1. Introduction

IJRC

International Journal of Reconfigurable Computing

1687-7209 1687-7195

Hindawi

10.1155/2019/1949121

1949121

Research Article

Dimension Reduction Using Quantum Wavelet Transform on a High-Performance Reconfigurable Computer

https://orcid.org/0000-0001-5570-0547

Mahmud

Naveed

https://orcid.org/0000-0002-4575-1049

El-Araby

Esam

Vanderbauwhede

Wim

Department of Electrical Engineering and Computer Science

University of Kansas

Lawrence

KS 66045

USA

ku.edu

2019

11112019

2019 04 05 2019 16 08 2019 01 09 2019 11112019

2019

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The high resolution of multidimensional space-time measurements and enormity of data readout counts in applications such as particle tracking in high-energy physics (HEP) is becoming nowadays a major challenge. In this work, we propose combining dimension reduction techniques with quantum information processing for application in domains that generate large volumes of data such as HEP. More specifically, we propose using quantum wavelet transform (QWT) to reduce the dimensionality of high spatial resolution data. The quantum wavelet transform takes advantage of the principles of quantum mechanics to achieve reductions in computation time while processing exponentially larger amount of information. We develop simpler and optimized emulation architectures than what has been previously reported, to perform quantum wavelet transform on high-resolution data. We also implement the inverse quantum wavelet transform (IQWT) to accurately reconstruct the data without any losses. The algorithms are prototyped on an FPGA-based quantum emulator that supports double-precision floating-point computations. Experimental work has been performed using high-resolution image data on a state-of-the-art multinode high-performance reconfigurable computer. The experimental results show that the proposed concepts represent a feasible approach to reducing dimensionality of high spatial resolution data generated by applications such as particle tracking in high-energy physics.

University of Kansas

1. Introduction

High-energy physics deal with advanced instruments such as particle accelerators and detectors. These machines use electromagnetic fields to accelerate charged particles to high speeds and create collisions. By studying particle collisions and tracking collision trajectories, physicists can test the predictions of many theories of particle physics such as properties of the Higgs boson [1], discovering new particle families [2] as well as many high-energy physics problems [3]. There are a number of high-energy physics (HEP) research centers [4]. The largest particle accelerator is the Large Hadron Collider (LHC) in Geneva, Switzerland. Large-scale general-purpose particle detectors have been developed at the LHC. The ATLAS [5] and Compact Muon Solenoid (CMS) [6] are two examples which are used for studying the properties of the Higgs boson and investigating new physics. The ATLAS has an inner detector that has been used to observe the decay products of collisions. The pixel detector [7] is one of the main components of the inner detector, having over 80 million readout channels [8] (pixels), which contribute to half the total readout channels of the entire experiment. Reconstruction of high-energy particles from the pixel detector is considered a critical design and engineering challenge [9], due to its large readout count, high spatial resolution, and 3D space-time measurements. There have been efforts to improve the tracking performance of the ATLAS Inner Detector [9, 10], which involved insertion of additional pixel detector layer (Insertable B-Layer). Another approach that has been considered in the ATLAS FTK (Fast Track Trigger) upgrade [11] is using variable resolution patterns, where the data from the detector is compared to generated pattern banks of particle tracks and non-intersecting data is filtered. In high-dimensional datasets, e.g., the pixel detector readout data, not all the measured data variables are relevant in understanding the underlying regions of interest (RoI). Generally, statistical predictive models are applied to multidimensional datasets for detection and pattern matching, which is a computationally expensive process. Thus, an effective method is needed to reduce the dimensionality [12] of the data in such high-dimensional spatial sets, for faster detection and matching.

As a feasible solution to this problem, we here propose combining wavelet-based dimension reduction techniques [13–15] with quantum information processing (QIP) [16] for applications in domains that generate high-dimensional data volumes such as high-energy physics (HEP). More specifically, we propose using quantum wavelet transform (QWT) to reduce the dimensionality and high spatial resolution of data in HEP particle tracking. Wavelet-based dimension reduction has been shown to be an effective technique in image preprocessing, reducing computation time, reducing interprocessor overhead, and improving classification accuracy [13–15]. Even so, the large volume of data from domains such as high-energy particle physics, present a challenge for a classical wavelet-based method. The QWT has been demonstrated in previous works to be very useful in quantum image processing and quantum data compression [16–19]. Quantum information processing uses qubits as the basic units of information storage, compared to classical binary forms, and can exploit quantum mechanical properties such as entanglement and superposition [20]. Therefore, applying QIP techniques such as QWT for dimension reduction of HEP data will bring substantial improvements in storage and computation compared to classical signal processing techniques. To the best of our knowledge, this work is the first to investigate QWT-based dimension reduction for HEP applications. We develop simple and effective algorithms for QWT and inverse-QWT (IQWT) that are best suited for dimension reduction and present corresponding emulation hardware architectures for QWT and IQWT.

The objectives and focus of our work are to demonstrate the feasibility of QWT for dimension reduction, through emulation, and to evaluate the performance of the emulation architectures. Our proposed algorithms are prototyped on an FPGA-based quantum emulator that has been developed based on our previous works [21, 22] and has been shown to emulate full quantum algorithms such as quantum Fourier transform (QFT) [23] and Grover’s search algorithm [24]. An FPGA platform was chosen because of its reconfigurability and flexibility in emulating multiple quantum algorithms. The emulator is based on the hardware system of DirectStream [25], which is a state-of-the-art reconfigurable computing platform. This emulation platform can be conveniently used to verify and benchmark future implementations of the proposed system in HEP applications. In the next section, we discuss fundamental concepts of quantum computing, QWT, and the related work done on QWT. In Section 3, we elaborate our proposed methods and emulation architectures. In Section 4, the experimental results and analysis are presented. Section 5 is our conclusion and future directions of this work.

2. Background and Related Work

In this section, we discuss background concepts of quantum computing and the quantum wavelet transform. We also discuss current and related work on QWT and high-energy particle detection.

2.1. Qubits, Superposition, and Entanglement

The qubit is the smallest unit of quantum information that describes a two-level quantum mechanical system. Physical implementations of the qubit can be electron/atomic/nuclear spin, where spin directions of the particle represent the two qubit levels. Other physical representations of the qubit can be photon polarization, superconducting Josephson junction, etc. [26]. The qubit is represented theoretically using the Bloch sphere [20], as shown in Figure 1. The basis states of the qubit, 0 and 1 are denoted by poles of the sphere. The property that distinguishes the qubit from the classical bit is superposition. The qubit can exist in a mixed or superposition state that is any other point on the surface of the sphere other than the poles. The overall state of the qubit can be defined using a linear superposition equation ψ=α0+β1, where α and β are complex numbers determined from φ and θ as shown in Figure 1 and satisfying α2+β2=1. Another distinguishing property of qubits is entanglement [20]. Two or more qubits can be entangled together, which means each entangled qubit becomes strongly correlated to the other along all possible combinations of the qubits. Outcome of measurement of one qubit is dependent on the other measurement, but individually they exhibit completely random behavior. In quantum computing, most algorithms assume that the qubits are fully entangled [21]. A system of n entangled qubits can be represented in vector space as N=2n complex basis state coefficients.

Figure 1

Bloch sphere representation of a single qubit.

2.2. Quantum Wavelet Transform

The wavelet transform, similar to other transforms like Fourier transform, decomposes input signals into their components. The principal difference is that Fourier transform decomposes input signals into their sinusoidal orthogonal temporal-only bases, while wavelet transform uses a set of non-sinusoidal functions, usually called mother wavelets, that are both spatially and temporally localized [15]. This results in a very important feature unique to wavelet transform which is the preservation of spatial locality of data. In other words, wavelet transform gives information about both time and frequency of input data. Wavelet transform also has better computation speeds compared to other transforms [14]. Therefore, they are effective and widely used in many image processing applications [16]. The wavelet transform can be effectively implemented in the quantum information processing (QIP) domain as quantum wavelet transform (QWT) [16, 18, 19]. However, the related work on QWT is rare or preliminary. This is because quantum computing and QIP are fields that are gradually developing and have not yet reached full potential. Although many large-scale quantum hardware is being developed [27], their useful applications are still yet to be decided. We discuss the classical wavelet transform first and then apply it in QIP domain, to establish a model for the QWT. The general wavelet transform can be expressed by(1)Fa,b=1a∫−∞∞ftΨa,b∗t−badt,where Ψ is called the mother wavelet function in complex conjugate form, and a, b are the time dilation and displacement factors, respectively. Wavelet transforms can be classified as discrete or continuous depending on the use of orthogonal or non-orthogonal wavelets, respectively. For the purposes of this paper, we will discuss the discrete wavelet transform (DWT). The DWT is a decomposition of input signals into a set of wavelet functions that are orthogonal to its translations and scale. The first and simplest DWT was introduced by mathematician Alfred Haar [15] and is thus named the Haar wavelet transform. The Haar mother wavelet function can be constructed using a unit step function, ut, as shown in (2). The discretized version of the Haar wavelet function is defined as (3), where t=q⋅Δt, b=j⋅Δt, and a=K⋅Δt, Δt is the sampling period, and K is the Haar window size in samples. Applying (3) in (1), the expression for the discrete Haar wavelet transform can be derived to be (4):(2)Ψt−ba=ut−ba−2ut−ba−12+ut−ba−1,(3)ΨDq−jK=+1,0≤q−j<K2,−1,K2≤q−j<K,0,otherwise.(4)FDj,K=∑q=0N−1fDq⋅ΔtΨDq−jK,where N is the number of data samples. When doing computation in the quantum domain, there are efficient methods of classical-to-quantum encoding [28–30]. Classical signal samples can be encoded as the coefficients of a quantum state, which is in superposition of its constituent basis states [28, 31]. The signal samples are transformed to a normalized sequence of amplitudes as shown in (5), where n is the number of qubits, N=2n is the number of basis states of the quantum system, and ψ is the input quantum state. By applying the wavelet transform on the input quantum state, we can formulate the equivalent expression for the quantum Haar wavelet transform (QHT) as (6), where ψQHT is the output quantum state:(5)ψ=∑q=0N−1fq⋅Δtq, where ∑q=0N−1fq⋅Δt2=1,(6)ψQHT=1N∑j=0N−1∑q=0N−1fq⋅ΔtΨDq−jKj.

There are many notable works on wavelets and applications of wavelet transforms [32–34]. We focused our survey on works of wavelet transform applied in the field of quantum information processing, i.e., quantum wavelet transforms (QWT). Early work on the QWT was reported in [16], where the authors present gate-level circuits for the quantum Haar wavelet and Daubechies D4 wavelet. They propose techniques for efficient quantum implementation of permutation matrices, which are required for factorization of the unitary operations of the wavelet transforms. In [35], the authors present quantum algorithms for Haar wavelet transforms and demonstrate applications in analyzing the multiscale structure of a dynamical system by logistic mapping. They show the derivation of the quantum wavelet transform by factoring the classical operators into direct sums, direct products, and dot products, which is the same approach in [16]. The work in [36] also demonstrates similar quantum circuits for QWT based on the well-known pyramid and packet algorithms which are used in classical DWT. The work in [37] presents an analytical study of effects of imperfections in quantum computation of a QWT-based dynamical model. They propose a QWT-based algorithm for the Daubechies wavelets. The works in [38, 39] demonstrate applications of QWT in image watermarking. A more recent, novel watermarking method is proposed in [18], where they demonstrate improvement in invisibility and robustness of the watermarked image. The most recent work on QWT is presented in [19], where the authors provide quantum circuit derivations for the Haar and Daubechies wavelet transforms. The authors propose QHT circuits which contain k levels of permutations, where k is the kernel size.

The previous work on the QWT has mostly presented circuits and software simulations, and no hardware implementations were reported. In comparison, our focus is on efficient hardware implementation of the QWT, and we propose an optimized, low resource-intensive approach for emulation on classical hardware. To the best of our knowledge, our work is the first to (1) propose using QHT for reducing data dimensionality and (2) provide hardware emulation architectures for QHT. Our approach is simpler and optimized for emulation because it uses a single Haar kernel model and a pair of permutation models, where the permutation models are implemented as classical circuits. We propose classical circuits for permutation because (1) quantum permutation circuits implemented using multiple levels of swap operations [16, 19, 35, 36] have large quantum cost, and (2) classical permutation techniques such as index scheduling are space and time efficient for hardware implementation.

Moreover, among the previous work there have been no experimental demonstrations of QWTs on actual quantum hardware or on any quantum emulators. In our work, we present simplified architectures for implementing multilevel, multidimensional QHT operations on classical hardware and propose application of these methods in dimensionality reduction of particle tracking data in high-energy physics applications. Our proposed algorithms and architectures are easily generalizable, compared to previous works. In addition, our proposed architectures are more effective in utilizing minimal quantum and classical hardware resources which is more suited for dimension reduction. We experimentally evaluate the architectures on a high-throughput and high-accuracy FPGA quantum emulator. To the best of our knowledge, this work is the first to present experimental demonstrations of quantum wavelets used for dimension reduction in large-scale applications, e.g., LHC.

2.3. High-Energy Particle Detectors

The ATLAS Fast Tracker (FTK) is a hardware processor upgrade [9] for the Large Hadron Collider (LHC) which has been developed for faster reconstruction of tracks at 100 kHz. Details of the operating principle, hardware components, and performance can be found in [40]. The reconstruction is done by matching detector data with predefined track patterns that are stored in associative memory on ASICs. The data processing and pattern matching are done using FPGA hardware. The FTK receives data from the ATLAS pixel detector and stores them as clusters to reduce data size. The clusters are arranged into regions for parallel processing. In the processing units (PU), the tracks are stored with full resolution on input FPGAs, while other FPGA processors are responsible for converting the stored data into coarser resolution segments. This is followed by comparison of the course-grained segments with pre-stored Monte Carlo track patterns. The coarse granularity of the tracks can cause problems in identification and pattern matching and lead to slower tracking performance of the FTK. In this work, we propose QHT techniques to reduce dimensionality of full resolution data such as FTK particle tracks. We also demonstrate an FPGA-based hardware prototype that can be easily integrated into the current FTK ATLAS architecture.

3. Methodology and Emulation Architectures

In this section, we elaborate our methodology that uses QHT to achieve dimension reduction. We also detail the corresponding emulation hardware architectures that were implemented [41].

3.1. Dimension Reduction

The classical wavelet transform has been shown to achieve dimension reduction efficiently and can be used in various applications that use hyperspectral data, for example, remote sensing, mineralogy, and surveillance. Depending on the type of data and the application in which these data are being used, both 1D wavelet transform (1D-WT) and 2D wavelet transform (2D-WT) techniques can be used for dimension reduction. For example, while the data in remote sensing hyperspectral imagery is in the form of large 3D data cubes, 1D wavelet transform (1D-WT) was previously proposed [13, 14] for efficient dimensionality reduction of such data cubes. In the experimental work in [14], five levels of wavelet decomposition were used on images of size 217 × 512 pixels by 192 bands to achieve ×32 reductions in data volume. In current and future large-scale applications, the volume of data can be overwhelming. For example, hyperspectral image cubes are typically hundreds of pixels in width and height [13], with 220–240 frequency bands [14]. The ATLAS pixel detector contains 1700 detection modules corresponding to 8×107 pixels [8] and has bandwidth capacity of 48 Gb/s [11]. Hence, it is necessary to investigate and apply newer paradigms of information processing and storage for supporting future applications at full bandwidths. In quantum information processing, exponentially greater amount of information can be held in the state of quantum system compared to a classical binary system. Thus, we propose using quantum information processing techniques such as the QWT for the processing of high volumes of data in large-scale applications. For example, a 64K × 64K image can be reduced to a smaller resolution of 32 × 32 using a 32-qubit, 12-level QWT decomposition. The pixels are encoded as N basis states of a quantum state, where N=2n and n is the number of qubits, i.e., 32.

Our proposed methodology for dimension reduction using quantum wavelet transforms is shown in Figure 2 [41]. In our proposed approach, each pixel of the input image is encoded as a basis coefficient of a quantum state. Input image data first undergoes a multidimensional quantum Haar transform, e.g., one-dimensional QHT (1D-QHT) or two-dimensional QHT (2D-QHT) operation. The operations can have multiple decomposition levels and separate the input image into a number of low frequency and high frequency replications, depending on the number of decomposition levels. The lowest frequency image replication retains the principal components of the input data without significant data loss. More importantly, the mirror images have reduced dimensionality and thus can be used for reducing preprocessing overhead or communication bandwidth congestion. Multidimensional inverse quantum Haar transform (1D-IQHT or 2D-IQHT) is then applied to reconstruct the original data. The 2D operations can be achieved by cascading 1D operations and multiple permutation sets.

Figure 2

Dimension reduction using 2D-QHT and 2D-IQHT.

The proposed kernel-based algorithms for multilevel 1D-QHT and 2D-QHT are elaborated in Algorithms 1 and 2, respectively. The algorithms perform multilevel decompositions of 1D-QHT or 2D-QHT operations based on a d-dimensional Haar wavelet kernel. The kernel functionality can be represented by a set of operations applied to some input states/pixels and is preceded and followed by perfect shuffle permutation operations [16] on the input and output states/pixels. The permutation operations are performed by means of index calculations and scheduling. Algorithm 1 performs 1D-QHT on a set of input pixels, X, to produce an output pixel set, Y. The input pixels first undergo input permutations, followed by 1D Haar kernel operations on 2 pixels every cycle, and output permutations. Algorithm 2 performs 2D-QHT on a set of input pixels, X, to produce an output pixel set, Y. The input pixels first undergo input permutations, followed by 2D Haar kernel operations on 4 pixels every cycle, and output permutations.

<bold>Algorithm 1: </bold>Multilevel 1D quantum Haar transform.

Input: X=x0,x1,…xN−1, nrows, ncols, nlevels

Output: Y=y0,y1,…,yN−1

n states = n rows × n cols = N

for ilevel=1; ilevel≤nlevels; ilevel++ do

for igroup=0; igroup<nstates/2; igroup++ do

//Initial scheduler setup

icolGroup=igroup/nrows/2

irowGroup=igroup−icolGroup×nrows/2

icol=icolGroup

irow=2×irowGroup

//Input Permutations/Scheduler

iX00=irow+icol×nrows

iX10=iX00+1

X00⟵XiX00

X10⟵XiX10

//1D-QHT kernel

Y00=X00+X10/2

Y10=X00−X10/2

//Output Permutations/Scheduler

iY00=irow+icol×nrows/2

iY10=iY00+nrows/2

YiY00⟵Y00

YiY10⟵Y10

end for

<bold>Algorithm 2: </bold>Multilevel 2D quantum Haar transform.

Input: X=x0,x1,…,xN−1, nrows, ncols, nlevels

Output: Y=y0,y1,…,yN−1

n states = n rows × n cols = N

for ilevel=1; ilevel≤nlevels; ilevel++ do

for igroup=0; igroup<nstates/4; igroup++ do

//Initial scheduler setup

icolGroup=igroup/nrows/2

irowGroup=igroup−icolGroup×nrows/2

icol=2×icolGroup

irow=2×irowGroup

//Input Permutations/Scheduler

iX00=irow+icol×nrows

iX10=iX00+1

iX01=iX00+nrows

iX11=iX01+1

X00⟵XiX00

X10⟵XiX10

X01⟵XiX01

X11⟵XiX11

//2D-QHT kernel

Y00=X00+X10+X01+X11/2

Y10=X00−X10+X01−X11/2

Y01=X00+X10−X01−X11/2

Y11=X00−X10−X01+X11/2

//Output Permutations/Scheduler

iY00=irow+icol×nrows/2

iY10=iY00+nrows/2

iY01=iY00+nstates/2

iY11=iY01+nrows/2

YiY00⟵Y00

YiY10⟵Y10

YiY01⟵Y01

YiY11⟵Y11

end for

To efficiently extract output state data, quantum-to-classical readout techniques [28] such as quantum Fourier transform (QFT) can be employed. However, this was not required to be implemented in this work as full emulation of quantum computation was performed on classical hardware and the output of the emulator is in classical representation. For emulation, we develop circuit models based on these algorithms and integrate them into reconfigurable hardware architectures for multilevel, multidimensional (1D and 2D) QHT and IQHT. These models and emulation architectures are elaborated in the next section.

3.2. Quantum Haar Transform Kernel

The Haar wavelet kernel can be generalized by quantum operations using n qubits and a d-dimension kernel as shown in (7), where ⊗ is the Kronecker product [42], H is the Hadamard transform [20], and I is an identity matrix. Here, a group of entangled gates is denoted by the gate symbol with the size of the equivalent operation matrix as subscript, for example, H2d. The quantum Haar function can be implemented using d entangled H gates and n−d entangled I gates as shown in (7). For example, the transformation matrix for 2D-QHT with d=2 can be derived as shown in (9):(7)UQHT=I2n−d ⊗ H2d,where(8)H2d=H ⊗ H ⊗ ⋯ ⊗ H︸d,I2n−d=I ⊗ I ⊗ ⋯ ⊗ I︸n−d ,H=H2=12111−1,I=I2=1001 ,where n ⟹ number of qubits and d ⟹ kernel dimension:(9)UQHT2D=I2n−2 ⊗ H22=I2n/4 ⊗ H4=IN/4 ⊗ H4,where(10)H4=H ⊗ H=1211111−11−111−1−11−1−11.

3.3. Permutation Operations

Perfect shuffle permutation on a given vector is described as partitioning the vector in half and shuffling the top and bottom portions of the halves [16]. In our algorithms for QHT and IQHT, we apply similar input and output permutation operations before and after applying the QHT kernel, respectively. The QHT kernel is performed on a set of k points, where k=2d. An input permutation operation involves dividing the input vector of size N, into k groups, and selecting a state (pixel) from every group(s), to be applied to the kernel operation. For 1D-QHT and 2D-QHT operations, the input permutations, Pin1D and Pin2D, are shown in (11) and (12), respectively. An output permutation involves arranging the pixels from k groups into a single output state sequence. The output permutation for 1D-QHT and 2D-QHT operations, Pout1D and Pout2D, are shown in (13) and (14), respectively:(11)Pin1D:x0x1⋮xnrowsxnrows+1⋮xN−1⟼x0x1⋮xnrowsxnrows+1⋮xN−1,(12)Pin2D:x0x1x2x3⋮xnrowsxnrows+1xnrows+2xnrows+3⋮⋮xN−1⟼x0x1xnrowsxnrows+1x2x3xnrows+2xnrows+3⋮⋮⋮xN−1,(13)Pout1D:y0ynrows/2y1ynrows/2+1⋮⋮yN−1⟼y0y1y2y3⋮⋮yN−1,(14)Pout2D:y0ynrows/2yN/2ynrows/2+N/2⋮⋮yN−1⟼y0y1y2y3⋮⋮yN−1.

3.4. Emulation Architectures

While developing the emulation architectures for the proposed system, as an intermediate step, we design circuit models, illustrated in Figures 3 and 4 for 1D- and 2D-QHT/IQHT, respectively. These models are derived from the sequence of operations in Algorithms 1 and 2 and can contain quantum and/or classical circuits. The 1D- and 2D-QHT models in Figures 3(a) and 4(a), respectively, consist of input permutation models (Pin), followed by Haar kernel models (UQHT) and then output permutation models (Pout). The 1D- and 2D-IQHT models in Figures 3(b) and 4(b), respectively, consist of inverse output permutation models Pout−1, followed by Haar kernel models (UQHT) and then inverse input permutation models Pin−1. The inverse models are equivalent to the direct models, as the permutation operations are reversible. To achieve multilevel decompositions, multiple iterations of the Haar kernel models are applied. The QHT and IQHT operations for 1D and 2D are summarized as unitary transformations in (15) and (16), respectively. The emulation architectures of the 1D-QHT/IQHT and 2D-QHT/IQHT are shown in Figures 5 and 6, respectively. Since the hardware implementations of the 1D and 2D are similar, we focus our following discussions on the implementation of the 2D-QHT emulation architectures:(15)1D−QHT:Pout1D⋅UQHT1D⋅Pin1D,1D−IQHT:Pin1D−1⋅UQHT1D⋅Pout1D−1,(16)2D−QHT:Pout2D⋅UQHT2D⋅Pin2D,2D−IQHT:Pin2D−1⋅UQHT2D⋅Pout2D−1.

Figure 3

(a) 1D-quantum Haar transform circuit. (b) 1D-inverse quantum Haar transform circuit.

(a) (b)

Figure 4

(a) 2D-quantum Haar transform circuit. (b) 2D-inverse quantum Haar transform circuit.

(a) (b)

Figure 5

Emulation architectures for the (a) 1D input permutation, (b) 1D-Haar kernel, and (c) 1D output permutation.

(a) (b) (c)

Figure 6

Emulation architectures for the (a) 2D input permutation, (b) 2D-Haar kernel, and (c) 2D output permutation.

(a) (b) (c)

As shown in Figures 4(a) and (16), the first step in the 2D-QHT operation is the input permutation Pin2D, which is described by (12). The permutations can be modeled as quantum circuits with multiple swap gates, but that would incur high resource utilization in the corresponding emulation architecture. For this reason, we use classical models that involve simple index scheduling, and the corresponding emulation architecture is shown in Figure 6(a). The input is a vector of quantum state coefficients which are written to a memory array in the index order 0 to N−1. Four coefficient values are then read out each clock cycle, with the scheduler generating the read indices iX00, iX01, iX10, and iX11 according to the input permutation, see Algorithm 2 and (12). The scheduler maintains a counter, row index irow, and a column index icol to calculate the output indices. Multiplications and divisions by powers of two are replaced by logical shifts for optimizing area and speed. The scheduler also requires a floor operation unit.

As shown in Figures 4(b) and (9), the 2D-Haar transformation, UQHT2D, is modeled using a pair of Hadamard gates. The Hadamard pair operation reduces to kernel operations on a set of four coefficients as we described in Algorithm 2. The emulation architecture for the 2D-Haar kernel is shown in Figure 6(b). The design takes in four input coefficients, applies the kernel operations which involve addition and division, and outputs four coefficients per clock cycle. Conventional operator sharing techniques and logical shifts are applied to optimize for speed and area.

The final step in the 2D-QHT operation is the output permutation, Pout2D, described by (14). The corresponding emulation architecture is shown in Figure 6(c) and works similar to the input permutation scheduler. The input vector of coefficients are written to a memory array, four values per clock cycle, with the scheduler generating the write indices iY00, iY01, iY10, and iY11 according to the output permutation described in Algorithm 2. The permuted coefficients are then read out from memory 4 values per clock cycle.

The emulation hardware architectures, i.e., input/output schedulers and 1D/2D Haar kernels, were integrated into a reconfigurable quantum emulator design based on our previous works [21, 22], whose high-level architecture is shown in Figure 7. The emulator stores input and output quantum states as vectors of the state coefficients and core kernel operations are extracted from the input quantum algorithm. The input state vector goes through the input permutations (input schedulers) before the kernel operation is applied iteratively across each state. To get the correct final quantum state that represents the transformed data, the output permutation (output schedulers) is applied. The architecture uses a fully pipelined dataflow architecture and supports single and double-precision floating-point arithmetic. For example, each quantum state coefficient is complex and is modeled in 32 bit floating-point precision for the real and imaginary components, respectively. The emulator also supports features such as fully-entangled input quantum state preparation from a set of input qubits and output quantum state measurement as a classical bit string. The emulator is generic and can efficiently run a given quantum algorithm that can be reduced to its corresponding unitary transformation.

Figure 7

Reconfigurable quantum emulator architecture.

4. Experimental Work

The experimental work was performed on DS8, a state-of-the-art high-performance reconfigurable computing (HPRC) system developed by DirectStream [25]. On the DS8 platform, developers can build applications on hardware systems ranging from single-node compute instances to multinode structures, see Figure 8. A single C2 compute node of the DS8 system is equipped with a high-end Intel-Altera Arria 10AX115N4F45E3SG FPGA, with on-chip resources such as adaptive logic modules (ALMs), block RAMs (BRAMs), digital signal processors (DSPs), and on-board resources such as two 32 GB SDRAM memory banks and four 8 MB SRAM memory banks, as shown in Figure 8. A user-friendly programming environment, previously known as Carte-C [43], is integrated into the DS hardware systems. A high-level language (HLL) facilitates the development of complex, parallel, and reconfigurable codes in an efficient manner. The study in [44] showed that Carte-C has a highly productive environment, short acquisition time, and short learning time as well as a short development time. The DS8 architecture provides a combination of high performance, high scalability, runtime reconfiguration, and ease of use.

Figure 8

DS8 platform architectures. (a) Single compute node. (b) Multinode instance. (c) Node types.

(a) (b) (c)

The QHT and IQHT architectures were implemented using C++ on the DS8 programming environment. Input images with a resolution of up to 1024×1024, and 256 shades of grayscale pixels, were used to test the designs. MATLAB was used to convert the images into greyscale, generate the input vectors for DS8, and reconstruct images from the output vectors. Synthesis and hardware builds were performed using Quartus Prime Version 17.02 on the DS8 environment. Figure 9(a) shows one of the input images converted to greyscale, and Figure 9(b) is the output after a 1D-QHT operation with 1 level of decomposition. Figure 9(c) is the output after a 1D-QHT operation with 2 levels of decomposition, and Figure 9(d) shows the reconstructed images after a 1D-IQHT operation was applied. Figures 10(a)–10(d) show the results from repeating the experiment using the 2D-QHT and 2D-IQHT architectures.

Figure 9

Experimental results of multilevel decomposition and reconstruction with 1D-QHT and 1D-IQHT. (a) Original image. (b) 1-level 1D-QHT. (c) 2-level 1D-QHT. (d) Reconstructed image using 1D-IQHT.

(a) (b) (c) (d)

Figure 10

Experimental results of multilevel decomposition and reconstruction with 2D-QHT and 2D-IQHT. (a) Original image. (b) 1-level 2D-QHT. (c) 2-level 2D-QHT. (d) Reconstructed image using 2D-IQHT.

(a) (b) (c) (d)

Resource utilizations from the hardware implementations are summarized in Tables 1 and 2 for 1D and 2D, respectively. The on-chip resources (ALMs, BRAMs, DSPs) are used up in implementing the static components of the design such as counters, adders, and shift operators and hence are constant as the emulated circuit size (number of qubits) increases. The low on-chip resource utilizations indicate that our proposed approach and emulation architecture designs are highly space-efficient. The 1D-QHT architecture consumes lower on-chip resources than 2D-QHT due to its less complex kernel operations. The low resource utilizations also indicate the flexibility of the QHT and IQHT designs for integrating with larger algorithms.

Table 1

1D-QHT implementation results on Arria 10AX115N4F45E3SG FPGA.

Number of pixels	Number of qubits	Resource utilization∗ (%)			SDRAM∗∗ (bytes)	Emulation time (sec)
Number of pixels	Number of qubits	ALMs	BRAMs	DSPs	SDRAM∗∗ (bytes)	Emulation time (sec)
16 × 16	8	11	8	1	4 K	0.00018
32 × 32	10	11	8	1	16 K	0.00071
64 × 64	12	11	8	1	64 K	0.00285
128 × 128	14	11	8	1	256 K	0.01139
256 × 256	16	11	8	1	1 M	0.04557
512 × 512	18	11	8	1	4 M	0.18226
1024 × 1024	20	11	8	1	16 M	0.72905

∗ Total chip resources: N_ALM = 427,200; N_BRAM = 2,713; N_DSP = 1,518. ∗∗Total on-board SDRAM memory: 2 parallel banks of 32 GB each.

Table 2

2D-QHT implementation results on Arria 10AX115N4F45E3SG FPGA.

Number of pixels	Number of qubits	Resource utilization∗ (%)			SDRAM∗∗ (bytes)	Emulation time (sec)
Number of pixels	Number of qubits	ALMs	BRAMs	DSPs	SDRAM∗∗ (bytes)	Emulation time (sec)
16 × 16	8	14	9	2	4 K	0.00012
32 × 32	10	14	9	2	16 K	0.00047
64 × 64	12	14	9	2	64 K	0.00187
128 × 128	14	14	9	2	256 K	0.00746
256 × 256	16	14	9	2	1 M	0.02982
512 × 512	18	14	9	2	4 M	0.11926
1024 × 1024	20	14	9	2	16 M	0.47704

∗ Total chip resources: N_ALM = 427,200; N_BRAM = 2,713; N_DSP = 1,518. ∗∗Total on-board SDRAM memory: 2 parallel banks of 32 GB each.

The SDRAM memory requirements for storage of the input and output images as quantum state vectors are also reported in Tables 1 and 2. For the highest resolution image of size 1024×1024, the pixels occupy 25% of the total on-board SDRAM memory (64 GB) available on a single DS node. The pixels of the input images are encoded as basis coefficients of a quantum state. For example, to store 16×16 or 256 pixels, we need 256 complex coefficients each of which have a real and imaginary component occupying total 2×4=8 bytes in 32 bit floating-point representation. Therefore, for storing both input and output images, 2×256×8=4096 bytes of memory was required. The obtained memory usages for larger QHT circuits are consistent with expected values.

The hardware designs on the FPGA were pipelined to ensure a constant and high operating frequency of 233 MHz. The obtained emulation times for high resolution images are also feasible. For a 1024×1024 image, 20 qubits were sufficient for achieving dimension reduction using 1D-QHT and 2D-QHT. From our experimental results, we observe that the emulation time increases linearly with increase in the number of image pixels (states), as illustrated in Figure 11. This is because a large portion of the emulation time is dedicated to writing in and reading out the input/output state vectors of size N (number of pixels); hence, the emulation time complexity is ON. This indicates the benefit of using quantum encoding of data, i.e., encoding each image pixel as a basis state coefficient in the quantum state space. Finally, the emulation times for 1D-QHT are higher than 2D-QHT because of the higher number of iterations N/2 in the 1D algorithm, compared to N/4 iterations in the 2D algorithm, see Algorithms 1 and 2.

Figure 11

Emulation time as a function of data size (number of pixels).

In general, on a classical emulation platform, the emulation execution time increases with both the spatial and temporal complexities of the quantum circuit. In other words, the emulation time of a quantum circuit on a classical platform is generally a function of both the circuit width (number of qubits) and depth (number of gate levels). Due to optimizations and encoding techniques we used, the emulation time of our proposed emulation architectures is a function of only the quantum circuit width (number of qubits), as shown by our experimental results. On state-of-the-art superconducting NISQ devices [45, 46], the execution time is a function of only the depth (number of gate levels) of the circuit [47]. For our proposed 1D-QHT and 2D-QHT circuits, which are simple quantum circuits of depth 1, we estimate an execution time of 0.01 ms on a typical NISQ device processing a 7×7 qubit array with sampling frequency of 100 kHz [47]. The estimated execution time is constant for a fixed circuit depth and variable number of qubits in the quantum processing unit (QPU) array; i.e., the time complexity is theoretically O1. In comparison, the time complexity of our emulation is ON.

Our emulation experiments and implementations help in validating the functionality and feasibility of the proposed QHT-based methodology in achieving dimension reduction of high-resolution images. The emulation provides implications for the proposed system’s application in fast, efficient processing of particle tracking data in the large-scale, high-energy physics domain. The emulation is memory-bound by the resources on a single DS FPGA node. For larger-scale emulation, the on-board memory has to be increased, or multi-node, and/or multichassis architectures of the DS system can be utilized in conjunction with efficient scheduling techniques and high-bandwidth networks [22].

We further quantitatively compare our obtained experimental results with the existing FPGA-based emulation work [48–53] as shown in Table 3. Among the related work on FPGA emulation of quantum circuits, our emulator has the capability of emulating the largest quantum circuits (QFT, QHT, and Grover’s search), with highest operating frequency (233 MHz) and high precision (32 bit floating-point). Current FPGA hardware-emulators have many discrepancies (missing resource utilization, operating frequency, and emulation time) in the reporting of their results which makes a comprehensive comparison difficult. In our comparison, we included only hardware emulators, as most parallel-software-simulators are based on large-scale supercomputers such as Summit [47] and Sunway [54], which are extremely costly, power-hungry, and resource-hungry and are not comparable with FPGA-emulators. Also, they provide simulations of random quantum circuits and not full quantum algorithms.

Table 3

Comparison of the proposed work against previous works of FPGA emulation.

Reported work	Algorithm	Number of qubits	Precision	Frequency (MHz)	Emulation time (sec)
Fujishima [48]	Shor’s factoring	—	—	80	10

Khalid et al. [49]	QFT	3	16 bit fixed pt.	82.1	61E − 9
Khalid et al. [49]	Grover’s search	3	16 bit fixed pt.	82.1	84E − 9

Aminian et al. [50]	QFT	3	16 bit fixed pt	131.3	46E − 9

Lee et al. [51]	QFT	5	24 bit fixed pt.	90	219E − 9
Lee et al. [51]	Grover’s search	7	24 bit fixed pt.	85	96.8E − 9

Silva et al. [52]	QFT	4	32 bit floating pt.	—	4E − 6

Pilch et al. [53]	Deutsch	2	—	—	—

Mahmud et al. [22]	QFT	5	32 bit floating pt.	233	4.63E − 4†
Mahmud et al. [22]	Grover’s search	5	32 bit floating pt.	233	4.38E − 7†

Proposed work	QFT	20	32 bit floating pt.	233	18.4
	QHT	20	32 bit floating pt.	233	0.477
	Grover’s search	22	32 bit floating pt.	233	7.5E04

† Results obtained at a later time to publication.

5. Conclusions

Quantum information processing and quantum computing will have significant implications in the future of computing technology. As current quantum technology continues to improve, there is a great need to investigate useful applications in quantum information theory. In this work, we presented a first effort, to the best of our knowledge, to efficiently reduce data dimensionality using quantum processing methods such as quantum wavelet transform. We propose to apply these techniques in physics applications that investigate high-energy particle detection and tracking, where dimension reduction helps to reduce communication bandwidth and speedup preprocessing computations. Our proposed architectures are simpler and optimized for hardware implementation than previously reported works. We demonstrated the minimal resource utilization, high performance/throughout, and high precision of the proposed architectures. We prototyped our designs on a quantum emulator and demonstrated the feasibility of proposed techniques by conducting experiments using high-resolution test image data.

Due to limitations of the current state of quantum technology, e.g., cost, availability, and current scale (size) of quantum processors, it is beyond the scope of this work to actually implement the system and measure performance. Although not yet integrated with the ATLAS FTK project, the proposed approach and emulation hardware architectures are feasible for future implementations, with the maturing of current quantum technology. For future integration into the ATLAS FTK project, data conversion techniques such as quantum-to-classical and classical-to-quantum, which are heavily-researched current topics, must be perfected first, and we plan to conduct investigations of these techniques in our future work. Our future plans also include application of the proposed methods using real HEP data and combining QHT with Grover’s search algorithm as a complete solution to HEP FTK problems. We will also investigate 3D-QHT, Daubechies wavelet transforms, and their application for real-time data streaming.

Data Availability

The test data used to support the findings of this study are available from the corresponding author upon request and approval from Direcstream.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

We would like to thank Prof. Alice Bean and Prof. Christopher Rogan from the department of Physics and Astronomy at the University of Kansas for their valuable insights and help in this work.

CERN Accelerating science

The Higgs boson

2019

https://home.cern/science/physics/higgs-boson

CERN Accelerating science

Unified forces

2019

https://home.cern/science/physics/unified-forces

Ginzburg

V. L.

The Physics of a Lifetime: Reflections on the Problems and Personalities of 20th Century Physics 2013

Berlin, Germany

Springer Science & Business Media

Kisel

Track Reconstruction and Pattern Recognition in High-Energy Physics 2019

https://www.physik.uni-heidelberg.de/c/image/exp/f/highrr/Kisel_HD_12.04.2016.pdf

Aad

ATLAS Collaboration

The ATLAS experiment at the CERN large Hadron collider

Journal of Instrumentation 2008 3 8

10.1088/1748-0221/3/08/S08003

2-s2.0-68449087196

O’Luanaigh

New Results Indicate that New Particle is a Higgs Boson 2019

Geneva, Switzerland

CERN

https://home.cern/news/news/physics/new-results-indicate-new-particle-higgs-boson

CERN Accelerating science

The inner detector

2019

https://atlas.cern/discover/detector/inner-detector

Hugging

The ATLAS pixel detector

Proceedings of the IEEE Symposium Conference Record Nuclear Science

October 2004

Rome, Italy

1077 1081

Backhaus

The upgraded pixel detector of the ATLAS experiment for run 2 at the large Hadron collider

Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 2016 831 65 70

10.1016/j.nima.2016.05.018

2-s2.0-84971276013

Pernegger

The Pixel Detector of the ATLAS Experiment for LHC Run-2 2015

Amerio

Andreani

Andreazza

ATLAS FTK: Fast Track Trigger 2013

Fodor

I. K.

A Survey of Dimension Reduction Techniques 2002

Livermore, CA, USA

Center for Applied Scientific Computing, Lawrence Livermore National Laboratory

1 18

Kaewpijit

Le Moigne

El-Ghazawi

Automatic reduction of hyperspectral imagery using wavelet spectral analysis

IEEE Transactions on Geoscience and Remote Sensing 2003 41 4

10.1109/tgrs.2003.810712

2-s2.0-0037934716

El-Araby

El-Ghazawi

Le Moigne

Gaj

Wavelet spectral dimension reduction of hyperspectral imagery on a reconfigurable computer

Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT)

December 2004

Brisbane, Australia

399 402

Wickmann

A wavelet approach to dimension reduction and classification of hyperspectral data

2007

Oslo, Norway

Faculty of Mathematics and Natural Sciences, University of Oslo

Masters Thesis

Fijany

Williams

C. P.

Quantum Wavelet Transforms: Fast Algorithms and Complete Circuits 1998

http://arxiv.org/abs/9809004v1

Stajic

The future of quantum information processing

Science 2013 339 6124 1163

10.1126/science.339.6124.1163

2-s2.0-84874738739

Heidari

Naseri

Gheibi

Baghfalaki

Pourarian

M. R.

Farouk

A new quantum watermarking based on quantum wavelet transforms

Communications in Theoretical Physics 2017 67 6 732

10.1088/0253-6102/67/6/732

2-s2.0-85021312141

Hai-Sheng

Fan

Xia

Song

The multi-level and multi-dimensional quantum wavelet packet transforms

Scientific Reports 2018 8 1

10.1038/s41598-018-32348-8

2-s2.0-85053426843

Nielsen

M. A.

Chuang

I. L.

Quantum Computation and Quantum Information 2010

Cambridge, UK

Cambridge University Press

Mahmud

El-Araby

A scalable high-precision and high-throughput architecture for emulation of quantum algorithms

Proceedings of the 31st IEEE International System-on-Chip Conference (SOCC 2018)

September 2018

Washington, DC, USA

Mahmud

El-Araby

Towards higher scalability of quantum hardware emulation using efficient resource scheduling

Proceedings of the 3rd IEEE International Conference on Rebooting Computing (ICRC 2018)

November 2018

Washington, DC, USA

Shor

P. W.

Algorithms for quantum computation: discrete logarithms and factoring

Proceedings of the 35th IEEE Annual Symposium on Foundations of Computer Science (SFCS ’94)

November 1994

Santa Fe, NM, USA

124 134

Grover

L. K.

A fast quantum mechanical algorithm for database search

Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of computing (STOC ’96)

May 1996

Philadelphia, PA, USA

212 219

DirectStream LLC

2019

https://directstream.com

Quantum Computing Report

Qubit Technology 2019

https://quantumcomputingreport.com/scorecards/qubit-technology/

Gomes

Quantum computing: both here and not here

IEEE Spectrum April 2019

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=83-22045

Williams

C. P.

Explorations in Quantum Computing 2010

Berlin, Germany

Springer Science & Business Media

Grover

Rudolph

Creating Superpositions that Correspond to Efficiently Integrable Probability Distributions 2002

http://arxiv.org/abs/0208112

Kaye

Mosca

Quantum Networks for Generating ArbitraryQuantum States 2004 1 3

http://arxiv.org/abs/0407102

Yao

Wang

Liao

Quantum image processing and its application to edge detection: theory and experiment

Physical Review X 2017 7 031041

10.1103/physrevx.7.031041

2-s2.0-85029717624

Mallat

S. G.

A theory for multiresolution signal decomposition: the wavelet representation

IEEE Transactions on Pattern Analysis and Machine Intelligence 1989 11 7 674 693

10.1109/34.192463

2-s2.0-0024700097

Toufik

Mokhtar

The wavelet transform for image processing applications

Advances in Wavelet Theory and Their Applications in Engineering, Physics and Technology 2012

Rijeka, Croatia

InTech

395 422

Addison

P. S.

The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance 2017

Boca Raton, FL, USA

CRC Press

Gosal

Lawton

Quantum Haar Wavelet Transforms and Their Applications 2001

Ohnishi

Matsueda

Zheng

Quantum wavelet transform and matrix factorization

Proceedings of the IEEE International Quantum Electronics Conference

2005

Antwerp, Belgium

1327 1328

Terraneo

Shepelyansky

D. L.

Imperfection effects for multiple applications of the quantum wavelet transform

Physical Review Letters 2003 90 25

10.1103/physrevlett.90.257902

2-s2.0-85038302468

Song

X.-H.

Wang

Liu

Abd El-Latif

A. A.

Niu

X.-M.

A dynamic watermarking scheme for quantum images using quantum wavelet transform

Quantum Information Processing 2013 12 12 3689 3706

10.1007/s11128-013-0629-2

2-s2.0-84887238065

Yang

Y.-G.

Tian

Zhang

Analysis and improvement of the dynamic watermarking scheme for quantum images using quantum wavelet transform

Quantum Information Processing 2014 13 9 1931 1936

10.1007/s11128-014-0783-1

2-s2.0-84906317386

Ilic

The ATLAS fast tracker and tracking at the high-luminosity LHC

Journal of Instrumentation 2017 12 2

10.1088/1748-0221/12/02/c02052

2-s2.0-85035054667

Mahmud

El-Araby

Caliga

Scaling reconfigurable emulation of quantum algorithms at high precision and high throughput

Quantum Engineering 2019 1 2

10.1002/que2.19

Brylinski

R. K.

Chen

Mathematics of Quantum Computation 2002

Boca Raton, FL, USA

CRC Press

El-Araby

El-Ghazawi

Le Moigne

Irish

Reconfigurable processing for satellite on-board automatic cloud cover assessment

Journal of Real-Time Image Processing 2009 4 3 245 259

10.1007/s11554-008-0107-8

2-s2.0-70349898795

El-Araby

Merchant

S. G.

El-Ghazawi

Vanderbauwhede

Benkrid

Assessing productivity of high-level design methodologies for high-performance reconfigurable computers

High-Performance Computing Using FPGAs 2013

New York, NY, USA

Springer

719 745

A Preview of Bristlecone, Googles New Quantum Processor, Google AI Blog, March 2018

IBM Announces Advances to IBM Q Systems & Ecosystem, IBM Press Release, November 2017

Villalonga

Lyakh

Boixo

Establishing the quantum supremacy frontier with a 281 Pflop/s simulation

2019

http://arxiv.org/abs/1905.00444v1

Fujishima

FPGA-based high-speed emulator of quantum computing

Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT 2003)

December 2003

Tokyo, Japan

Khalid

A. U.

Zilic

Radecka

FPGA emulation of quantum circuits

Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD 04)

October 2004

San Jose, CA, USA

310 315

Aminian

Saeedi

Zamani

M. S.

Sedighi

FPGA-based circuit model emulation of quantum algorithms

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI ’08)

April 2008

Montpellier, France

399 404

Lee

Y. H.

Khalil-Hani

Marsono

M. N.

An FPGA-based quantum computing emulation framework based on serial-parallel architecture

International Journal of Reconfigurable Computing 2016 2016 18

5718124

10.1155/2016/5718124

2-s2.0-84965115243

Silva

Zabaleta

O. G.

FPGA quantum computing emulator using high level design tools

Proceedings of the Eight Argentine Symposium and Conference on Embedded Systems (CASE’17)

August 2017

Buenos Aires, Argentina

1 6

Pilch

Długopolski

An FPGA-based real quantum computer emulator

Journal of Computational Electronics 2018 1 14

Ying

Sun

Yang

Quantum supremacy circuit simulation on Sunway TaihuLight

2018

http://arxiv.org/abs/1804.04797