CNOT-count optimized quantum circuit of the Shor’s algorithm

We present improved quantum circuit for modular exponentiation of a constant, which is the most expensive operation in Shor’s algorithm for in-teger factorization. While previous work mostly focuses on minimizing the number of qubits or the depth of circuit, we try to minimize the number of CNOT gate which primarily determines the running time on a ion trap quantum computer. First, we give the implementation of basic arithmetic with known lowest number of CNOT gate and the construction of improved modular exponentiation of a constant by accumulating intermediate date and windowing technique. Then, we precisely estimate the number of improved quantum circuit to perform Shor’s algorithm for factoring a n -bit integer, which is 217 n 3 log 2 n + 4 n 2 + n . According to the number of CNOT gates, we analyze the running time and feasibility of Shor’s algorithm on a ion trap quantum computer. Finally, we discuss the lower bound of CNOT numbers needed to implement Shor’s algorithm.


Introduction
Integer factorization is finding a non-trivial factor of the given compositive number.It is believed to be a hard mathematic problem for which no classical polynomial-time algorithm has yet been discovered.As the most representative and compelling quantum algorithm, Shor's algorithm [1,2] can factor integers with only polynomial time in theory, which offer an exponential speedup over its classical counterpart(number field sieve [3] up to now).It poses a serious threat to the classical public cryptosystem whose security is based on integer factorization, including RSA [4] which is widely used in key exchange and digital signature.
Quantum computer implements quantum computation which accepts quantum states representing superposition of all different possible inputs and simultaneously evolves them into corresponding outputs by a sequence of quantum gates.Quantum computation can be decribed as a quantum circuit in which the quantum gates represent the unitary transformations.Since the appearance of Shor's algorithm in 1994, a lot of efforts have been devoted to the design and improvements of its quantum circuit and its improvement in terms of the number of qubits and the depth of circuit.The first work is [5], in which Vedral et al. provided an explicit quantum circuit construction of basic arithmetic operations from addition to modular exponentiation.Beckman et al. [6] estimated the number of qubits and operations required of Shor's algorithm: a n-bit integer can be factored in O(n 3 ) operations with 5n + 1 qubits.Miquel et al. [7] analyzed the impact of losses and decoherence on the quantum circuit of Shor's algorithm.In terms of the number of qubits required: Beauregard [10] contructed a quantum circuit of Shor's algorithm with 2n + 3 qubits with using the QFT-based adder [8] and semiclassical QFT [9]; Takahashi and Kunihiro [11] reduced the number of qubits to 2n + 2 with comparator in modular addtion operation, which is the lowest known number of qubits so far; Häner et al. [12] constructed a quantum circuit of Shor's algorithm with 2n + 2 qubits as well based on a purely Toffoli-based adder which eliminats most of the cost overheads originating from rotation synthesis and enable testing and debugging.In terms of the depth of circuit: Zalka [13] reducd the depth of circuit of Shor's algorithm to 2 17 n 1.2 but required 24n ∼ 96n qubits with FFT-based fast multiplication; Pavlidis and Gizopoulos's circuit [14] implemented division to compute modular mutiplication, having a depth of 2000n 2 and requiring 9n+2 qubits.
In this paper, we consider quantum circuit of Shor's algorithm with low CNOT-count, that is a completely different perspective from previous work.Clifford+T set is general to approximate an arbitrary unitary transform with arbitrary precision in quantum computation.As the only double-qubit gate in Clifford+T set, the time of the CNOT gate acting on the non-adjacent qubits is much higher than that of other single-qubit gates in the ion trap quantum computer.Furthermore, CNOT gates cannot be parallel in ion trap quantum computer, even if acting on completely unrelated qubits.Therefore, the total CNOT-count in quantum circuit primarily determains the running time of quantum algorithm in the ion trap quantum computer.We improve the quantum circuit of Shor's algorithm to reduce CNOT-count by applying window technique [15], Montgomery multiplication [16] and pebbling technique [17] to modular exponentiation operation, which is the most computational intensive ingredient of Shor's algorithm.Besides, we also esimate the time to run Shor's algorithm once in the ion trap quantum computer and analyze the feasibility of Shor's algorithm, based on the CNOT-count of our improved circuit and the lower time limit of CNOT gate in the ion trap quantum computer in [18].The rest of this paper is organized as follows.Section 2 describes Shor's algorithm and the basic arithmetic circuits constructed previously.Section 3 is our works on the arithmetic circuits and the circuit of modular exponentiation operation as well as analysis of corresponding CNOT-count.Section 4 is about the lower bound of CNOT gate needed to run Shor's algorithm.The last part is the result about the time estimation and feasibility of Shor's algorithm.In the figures of this paper, we use the black triangles on the right side of gate symbols to indicate quantum registers which are modified and holding the result of the computation.

Shor's algorithm
Given the integer N to be factored, Shor's algorithm consists of the quantum order-finding and the classical post-processing.Let a be a randomly chosen integer which is less than N and coprime to N , the order of a is the least positive integer r such that a r ≡ 1 mod N .The quantum orderfinding shown in Figure 1 requires two work quantum registers, The first quantum register consists of 2n qubits which is set to |0 initially and the second quantum register consists of n qubits which is set to |1 initially, where n = log N is the number of bits to represent N .As shown in figure 1, there are four steps in the quantum order-finding: i Apply Hadamard transform H ⊗2n to the first quantum register, create a superposition state in which the elements correspond to the exponents in step ii: ii Compute the modular exponentiation by the constant a:  The most expensive operation in the quantum order-finding is the modular exponentiation by the classical known constant a in step ii, which is denoted as M E(a) in figure 1. M E(a) applies the following transform to its input quantum states: According to the binary expansion of x: Notice that a x can be written as : Therefore, starting from |1 , the computation of a x mod N can be decomposed into 2n modular multiplications by the classical known constant a 2 i mod N controlled by the corresponding qubit |x i where i takes value between 0 and 2n − 1.Similarly, the product ax which multiplies the input quantum state |x by the classical known constant a can be written as: Therefore, starting from |0 , the computation of ax mod N can be decomposed into n modular additions by the classical known constant a2 i mod N controlled by the corresponding qubit |x i where i takes value between 0 and 2n − 1.Since M M (•) operation multiplies the input quantum state |x by the classical known constant to a different quantum register which is initially set to |0 , the direct approach to compute a x mod N with 2n M M (•) operations will accumulate the intermediate data of each M M (a 2 i ) operation.The following method proposed by Vedral et al. [5] is widely adopted to implement the in-place modular multiplication to the input quantum state |x by the classical known constant a: i Apply M M (a) to the input quantum register and the ancillary quantum register initially set to |0 : ii Swap the quantum states of the the input quantum register and the ancillary quantum register: iii Apply M M −1 (a −1 ) to the input quantum register and the ancillary quantum register: |ax mod N |0 .
Therefore, as shown in figure 4, M E(a) operation can be constructed by M M (•) operations.

Previous works on basic arithmetic
We compared the previous different types of basic arithmetic circuits and selected the circuits used to construct the Shor's algorithm with the fewest CNOT gates required.
Addition.We compared different types of addition circuits and found that Cuccaro et al's addition [8](hereinafter the CDK adder) is the lowest known in the CNOT-count.Based on the standard decomposition of Toffoli gate into Clifford+T set which contains 6 CNOT gates [19], the CNOT-count of CDK adder for n-bit binary integers is 16n + 1.
Addition by a constant.Addition by a constant a can be constructed from CDK adder, as shown in figure 5. First bind the constant a to an input quantum register of CDK adder initially in |0 , then apply CDK adder to compute the sum, finally recover the input quantum register to |0 by the same way as first step.The binding operation of a constant a is to apply X gates to the appropriate qubits which are corresponding to 1 in the binary representation of a. Different form addition with 2 unknown addicands both in the form of quantum state, here the adder can be simplified by the known constant.Therefore, the CNOT-count of addition by a constant for n-bit binary integers is 13n + 1.  Comparison.We use the comparison based on the M AJ blocks given in [21], of which the CNOT-count is 16n+1 for n-bit integers.Comparison by a constant.We use the comparison by a constant given in [20] as well, which is similar to the construction of addtion by a constant.And the CNOT-count of comparison by a constant for n-bit binary integers is 12n + 1.

Improvement of basic arithmetic
Based on the basic arithmetic circuits in the previous section, we improve the modular addition, shift and modular doubling circuits.At the same time, we construct the controlled comparison and controlled modular addition circuits according to the previous comparison and modular addition circuits.
Controlled comparison.As shown in figure 7, we construct the controlled comparison by controling the CNOT gate and X gate on the qubit holding the result of comparison.The CNOT-count of controlled comparison for n-bit integers is 16n + 7. Modular addition.We improve the modular addition as show in figure 8, in which the substraction is the inverse of addition.The CNOT-count of our modular addition is 61n + 16.Shift.Note that there is no need for swaping two qubit with SWAP operation if a qubit is known in the state of |0 , so that we contruct the left shift and right shift for a n-qubit quantum register as shown in figure 10, of which the CNOT-count are both 2n.Windowing technique.Windowing technique is widely used to reduce the number of operation in classical computation, such as the fast implementation of CRC parity check [23].Gidney [15] showed that it is also useful to optimize quantum circuits in quantum computation and presented various windowed quantum arithmetic circuits, including a windowed modular exponentiation with nested windowed modular multiplications.The key of windowing technique in quantum computation is to merge several controlled operation acting on the target quantum state into a single operation acting on the target quantum state and a corresponding special quantum state which is created and recovered by table lookup operation.
Since M E(a) operation can be decomposed into a series of controlled M M (•) operations, it is suitable to apply windowing technique to M E(a) operation.We iterate all the control qubits in groups with the window size m instead of individually.For the m controlled M M (•) operations in each group, we merge them as the following steps: i Retrieve the value which the result is actually multiplicated by from the precomputed table by the m control qubits.Create the special quantum state corresponding to the value found in the ancillary quantum register.
ii Modular multiply the value of the target quantum state by the value of the special quantum state.
iii Retrieve the value which the result is actually multiplicated by from the precomputed table by the m control qubits.Recover the special quantum state corresponding to the value found in the ancillary quantum register.
The table lookup operation in step i and iii performs |x |0 → |x |T x , where T x is the value found from the classical precomputed table addressed by x.We give the quantum circuit of table lookup operation without control qubit based on [15] showed in Figure 13.
Figure 13: Quantum circuit of table lookup operation in the case where the window size m = 3.If the control qubits contains the value i, then bind the value found T i from the classical precomputed table addressed by i to the lowest register by applying the X gate to the appropriate qubits of the lowest register.The binding operation is denoted as a circle with the value to bind in the figure .The modular multiplication with the two factors both in the form of quantum state can be computed by modular multiplication operation denoted as M M , which performs the following transform to its input quantum states: The quantum circuit of windowed M E(a) operation based on the construction with the intermediate date accumulated is shown in figure 14.
Modular multiplication.Roetteler et al. [22] showed the quantum circuits of two approaches to compute modular multiplication of two factors both in the form of quantum state: Fast modular multiplication [24] and Montgomery modular multiplication [16].
By the binary expansion of the first factor x, the product x • y can be written as: So fast modular multiplication decompose x•y mod N into a sequence of conditional modular additions and modular doublings.Based our constructions of basic arithmetic, we improve the circuit of fast modular multiplication which is shown in figure 15 and the CNOT-count is 102n 2 − 54n − 42.

Discussion and conclusion
Given the n-bit integer to be factored and the window size m, the precise number of CNOT gate in our quantum circuit for modular exponentiation is ( 2n m −1)[(n+13)2 m +90n 2 +34n−10]+(n+13)2 2n−m( 2n m −1) +102n 2 −54n−42, which achieves the minimum related to the window size m for the input bit size n.We fit the data with a range of input bit size and get the result of 217 n 3 log 2 n .Combine the 4n 2 + n CNOT gates used for QFT on 2n qubits, We conclue that the total number of CNOT gate ro run Shor's algorithm once is 217 n 3 log 2 n + 4n 2 + n. [18] gives the lower limit of time for executing a CNOT gate in an ion trap quantum computer, which is about 2.85 × 10 −4 s.
Combined with the number of CNOT to run Shor's algorithm, the time to break 1024-bit RSA is at least 72 years after three levels of coding.If we assume that the time required to run the Shor's algorithm is T , the time required to execute a CNOT is t and the lower bound of the CNOT gate is N , which is a function of the number of qubits n.So the lower bound of running Shor's algorithm can be expressed as T = N (n)t.Modular exponentiation can be constructed by modular multiplication and modular multiplication can be constructed by modular addition.So the number of CNOT gates required for modular exponentiation must be greater than modular addition.Since the quantum circuit of modular addition adds the modular operation, the CNOT number required by modular addition must be larger than that of addition circuit.For two n qubits x, y, there has c i+1 = x i c i + y i c i + x i y i = x i + (x i + y i )(x i + c i ), where x i and y i are the i-th bit of the binary representation of x, y, c i+1 is the i-th carry, and s i is the sum of the i-th bit.Therefore, each qubit addition requires at least 1 Toffoli and 3 CNOTs.So the addition of n qubits requires at least 9n CNOTs.Thus, we can get that the lower bound of the number of CNOT gates required to run Shor's algorithm is 9 n 3 log 2 n .The range of this results is a bit large, and we shoule calculate the lower bound more accurately in our next work.
In this paper, we improve the quantum circuit of basic arithmetic including addition, controlled addition and comparison.We construct the circuit of controlled comparison, controlled modular addition and modular multiplication.Based on these work, the quantum circuit running Shor's algorithm is improved, and we calculate the number of CNOT gates required by the algorithm.The time required by Shor's algorithm to attack 1024-bits RSA is estimated.Finally, the lower bound of the required CNOT gate independent of algorithm improvement is discussed.

− 1 x=0
|x |a x mod N ;iii Apply the quantum Fourier transform QF T 2 2n to the first quantum reg-|a x mod N ; iv Measure the first quantum register and find the order of a with high probability by classical post-processing to the measured data.

Figure 2 :
Figure 2: Schematic diagram of the compution of a x mod N

Figure 3 :
Figure 3: Schematic diagram of the computation of ax mod N

Figure 4 :
Figure 4: The construction of M E(a) operation by M M (•) operations

Figure 5 :
Figure 5: Addition by a constant a constructed from CDK adder

Figure 6 :
Figure 6: Controlled addition by a constant based on the addition by a constant.

Figure 10 :
Figure 10: Left shift and right shift

Figure 12 :
Figure 12: The construction of M E(a) opertion by M M (•) operations with the intermediate date accumulated.

Figure 14 :
Figure 14: Windowed M E(a) operation based on the construction with the intermediate date accumulated.

Figure 15 :
Figure15: Fast modular multiplication.We replace the first controlled modular addition with a controlled copying operation since the modular sum equals to another addicand if an addicand is 0.

Figure 16 :
Figure 16: Whole quantum circuit of Montgomery modular multiplication which performs |x |y |0 → |x |y | x•y 2 n mod N .The forward Montgomery modular multiplication computes x•y 2 n mod N with the information of intermediate result in each round held in ancillary qubits.To recover the ancillary qubits of the forward Montgomery modular multiplication, one can copy the result x•y 2 n mod N to another new quantum register and apply the backward Montgomery modular multiplication, that is to run the circuit of forward Montgomery modular multiplication backwards.

Figure 17 :
Figure17: Forward Montgomery modular multiplication.We replace the first controlled modular addition with a controlled copying operation since the sum equals to another addicand if an addicand is 0.