Loop-Reduction LLL Algorithm and Architecture for Lattice-Reduction-AidedMIMODetection

We propose a loop-reduction LLL (LR-LLL) algorithm for lattice-reduction-aided (LRA) multi-input multioutput (MIMO) detection. The LLL algorithm is an iterative algorithm that contains many check and process operations; however, the traditional LLL algorithm itself possesses a lot of redundant check operations. To solve this problem, we propose a look-ahead check technique that not only reduces the complexity of the LLL algorithm but also produces the lattice-reduced matrix which obeys the original LLL criterion. Simulation results show that the proposed LR-LLL algorithm reduces the average number of loops or computation complexity. Besides, it also shortens the latency of clock cycles about 19.4%, 29.1%, and 46.1% for 4× 4, 8× 8, and 12× 12 MIMO systems, respectively.


Introduction
To increase the transmission capacity, multiple-input multiple-output (MIMO) system has been proposed for the next generation wireless communication systems, and therefore the need for a high-performance and low-complexity MIMO detector becomes an important issue.The maximum likelihood (ML) detector is known to be an optimal detector; however, it is impractical for realization owing to its great computational complexity.Addressing this problem, researchers have proposed tree-based search algorithms, such as sphere decoding [1] and K-Best decoding [2], to reduce the complexity with near-optimal performance.On the other hand, channel matrix preprocessing technique, such as lattice-reduction-aided (LRA) detection [3], has been proposed to improve the MIMO detection performance.
The lattice reduction transforms the channel matrix into a more orthogonal one by finding a better basis for the same lattice so as to improve the diversity gain of the MIMO detector.The Lenstra-Lenstra-Lovász (LLL) algorithm is a well-known lattice reduction algorithm for its polynomial execution time.In the literature [4], the LLL algorithm is widely employed to improve the lattice-reduction MIMO detection or to reduce the MIMO detection complexity.However, the LLL algorithm has many redundant check operations that have never been addressed in the literature.These redundant operations lead to many unnecessary computations and thus increase the processing latency and complexity.Therefore, we propose a look-ahead check technique to detect and avoid the unnecessary check operations in the LLL algorithm.This technique not only generates the lattice-reduced matrix which obeys the size reduction and LLL reduction in the original LLL algorithm but also applies to real-and complex-value LLL algorithm [5].
The remainder of this paper is organized as follows.Section 2 briefly describes the signal model for MIMO detection.In Section 3, we introduce the lattice-reductionaided MIMO detection and the LLL algorithm.In Section 4, we demonstrate the proposed LR-LLL algorithm, and in Section 5 we present the simulation and analysis results.The corresponding hardware architecture and processing cycle counts estimation is shown in Section 6.Finally, we summarize our conclusions in Section 7.

System Model
A narrow-band N r × N t MIMO system consisting of N t transmitters and N r receivers can be modeled by where x ∈ A Nt is the transmitted signal vector, y ∈ C Nr is the received signal vector, H = [h 1 , h 2 , . . ., h Nt ] represents a flatfading channel matrix, and n ∈ C Nr is the white Gaussian noise with variance σ 2 n .All the vectors h i are independent and identically distributed complex Gaussian random vectors with zero means and unity variances.Set A consists of the constellation points of the QAM modulation.Then, we re-formulate the equivalent real channel matrix as follows: ( Then, the dimension of H r becomes N × M, where M = 2N t and N = 2N r .The vectors y r and n r belong to R N and x r ∈ A M .The QR decomposition is often applied in the pre-processing of the MIMO detection because it provides decoding efficiency.Then, the channel matrix H r can be expressed by where Q r ∈ R N×M is an orthogonal matrix and R r ∈ R M×M is an upper triangular matrix.By multiplying Q T r on both sides of (2), we can obtain where Q T r n r is white Gaussian.In addition, we adopt column-norm-based sorted QR decomposition (SQRD) [6] because it not only enhances detection performance but also reduces the computational complexity of the lattice reduction [7].

Lattice Reduction
where {h r1 , . . ., h rN ∈ R N } are the basis vectors.The lattice reduction algorithm aims to find a unimodular matrix T(| det T| = 1 and all elements of T are integers) such that a more orthogonal H r = H r T has the same lattice as H r .Then, the signal model becomes (5) In practice, the transmitted signals x r do not belong to an integer set; however, we can still transform the signals x r ∈ A N into an integer set by linear operations such as scaling and shifting.Several lattice-reduction algorithms are described in the literature, and the LLL algorithm [11] is the most popular approach.Because QR preprocessing is often employed in the MIMO detector, the LLL algorithm is then modified for Q and R matrices [12], as shown in Algorithm 1.In the literature, lines (4) to (19) are often defined as a loop that can be decomposed into two parts: (1) lines ( 4) to (10) deals with the size reduction operations; and (2) lines ( 11) to (19) handle LLL reduction operations.The number of iterations performed in the size reduction depends on the index k, and the LLL reduction operation may increase or decrease the index k depending on the result of the LLL reduction check (δ|R Therefore, the number of loops certainly depends on the values in the R matrix, and thus the processing latency varies for different channel matrices.Moreover, we find that most of the computational complexity is contributed by the operations when the check conditions (μ / = 0 for size reduction and δ|R(k for LLL reduction check) are satisfied; that is, the size and Algorithm 2: The proposed loop reduction LLL algorithm using look-ahead check technique.
LLL reduction constraints are violated.Most important of all, redundant check operations occur very often when the index k decreases.Thus, the decrease of k is not always necessary because the size and LLL reductions have been checked in the last loop.We calculate the percentage of the redundant decreases of k and list them in the Table 1.We can see that the percentage of the redundant decrease of k achieves 67% for 2 × 2 lattice reduction and converge to 28% if the MIMO dimension is larger than 10 × 10.Therefore, we propose a look-ahead check technique to modify index k and avoid unnecessary check operations in the original LLL algorithm.

Look-Ahead Check
The number of loops is often treated as a benchmark for computation complexity and latency in the literature on LRA MIMO detection [7].In order to eliminate the redundant check operations in the LLL algorithm, we propose a lookahead check technique by classifying the original loops to forward loop and back loop.And the loop reduction LLL algorithm is shown in Algorithm 2. The corresponding flow chart of the proposed algorithm and each loop is shown in Figure 1.

Back Loop.
We define the back loop as the loop that only contains LLL reduction check and LLL violation processing as shown in Figure 1.We find that the size reduction constraint will not be violated after the k is decreased because the R(k, k) has already been size reduced in the previous processing and is not changed in the column-swapping operation.And the givens rotation will only change the k and k − 1 row while the row value above k − 2 row remains size reduced.That means only LLL reduction check is required in the back loop.This LLL reduction check is named as backstate LLL reduction check to differentiate with the origin one.Because only the LLL reduction part needs to be executed in this back loop, we use a while loop in our algorithm to avoid the redundant size reduction operation.Nonetheless, there is still one case that the size reduction will process.If the division result exactly equals 0.5, the original LLL algorithm will do the size reduction operation in the back loop.But our algorithm will skip.This will produce a different latticereduced matrix at last.However, to do the size reduction or not to do in this case will both produce the matrix that obeys the LLL lattice reduction criterion.Although, we cannot prove the performance is the same mathematically, we will show that their performance is the same through the Monte-Carlo simulation in latter section.So the lattice reduction   Notice that if the stay-state size reduction constraint (μ / = 0) is not violated, the LLL reduction constraint must not be violated because the R(k, k) has already been LLL reduced in the previous processing and all values remain unchanged.Therefore, we can also perform the stay-state size reduction check ahead.If the stay-state size-reduction constraint is not violated, we can simply increase the index k by 2 to skip a redundant LLL reduction and enter the forward loop.

Simulation Results
To verify the proposed LLL algorithm, we simulate the LLLaided MIMO detections based on the MIMO system described in Section 2, and we employ sorted QR decomposition in all MIMO detectors.The LLL-reduction parameter δ equals 0.75, as suggested in [11].Table 2 shows the average loop numbers of the original and the proposed LLL algorithms for different antenna numbers.Forward loop and back loop are all counted as a loop in our algorithm.The proposed LLL algorithm can reduce the average number of loops to 93% ∼94% of the original LLL algorithm.The BER versus SNR curve is shown for 4×4 and 8×8 MIMO systems in Figures 2 and 3, respectively.The performance is exactly the same for our algorithm and the original LLL algorithm.
We also analyze the computational complexity and latency of our algorithm.The results for 4 × 4 and 8 × 8 MIMO systems are listed in Tables 3 and 4. The computation is divided to four operations such as addition, multiplication, division, and givens rotation.Our algorithm is lower in total computational complexity and especially in the division which tends to cost more time for computation.The lower ratio is just like the loop-reduced ration.But only average computational complexity cannot clearly show the advantage of our algorithm.Since the original LLL algorithm contains lots of redundant checks operation which are unable to process in parallel, it will result in long average processing time to complete the lattice reduction operation.We try to simulate the latency by parallelizing all the possible operations.The latency counts are as follows: the line (5) to line (9) in our algorithm is counted as one division, one addition and one multiplication.The LLL reduction check operation contains four multiplications and two additions.The column swap operation is counted as no operation delay.The givens rotation counts one at each back loop.And the stay-state size  reduction is counted as a division operation.The latency is shown in the Table after the dashed line.The saving is about 22%∼29% and grows with antenna number.

Hardware Architecture
6.1.Top Structure.In this paper, we proposed a very intuitive structure for our LR-LLL algorithm in Figure 4.The center controller counts the index k by the LLL violation results.And it will send control signals to choose the specific matrix element to the input of the combinational circuit.We can also call size reduction part as size reduction loop and LLL reduction part as LLL reduction loop, respectively.The update circuit for the remaining R matrix, T matrix, and Q matrix are omitted for simplification.In this architecture, CORDIC circuit has two pipelined stages.So it required one cycle for size reduction loop and four cycles for a LLL reduction loop.The traditional LLL algorithm always processes the forward loop which contains the execution of the whole circuit.While using LR-LLL algorithm, some  forward loops will replace by back loops.The average cycle counts for the LLL algorithm, and our LR-LLL algorithm is listed in Table 5.We can find out that as the antenna number grows, the reduction of average cycle grows.And the FPGA results are shown in Table 6.In [5], the complexvalued LLL lattice reduction algorithm is proved to have lower computational complexity than real-valued system.This is mainly due to the double sized of the real number system comparing to complex number.So the hardware or cycle counts may be larger than the previous two complex number works.division architecture.In Figure 5, we show a four-stage longdivision architecture for five bits output divr circuit.The size reduction update circuit is composed of multiplication and addition circuit.Instead of calculating the square norm to do the LLL reduction comparison, we choose the CORDIC vector mode circuit to calculate the square root of the norm which may also be the output if the LLL check violates.The square root of δ is set to 0.875 to approximate the square root of 0.7.CORDIC rotation mode is used to do the Givens rotation of the algorithm.The output of the comparison circuit is the LLL reduction violation check results which will control the center controller and also enable the update circuits.The LLL reduction update block contains multiple CORDIC rotation circuits to do the givens rotation of remaining row element of R and Q matrix.It also contains a swap circuit for T matrix.

Conclusion
In this paper, we propose a look-ahead check technique to eliminate unnecessary check operation in the LLL algorithm.The proposed algorithm not only reduces the average number of loops in the LLL algorithm but also reduces the computation complexity and latency of LLL algorithm.We also proposed a very intuitive architecture to estimate the clock cycle saving of our algorithm.The saving is dramatically increased while the antenna number grows.Therefore, we believe that the proposed loop reduction LLL algorithm benefits the lattice-reduction-aided MIMO detection.

Figure 1 :
Figure 1: The flow chart for loop-reduction LLL algorithm.

Table 1 :
The percentage of no operation after index k is decreased. R

Table 2 :
Average loop of lattice-reduction Algorithms for different antenna numbers.
performance will not suffer any degradation.Using the lookahead check technique, we can more precisely determine the next k value at the end of each loop.4.2.Forward Loop.Forward loop is just like the original loop defined in the previous section except once the LLL reduction is violated, it will enter the back loop.If backstate LLL reduction constraint is not violated, we enter the stay-state size reduction check to predict the next index k.

Table 3 :
Average computational complexity and latency of lattice-reduction Algorithms for 4 × 4 MIMO detection.

Table 4 :
Average computational complexity and latency of lattice-reduction Algorithms for 8 × 8 MIMO detection.

Table 5 :
Average Cycle of Lattice Reduction Algorithms for different antenna numbers.

Table 6 :
FPGA implementation results and comparison.