Low Complexity Submatrix Divided MMSE Sparse-SQRD Detection for MIMO-OFDM with ESPAR Antenna Receiver

Multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) with an electronically steerable passive array radiator (ESPAR) antenna receiver can improve the bit error rate performance and obtains additional diversity gain without increasing the number of Radio Frequency (RF) front-end circuits. However, due to the large size of the channel matrix, the computational cost required for the detection process using Vertical-Bell Laboratories Layered Space-Time (V-BLAST) detection is too high to be implemented. Using the minimum mean square error sparse-sorted QR decomposition (MMSE sparse-SQRD) algorithm for the detection process the average computational cost can be considerably reduced but is still higher compared with a conventional MIMOOFDM system without ESPAR antenna receiver. In this paper, we propose to use a low complexity submatrix dividedMMSE sparse-SQRDalgorithm for the detection process ofMIMOOFDMwithESPARantenna receiver.The computational cost analysis and simulation results show that on average the proposed scheme can further reduce the computational cost and achieve a complexity comparable to the conventional MIMO-OFDM detection schemes.


Introduction
In multipath fading channels, multiple input multiple output (MIMO) antenna systems can achieve a great increase in the channel capacity [1].MIMO-OFDM combines the advantages of the MIMO systems with orthogonal frequency division multiplexing (OFDM) modulation, achieving a good performance for frequency selective fading channels.Due to these advantages, MIMO-OFDM allows high data rates in wireless communications systems.It is used in the wireless local area network (WLAN) standard IEEE 802.11n [2] and is also considered for the next-generation systems.
One of the limitations of MIMO-OFDM is that it requires one radio frequency (RF) front-end circuit for every receiver and transmitter antenna.Comparing MIMO-OFDM 2Tx-2Rx with MIMO-OFDM 2Tx-4Rx, MIMO-OFDM 2 × 4 can achieve better diversity gain and bit error rate performance but requires more RF front-end circuits, A/D converters, and FFT blocks for every additional branch.
In [3,4] a MIMO-OFDM 2 × 2 scheme with electronically steerable passive array radiator (ESPAR) antenna receiver diversity has been proposed.It utilizes for every receiver a 2-element ESPAR antenna whose directivity is changed at the same frequency of the OFDM symbol rate.Compared to the conventional MIMO-OFDM 2 × 2 systems, this scheme gives additional diversity gain and improves the bit error rate performance without increasing the number of RF frontend circuits.For the detection the zero forcing (ZF) Vertical-Bell Laboratories Layered Space-Time (V-BLAST) algorithm [5,6] is used but, due to the large size of the channel matrix, the required computational effort is very high.
In order to reduce the computational cost of the detection process of the scheme proposed in [3,4], the use of a minimum mean square error sparse-sorted QR decomposition (MMSE sparse-SQRD) algorithm based on the SQRD algorithm introduced in [7,8] was proposed by the authors in [9].The computational cost reduction is achieved by exploiting the sparse structure of the channel matrix.This detection algorithm considerably reduces the average computational cost and also improves the bit error rate performance compared to the original scheme [3,4].A submatrix divided MMSE sparse-SQRD algorithm for the detection process of MIMO-OFDM with ESPAR antenna receiver was proposed by the authors in [10] for further reduction in the computational cost.This algorithm divides the channel matrix into  smaller submatrices reducing the computational cost but adding a small degradation in the bit error rate performance.
This paper is an extension of [10] including results of the bit error performance and computational cost for higher order submatrix division schemes.Also, another approach to further reduce the bit error degradation originated by the submatrix division algorithm is introduced.
The rest of this paper is organized as follows.Sections 2 and 3 gives a brief background description about OFDM and MIMO-OFDM with ESPAR antenna receiver.In Section 4, detection algorithms based on QR decomposition are shown.Then in Section 5 a detailed explanation about the MMSE sparse-SQRD algorithm is included.In Section 6 the proposed submatrix divided scheme is described.The computational cost analysis and simulation results are presented in Sections 7 and 8, respectively.And finally, in Section 9 conclusions are included.

OFDM with ESPAR Antenna
ESPAR is a small size and low power consumption antenna [11,12].It is composed by a radiator element connected to the RF front-end and one or more parasitic (passive) elements terminated by variables capacitances.The beam directivity can be controlled modifying the variables capacitances.This antenna requires only one RF front-end and therefore is known also as single RF port antenna array.
In [13] an OFDM receiver using ESPAR antenna is proposed.In this scheme the directivity of the ESPAR antenna is changed by a periodic wave whose frequency is the OFDM symbol rate.The block diagram of this scheme is shown in Figure 1.
A two-element ESPAR antenna is utilized.The periodic variation of the directivity causes intercarrier interference (ICI) in the received signal.The ICI is caused by the addition of phase shifted components to the received signal.The frequency domain equalizer in Figure 1 uses both the shifted and nonshifted components in the detection.Due to this effect this scheme obtains diversity gain, therefore improving the bit error rate performance.

MIMO-OFDM with ESPAR Antenna Receiver
Based on [13], a MIMO-OFDM receiver with ESPAR antenna was proposed in [3,4] and is described in this section.The block diagrams of the receiver and transmitter are shown in Figures 2 and 3, respectively.The transmitter is based on the WLAN standard IEEE 802.11n [2].For simplicity forward error correction (FEC) interleaver blocks are not considered in the system.The receiver uses a 2-element ESPAR antenna where the directivity is also periodically changed according to the OFDM symbol rate.An MMSE channel estimator derived in [3,4] is used and the detection process is carried out by the ZF V-BLAST detector.

Channel Estimation.
For the channel estimation [13], let P 1 be the pilot symbol and its cyclic shifted P 2 .The received signal after the FFT processor at the th Rx is where h  , and h  , are the channel response between the th receive antenna and lth transmit antenna for the phase nonshifting (ns) and phase shifting (s) elements respectively.The matrix G represents the frequency shift due to directivity variation in ESPAR antenna and z is the additive white Gaussian noise (AWGN) vector.
From (1) the autocorrelation matrix R u = [u  u   ] is given by where  2  is the noise variance and R ℎ is the covariance matrix that represents the delay profile of the channel.Considering that the phase nonshifting (ns) and phase shifting (s) elements are spatially separated enough to be uncorrelated, the crosscorrelation matrices B  = [uh  ] are given by Using the MMSE criteria the channel response is given by  The channel matrix H has a size of 2( + 2) × 2, where  is the number of data subcarriers.

Detection.
For the detection process the ZF V-BLAST [6] algorithm is used.In this algorithm the received signal vector is multiplied with a filter matrix G ZF , that is, calculated by where H † is the Moore-Penrose pseudoinverse of H.The matrix G ZF is calculated in a recursive way after zeroing one column of the channel matrix H; for this scheme the pseudoinverse is calculated 2N times.Due to the large size of the channel matrix H, calculating the pseudoinverse demands a very high computational effort and for this reason the detection process is the main limitation of this scheme.

QR Decomposition-Based Detection
where   is the noise standard deviation, I 2 is an identity matrix of size 2 × 2, and 0 2,1 is a column vector with 2 zero elements.
The QR decomposition of the extended channel matrix H can be expressed by where Q is a unitary matrix and R is an upper triangular matrix.And the extended vector of received symbols u is given by Then ( 8) is multiplied by Q  to obtain where Q  is the Hermitian transpose of Q and ] = Q  .The statistical properties of ] remain unchanged because Q is a unitary matrix.

MMSE-SQRD.
In [8] an MMSE sorted QRD detection algorithm based on the modified Gram-Schmidt algorithm is introduced.The starting condition is that Q = H; then the norms of the column vectors of Q are calculated.For every step the column of Q with the minimum norm is found to maximize R , and the columns of Q are exchanged before the orthogonalization process.This algorithm calculates an improved matrix R that reduces the error propagation through the detection layers.During the calculation, a permutation vector p carries the column exchanging operations for reordering the detected symbols at the end of the algorithm.
After the matrices R and Q are calculated, then y is obtained according to (9) and the symbols are detected iteratively.After the symbols are detected, they are reordered using the permutation vector p to find the original sequence of the detected symbols.

MMSE Sparse-SQRD Algorithm
The extended channel matrix H whose size is 2(2 + 2) × 2 is shown in (10) and as we can see it is a sparse matrix.The MMSE sparse-SQRD algorithm is based on the MMSE-SQRD algorithm [8] and exploits the sparse structure of H to reduce the computational cost of the detection process: Analysing H given in (10) we can see that every column has only five nonzero elements so for the norm calculation of the column vectors of Q only these elements should be used.Also the positions of the nonzero elements are fixed so we have this information contained in a matrix as input.Using this information the norm calculation is shown in lines 5-9 of Algorithm 1.
In the orthogonalization process of the algorithm these two calculations are performed in an iterative way. , denote the elements of the matrix R and q  , q  are column vectors of the matrix Q.In (11), the multiplication of the zero elements of the column vectors does not influence the final result so these multiplications can be avoided.A vector containing only the indices of the nonzero elements of the column vectors is obtained in line 14 so the number of operations required to calculate  , is reduced without influencing the final result.This is shown in lines 19-21 in the algorithm.The same strategy is used also in lines 15-17 and 23-25.
Also, due to the sparse structure of (10), the result of (11) can be zero.In this case calculating ( 12) is unnecessary because it does not change the value of q  so it can be avoided using the condition in line 22.Also we can consider that the calculation of ( 9) can be simplified as where Q 1 is a matrix with the same size of the channel matrix H.This is shown in lines 30-31 of the algorithm.Using these analysed criteria the MMSE sparse-SQRD algorithm can achieve the same bit error rate performance of the MMSE-SQRD algorithm but with a considerable computational cost reduction.

Submatrix Divided Proposed Algorithm
In order to further reduce the computational cost of the detection process an algorithm based on submatrix division of the channel matrix is proposed.The block diagram of the proposed scheme is shown in Figure 4.
This detection scheme is composed by a submatrix builder block and  MMSE sparse-SQRD detectors.The submatrix builder is fed with the received symbols from the FFT processors and the channel state information obtained in the channel estimator.Its function is to build the submatrices and vectors for the detectors.Every detector is fed with a vector of received symbols s  and a channel submatrix H  .
From now on we consider the number of subcarriers to be  = 56 like in the IEEE 802.11n [2] standard.Let a = [ 1 ,  2 , . . .,  58 ]  denote the vector of received (transmitted and interfered) symbols from the FFT1 processor and let b = [ 1 ,  2 , . . .,  58 ]  denote the vector of received symbols from the FFT2 processor.For simplicity we consider that the extended channel submatrix H  is created inside the th detector.Now we will explain in detail the submatrix division case when  = 4 considering two variations with 2 or 4symbol overlapping.
The vectors of received symbols applied to the four detectors are denoted as During the sorting process of the detector 1, the columns containing the channel matrix nonshifted (ns) elements associated with the subcarrier −14 are used first regardless of its norm.It reduces the degradation introduced by these elements in the upper layers during the detection process.The same is performed in the detector 3 with the nonsubcarriershifted elements of the subcarrier +15.
In the vectors of detected symbols the overlapped detected elements  15 ,  71 in vector x 1 2 and  43 ,  99 in vector x 3 2 are discarded because they have a higher probability of error.

Quarter-Size Submatrix (𝑘 = 4) with 4-Symbol Overlapping.
In this subsection another variation with 4-symbol overlapping is introduced.The objective of this idea is to further reduce the degradation in the bit error rate performance created by the submatrix division.Similar to the previous subsection we divide the channel matrix into four submatrices denoted as H 14 , H 24 , H 34 , and H 44 .These matrices are shown in ( 23), ( 24), (25), and (26), respectively.In this case the vectors of received symbols applied to the four detectors are denoted as ) .
And the vectors of detected symbols obtained from the detectors are denoted as In this variation 4 symbols  15 ,  16 ,  15 ,  16 in vectors s 1 4 and s 2 4 are overlapped.Similar to the previous subsection the symbols  15 ,  15 in vector s 2 4 are compensated according to (20) using the elements of H 14 and x 1 4 .We also overlap symbols  44 ,  45 ,  44 ,  45 in s 3 4 and s 4 4 .In the same way the symbols  44 ,  44 in vector s 4 4 are compensated according to (21) using the elements of H 34 and x 3 4 .
Also the channel matrix elements associated with the subcarriers −14 and −13 are included in both H 14 and H 24 .In the same way the elements associated with the subcarriers +15 and +16 are included in H 34 and H 44 .During the sorting process of the detector 1, the columns containing the channel matrix elements associated with the subcarriers −14 and −13 are used first regardless of its norm.The same is performed in the detector 3 with the elements of the subcarriers +15 and +16.
In the vectors of detected symbols the overlapped elements  15 ,  16 ,  71 ,  72 in vector x 1 4 and  43 ,  44 ,  99 ,  100 in vector x 3 4 are discarded because they have a higher probability of error.

Computational Cost
The computational cost is analysed in terms of the number of complex floating point operations (flops) F required.As in [8], for simplicity we consider each complex addition as one flop and each complex multiplication as three flops.We cannot obtain a formula for the number of flops for the submatrix divided proposed algorithm because this number depends on the random sorting, so we obtained an average of the number of flops from the simulation results.Also, for comparison, the number of flops required by the ML detector [14] is where  is the constellation size.The ZF-VBLAST algorithm like in [15] requires where C is the number of columns and D is the number of rows of the channel matrix H.
In Tables 1 and 2 a computational cost comparison in terms of the average number of flops per subcarrier is presented for the case of 2-and 4-symbol overlapping, respectively.The tables show the number of flops per subcarrier for different submatrix sizes using different modulation schemes.The tables also include the number of flops for a full size channel matrix when the submatrix division scheme is not utilized.We can see that when the submatrix division order  increases the average number of flops per subcarrier is reduced.For the eighteen ( = 18) submatrix size, that is, the maximum achievable division of the scheme, we obtain the minimum average computational cost.Also we can see that the average number of flops is similar for the different modulation schemes.And, the number of flops for the 4symbols overlapping option is bigger compared with the other 2-symbols overlapping option.
Table 3 shows as reference the number of flops per subcarrier of the conventional MIMO 2 × 2 VBLAST and MIMO 2 × 2 MLD both without ESPAR antenna receiver.Also the computational cost using eighteenth-size ( = 18) submatrix division MMSE sparse-SQRD algorithm with 2 and 4-symbols overlapping is included.We can see that the average number of flops per subcarrier of the proposed submatrix division based algorithm is similar to the flops of MIMO 2 × 2 VBLAST and better than MIMO 2 × 2 MLD scheme for 16-QAM and 64-QAM modulation.For calculating the total computational cost required by the receiver, based on [16] the number of flops required by the two FFT blocks considering the data symbol and pilot symbol is where  FFT is the FFT size.Also the flops required by the channel estimator used for the ESPAR antenna receiver, that was presented in Section 3.1, are given by Table 4 presents the total flops per subcarrier required by the receiver using QPSK modulation.Also the complexity of the FFT, channel estimator, and detection blocks is included for the different systems.The MIMO-OFDM systems that are analysed in this table are the original system ESPAR antenna receiver using ZF-VBLAST detector [3,4], the system using full-size channel matrix detection, the system using the proposed submatrix divided ( = 18) with 4-symbol overlapping detection and the 2 × 2 VBLAST system without ESPAR antenna receiver.We can observe that using the proposed submatrix divided scheme ( = 18) with 4-symbols overlapping the computational cost required for the detection and also the total number of flops per subcarrier required by the receiver are reduced.

Simulation Results
To determine the bit error rate performance of the proposed algorithm, a software simulation model of MIMO-OFDM with ESPAR antenna receiver was developed in c++ using the it++ [17] communications library.It is important to note that the system does not include FEC and interleaver.In the simulation the proposed low complexity submatrix divided MMSE sparse-SQRD detection is implemented with quarter-size, eighth-size and eighteenth-size, submatrices.Both options, with 2-and 4-symbol overlapping, are implemented for the previous mentioned submatrix sizes.The configuration settings of the simulation are shown in Table 5.
In Figures 5 and 6 the bit error rate performance using QPSK modulation, for the cases of 2-and 4-symbol overlapping, respectively, is shown.In these figures the performance of the proposed algorithm for quarter-size ( = 4), eighthsize ( = 8), and eighteenth-size ( = 18) submatrices is included.To compare the degradation in the bit error performance created by the algorithm, the performance in the case of a full-size channel matrix without division is included.And also the performance of conventional MIMO-OFDM 2 × 2 VBLAST and MIMO-OFDM 2 × 2 MLD systems without ESPAR antenna receiver is shown.As we can see in Figure 6, with QPSK modulation and 4-symbol overlapping, the bit error rate performance degradation is minimum even for the case of eighteenth-size ( = 18) submatrix size.Also for a BER of 10 −3 , the proposed scheme with eighteenthsize ( = 18) submatrix size that achieves the minimum computational cost obtains an additional gain of about 11 dB compared to a conventional MIMO-OFDM 2 × 2 VBLAST system without ESPAR antenna receiver.
In the same way the bit error rate using 16-QAM modulation is shown in Figures 7 and 8.With 16-QAM modulation the degradation in the bit error rate performance is bigger compared with the QPSK results.In this case also the degradation is smaller in the case of 4-symbols overlapping.With 16-QAM for a BER of 10 −3 , the proposed scheme with 2 × 4 VBLAST are similar and steeper compared to MIMO 2 × 2 VBLAST.Therefore, our proposed scheme achieves a diversity order similar to MIMO 2 × 4 VBLAST without ESPAR antenna receiver.

Conclusion
In this paper, we have proposed a low complexity submatrix divided MMSE Sparse-SQRD algorithm for the detection of MIMO-OFDM with ESPAR antenna receiver.The computational cost analysis shows that this algorithm can further reduce the average computational effort achieving a complexity comparable to the common MIMO-OFDM detection schemes.We analysed two variations using 2-and 4-symbol overlapping.From the results the option with 4symbol overlapping obtains the best performance in terms of bit error rate, yet increasing the computational cost compared with the other option.The proposed detection scheme is flexible, so the best trade-off between computational cost and bit error rate can be selected depending on the design constraints.
The main application of MIMO-OFDM with ESPAR antenna receiver is to improve the bit error rate performance and diversity gain without increasing the number of RF front-end circuits.And utilizing the proposed low complexity detection scheme we can obtain this improvement in the performance with a low computational cost.The proposed detection scheme is specifically designed to reduce the computational cost of the detection of MIMO-OFDM with ESPAR antenna receiver but it can be also applied in the detection of similar systems that have a large size channel matrix.
In future research we will work in the channel estimator because it is necessary to reduce its computational cost.Also, we will add FEC and interleaver to the system for further improvement in the bit error rate performance.

Figure 1 :
Figure 1: Block diagram of OFDM receiver with ESPAR antenna.

Figure 3 :
Figure 3: Block diagram of the MIMO-OFDM receiver with ESPAR antenna.

Table 1 :
Average number of flops per subcarrier of the proposed algorithm with 2-symbol overlapping.

Table 2 :
Average number of flops per subcarrier of the proposed algorithm with 4-symbol overlapping.