For single-carrier transmission over delay-spread multi-input multi-output (MIMO) channels, the computational complexity of the receiver is often considered as a bottleneck with respect to (w.r.t.) practical implementations. Multi-antenna interference (MAI) together with intersymbol interference (ISI)
provides fundamental challenges for efficient and reliable data detection. In this paper, we carry out a systematic study on the interference structure of MIMO-ISI channels, and sequentially deduce three different Gaussian approximations to simplify the calculation of the global likelihood function. Using factor graphs
as a general framework and applying the Gaussian approximation, three low-complexity iterative detection algorithms are derived, and their performances are compared by means of Monte Carlo simulations. After a careful inspection of their merits and demerits, we propose a graph-based iterative Gaussian
detector (GIGD) for severely delay-spread MIMO channels. The GIGD is characterized by a strictly linear computational complexity w.r.t. the effective channel memory length, the number of transmit antennas, and the number of receive antennas. When the channel has a sparse ISI structure, the complexity of the
GIGD is strictly proportional to the number of nonzero channel taps. Finally, the GIGD provides a near-optimum performance in terms of the bit error rate (BER) for repetition encoded MIMO systems.
1. Introduction
In single-carrier mobile transmission systems not exploiting a guard interval, there are two sources of intersymbol interference (ISI): static ISI due to pulse shaping and receive filtering, and dynamic ISI due to the time-varying delay spread of the physical channel. Static ISI degrades the receiver performance, but can be avoided or limited by proper signal design. Dynamic ISI is particularly severe if the delay spread exceeds the symbol period, which is likely the case for high-rate data transmission. Dynamic ISI, however, provides a diversity gain in the time domain (fast fading) and the frequency domain (multipath fading). In addition to ISI, MIMO-ISI channels are characterized by another type of interference, namely, multi-antenna interference (MAI), which is caused by the simultaneous transmission of data streams via multiple antennas. MAI together with ISI manifests a fundamental challenge for efficient and reliable data detection. On the other hand, MAI provides a diversity gain in the spatial domain, from an information theoretical point of view.
There are two obvious facts that impede a practical implementation of high-rate single-carrier transmission over MIMO-ISI channels. First, with increasing signal bandwidth the effective channel memory length increases, which degrades the system performance in case of linear or decision-feedback equalization. Second, state-space-based detectors, such as the Viterbi algorithm [1, 2] and the BCJR algorithm [3], provide an excellent performance since they benefit from the diversity gain of dynamic ISI and MAI, but their computational complexity is typically prohibitive. Therefore, multi-carrier transmission schemes, particularly orthogonal frequency-division multiplexing (OFDM) [4], are often applied to circumvent the problem of ISI. An important question is if it is truly impossible to implement a single-carrier transmission system with reasonable performance and complexity for MIMO-ISI channels. We will try to answer this question by proposing a new detection algorithm, called graph-based iterative Gaussian detector (GIGD).
As the detection complexity of MIMO-ISI channels is mainly caused by multi-antenna interference and intersymbol interference, we will first carry out a systematic study on the interference structure and try to find the opportunities of easy treatment. Based on the knowledge obtained from this study, we deduce three different Gaussian approximations, namely, joint Gaussian approximation (JGA), grouped joint Gaussian approximation (GJGA), and independent Gaussian approximation (IGA), to simplify the calculation of the global likelihood function and sequentially reduce the data detection complexity. The JGA is already well known [5–10], while the GJGA and the IGA are new approaches proposed by the authors. Corresponding to these three Gaussian approximations, three low-complexity iterative parallel soft interference cancellation [5, 11] algorithms, namely, joint Gaussian detector (JGD), grouped joint Gaussian detector (GJGD), and graph-based iterative Gaussian detector (GIGD), will be described by utilizing factor graphs [12, 13] as a general framework. From the JGD to the GJGD, and from the GJGD to the GIGD, the detection complexity is reduced dramatically in each step.
For severely delay-spread MIMO-ISI channels, we propose the GIGD as a promising solution. Applying the independent Gaussian approximation, the GIGD has a computational complexity strictly linear w.r.t. the number of nonzero channel taps, the number of transmit antennas, and the number of receive antennas. Meanwhile, the performance loss incurred by the independent Gaussian approximation can be well compensated by using a repetition code. More importantly, the GIGD shows a satisfying capability in exploiting the frequency/time/space diversity provided by the MIMO-ISI fading channels.
The remainder of this paper is organized as follows. Section 2 introduces a conventional output-oriented channel model as well as a symbol-oriented channel model. Section 3 provides a deep insight into the interference structure of MIMO-ISI channels, and Section 4 gives a brief introduction on factor graphs and message passing algorithms. Section 5 revises the known joint Gaussian detector, Section 6 derives a grouped joint Gaussian detector, and Section 7 proposes a graph-based iterative Gaussian detector. Numerical results by means of Monte Carlo simulations are provided in Section 8 and Section 9 to assess and compare the performance of the three Gaussian detectors. Finally, conclusions are drawn in Section 10.
2. Channel Model
In this section, we will first introduce a conventional MIMO-ISI channel model, and then convert it into a symbol-oriented channel model to facilitate the mathematical derivation of the new algorithms.
2.1. Output-Oriented Channel Model
The equivalent discrete-time model of a MIMO-ISI channel (including transmit and receive filters, physical channel and symbol-rate sampling) can be written in complex baseband notation asyn[k]=∑m=1NT∑l=0Lhn,ml·xm[k-l]+wn[k],1≤n≤NR,
where NR denotes the number of receive (Rx) antennas, NT the number of transmit (Tx) antennas, L the effective memory length of all subchannels, and k∈{0,1,…,K-1} the discrete time index with K denoting the block length. yn[k]∈ℂ is the channel output sample at the nth Rx antenna at time index k, and xm[k] is the channel input symbol at the mth Tx antenna at time index k. hn,ml∈ℂ marks the lth tap of the subchannel connecting the nth Rx antenna and the mth Tx antenna. wn[k] represents a complex additive white Gaussian noise (AWGN) sample at the nth Rx antenna at time index k with zero mean and variance σw2. By convention, the single-sided noise spectral density in the passband is denoted by N0. Noting that wn[k] is a complex noise sample, we have σw2=N0/2+N0/2=N0. Throughout this paper, the signal-to-noise ratio per info bit will be defined as Eb/N0, where Eb stands for the energy used for transmitting one info bit. In case of coded transmission, we have Eb=Es/R with Es≐E{|xm[k]|2} denoting the energy used for transmitting one symbol and R denoting the coding rate.
We assume that all channel taps are constant within each data burst while varying independently from burst to burst. Moreover, we assume that the fading processes of channel taps all have the same average power and are mutually independent. This equal delay power profile is often used for the purpose of equalizer test, for example, in the 3GPP GSM standard, since it is the most challenging case for linear equalization. Nevertheless, we will show that low-complexity high-performance data detection is in fact possible for this type of MIMO-ISI channel, by means of the receiver algorithm proposed in this paper.
If we take a second look at (1), we may recognize that it is actually an output-oriented channel model, that is, this channel model explains how a channel output sample is formed given multiple channel inputs. Such kind of channel model is convenient to derive state-space-based detection algorithms, but inconvenient for the derivation of factor-graph-based detection algorithms, which requires a channel model that explicitly states the information spread of a data symbol over multiple channel outputs.
2.2. Symbol-Oriented Channel Model
Let us consider an arbitrary data symbol xm[k]. Due to multiple Rx antennas and delay spread, there will be in total NR(L+1) channel outputs containing the information of xm[k]. From now on, we call these channel outputs the observations of symbol xm[k]. To facilitate the following mathematical elaboration, we collect the observations of xm[k] into a matrixY[k]=[y1[k]y1[k+1]…y1[k+L]y2[k]y2[k+1]…y2[k+L]⋮⋮⋱⋮yNR[k]yNR[k+1]…yNR[k+L]]
which may be termed the observation matrix of xm[k]. Note that Y[k] is shared by all xm[k] for m=1,2,…,NT. Hence, there is no necessity for Y[k] to have a subscript m making this distinction.
Revisiting (1), we find that the relationship between xm[k] and one of its observations yn[k+l] (1≤n≤NR, 0≤l≤L) can be written asyn[k+l]=∑i=1NT∑j=0Lhn,ijxi[k+l-j]+wn[k+l]=hn,mlxm[k]+∑j=0,j≠lLhn,mjxm[k+l-j]︸ISI+∑i=1,i≠mNT∑j=0Lhn,ijxi[k+l-j]︸MAI+wn[k+l]︸AWGN.
Defining the summation of ISI, MAI, and AWGN as an effective noise term vn,mk,l, the relationship between xm[k] and yn[k+l] can be simplified asyn[k+l]=hn,mlxm[k]+vn,mk,l,
c.f. Figure 1. Combining (2) and (4), we obtain the following symbol-oriented channel model:Y[k]=Hmxm[k]+Vmk,
withHm=[h1,m0h1,m1…h1,mLh2,m0h2,m1…h2,mL⋮⋮⋱⋮hNR,m0hNR,m1…hNR,mL]Vmk=[v1,mk,0v1,mk,1…v1,mk,Lv2,mk,0v2,mk,1…v2,mk,L⋮⋮⋱⋮vNR,mk,0vNR,mk,1…vNR,mk,L]
being the channel matrix of the mth Tx antenna and the effective noise matrix in Y[k] w.r.t. xm[k], respectively.
Relationship between a data symbol and one of its observations.
With this new channel model, it is clear that all information about xm[k] that we can extract from the channel outputs is fully represented by the following global likelihood function:p(Y[k]∣xm[k])=p(Vmk=Y[k]-Hmxm[k]).
Now, the question is how to calculate this likelihood function in an efficient manner. According to (7), the key for this task is the probability density function (PDF) of the effective noise matrix, that is, p(Vmk). As a matter of fact, the main differences between the three Gaussian detectors to be described are in their way of dealing with p(Vmk).
3. Statistical Properties of the Effective Noise Matrix
From (3), (4), and (5), we see that the effective noise matrix Vmk consists of multi-antenna interference, intersymbol interference, and additive noise samples. Due to the large amount of variables involved in Vmk, an exact calculation of p(Vmk) typically incurs a prohibitive complexity. Therefore, reasonable approximations are necessary to make things easier. In this section, we will carefully study the statistical properties of the effective noise matrix and try to find a way towards complexity reduction.
3.1. Distribution of Effective Noise Samples
Noting that each effective noise sample vn,mk,l is a sum of NT(L+1) independent random variables, its probability density function may be approximated by a complex Gaussian distribution:p(vn,mk,l)≈1πσv2exp(-|vn,mk,l-μv|2σv2),
where μv and σv2 are defined asμv≐E{vn,mk,l},σv2≐E{|vn,mk,l-μv|2}.
(Here, we neglect the correlation between the real part and the imaginary part. Concerning this issue, interested readers may refer to [7].) According to the rule of thumb, as long as NT(L+1)≥12 holds, the accuracy of (8) is satisfying. This approximation is often called Gaussian approximation, and its feasibility in the scenario of MIMO-ISI channels has been proven in the available literature [6, 8, 9].
3.2. Dependence between Effective Noise Samples
Due to more or less common sources of randomness, the elements of Vmk are in general statistically dependent on each other. However, it is so far unclear whether this dependence is strong or weak. In the following, we will carry out some numerical measurements to obtain a deeper insight into this issue. Many previous works [6–10] show that p(Vmk) can be well approximated by a joint Gaussian distribution, as long as the product NT(L+1) is large enough. Besides, it is well known that two jointly Gaussian distributed variables are independent if they are uncorrelated, and their dependence structure is completely defined by the correlation coefficient. Therefore, by measuring the correlation between the elements of Vmk, we will be able to get a rough impression on the dependence between the elements of Vmk.
First, we define thatv=[v1,v2,…,vQ]T≐vec{Vmk},
with Q≐NR(L+1). vec{·} denotes the column stacking operator and (·)T denotes the matrix/vector transpose operator. Since for a block-fading channel the statistics of Vmk do not change with m and k, the subscript m and the superscript k are omitted in v. Next, we define the magnitude of the correlation coefficient between two effective noise samples asφi,j≐|E{(vi-μvi)(vj-μvj)*}|σviσvj,
where (·)* denotes complex conjugate. Since φi,j is in fact a function of the random channel taps, we further defineϕi,j≐E{φi,j},
where the expectation is taken over random realizations of channel taps. Last, we collect ϕi,j into a matrixΦ≐[ϕ1,1ϕ1,2…ϕ1,Qϕ2,1ϕ2,2…ϕ2,Q……⋱…ϕQ,1ϕQ,2…ϕQ,Q].
Clearly, the entries on the main diagonal of Φ will always be 1, because these entries are the magnitudes of autocorrelation coefficients. For entries not on the main diagonal of Φ, their values reflect the strongness of correlation between effective noise samples and sequentially the strongness of dependence between effective noise samples.
Figure 2 demonstrates the measured values of Φ in a BPSK system with independent Rayleigh fading channel taps and an equal delay power profile. Observing Figure 2, we see that the values of ϕi,j(i≠j) are small, which means that the correlation between the elements of Vmk is actually very weak. As a matter of fact, the correlation between effective noise samples drops steadily as the product NT(L+1) increases [14]. This observation delivers a good message: it may be feasible to partially or even fully neglect the mutual dependence between the effective noise samples, for the sake of complexity reduction. Certainly, the detailed dependence structure of effective noise samples will be different from Figure 2 if one uses another type of channel delay power profile. However, the contour of Figure 2 holds in general.
Average magnitude of correlation coefficient between effective noise samples, NT=NR=4, L=4, BPSK mapping, and Eb/N0=4 dB.
4. Factor Graph and Message Passing
Before specific algorithm derivation, we briefly revisit the concept of factor graphs and message passing.
4.1. Factor Graphs and Factorization
A factor graph is a type of bipartite graph which visualizes the factorization of certain global functions object to maximization or minimization. To easily understand it, let us consider a simple example. Suppose that we have a BPSK symbol x with three observations:y1=x+n1,y2=x+n2,y3=x+n3,
where n1, n2, and n3 are additive noise terms. Assuming that no a priori information is available for x, an optimal detector tries to maximize the global likelihood function according tox̂=argmaxx̃∈{±1}{p(y1,y2,y3∣x̃)}.
If n1, n2, and n3 are mutually independent, we may factorize the above global likelihood function into a product of local likelihood functions:p(y1,y2,y3∣x)=p(y1∣x)p(y2∣x)p(y3∣x),
which can be visualized by the factor graph given in Figure 3, where a circle represents a symbol node and a square box represents an observation node.
A symbol node connected with three observation nodes.
4.2. Iterative Message Passing Algorithms
Given a factor graph, the task of variable estimation can be accomplished by combining and exchanging the messages (knowledge) from various sources over this probabilistic network. Such an algorithm is often called an iterative message passing algorithm. For message passing over factor graphs, only extrinsic information should be exchanged and propagated. Although different type of nodes often apply different type of message processing operations, this rule must be carefully followed.
4.3. Message Exchange at Symbol Nodes
For binary variables, it is often convenient to use log-likelihood ratios (LLRs). Define thatLLRi≐lnp(yi∣x=+1)p(yi∣x=-1),
the message exchanging at a BPSK symbol node proceeds as Figure 4(a). The underlying principle is that LLR messages from independent observations are additive. In practice, LLRsum≐∑iLLRi is first calculated, then each new message is obtained as (LLRsum-LLRi). Consequently, the complexity of this operation is always proportional to the amount of edges diverging from this symbol node.
Message exchange at different nodes.
Message exchange at a binary symbol node
Message exchange at an observation node
4.4. Message Exchange at Observation Nodes
Considering an observation node connected with three BPSK symbols, the message exchange proceeds as illustrated in Figure 4(b), where f(·) denotes a certain message combining function, often called a message update rule. Different from the situation at symbol nodes, here message combining can no longer be accomplished by a simple linear addition. As a matter of fact, f(·) is the major source of complexity in a graph-based detection algorithm, and hence will be the object of simplification in the remaining part of this paper.
5. Joint Gaussian Detector
According to Section 3, the elements of Vmk are roughly Gaussian distributed, and they are in general dependent on each other, although weakly. Hence, a straightforward way to calculate p(Vmk) is to approximate the elements of Vmk as jointly Gaussian distributed. This approach is usually termed joint Gaussian approximation (JGA), and the algorithm based on this approach is called joint Gaussian detector (JGD), which has been known for years [6–10]. In this section, we will give a clean mathematical derivation of the JGD (For the sake of simple mathematical expression, BPSK mapping is assumed in the rest of the paper.).
5.1. Joint Gaussian Approximation
Using the symbol-oriented channel model (5), the joint Gaussian approximation can be written asp(Vmk)≈1πQ|Σ|Qexp(-(v-μ)HΣ-1(v-μ))
withv≐vec{Vmk},μ≐E{v},Σ≐E{(v-μ)(v-μ)H},Q≐NR(L+1).
Note that v is a Q×1 column vector. Therefore, the order of the covariance matrix Σ is Q=NR(L+1). In the literature, however, this covariance matrix usually has an order NRK, where K is the burst length, due to using an output-oriented channel model. The concept of sliding windows is introduced in [6] in order to reduce this order from NRK to NR(L+1). Nevertheless, with the symbol-oriented channel model, it is clarified that there is in fact no reason for the order of Σ to be related to the burst length.
5.2. Factor Graph with Joint Gaussian Approximation
Applying the joint Gaussian approximation, we admit the mutual dependence between the elements of Vmk, and hence the PDF p(Vmk) as well as the global likelihood function p(Y[k]|xm[k]) will not be factorizable at all. We also notice that the observation matrices for neighboring data symbols, namely, Y[k],Y[k+1],…,Y[k+L], partially overlap with each other. For these two reasons, the factor graph of a MIMO-ISI channel will look like Figure 5, where Y denotes the matrix which collects all channel outputs within the current data burst. No factorization exists and also no cycles are present.
Factor graph of a MIMO-ISI channel with joint Gaussian approximation, NT=2.
5.3. Message Update Rule at Observation Node
Revisiting (7) and applying (18), the message from an observation node to a symbol node can be calculated asLLR(xm[k])=lnp(Y[k]∣xm[k]=+1)p(Y[k]∣xm[k]=-1)=-(v1-μ)HΣ-1(v1-μ)+(v2-μ)HΣ-1(v2-μ)
withv1=vec{Y[k]-Hm},v2=vec{Y[k]+Hm}.μ and Σ, covering the statistical properties of the effective noise matrix Vmk, are calculated according to (19), utilizing the incoming LLR messages from all relevant symbol nodes. Due to limited space, we would like to refer interested readers to [6] for a detailed description of this calculation.
5.4. Computational Complexity
The computational complexity of (20) mainly comes from the inversion of the covariance matrix Σ. Noting that (20) needs to be calculated for NT data symbols per time index and matrix inversion is an operation with complexity cubic in the matrix order, we have𝒪(JGD)∝NTNR3(L+1)3.
This complexity is much lower than that of the BCJR algorithm, but still is a considerable problem whenever the system possesses many Rx antennas or the channel is severely delay-spread.
6. Grouped Joint Gaussian Detector
In this section, we introduce a grouped joint Gaussian approximation (GJGA) of p(Vmk), which brings a significant complexity reduction w.r.t. the joint Gaussian approximation.
6.1. Grouped Joint Gaussian Approximation
From Figure 2 we see that the average magnitude of correlation coefficient between vn1,mk,i and vn2,mk,i is constant for all n1≠n2, while the average magnitude of correlation coefficient between vn,mk,i and vn,mk,j drops steadily as the distance (i-j) increments. This observation inspires us for a new approximation of p(Vmk) (Initial work has been presented in [15].). As illustrated in the following expression:
we assume that the columns of Vmk are linearly independent from each other while the elements in each column are jointly Gaussian distributed. Mathematically, this approximation can be written asp(Vmk)≈∏l=0Lp(vmk,l)
withvmk,l≐[v1,mk,l,v2,mk,l,…,vNR,mk,l]T,p(vmk,l)∝exp(-(vmk,l-μ)HΣ-1(vmk,l-μ)),
where μ and Σ are the mean vector and the covariance matrix of vmk,l, respectively. Note that the order of Σ is now only NR. In the following, we refer to the receiver algorithm based on this approximation as grouped joint Gaussian detector (GJGD).
6.2. Factor Graph with Grouped Joint Gaussian Approximation
Applying the grouped joint Gaussian approximation, we achieve the following factorization:p(Y[k]∣xm[k])≈∏l=0Lp(y[k+l]∣xm[k])
withy[k+l]≐[y1[k+l],y2[k+l],…,yNR[k+l]]T.
The resulting factor graph will look like Figure 6. Now the observation matrix Y[k] is split into observation vectors y[k+l]. Compared to the factor graph with the JGA, the factor graph with the GJGA becomes more complicated, that is, there are more edges diverging from each symbol node. However, the corresponding detection complexity actually becomes much lower, as explained in Section 6.4.
Factor graph of a MIMO-ISI channel with grouped joint Gaussian approximation, NT=2, L=1.
6.3. Message Update Rule at Observation Nodes
With the new approximation, the message updating rule at an observation node can be written asLLR(xm[k])=lnp(y[k+l]∣xm[k]=+1)p(y[k+l]∣xm[k]=-1)=-(v1-μ)HΣ-1(v1-μ)+(v2-μ)HΣ-1(v2-μ)
withv1≐y[k+l]-hml,v2≐y[k+l]+hml,hml≐[h1,ml,h2,ml,…,hNR,ml]T.
The statistical properties μ and Σ can be calculated by utilizing the incoming LLR messages from all relevant symbol nodes. Due to limited space, we would like to refer interested readers to [15] for more details on this topic.
6.4. Computational Complexity
By checking (26) and (28), and noting that the covariance matrix Σ is now only of order NR, we have𝒪(GJGD)∝NTNR3(L+1).
Comparing (30) with (22), it is clear that the computational complexity of the GJGD is much lower than that of the JGD, particularly for MIMO systems with severe delay spread. Nevertheless, a cubic term is still present due to matrix inversion.
7. Graph-Based Iterative Gaussian Detector
In this section, we introduce an independent Gaussian approximation (IGA) which completely eliminates matrix inversion and a graph-based iterative Gaussian detector (GIGD) based on that (Initial work has been presented in [14].).
7.1. Independent Gaussian Approximation
In Section 3.2, we mentioned that the cross-correlation between effective noise samples drops steadily as the product NT(L+1) increases. Therefore, if NT(L+1) is sufficiently large, we might completely neglect the mutual dependence, that is, to approximate all effective noise samples to be independently Gaussian distributed, as illustrated in the following:
Mathematically, we may write this approximation asp(Vmk)≈∏n=1NR∏l=0Lp(vn,mk,l)
withp(vn,mk,l)≈1πσv2exp(-|vn,mk,l-μv|2σv2),
where μv and σv2 are defined asμv≐E{vn,mk,l},σv2≐E{|vn,mk,l-μv|2}.
7.2. Factor Graph with Independent Gaussian Approximation
Revisiting (7) and applying (32), we achieve the following factorization:p(Y[k]∣xm[k])≈∏n=1NR∏l=0Lp(yn[k+l]∣xm[k]).
The resulting factor graph will look like Figure 7. Now all observations are separately represented in the factor graph, and there are even more edges diverging from each symbol node. However, the corresponding detection complexity is again much lower than that of the GJGD.
Factor graph of a MIMO-ISI channel with independent Gaussian approximation, NT=NR=2, L=1.
7.3. Message Update Rule at Observation Nodes
Combining (4) with (33), the message updating rule at an observation node can be written asLLR(xm[k])=lnp(yn[k+l]∣xm[k]=+1)p(yn[k+l]∣xm[k]=-1)=-|yn[k+l]-μv-hn,ml|2σv2+|yn[k+l]-μv+hn,ml|2σv2=4Re{hn,ml(yn[k+l]-μv)*}σv2,
with μv and σv2 as defined in (34) and the way of calculating them described in the following.
Revisiting Figure 7, we see that each observation node is connected with G≐NT(L+1) symbol nodes. Replacing complicated indices n, m, k, and l by a single index i, we may simplify the relationship between an observation and its associated data symbols asy=∑i=1Ghixi+w=hjxj+∑i=1,i≠jGhixi+w=hjxj+vj,
with vj=∑i=1,i≠jGhixi+w denoting the effective noise sample w.r.t. xj. Since all data symbols are mutually independent, the following statement is straightforward:μvj=∑i=1,i≠jGhi·μxi,σvj2=∑i=1,i≠jG|hi|2·σxi2+σw2,
where μxi and σxi2 are calculated by utilizing the incoming LLR message from the symbol node:μxi=eLLR(xi)-1eLLR(xi)+1,σxi2=1-μxi2.
Note that the principle of extrinsic information is implicitly applied in this message updating operation.
7.4. Computational Complexity
The computational load of the GIGD comes from the message updating at the symbol nodes and the observation nodes. Revisiting Figure 7, we find that there are NT symbol nodes per time index, each connected with NR(L+1) edges. Since the complexity of message exchange at a symbol node is always proportional to the amount of associated edges (c.f. Section 4.3), we have𝒪(operationatsymbolnodes)∝NTNR(L+1).
In each iteration, an observation node needs to calculate the LLR values of G=NT(L+1) data symbols associated with it. In practice, this task is accomplished in two steps. In step one, μxi and σxi2 are first calculated for i=1,2,…,G, according to (39). Afterwards, the products hiμxi and |hi|2σxi2 as well as the summations ∑i=1Ghiμxi and (∑i=1G|hi|2σxi2+σw2) are calculated and stored. Obviously, the complexity of this step is proportional to G. In step two, the following calculation:μvj=∑i=1Ghiμxi-hjμxj,σvj2=(∑i=1G|hi|2σxi2+σw2)-|hj|2σxj2,
is performed and then LLR(xj), j=1,2,…,G, is obtained according to (36). Since the two sums in (41) and (42) have already been stored in step one, the complexity of step two is proportional to G=NT(L+1) as well. Given this explanation and noting that there are NR observation nodes per time index, we may conclude that𝒪(operationatobservationnodes)∝NTNR(L+1).
We may recognize that NTNR(L+1) actually gives the number of channel taps. In reality, however, the discrete-time channel model often has a sparse ISI structure, that is, many channel taps are quasi zero. In this case, the edges associated with zero taps can safely be removed from the factor graph (c.f. Figure 7). Given this knowledge, and combining (40) and (43), we obtain the following expression:𝒪(GIGD)∝numberofnonzerochanneltaps≤NTNR(L+1).
Due to the complete elimination of matrix inversion, the complexity of the GIGD is truly linear. Besides, the GIGD is very attractive for sparse ISI channels, where the maximum delay spread is large while many zero taps are present. Note that neither the JGD nor the GJGD is able to benefit from the sparse ISI channel structure in such a straightforward manner, because of multivariate Gaussian approximations.
8. Performance in Uncoded Systems
In previous sections, we have introduced three low-complexity Gaussian detection algorithms, namely, joint Gaussian detector (JGD), grouped joint Gaussian detector (GJGD), and graph-based iterative Gaussian detector (GIGD). In this section, we provide numerical results from Monte Carlo simulations to assess and compare the performance of these three algorithms in uncoded systems, and ultimately illustrate the merits and demerits of the GIGD algorithm.
8.1. Simulation Setup
Each burst from each Tx antenna contains 400 data symbols. After one burst is transmitted, each Tx antenna ceases transmission for an interval of L symbol durations to avoid interburst interference, where L denotes the effective channel memory length. All Tx and Rx antennas are assumed to be perfectly synchronized. For simplicity, the signal mapping scheme is always BPSK. The channel coefficients hn,ml of every subchannel are normalized to form an equal delay power profile with an average sum power of one, that is, E{|hn,ml|2}=1/(L+1) with ∑l=0LE{|hn,ml|2}=1. For a fair comparison, for all three Gaussian detection algorithms, 5 iterations are performed, that is, the operations of message updating and message exchanging are repeated 5 times.
8.2. Theoretical Performance Bound
Observing the architecture of the JGD, the GJGD, and the GIGD, these three algorithms clearly fall into the class of symbol-by-symbol detectors, as they all try to maximize the global likelihood function w.r.t. individual symbols. Therefore, the symbol-by-symbol MAP detector provides a lower bound for the achievable BER performance in uncoded systems. Here, we use the BCJR algorithm [3] to implement the symbol-by-symbol MAP detector. Certainly, given the BCJR algorithm, no receiver iterations are necessary for uncoded transmission.
8.3. Performance Comparison
Figure 8 displays the BER performances of the three Gaussian detection algorithms. As can be seen, the JGD algorithm achieves a BER performance very close to that of the BCJR algorithm. It shows a trivial error floor at high SNRs due to the inaccuracy of (18) and feeding back intrinsic information as a priori information. (In an uncoded system, the factor graph for the JGD is cycle-free, c.f. Figure 5. Therefore, a self-feedback is enforced at all symbol nodes in order to implement an iterative detection. The JGD algorithm for uncoded systems in fact falls into the class of probabilistic data association (PDA) algorithms [16]. Nevertheless, it is not necessary and also not proper to do so in a coded system, since the existence of code nodes enables rigorous extrinsic information exchange.) Compared to the JGD, the GJGD algorithm shows a performance loss of approximately 1 dB at BER =10-4. Due to the further inaccuracy introduced by (24), the error floor of the GJGD is higher than that of the JGD and is no longer trivial. The performance of the GIGD algorithm is undesirable in this scenario. It shows a significant error floor at BER≈5×10-4 due to the coarseness of the approximation given in (32).
Performance of the three Gaussian detectors in an uncoded system, NT=NR=4, L=4, and K=400.
8.4. Complexity Comparison
As a matter of fact, the introduced three Gaussian detection algorithms do not really differ in the necessary number of iterations. Though applying different type of approximations, these algorithms never change the amount of channel outputs (yn[k]) that a symbol node can extract information from. Consequently, the speed of information aggregation does not change for these three algorithms, and the required number of iterations for a satisfactory BER performance basically stays constant for a fixed system setup. Given a reasonable burst length, 5 iterations are already good enough, empirically.
For the current system setup, the covariance matrices to be inverted are of order NR(L+1)=20 in the JGD algorithm. The covariance matrices are only of order NR=4 in the GJGD algorithm. Finally, matrix inversion is completely eliminated in the GIGD algorithm. Revisiting (22), (30), and (44), we will find that the complexity of the GJGD is about 25 times lower than that of the JGD, and the complexity of GIGD is about 16 times lower than that of the GJGD. In total, a complexity reduction of factor 400 is achieved by the GIGD algorithm w.r.t. the JGD algorithm. As such a complexity reduction is rather attractive, it is worthwhile to study the error floor behavior of the GIGD algorithm.
8.5. Error Floor of the GIGD
The error floor of the GIGD algorithm is mainly caused by approximating the elements of the effective noise matrix to be mutually independent. As mentioned in Section 3.2, the average correlation coefficient between effective noise samples drops when the product NT(L+1) increases. Therefore, we may expect the error floor of the GIGD to drop when the channel memory length becomes larger or when the system deploys more antennas. To verify our conjecture, we again utilize Monte Carlo simulations.
Figure 9 demonstrates the behavior of the GIGD under different channel memory lengths. Since the complexity of the BCJR algorithm and the JGD algorithm both become prohibitive for severely delay spread MIMO channels, we use the BER bound of an AWGN channel as an asymptotic performance bound if L approaches infinity. As predicted, the error floor drops as the channel memory length increases and/or the number of antennas increases. This observation reveals two issues. First, the independent Gaussian approximation (32) benefits from a large amount of channel taps. Second, despite its extremely low complexity achieved by making a very coarse approximation, the GIGD is able to exploit the diversity provided by additional channel taps or receive antennas. The cross-over at NT=NR=8 and L=96 is mainly caused by the zero-padding burst structure. Both at the beginning and the end of the burst, the channel outputs are composed of few data symbols and a lot of zeros, which degrades the accuracy of the independent Gaussian approximation. This effect is significant at L=96, given the burst length is K=400. By applying a tail-biting burst structure or a cyclic prefix, this problem can be well eliminated, and the resulting performance will be very close to the AWGN bound.
Performance of the GIGD in uncoded systems, K=400, 5 iterations.
NT=NR=4
NT=NR=8
The above results suggest that the GIGD algorithm is very attractive for large systems with severe delay spread. Nevertheless, the GIGD causes a significant error floor when NT(L+1) is not sufficiently large. The question that remains is if this error floor can be eliminated by means of channel coding.
9. Performance in Coded Systems
In this section, we check the BER performance of the three Gaussian detectors in coded systems.
9.1. Simulation Setup
For simplicity and for an easy derivation of performance bounds, we adopt repetition encoding with scrambling. The scrambling pattern is fixed, that is, every second bit of a code word is flipped. In case of short data bursts, scrambling is very helpful for the three Gaussian detectors, since they assume that all data symbols come with zero mean. Random interleaving is applied after scrambling in order to make neighboring data symbols as independent as possible. No matter which coding rate is used, the number of symbols per burst per antenna is always K=400. Due to the presence of channel decoding, local iterations in the graph of Gaussian detection are no longer desirable, particularly for the case of JGD. Hence, each receiver iteration contains the following sequential operations: message updating at observation nodes, message updating at symbol nodes, channel decoding, and message updating at symbol nodes. As the use of different Gaussian approximations does not really change the speed of information aggregation at symbol nodes, in the following we will always apply a fixed number of iterations for comparing the performance of using different Gaussian approximations.
9.2. Performance Comparison
Figure 10(a) illustrates the performance of the three Gaussian detectors in a rate 1/2 repetition encoded system. Surprisingly, all three Gaussian detectors as well as the BCJR algorithm show nearly the same performance at L=4, regardless of their huge complexity difference. A purely theoretical analysis of this phenomenon appears difficult. An empirical answer is that the strongest detector for an uncoded system is not necessarily the best one for a coded system. From the JGD to the GJGD, and from the GJGD to the GIGD, more and more coarse approximations are made, which makes the detector outputs less and less accurate. However, this also makes the detector outputs less and less correlated, which is beneficial to the following channel decoder. From Figure 10(a), it seems that the effect of less accuracy is partially compensated by the effect of less correlation. Figure 10(b) further supports our supposition on this issue. In a rate 1/4 repetition encoded system, the performance of the BCJR algorithm is even worse than that of the three Gaussian detectors at L=4. When the coding rate drops, the strong correlation of the outputs of the BCJR algorithm noticeably degrades the system performance, while the three Gaussian detectors stay robust. Among the four algorithms, the GIGD has a decisively lower complexity, meanwhile its BER performance is not worse than that of any other. Therefore, it is the most attractive solution.
Performance comparison in repetition encoded systems, NT=NR=4, K=400, 5 iterations.
R=1/2
R=1/4
9.3. Error Floor of the GIGD
Figures 10(a) and 10(b) also demonstrate the BER performance of the GIGD in repetition encoded systems with various channel memory lengths. Since a repetition code does not provide any coding gain, the AWGN bound still holds. At R=1/2, error floors are still present, but are no longer significant. At R=1/4, error floors nearly disappear, even for L=0, that is, flat-fading channels. The reason of the cross-over at L=96 and R=1/2 is still the zero-padding burst structure. Nevertheless, this effect is well mitigated by the rate 1/4 repetition code. So far we may recognize that repetition encoding is really helpful in mitigating the estimation errors caused by the independent Gaussian approximation, and meanwhile the approximation errors do not present a problem to the convergence property of the repetition decoder. The asymptotic AWGN bound is quasi-approached at NT=NR=8, L=96. Note that with this system setup, it is practically impossible to run the BCJR algorithm and it is computationally prohibitive to run the JGD. For systems with short memory lengths, repetition encoding is truly helpful for the GIGD. Necessary to be mentioned, the AWGN bound is only achievable for systems with very large channel memory lengths, since only then the channel instant power tends to be constant. By checking the performance of GIGD with small L values, we may recognize that these curves should also be quasi-bound approaching. Therefore, in repetition encoded systems, the GIGD is applicable for systems with moderate number of antennas and short channel memory lengths as well.
10. Conclusions and Future Work
In this paper, we revisited and slightly revised the joint Gaussian detection (JGD) algorithm, derived the grouped joint Gaussian detection (GJGD) algorithm, and proposed the graph-based iterative Gaussian detection (GIGD) algorithm. A mathematical derivation as well as a detailed performance analysis is provided. From the JGD to the GJGD and from the GJGD to the GIGD, the computational complexity dramatically decreases. The GIGD algorithm has a linear complexity and provides a promising performance for MIMO channels with severe delay spread. In [17], the incorporation of the GIGD algorithm with soft channel estimation has been studied.
The adopted channel model within this paper is very specific, in the sense that it presents the biggest challenge for conventional linear equalizers. Using such a channel model effectively exhibits the high potential of the proposed low-complexity Gaussian detection algorithms, particularly GIGD. Nevertheless, from an engineering standpoint, it deserves to be an interesting topic to test the performance of Gaussian detection with more realistic channel models. Repetition coding is considered within this paper for the sake of easy analysis as well as its strength in mitigating estimation errors due to approximation. Future work should also be targeted at more advanced code structures, particularly concatenations of a repetition code and a sparse graph code, for example, an LDPC code.
Acknowledgments
The authors would like to thank Shan Jiang and Ying Yu for their effort on this topic during their master theses work. This work has been supported by the German Research Foundation (DFG) under Contract nos. HO 2226/8-1 and HO 2226/10-1.
ForneyG. D.Jr.Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference19721833633782-s2.0-0015346024UngerboeckG.Adaptive maximum-likelihood receiver for carrier-modulated data-transmission systems19742256246362-s2.0-0016059840BahlL. R.CockeJ.JelinekF.RavivJ.Optimal decoding of linear codes for minimizing symbol error rate19742022842872-s2.0-0016037512BahaiA.SaltzbergB.ErgenM.2004New York, NY, USASpringerWangX.Vincent PoorH.Iterative (Turbo) soft interference cancellation and decoding for coded CDMA1999477104610612-s2.0-003265879110.1109/26.774855LiuS.TianZ.Near-optimum soft decision equalization for frequency selective MIMO channels20045237217332-s2.0-154230369210.1109/TSP.2003.822376JiaY.AndrieuC.PiechockiR. J.SandellM.Gaussian approximation based mixture reduction for near optimum detection in MIMO systems20059119979992-s2.0-2824448308810.1109/LCOMM.2005.11018YuanX.WuK.PingL.The jointly Gaussian approach to iterative detection in MIMO systemsProceedings of the IEEE International Conference on Communications (ICC '06)September 2006Istanbul, TurkeyTanP. H.RasmussenL. K.Asymptotically optimal nonlinear MMSE multiuser detection based on multivariate Gaussian approximation2006548142714382-s2.0-3374786992910.1109/TCOMM.2006.878829JiaY.AndrieuC.PiechockiR. J.SandellM.Gaussian approximation based mixture reduction for joint channel estimation and detection in MIMO systems200767238423892-s2.0-3454739526710.1109/TWC.2007.05911DivsalarD.SimonM. K.RaphaeliD.Improved parallel interference cancellation for CDMA19984622582682-s2.0-0032001944KschischangF. R.FreyB. J.LoeligerH.-A.Factor graphs and the sum-product algorithm20014724985192-s2.0-003524656410.1109/18.910572LoeligerH.-A.An introduction to factor graphs200421128412-s2.0-144233705810.1109/MSP.2004.1267047WoT.HoeherP. A.A simple iterative Gaussian detector for severely delay-spread MIMO channelsProceedings of the IEEE International Conference on Communications (ICC '07)June 2007Glasgow, Scotland2428WoT.FrickeJ. C.HoeherP. A.A graph-based iterative Gaussian detector for frequency-selective MIMO channelsProceedings of the IEEE Information Theory Workshop (ITW '06)October 2006Chengdu, ChinaFrickeJ. Ch.jf@tf.uni-kiel.deSandellM.magnus.sandell@toshiba-trel.comMietznerJ.jm@tf.uni-kiel.deHoeherP. A.ph@tf.uni-kiel.deImpact of the Gaussian approximation on the performance of the probabilistic data association MIMO decoder20052005579680010.1155/WCN.2005.796WoT.wtb@tf.uni-kiel.deLiuC.liu@ti.rwth-aachen.deHoeherP. A.ph@tf.uni-kiel.deGraph-based soft channel and data estimation for MIMO systems with asymmetric LDPC codesProceedings of the IEEE International Conference on Communications (ICC '08)May 2008Beijing, China62062410.1109/ICC.2008.122