Detector Design and Complexity Analysis for Cooperative Communications in Intersymbol Interference Fading Channels

The area of cooperative communications has recently attracted lots of research interest because of the potential benefit of increased spatial diversity. In this paper, we study the detector design for cooperative communications in intersymbol interference (ISI) channels. A novel system framework employing nonorthogonal amplify-and-forward half-duplex relays through ISI channels is introduced. We focus on detector design and first consider an optimal detector that consists of a whitening filter and a maximumlikelihood sequence estimator (MLSE). However, such an optimal detector embedded with Viterbi algorithm has practical issues of high complexity if the relay period is long. Consequently, we adopt a multitrellis Viterbi algorithm (MVA) that reduces the complexity significantly but still achieves near-optimal performance. Simulation results demonstrate the performance of both the optimal and near-optimal detector designs.


Introduction
Multipath fading is one of the major impairments to the performance of wideband wireless communication systems [1].Historically, such fading has been combated by using time and frequency diversity techniques.In the last fifteen years, results in information theory have shown that spatial diversity can yield significant gains in the spectral efficiency and power efficiency of point-to-point multiple-antenna communication (MIMO) systems [2,3].The transition from theory to practice has largely taken place with MIMO technology, as many modern consumer wireless standards exploit MIMO technology.To realize such gains, however, it is necessary that each of the paths between transmit and receive antennas is uncorrelated.For such an assumption to be valid, it is typically required that the antenna elements are spaced at least a half carrier wavelength apart [4], and perhaps even more in environments with minimal scattering.In many scenarios, however, it may not be practical for sizeconstrained nodes to have even two antennas with sufficient spacing between them.Furthermore, for each antenna that is added to the node, a complete RF front end must be added.There may be cost and power constraints that preclude the inclusion of multiple antennas, as the RF portion of a communications system often accounts for the majority of the cost and power.
More recently, cooperative diversity [5,6] and relay networks [7] have attracted a lot of attention for their ability to exploit increased spatial diversity available at distributed antennas on other nodes in the system.By intelligent cooperation among nodes in the network which may only have a single antenna, a virtual multiple antenna system can be formed.Indeed, information theoretic results demonstrate that some of the loss associated with using only a single antenna can be recuperated by using intelligent cooperation among distributed nodes [8,9].
While communication via cooperative relays has seen a lot of active research interest in recent years, most of the existing work has largely come from the information theory and coding communities.While there are few exceptions, for example [10,11], little research has yet been conducted into the implementation issues of relaying and cooperation.As such, the majority of works in the field of cooperative diversity assume that receivers employ optimal detectors.
In this paper, we set out to investigate detector design for half-duplex relays in frequency-selective fading channels encountered in practice.While a variety of forwarding protocols have been previously proposed, we will consider amplify-and-forward (AF) for its simplicity and reduced implementation costs.Frequency selective fading channels are an inevitable impairment in wideband communication systems [1], and such channels cause the receiver to observe the superposition of multiple delayed reflections of the transmitted signal, resulting in intersymbol interference (ISI).Even in channels which do not exhibit significant time dispersion, the nonorthogonal AF relay itself effectively introduces ISI since the destination observes a superposition of the source and relayed signals.As our focus is on the complexity of the detector itself, we do not treat the performance gains possible with relays as this has been demonstrated elsewhere [12].Similarly, with our focus on the complexity of detector implementation, we do not address the problem of channel estimation and thus consider the optimistic case where the detector has perfect channel knowledge.After developing a system model for the case of AF relays in ISI channels, we will present an optimal detector realization based on the Viterbi algorithm (VA), and we will address its implementation complexity.Then we will show that the delay introduced by the half-duplex AF relay causes the effective channel impulse response to become sparse when the relay period is large.To exploit the inherent sparsity in the effective channel, a multitrellis Viterbi algorithm (MVA) [13] is adopted which results in much lower complexity but negligible performance loss.Finally, we conclude with numerical simulations demonstrating performance of the detector in slow standardized fading channels.

Cooperative Communication System Model
A basic model of a three-node relay system model is shown in Figure 1.Both source and relay can be considered as mobile users, and each has only one antenna.The source is attempting to send a message to the destination.Due to the broadcast nature of wireless communications, however, the relay receives transmissions from the source that are intended for the destination; therefore, the relay can assist by forwarding additional copies of these transmissions to the destination.Since the channels from source and relay to destination are statistically independent, the threenode cooperative communication scheme effectively forms spatial diversity.We consider a system where a source transmits a continuous stream of data to a destination, and a simplistic AF relay assists the source by amplifying and forwarding the data to the destination.We do not assume the relay has performed any synchronization with the destination, and so the relay forwards information to the destination in an open-loop fashion.
Relays have mainly two types: full-duplex relays that can transmit and receive simultaneously and half-duplex relays that can either transmit or receive in any time slot.Since full duplex relays are difficult to implement due to selfinterference which occurs when both transmit and receive operations are in the same band, half-duplex is considered more practical for cooperative communication systems.In our system model, the half-duplex relay period T is a parameter which defines the frame structure where the relay receives for T symbol periods and then transmits for T symbol periods.The relay repeats these two tasks alternately.The source and relay are assumed to transmit on the same channel, employing the so-called nonorthogonal amplifyforward protocol (NAF) [12].
The source sends the symbols a symbol rate of f , where N is the number of transmitted symbols.We assume that a squared root raised cosine (SRRC) filter with the impulse response h Tx (t) is applied as the transmitter filter.The received signal is processed by a matched filter with the impulse response h Rx (t) = h Tx (−t) and then sampled at the symbol rate of f .The equivalent discretetime channel impluse responses [14] which include the effect of pulse shaping are denoted by h sd , h sr , and h rd for the source-destination, source-relay, and relaydestination channels, respectively, and they have corresponding channel lengths L sd , L sr and L rd (e.g., h sd = [h sd [0], h sd [1], . . ., h sd [L sd − 1]] T ).The signals w r and w d are complex additive white Gaussian noise (AWGN) at the relay and the destination with variances σ 2 r and σ 2 d , respectively.The destination receives the superposition of the two signals from the source and the relay, and the received signal can be expressed as where y sd ∈ C N+Lsd−1 is the contribution from the source and y rd ∈ C N+Lsr+Lrd−2 is the contribution from the relay.We first consider the source-destination link.The contribution from source to destination is written as where H sd ∈ C (N+Lsd−1)×N is the complex Toeplitz channel convolution matrix whose entries are defined by where

ISRN Communications and Networking
3 For the source-relay-destination link, the corresponding contribution is given by ( The Toeplitz channel matrices and H sr ∈ C (N+Lsr−1)×N are defined in the same way as H sd , y r ∈ C N+Lsr−1 is the signal received by the relay, x r ∈ C N+Lsr−1 is the signal transmitted from the relay, and Γ ∈ C (N+Lsr−1)×(N+Lsr−1) is a fixed matrix described below.Note that for the matrix dimensions to be compatible, we require that L sd = L sr + L rd − 1; if this is not satisfied, we can append zeros to the appropriate matrix without loss of generality.
The function of Γ is to impose the half-duplex constraint by selecting groups of T symbols from y r (receiving), scaling these symbols by a factor β (amplifying), and then delaying the scaled symbols of y r for transmission in the next T symbol block (forwarding).The value of β is typically chosen to satisfy an average power constraint at the relay by choosing ), where P s and P r are the source power and relay power, respectively.The constant matrix Γ is given by where ⊗ denotes Kronecker product.Here we implicitly require that N + L sr − 1 be divisible by 2T.As an example, when T = 2, the signals received and transmitted by the relay are shown in Figure 2, where the first eight time periods are considered, and the shadow indicates time period where the relay cannot receive because it is transmitting, or vice versa.
From (1), (2), and (5), the received signal at the destination is expressed as where . Note that w is colored, not white, since the AWGN on the source-relay link is amplified-and-forwarded over the relay-destination ISI channel which colors the noise.Additionally, from ( 6) and ( 7), we see that the relay matrix Γ has a repetitive structure with a period of 2T.Accordingly, the channel matrix H shows the same structure as Γ.Consequently, H can be interpreted as a periodically time-varying FIR channel which consists of 2T sets of different channel coefficients.In summary, (8) allows us to describe the input-output behavior of the system with a linear equation.While the constituent channels themselves are not time-varying, the effective impulse response of the overall system is indeed time-varying due to the on/off behavior of the relay.

Maximum-Likelihood Detector
Assuming that receivers can acquire perfect channel knowledge, maximum likelihood sequence estimation (MLSE) Relay received signal y r Relay transmitted signal x r y r [0] y r [1] y r [4] y r [5] βy r [0] βy r [1] βy r [4] βy r [5] Γ can be employed to combat the ISI by searching for the minimum Euclidean distance between observed signal and any given transmitted signals [15].The Viterbi algorithm [16] is an efficient technique for solving the minimum distance problem, and its implementation has been investigated extensively [17][18][19][20][21].The traditional Viterbi algorithm as proposed in [16] is directly applicable only to timeinvariant channels.A modified Viterbi detector is proposed to address the periodically time-varying effective channel induced by the half-duplex relay.Furthermore, since the minimum Euclidean distance is not optimal in the presence of colored Gaussian noise, we employ a whitening filter before detection, which is optimal as shown in [15].The block diagram of our design is given in Figure 3.
To whiten the noise, spectral factorization of the composite noise covariance must be performed.We factor the noise covariance matrix as.
which can be accomplished by taking G to be the Cholesky factorization of the covariance.We note that the Cholesky factorization is not the only such factorization G, as the factorization in ( 9) is not unique.By filtering the received signal with G −1 (i.e., by forming G −1 y), the noise becomes whitened since the covariance of the filtered noise G −1 w is given by We note that noise covariance matrix in ( 9) is positive definite, so the inverse of G always exists.Ignoring end effects (or, equivalently, taking the block length N → ∞), G −1 follows the same repetitive structure as Γ and thus also exhibits the periodically time-varying property.
After applying the whitening filter to the received signal (8), the whitening filter output becomes (11) where w eff is now white Gaussian noise.Note that the effective whitened channel matrix H eff maintains the periodically time-varying property due to the similar structures of G −1 and H.The structure of the effective channel matrix H eff is given as where we see that the matrix has a block Toeplitz structure with rows repeating every multiple of 2T.It defines 2T sets of effective channel coefficients as h 0 , h 1 , . . ., h 2T−1 , where . ., and L is the effective channel length.The effective length L may be significantly extended by the delay introduced by the relay, as well as the group delay introduced by the whitening filter.We can determine the lower bound of the effective channel length in terms of the constituent channel lengths and the relay period as At time n, the corresponding coefficients of the periodically time-varying effective channel are where mod(•) is the modulus.The ideal output of the whitening filter at time n is then given by which is simply the convolution of the source symbols with the periodically time-varying channel coefficients.Thus, the system model for relay-aided transmission through ISI channels reduces to a classical MLSE problem, with the additional twist that the effective channel is periodically time-varying.As the true output is of course corrupted by AWGN, the maximum-likelihood detector for estimating the source symbols x from (11) can be accomplished most efficiently with the Viterbi algorithm.
For the branch metric unit (BMU) in Figure 3, the branch metrics along the trellis path are not only related to state transitions but also the current time instant.The branch metric calculation is modified as where y g [n] is the signal from the whitening filter at the time instant n, s[n] and λ[n] are the estimated output signal and the corresponding branch metric the time instant n, respectively.The add-compare-select (ACS) unit in Figure 3 recursively computes path metrics and decision bits, j) [n] , n = 0, 1, . . ., (17) where Λ ( j) [n] denotes the path metric at state j at the time instant n, and λ (i, j) [n] is the branch metric from state i to state j at instant n.The path metrics for each state are updated for the next iteration, and the decision indicating the survivor path for state j is recorded and retrieved from the survivor-path memory unit (SMU) in order to estimate the transmitted symbols along the final survivor path.Similar to the traditional MLSE, the implementation cost of the ML detector for relay networks increases exponentially with respect to the effective channel length.Furthermore, the overhead of the proposed detector when compared with the traditional MLSE comprises the whitening filter, the multiplexers in the BMU, and additional control logic to account for the periodically time-varying effective channel [22].As indicated in (13), the effective channel length increases with the relay period T. In cooperative relay systems, the relay period T is likely chosen to be very long, possibly spanning hundreds of symbols, so that the relay is not required to switch frequently between transmit and receive modes.When the relay period T is large, however, an implementation of the Viterbi algorithm-based optimal detector becomes not practical.This problem will be addressed in Section 4.
the sparse channel have equispaced coefficients, however, which usually cannot be satisfied in practice.Although a generalized PTVA is given to deal with general sparse channels, its performance loss is remarkable if the channel taps are not well-approximately by an equispaced structure.
A multitrellis Viterbi algorithm (MVA) was proposed in [13] for near-optimal detection in sparse ISI channels.For the MVA, the complexity does not depend on the channel impulse response length but only on the number of nonzero coefficients.In order to process the sparse timevarying channels for relay networks, the MVA is modified and incorporated in our MVA-based ML detector.
To illustrate the operation of the MVA, we begin by considering an example.Assume h n ∈ {h 0 , h 1 , . . ., h 2T−1 } has only a few nonzero coefficients, for example, h n [i] / = 0 for i = 0, K, L − 1, (0 < K < L − 1).The ideal (noiseless) output signal at the time n is given by When x[0] is to be estimated, we can see that

and it is also needed in s[L − 1] and s[2K].
In this way, we record all the output signals and symbols related with x[0] in Table 1, and we use the notation of f (•) to indicate the dependency of outputs on input symbols and channels.Note that some output signals and symbols are not needed when x[0] is under detection; for example, if K / = 1, there is no need to record s [1], since x [1], x[1−K], x [2−L] in s [1] do not affect the estimation of x[0].With the traceback length L tb = 3(L − 1), for example, the estimation of by a noninstantaneous relationship, assuming that x[n], n < 0, are known.
We note that some related symbols appear only once in the first column, for example, x[L − 1 − K], and do not need to be recorded.Its value can be determined by an instant decision given by where ), and x[L − 1] and x[0] are known for a given state.When two or more symbols are determined by the instant decision, for example, x[3K] and x[3K − L + 1] in s[3K], the estimation is given by where

and x[2K] is known for a given state.
The definition of state depends only on the related time instant.By the list of related symbols, the state definition is derived and given in Table 2.Note that the state definition excludes the symbols assumed to be known, that is, , and the symbols that can be determined by the instant decision.
From Tables 1 and 2, it is observed that the corresponding trellis shrinks in two dimensions, which leads to a significant reduction in computational complexity.Furthermore, the process of traceback is faster, since, for some instant given the current state, the previous state can be obtained immediately without survivor path decisions.There are two categories for these instant tracebacks.First, the previous state definition is a subset of the current state definition.For example, the state at instant 2K is defined as ] at the instant L − 1 can be obtained from the current state without the help of the survivor path decision.Second, the state definition is the same for the current and previous state.For example, considering the instants K + 2L − 1, 2K + L − 1, 3K, and 2L − 2, we can bypass the traceback from K +2L − 2 to 2L − 2. Once the start state at the instant K + 2L − 2 is available, we can begin to traceback from the instant 2L − 2 at the same state.
When subsequent symbols are under detection, the structure of the trellis remains the same, except that the branch metric calculation for the first several instants are slightly different, since the initial symbols (e.g., ) have been estimated.The available estimated symbols will be used in the calculation of output signal s[n] when needed.
Due to the reduced-size trellis, the detector can be realized by utilizing L tb trellises working in parallel to increase the throughput.The received signals y g [n] are filled in the L tb trellises sequentially.At the instant L tb − 1, the first trellis is full, and x[0] is estimated.At the instant L tb , the received signal y g [L tb ] is ready to fill in the first trellis, and, also, x [1] is available from the second trellis.Notice that these trellis are similar in structure; however, the channel coefficients used in the branch metric calculation are not the same at different instants.For example, in the first trellis, h 0 , h K , h L−1 , . . ., h K+2L−2 are used in sequence for each step, while in the second trellis, h 1 , h K+1 , h L , . . ., h K+2L−1 are used for each step.The corresponding channel coefficients for the received signals y g [n] are h mod (n,2T) ∈ {h 0 , h 1 , . . ., h 2T−1 }.
For general Viterbi detectors in M-ary modulation systems, complex multiplications dominate the computational cost.There are M L−1 states, and each state corresponds to M complex multiplications for branch metric calculation.Then the estimated total computational cost is M L .For the proposed MVA detector in the sparse channel with 3 nonzero coefficients, there are M 3 states at most, and the corresponding computational cost for each trellis is M 4 .Thus the estimated total computational cost for MVA detector is L tb M 4 .Furthermore, survivor path decisions are recorded in the SMU which requires a significant amount of memory.The memory cost in bits for the general Viterbi detector is L tb M L−1 log 2 M. For the proposed MVA detector, we do not need to record the survivor path decisions for all L tb instants.Assuming that only L tb (< L tb ) instants are considered in each trellis, the memory size for each trellis is In general, the complexity of Viterbi detector is in O(M L ), where L is the channel length.In comparison, the complexity of the MVA detector does not depend on the

Related output signal
Symbols to be recorded channel length but on the number of nonzero coefficients.When instantaneous decisions are made appropriately so that the symbol dependency table simplifies [13], the computing and memory resource required for the MVA detector is in O(M L ), where L is the number of nonzero coefficients.Therefore, the MVA detector is a better solution for sparse channels.

Simulation Results
We simulate the bit error rate (BER) performance of the proposed ML detector with the following parameters: the transmitted BPSK signal consists of i.i.d unit-power symbols x[n] = {±1}, the relay transmitting power is denote bit energy-to-noise ratio for the destination and the relay, respectively.Unless otherwise specified, we assumed E b /N (r) 0 = E b /N (d) 0 + 10 dB, which represents a scenario where the source-relay link is better, on average, than the sourcedestination link.
We assume the SRRC pulse shaping filter is employed at both the transmitting and receiving ends.The SRRC filter is truncated to [−2/ f , 2/ f ] with a roll-off factor 0.5, where f = 5 MHz is the symbol rate.We simulate over 200 fading realizations in the ITU-R 3G indoor office test environment [24] with 6 independent channel paths.The time delay relative to the first path is [0, 50, 110, 170, 290, 310](ns), and the average power relative to the strongest path is [0, −3, −10, −18, −26, −32](dB).As a practical matter, it is  possible for the whitening filter to be quite long, depending on the noise covariance matrix.In these simulations, we truncate the whitening filter to have L = max(L sd , L sr + L rd + T − 1) taps.
We consider the effect of the number of independent paths on system performance.While additional ISI is often viewed as an impairment to reliable communication, the additional fading paths result in increased diversity, and hence increased BER performance through the cooperative relay.This effect is observed in Figure 5, where we consider three cases.Specifically, we truncate the ITU-R 3G indoor office channel so that it only uses either the first two paths, the first three paths, or all the six paths.We note that the BER performance with respect to E b /N (d)  0 improves as the number of independent paths increases.
We next consider the effect of the relay cooperation period T. Recall that the relay receives T symbols and then amplifies and retransmits those T symbols.As shown in Figure 6 for an uncoded system, better performance is obtained when choosing a smaller relay period.If the relay period T is much larger than the channel delay spread, the diversity technique of using cooperative relays becomes less effective since there is almost no overlap between the symbols directly received from the source and forwarded by the relay.The effect of the whitening filter is illustrated in Figure 7, where T = 3.As we expect, the BER performance with the whitening filter is better than the performance without it.It is also observed that if the noise on the source-relay link is small relative to the noise on the source-destination link so that σ d σ r , the whitening filter does little to help.The reason for this is that the noise looks approximately white when the   noise on the source-destination link dominates since σ 2 d I + σ 2 r H rd ΓΓ H H H rd ≈ σ 2 d I. Thus, in situations where the sourcerelay link is particularly good, it may be possible to reduce complexity without much performance penalty by removing the whitening filter.
To simulate the BER performance of the MVA-based detector, we set the channel length L {sd,rd} = 1, L sr = 2, so that each effective channel h n has 3 nonzero coefficients, and Table 1 can be applied directly.Without loss of generality, we assume that each coefficient is i.i.d. as CN (0, 1).The performance is given in Figure 8(a) with T = 3 and Figure 8(b) with T = 5.It is shown that performance loss from the MVA-based detector is negligible, as was claimed previously in [13].

Conclusion
A system model is introduced for cooperative communication in ISI channels with amplify-and-forward relays.Based on the system model, we present a maximum-likelihood detector design based on the Viterbi algorithm.For long relay periods, the equivalent channel was shown to become sparse.Moreover, the channel length becomes much larger which makes the ML detector hardly practical.Therefore, we proposed use of the MVA algorithm in which the complexity is determined by the number of nonzero taps in the effective channel impulse response.The simulation results show that the MVA algorithm provides near-optimal BER performance but at significantly lower hardware complexity.

Figure 2 :Figure 3 :
Figure 2: Signals received and transmitted by the relay when T = 2.

Figure 4 :
Figure 4: Effective channel impulse response in non-ISI channels.

Figure 5 :
Figure 5: Average BER performance for different channel length.
Optimal VA-based detector Suboptimal MVA-based detector

Figure 8 :
Figure 8: BER performance of the proposed MVA-based detector.

Table 1 :
The dependencies between x[0] and related output signal.