Relevance Vector Machines for Enhanced BER Probability in DMT-Based Systems

A new channel estimation method for discrete multitone (DMT) communication system based on sparse Bayesian learning relevance vector machine (RVM) method is presented. The Bayesian frame work is used to obtain sparse solutions for regression tasks with linear models. By exploiting a probabilistic Bayesian learning framework, sparse Bayesian learning provides accurate models for estimation and consequently equalization. We consider frequency domain equalization (FEQ) using the proposed channel estimate at both the transmitter (preequalization) and receiver (postequalization) and compare the resulting bit error rate (BER) performance curves for both approaches and various channel estimation techniques. Simulation results show that the proposed RVM-based method is superior to the traditional least squares technique.


Introduction
One of the research goals during the last decades was to provide broadband communication capabilities to and from the customer premises [1,2].To cope with the time dispersive transmission character of wireline and wireless communications, multicarrier modulation (MC) offers a viable solution.Since the early seventies a widespread interest was created due to an all digital implementation based on the cost-effective fast Fourier transform (FFT) algorithm [3].Nowadays, MC systems are included in digital audio or video broadcasting (DAB/DVB) [3,4] in wireless local area networks such as IEEE802.11a/g[5] and HIPERLAN2 [6], wireless wide area networks like WiMAX and LTE systems [7], and in wireline communication over twisted pairs such as in ADSL, ADSL2+,VDSL [8,9] and also in power-line local area networks [10].
At the present time, the transmission format as presented in [3] is more commonly known as Discrete Multi-tone (DMT) modulation or Orthogonal Frequency Division Multiplexing (OFDM).An inherent feature of DMT over traditional OFDM is that DMT systems make use of the channel state information (CSI) at the transmitter [11,12].
DMT divides the available bandwidth into several parallel frequency bins or tones, where each tone is quadrature amplitude modulated (QAM).Modulation and demodulation are efficiently carried out by means of an inverse Fourier transform (IFFT) and a fast Fourier transform (FFT), respectively.Contrary to analog MC systems, FFT-based implementation permits considerable overlap between subchannels and therefore provides a high bandwidth effiency.
As mentioned, the various flavors of Digital Subscriber Line technologies (xDSL) represent a good example of DMT practical applications over traditional twisted pair phone lines to connect between customer premises and the Central Office (CO).We note that the bit rates highly depend on the length of the twisted pair: higher bit rates are only offered on relatively short loops due to signal attenuation caused by the telephone line.Since line attenuation increases with length and frequency, short loops are able to exploit a broader frequency spectrum and hence enable higher data rates.Tones with higher Signal-to-Noise Ratio (SNR) are able to carry more bits for a predetermined Bit Error Rate (BER) than those with low SNR.The measurement of SNR is usually performed during the initialization phase of the modem during which the frequency response of the channel is also estimated.To deliver high bit rates towards the customer, DMT modems depend on advanced digital signal processing to mitigate several loop (channel) impairments such as time dispersion, noise, echo, and radio frequency interference (RFI) [13,14].In this paper, we formulate a Relevance Vector Machine (RVM) method [15] for a DMT-baseband system in a regression mode to improve channel estimation when the SNR is low to moderate, which in turn, will improve the probability of error performance when compared with the traditional Least Squares (LS) approach [16].
In Section 2 of this paper, we describe the models used for analysis and design.In Section 3 we present the RVM channel estimation methodology.In Section 4 we present the channel equalization approaches used.In Section 5 we present simulation results, and we conclude in Section 6.

System Model
Consider the system block diagram shown in Figure 1, which represents a baseband Discrete Multi-Tone (DMT) link that includes a DMT modulator, the channel model, DMT demodulator, and frequency domain equalizer (FEQ).
Figure 1(a) depicts the DMT system with FEQ implemented at the receiver (postequalization), while in Figure 1(b) we show the configuration of FEQ performed at the transmitter (preequalization) [11].[3] splits a high-rate data stream, R, into N lower rate streams transmitted simultaneously over subcarriers.The symbol rate for any of the N complex data streams is R/N symbols/second, which are sent to a 2N-point inverse fast Fourier transform (IFFT) block after zero padding, forming a complex conjugate symmetry around the center of the IFFT and converting the frequency-domain data symbols into 2N time-domain realvalued samples as shown in Figure 2.

The DMT Modulator. DMT transmission
A cyclic prefix (CP) of length v is prepended to the 2Npoint time-domain samples to form the cyclically extended DMT symbol.This is similar to the configuration used for DMT-based Asymmetric Digital Subscriber Line (ADSL), but with fixed modulation on each subchannel (i.e., QPSK).The cyclic prefix length, v, is chosen to encompass the maximum delay spread of the channel to prevent intersymbol interference (ISI) and making the DMT symbol appear periodic over the time span of interest.

The Channel Model.
When the channel impulse response (CIR) is smaller than or equal in length to the CP length, v, the 2N-DMT-sampled time points (Baud) and the channel are circularly convolved.This enables straight forward frequency-domain channel equalization at the receiver.However, when the order of the CIR is larger than the cyclic prefix, more complex equalization techniques are required [17].In this paper, the channel that we will consider has a CIR that does not exceed the CP.Hence, inserting of the CP at the transmitter and then discarding the first v samples at the receiver give the received real-valued sampled-time domain samples over one DMT data block (baud) as follows: where the vector r = [r 0 , r 1 , r 2 , . . ., r 2N−1 ] T contains the received time-domain samples, the vector T contains the transmitted time-domain samples, and the vector n T contains the samples of the band-limited additive noise with average power (variance) σ n 2 .The matrix multiplication operator in (1) represents circular convolution without intersymbol interference (ISI).The circulant matrix can be diagonalized by pre-and post-multiplication with the 2N-point discrete Fourier transform (DFT) matrix, W 2N , and inverse DFT (IDFT) matrix (W 2N ) −1 [18] that is, where [H 0 , H 1 , H 2 , . . ., H 2N−1 ] T is the frequency response of the channel.
When the transmitted symbols are generated in the frequency domain (i.e., T ), and the received samples are converted to the frequency domain , we obtain the following very simple input-output relationship: or, alternatively as follows: where 4) and ( 5), we can see that each received frequency-domain symbol on each of the subchannels is simply a scaled version of the transmitted frequency-domain symbol plus white Gaussian noise.Moreover, every sub-channel can be processed independently of the other subchannels.In other words, block transmission with CP has converted a time dispersive channel into 2N parallel, narrowband flat sub-channels, or tones each having a channel gain H k and additive Gaussian noise, hence the term discrete multi-tone transmission (DMT).

Channel Estimation
An important aspect of DMT is the fact that it uses an optimized frequency division allocation of energy and bits to maximize the achievable data rates that can be transmitted over band-limited channels.During startup of the DMT modem, the receiver measures the quality of the signals received (SNR) on each tone and reports this information to the transmitter [9], this is repeated periodically for dynamic channels.To estimate the channel during initialization, a pseudorandom frequency-domain training (pilot) sequence, Here, without loss of generality, we normalize X b so that |X b (k)| = 1.Now, the received pilot symbols can be expressed in vector form as follows: In ( 6), R b is the 2N × 1 received signal vector, the diagonal matrix X b is the transmitted (pilot) signal, N is a vector containing the complex noise of the 2N subcarriers, and H is a vector that contains the overall complex channel gains between transmitter and receiver.
The problem at hand is to estimate the frequency-domain channel vector, H k , (k = 0, 1, 2, 3,. .., N −1) as the FFT of L unknown sample-spaced time-domain tap gains, where L is chosen to encompass the maximum expected delay spread and does not exceed the CP length, v.The frequency-domain model of the channel for each of the 2N tones is given by where h n is channel tap at discrete time n.
To perform channel estimation from (6), we start by multiplying each of the frequency-domain symbols by the conjugate of the training symbols to produce the vector T b as follows: where the superscript H denotes conjugate-transpose.
Because of the property of unitary magnitude of the training symbols, (8) can be expressed as In the noise-free case, T b in (9) will contain the perfect frequency-domain channel gains.Due to channel noise, T b contains the channel estimates that are corrupted by the additive noise, N .This is equivalent to least squares (LS) estimation [16] in which no assumption about the channel impulse response length has been incorporated (i.e., h n can have values larger than zero for n = 0, 1, . .., 2N-1).
The LS estimator for the impulse response, H, minimizes the criterion (R b -X b H) H (R b -X b H) and is given by where due to the unitary magnitude of the training symbols we have When the values of the channel impulse response are forced to be zero for sample numbers larger than the cyclic prefix duration, v, then this will be equivalent to "modified LS" estimation [16].
Our objective is to improve the performance of the channel estimation at low to moderate SNR after truncation of the channel impulse response to v [19].Specifically, we will use a regression model similar to the one used for the relevance vector machine method (RVM) [15,20].We apply this model to baseband-DMT channel estimation by first taking the inverse Fourier transform of (9) to fit a regression model as follows: where In order to estimate h from the observations of t b in (12) and ameliorate the effects of the noise, we will use sparse Bayesian regression.This approach has the property of setting the appropriate regression weights to zero automatically, avoiding the fitting of noise in the signal t b .
Assume that the channel can be approximated using the function y(n) which is the linear combination of kernel functions as follows: This is a convolution that can be written in matrix vector as where w = [w 0 , w 1 , w 2 , . . ., w i , . . ., w v ] T are the model weights, and Φ is the v × (v+1) convolution (design) matrix that is created from the kernel we select.One commonly used kernel is the Gaussian-shaped given by However, there are many other choices (e.g., spline, polynomial, etc.) [20].Now, our model becomes where y = [y 0 , y 1 , y 2 , . . ., y v ] T is an approximation function and e = [e 0 , e 1 , e 2 , . . ., e v ] T is the vector with the regression model error.
The Bayesian model used assumes that the errors are modeled as independent identically distributed zero-mean Gaussian random variables, with variance σ 2 , so The parameter σ 2 can be set in advance if known, but can be also estimated from the data.This error model implies a multivariate Gaussian likelihood for the target vector Maximum-likelihood estimation of w from ( 18) is equivalent to the Least Squares solution given by and leads to severe overfitting of the data [20].
In order to overcome this problem, a flexible Gaussian prior over the weights w and Bayesian inference will be used [20].More specifically, with α is a vector of v hyperparameters [15,20].The flexibility of this prior is based on the fact that it uses a separate hyperparameter for every weight w i .The weights w are marginalized in according to This integral can be found in closed form and is given by where we have defined ) is called the marginal likelihood (or evidence) for the hyperparameters α and σ 2 [21].Because the involved pdfs are Gaussian, calculation of the posterior in closed form is also possible and is given by with The regression estimates obtained by Bayesian Inference are given by Φμ w/t where the hyperparameters α and σ 2 used in μ w/t are the Maximum Likelihood (ML) estimates given the observations t b and are found by maximizing p(α, σ 2 | t b ) [15].However, maximization of ( 22) with respect to α and σ 2 , which is termed as the Type II ML [20,21], is equivalent to finding the maximum of p(α, σ 2 | t b ) assuming a uniform hyperprior [15,20].The values of α and σ 2 which maximize ( 22) cannot be obtained in closed form.However, they can be found by stationary point analysis of (22).More specifically, they are found by the following iterative update equations [22]: where μ i is the ith component of the posterior mean μ w/t from ( 24) and the quantities γ i are defined by γ i = 1 − α i Σ ii with Σ ii being the ith diagonal element of the posterior weight covariance and Σ w/t from ( 24) computed with the current α and σ 2 values.
The learning algorithm thus proceeds by iterating between ( 24) and ( 25) until some convergence criteria have been satisfied [15,22].It can be shown that this iteration is equivalent to an expectation-maximization (EM) algorithm [20].The E-step is (24), and the M-step is (25).This relation guarantees convergence of the proposed algorithm [20].
An important property of this model is that the optimal value of most hyperparameters α i is typically infinite [15,22].From (24) this implies that the corresponding μ i is zero.Thus, the regressor y(n) = v i=0 μ i ϕ(n − i) is sparse since many of its weights μ i are zero.In matrix form, the RVMchannel estimate is expressed as (26)

Frequency-Domain Channel Equalization
If the channel is known for the noise-free case, we can perform the equalization by inverting (1) assuming C is nonsingular.This is the zero-forcing (ZF) equalizer [23].The eigenvalues of the 2N × 2N circulant matrix are equal to the DFT coefficients of the first column [24].Since the first column has the channel coefficients in ascending order, from (6) these eigenvalues are where W = e j2π/2N , and H k represents the channel frequency response.Note that h n = 0 for L < n < 2N.
The circulant matrix C can be diagonalized with the DFT matrix [24] as mentioned in (3) as follows: Because the DFT (and thus its matrix W 2N ) is invertible, we deduce that the circulant matrix, C, is invertible if and only if Λ C is invertible.And, the diagonal matrix will be invertible if it has no zero on its diagonal (i.e., the channel transfer function has no zero on the DFT grid).The diagonal elements of [Λ C ] −1 are 1/H k , which can be regarded as a set of DFT-domain equalizers.The symbols are equalized at the receiver as in Figure 1(a) in the frequency domain before detection as follows: where X is a 2N × 1 vector, and R is a 2N × 1 received signal vector.Of course, in practice channel noise can be amplified by [Λ C ] −1 if the coefficients, 1/H k , are large.This is the result of an optimization based on the ZF criterion which aims at equalizing the symbols regardless of noise level.Where on each subcarrier, the received symbol is multiplied by a complex multiplier to off set the effect of the channel, which is manifested in the form of attenuation and phase rotation.We can express the equalized received symbol on each subcarrier in equation form as In this version (postequalization), the receiver has all the complexity.If the channel is known at the transmitter (e.g., transmitter has access to CSI through a feedback channel [25]), as it is the case in DMT [11,12].We can move [Λ C ] −1 to the transmitter.Equalization in this configuration is implemented at the transmitter (preequalization) as shown in Figure 1(b).This results in superior performance [11] as will be demonstrated in the simulation section.The symbols are equalized in the frequency domain before modulation and transmission as follows: where X is a 2N × 1 symbols vector.And the received symbol on each subcarrier after demodulation is expressed as We can see that channel noise will not be amplified for this second version.In addition, this yields a useful configuration for the cases where the receiver is simpler.

Simulation Results
To evaluate the performance of the DMT/RVM system, it was simulated using MATLAB.The RVM-channel estimation scheme was evaluated and compared to other methods in addition to the ideal perfect case.The following are the common simulation parameters: 2N = 512 is the DMT FFT size; v = 32 is the Cyclic Prefix length.The additive channel noise is band-limited white Gaussian with power σ 2 n .The actual channel length is L = 32.The used subchannels consisted of tones 6 to 255.Also, each of the subcarriers uses a QPSK modulation scheme.
In Figure 3, we present an example target vector time domain samples of (12), which in the noise-free case should be the actual impulse response of the channel and are superimposed on the same plot.The channels were assumed to remain constant after each sounding using the pilot sequence.Figure 4 depicts the actual sampled-time domain channel in the cyclic prefix range where it is equal to zero for values of time larger than the cyclic prefix.The target vector is also plotted in the same time range, which is equivalent to the Least Squares (LS) estimate of the channel.The proposed Bayesian sparse regression algorithm is used to process the target vector samples to filter out the noise at an SNR of 5 dB.The locations of the nonzero regression weights are identified on the plot with circles.Notice that we use only 7 nonzero weights from a total of 32.The remaining 25 are set to zero by the sparse model.To demonstrate the effects of estimating the channel impulse response on the frequency attenuation of each subcarrier (and hence equalization), in Figure 5 we plotted the frequency response of the estimated channel, the actual channel, and the original noise corrupted channel.
From Figure 5 we observe that in the frequency domain the estimated channel by the sparse model has smoother variation between adjacent tones and is closer to the desired frequency response.
When FEQ is implemented at the transmitter using the estimated channel at the receiver, BER performance curves (labeled TX-) demonstrate in Figure 6 the superior performance attained as a result of the more accurate sparse regression channel (RVM) estimate over the LS one.Also, the BER performance curve based on the actual channel (perfect estimate) is superimposed on the same plot of Figure 6 as a performance bench mark.
We also compare in Figure 7 the accuracy in channel estimation when FEQ is implemented at the receiver as compared to that at the transmitter.We can make two observations First, even though it is intuitive that FEQ at the transmitter (preequalization) will achieve better performance, we demonstrate that it is far superior to that at the receiver (postequalization) for the same channel estimate.Second, the improved accuracy in channel estimation has limited effect on BER performance curves when FEQ is implemented at the receiver (labeled RX-), and BER curves are virtually identical for the actual channel, sparse regression (RVM) channel estimate, and the LS channel estimate.Low SNR on attenuated subcarriers renders the improved channel accuracy not important in this case, since equalization will also enhance noise simultaneously.

Conclusions
Proper equalization is commonly used to improve utilization of available bandwidth in a DMT-data transmission system.In this paper, an improved channel estimation method in a DMT communication system based on sparse Bayesian learning and the RVM method was presented.Deploying the RVM method, more accurate channel estimates in a low to medium SNR environments were attained.Although, frequency-domain equalization at the transmitter (preequalization) will intuitively yield better BER performance, we demonstrated the performance enhancement achieved when using the improved channel estimate via the RVM method.We also show through simulations that preequalization is more sensitive to channel estimation accuracy than frequency equalization at the receiver side (postequalization).Moreover, when compared to the traditional least squares technique, the proposed enhanced channel estimation scheme had superior performance and exhibited promising characteristics, particularly when applied to an ADSL system.

Figure 1 :Figure 2 :
Figure 1: (a) DMT system block diagram with FEQ at receiver.(b) DMT system block diagram with FEQ at transmitter.

Figure 5 :
Figure 5: Frequency response of actual and estimated channel.

Figure 6 :
Figure 6: BER curves versus SNR with equalization at Transmitter.

Figure 7 :
Figure 7: Comparisons of BER curves with FEQ at receiver to that at transmitter.