An Analytical Multimodulus Algorithm for Blind Demodulation in a Time-Varying MIMO Channel Context

This paper addresses the issue of blind multiple-input multiple-output (MIMO) demodulation of communication signals, with time-varying channels and in an interception context. A new adaptive-blind source separation algorithm, which is based on the implementation of the Multimodulus cost function by analytical methods, is proposed. First a batch processing analysis is performed; then an adaptive implementation of the (Analytical Multi-Modulus Algorithm) AMMA and its simplified version named (Analytical Simplified Constant Modulus Algorithm) ASCMA is detailed. These algorithms, named adaptive-AMMA and adaptive-ASCMA, respectively, are compared with the adaptive (Analytical Constant Modulus Algorithm) ACMA and the MMA (Multi-Modulus Algorithm). The adaptive-AMMA and adaptive-ASCMA achieve a lower residual intersymbol interference and bit error rate than those of the adaptive-ACMA and MMA.


Introduction
The context of this study is the Multiple-Input Multiple-Output (MIMO) interception, and more specifically blind demodulation in reception.The channel is supposed to be not stationary, so the methods used to demodulate the received signals must be fast.Such systems exist, among them are the Blind Source Separation (BSS) implemented with analytical methods.
BSS algorithms require few information about emission signals in order to process and recover the transmitted symbols (sources).This type of algorithms estimate an equalization matrix from received signals to obtain the transmitted sources.To find this matrix, different cost function can be minimized, like the Multimodulus (MM) [1,2], the Constant Modulus (CM) [3,4], the Simplified Constant Modulus (SCM) [5], and the Multiuser Kurtosis (MUK) [6].The algorithms in [1][2][3][4][5][6] use the stochastic gradient to minimize these cost functions, so they are slow to converge and with time-varying environments they are less accurate.So, analytical methods such as the adaptive Analytical Constant Modulus Algorithm (ACMA) [7][8][9][10], which converges quickly, should be used with time-varying channels.
Another particularity of the BSS algorithms is their ability to restore, in the output, the sources with a certain permutation and a certain rotation or phase.Moreover, when a frequency offset is present, the constellations in output of adaptive-ACMA and ACMA spin.In order to avoid that and track the frequency offset jointly to the blind source separation, an MM or SCM cost functions can be used.To our knowledge, the MM and the SCM cost functions associated with analytical methods have never been tried so far (we have presented these works partly in conference [11]).So, in this paper an adaptive Analytical Multimodulus Algorithm and its simplified version named the adaptive Analytical Simplified Constant Modulus Algorithm are proposed.
The performances of the adaptive-AMMA and adaptive-ASCMA are compared with those of the adaptive-ACMA and MMA.With time-selective channels, the adaptive-AMMA and the adaptive-ASCMA allow to obtain a higher signalto-interference noise ratio (SINR) and a lower bit error rate International Journal of Digital Multimedia Broadcasting (BER), than those obtained with the adaptive-ACMA and the MMA.Moreover, both proposed methods perform a source separation and a carrier phase recovery, unlike the adaptive-ACMA.
First of all, in this paper, the system model used here is described.Then a batch analysis of the blind source separation algorithm AMMA is performed, and an adaptive form of this algorithm and of its simplified version is presented.Finally, simulation results of the proposed algorithms are compared to those obtained with the adaptive-ACMA and MMA in time-varying envrionments.
The Kronecker and Khatri-Rao products have the following properties:

System Model
A MIMO system with N t transmitters and N r receivers is considered.In this system, the information symbols x(k) are transmitted through the time-varying MIMO N r × N t channel H(k) at the time k.Taking into consideration a carrier frequency offset δ f , the N r × 1 vector of the received signals y(k) can be expressed as where (i) y is an (N r × 1) representing the baseband received symbols without any time oversampling, (ii) x is an (N t × 1) vector representing the transmitted signals (sources).The sources are independent, identically distributed (i.i.d.), and mutually independent with zero mean and unit variance, (iii) b is an (N r × 1) vector representing the additive Gaussian noise, (iv) H is an (N r × N t ) unknown, full column rank, instantaneous linear and time-selective MIMO channel, (v) 1/T s is the baud rate of the transmitter.

The AMMA: A BSS Analytical Method so as to Minimize the MM Cost Function
3.1.Hypotheses.The BSS can be performed under the following hypotheses.
(i) N r ≥ N t .
(ii) H has i.i.d.complex components, is unitary unlike the received signals y(k), and must be prewhitened before applying the BSS.
(iii) The noise is additive, white, and Gaussian with a zero mean and a covariance matrix The sources are zero mean discrete-time sequences, with a covariance matrix C x = E[xx H ] = I Nt , and they must be mutually independent at a given time and identically distributed.

BSS's Principle.
The BSS objective is to find a matrix (N r × N t )W(k) such as where the output vector sequence z(k) contains accurate estimates of the N t source signals x(k).Each column of W(k) allows to find one source.So, all the columns of W(k) must be orthogonal with one another, in order to obtain independent sources.The matrix channel H must be unitary, otherwise the received signals will have to be prewhitened.Generally H is not unitary, so we will consider a prewithening operation which also reduces the dimension of x(k) from N r to N t .An underscore is used here to denote pre-whitened variables.Thus, let y(k) = F H y(k) be the prewhitened received signals, where F is an (N r × N t ) prefilter.Like in [8], the prewhitening algorithm in [12] is used.The initial problem to find W(k) is then replaced by finding a (N t × N t ) matrix T(k): where W = FT and Ξ = H H (k)FT are a global separation matrix.Unfortunately, the calculated matrix T(k) separates the sources, except for a possible permutation of z(k) and an arbitrary phase on each source signal.The different BSS cost functions, described in Section 3.3, are indeed minimized with any permutation and any rotation on sources.Thus after separation, separator outputs present the following form: where Φ is a diagonal matrix and Γ is a permutation matrix which represent the arbitrary phase and the permutation, respectively.
In order to find T(k), several cost functions can be used.The next section recalls the well-known (Constant Modulus) CM cost function and (Multimodulus) MM cost function.

Cost Function
3.3.1.CM Cost Function.Many telecommunication signals have a constant modulus (PSK, 4-QAM), normalized to one generally.If the samples of one source are shown on a complex plan, every samples are on an unitary circle, but it is not true if we show a linear combination of several sources.It is this property that cost function CM exploits.Thus, the algorithms using the CM cost function make use of the sources' constant-modulus property: where t n represents the nth row of T and . This criterion forces BSS outputs to be on a circle with radius R.

MM Cost Function.
The cost function is composed of two cost functions; one cost function for real part (R) and one for imaginary part (I) of the equalizer output z(k): The error estimation for the real and imaginary parts is done jointly, and so the cost function uses implicitly the phase of the equalizer output.Thus, the carrier phase recovery is also accomplished with blind equalization, while the equalizer output signal constellation, obtained when using the CM cost function, suffers from an arbitrary phase rotation.So, the MM cost function is most interesting and used later in this paper.

SCM Cost Function.
The SCM cost function [5] consists in projection of the equalizer outputs on one dimension (either real or imaginary part).This cost function presents a lower computational complexity compared to the CM and MM cost functions:

Implementation of Cost Function.
Several methods exist to implement cost functions.Among them are the stochastic gradient descent and analytical methods.
The algorithms using stochastic gradient are adaptive algorithms which are slow to converge.Furthermore, in varying environments, subsequent signals tend to be less accurate because of a maladjustment.These algorithms, using a stochastic gradient in order to minimize the cost functions CM and MM, are named, respectively, (Constant Modulus Algorithm) CMA [3,4] and (MultiModulus Algorithm) MMA [1].
Analytical methods exist in both the adaptive form and batch form.With the adaptive form, the algorithm converges quickly, so it is an interesting option in varying environments.
Our choice is to use, for varying environments, an adaptive analytical method so as to minimize the MM cost function.So, a new algorithm named AMMA and which uses this association, is proposed, once a batch analysis has been performed.

The Analytical Multimodulus Algorithm.
The following cost function should be minimized: In order to bring out the real and imaginary parts z l (k), this last term is written as where The matrix G has the following properties: Using Kronecker product properties and taking into account [9], R 2 (z n (k)) and I 2 (z n (k)) can be written as International Journal of Digital Multimedia Broadcasting (y(k) ⊗ y(k)) T are then, respectively, stacked in the N × (2N t ) 2 matrix P and P.
Using the Khatri-Rao product (A ), P and P can be written as P = ( Y • Y) T and P = (Y • Y) T .Let us define the (2N t ) 2 × 1 vectors d l = t l ⊗ t l and 1 = (1, . . ., 1) T , the cost function is then given by with The linear system can be rewritten by using a QR decomposition [7,8].We have Q, an N s × N s unitary matrix, and such as Q1 = N s 1 0 .After doing the QR factorization of (1 P) and (1 P) and applying Q to (1 P) and (1 P), we obtain with 1 the column vector composed of N s ones, 0 a column vector composed of N s − 1 zeros, γ T and γ T two 1 × (2N t ) 2   rows vectors and O and O two N s − 1 × (2N t ) 2 matrices, then The cost function becomes where By using the definition of the estimated covariance matrices of y(k) and y(k), named, respectively, R y and R y , and by using the property of the Kronecker product, we obtain the following equalities: Thus The criterion J AMMA is minimized when γ   However, these vectors have not necessarily a Kronecker structure.Thus, in order to obtain the vectors t l in the image and to have a Kronecker struture, we search a N t × N t full rank matrix Λ that relates the two bases such so that with T = ( t 1 , . . ., t Nt ) and D = (d 1 , . . ., d Nt ) the 2N t × N t and (2N t ) 2 × N t matrix, respectively.Using the property of the Kronecker product, this minimization problem can be rewritten as Please note that if the matrix T was square, the minimization of (23) would be equivalent to a joint diagonalization.In order to obtain the T matrix, let us suppose in first phase that it is indeed square.Thus, with the aid of joint diagonalization of D n matrices, a 2N t × 2N t T matrix is obtained.This diagonalization can be done by using the algorithm described in [13].The columns of the matrix T are dependent; the 2N t × N t matrix T is extracted by keeping the columns of T linearly independent.Now, this algorithm is implemented thanks to an adaptive method.So, we must estimate in an adaptive manner, C, C, and the eigenvectors of C + C associated with the smallest eigenvalues.The methods used are described in the next section.Then the adaptive tracking of the minor subspace of C+C is performed thanks to the use of the NOOJA algorithm.Finally the adaptive update of the joint diagonalization is realized.

The Adaptive Method. The function d T
l ( C + C)d l will be minimized, at each time k, but iteratively The matrices C and C were defined by a batch of N samples.In this adaptive method they are converted into an exponential window (λ-scaling).The details are shown in [8].
The following method used to estimate iteratively the matrix C and C is a generalization of the Van der Veen method [9] Then, C(k) and C(k) are, respectively, the estimations of the autocorrelation matrices for the signals c(k) and c(k), with 0 where We saw that the solution of this minimization is in the image of C + C. In order to obtain these vectors in an adaptive manner, a subspace tracking algorithm is used, associated with an adaptive joint diagonalization.
Subspace Tracking.Used alone, the (Normalized Orthogonal Oja) NOOJA algorithm of [14] is utilized to track International Journal of Digital Multimedia Broadcasting minor subspace spanned by the eigenvectors relating to the smallest eigenvalues.But, when associated with an adaptive joint diagonalization, as described in the next section, the NOOJA algorithm tracks the minor subspace spanned by the eigenvectors in the image of C + C relating to the smallest eigenvalues which are different from 0. This algorithm extracts adaptively the minor subspace V relating to the input signal r(k)'s autocorrelation matrix R, by maximization the following cost function: In our problem, the minor subspace of C + C is searched.So, in order to use the NOOJA algorithm, C + C must be expressed as C+C = E[cc T ].But since the function f (cf.( 25) and ( 26)) is not bilinear, the vector c is unknown.Instead of searching the minor subspace of C+C, the minor subspace of both C and C will be searched.Since C can not be expressed as The matrices C and C are similar.They have therefore the same image.In order to find the minor subspace D = (d 1 , . . ., d Nt ) spanned by the eigenvectors and relating to the smallest eigenvalues different from 0 for both C and C, we propose to maximize the following cost function: Since C and C have the same image, the first term of the cost function is sufficient to obtain convergence with a constant channel.Using only the matrix C is equivalent to minimize the cost function Simplified CM [5] in analytical way.This simplification reduces the complexity of the adaptive AMMA and we name it the adaptive Analytical Simplified CMA (ASCMA).Please note that the ASCMA batch analysis is the same with the AMMA batch analysis performed in Section 3.5.1.However, the study is solely about C rather than C + C.
The maximization of J(D) may be achieved by using the gradient-descent technique: where The time instant k has been removed to simplify the equations.
By using D T (k)D(k) = I, the cost function can be simplified as By replacing (32) in (34) and by doing the gradient of J(D(k + 1)) with respect to β(k), the optimal stepsize β opt (k) is obtained Keeping in mind that D(k) is orthogonal at each iteration, that is, D T (k)D(k) = I, and replacing C(k) and C(k) by the estimate c(k) c T (k) and c(k)c T (k), respectively, we can write where The optimal step size becomes The instant k has been removed in the β opt (k)'s expression in order to simplify the equation.The updating of D(k + 1) is performed as where the matrix D(k + 1) is obtained thanks to a Gram-Schmidt orthogonalization of D (k + 1).The algorithm obtained is described in Algorithm 1 .
Once the eigenvectors relating to the smallest eigenvalues of both C(k) and C(k) are obtained, the constraint d l = t l ⊗ t l must be satisfied.In other words, d l must have a Kronecker structure and must be in the image of C(k) + C(k).
Adaptive Update of the Joint Diagonalization.The method used to satisfy the constraint d l = t l ⊗ t l is done as in [9].T • T is computed and regarded as the current estimate of the subspace basis (D = T • T).Using this basis, the subspace update is performed by using the "Normalized Orthogonal OJA" giving D. The last step of this algorithm is the mapping of the columns d l of D to a Kronecker-product: (39) Then a power iteration [15] is applied.This iteration can be written as At the next iteration, the matrix D in the image, which has a Kronecker structure, is obtained with D = T • T.Then, D is transmitted at the NOOJA algorithm, allowing to vectors d l obtained in output NOOJA to be in the nullspace and to have a Kronecker struture.

end for end for
Algorithm 1: Adaptive nullspace of C and C tracking by using NOOJA.
Initialize F, T, p = 0, p = 0, α = 0 for k = 1, 2, . . .Update F, the prewhitened filter, by using y(k) and [12] we obtain y(k Estimated Symbols.In order to obtain the transmitted symbols, the complex matrix T is rebuilt from T t lm = t lm + j t l(m+Nt) , l, m ∈ 1, . . ., N t . (41) In order to prevent the extraction of the same source many times, T is reconditioned after each update.The technique used here is the computation of a singular value decomposition of T: T = σ j u j v H j .Then the singular values of T, which are smaller than 0.5, are replaced by 1 Then, the matrix T is applied on the pre-whitened received signals: The suggested adaptive-AMMA algorithm is described in Algorithm 2 .
International Journal of Digital Multimedia Broadcasting

Simulation Results
The channel used in simulations is a time-selective channel; that is, it is time varying but flat in frequency.The transmitted symbols, resulting from a 4-QAM constellation, are sent through a MIMO Rayleigh fading channel.Two transmitters and four receivers are used (N t = 2 and N r = 4).The performances of proposed adaptive-AMMA are compared with those of the adaptive-ASCMA, the adaptive-ACMA and the MMA, in terms of Bit Error Rate (BER) and Signal to Interference plus Noise Ratio (SINR).
In the simulation results of the Figure 1 , we considered the case of a carrier frequency offset (δ f T s = 10 −3 ) with SNR = 30 dB.The equalizer output constellation of the adaptive-ACMA (Figure 1(e)) spins because of the carrier frequency offset.Normally, the MMA (Figure 1(f)) allows to track the carrier frequency offset, but in this case the carrier phase is too high to obtain a good tracking.Only the proposed adaptive-AMMA and adaptive-ASCMA can remove the frequency offset (Figures 1(c) and 1(d)) , without the use of a carrier tracking loop.
Figure 2 shows the average BER as a function of SNR (with the term "average" being here a time average), and this for two sources and 1000 runs, each one presenting different randomly varying channels and different noise sequences.The value of the Doppler shift f d T s is 1•1e −3 .The simulation here was realized without any carrier frequency offset.The lowest BER on Figure 2 is obtained with the proposed adaptive-AMMA algorithm, followed by the adaptive-ASCMA, then by the adaptive-ACMA and the highest BER is obtained with the MMA.So the proposed adaptive-AMMA algorithm allows to obtain a 4 dB gain in SNR terms compared with that obtained by the adaptive-ACMA with a value of BER equal to 10 −4 .
Figure 3 illustrates the convergence speed of the adaptive-AMMA.The path gain of the channel and its estimate obtained by the adaptive-AMMA are represented on the same figure.We can notice that estimation of adaptive-AMMA follows the variations of the channel.
From the results of the Figure 4 , we can see that the performances of the proposed adaptive-AMMA and the adaptive-ASCMA are very closed in residual SINR terms.Moreover, these two algorithms perform higher residual SINR than the adaptive-ACMA and the MMA.

Conclusion
In this paper, we have proposed two new BSS algorithms, named adaptive-AMMA and adaptive-ASCMA, build on the Multimodulus criteria, the Simplified Constant Modulus respectively, and an analytical processing method.The second algorithm is a simplified version of the adaptive-AMMA and it accomplishes performances results closed to the adaptive-AMMA.These algorithms perform better results in BER and SINR terms compared with the adaptive-ACMA and the MMA algorithms.In addition to the blind separation, they can perform a carrier phase recovery simultaneously unlike the adaptive-ACMA.
International Journal of Digital Multimedia Broadcasting Afterward, let us define the nullspace and the image of where ⊕ denotes the direct sum, that is, ker(C) ∩ image(C) = {0} and ker(C) and image(C) are subspaces of R (2Nt) 2  .
Let us assume now, for the sake of simplicity, that N t = 2 and Nr ≥ 2, dim( Afterward, the following affirmations will be proven The e i are linearly independent; verify Ce i / = 0 and we can verify the Property 1.The e i and the e i are linearly independent, then {e 1 , . . ., e 6 , e 1 , . . ., e 10 } is a basis for R 16 .Now, the following theorem can be proved.So, X = vec −1 (x) is a (4 × 4) skew-symmetric matrix.X and vec −1 (t ⊗ t) with t = (t 1 t 2 t 3 t 4 ) T ∈ R 4 are equal to

Notations
The following notations are adopted: * : Complex conjugation T : Matrixorvectortranspose H : Matrix or vector complex conjugate transpose R: Realpart of complex I: Imaginary part of complex Khatri-Rao product: F : The Frobenius norm

3. 5 . 1 .
The Batch Analysis.Let us denote N a fixed number of received signals, the N rows of ( y(k) ⊗ y(k))T and

2 and
|γ T d l − R| 2 allow to avoid the trivial solution d l = 0.By squaring (14), the expressions of vectors γ and γ, and of matrices C = O T O and C = O T O are and Od l 2 = 0. Afterwards, we consider γ T d l = R, γ T d l = R as constraints.Since received signals are pre-whitened and R y = R y = I thus and to a scaling which is not important the optimization problem becomest l = arg min ) C(k) + C(k) d l (k).

( 21 )
In the appendix, it is shown that vectors d l , for all l ∈ {1 • • • N t }, which minimize the function Nt l=1 d T l ( C + C)d l under the hypotheses d l = t l ⊗ t l and d l = R, are in the image of C + C (c.f.Theorem 2 and Lemma 2).Moreover, the minimization of Nt l=1 d T l ( C + C)d l is equivalent to minimize d T l ( C + C)d l , for all l ∈ {1 • • • 2} (c.f.Theorem 1 and Lemma 1).Now, we will see how to create vectors d l that present a Kronecker structure, in the image of C + C. 3.5.2.The Batch Method.The vectors d l which minimize d T l ( C + C)d l are eigenvectors associated with the smallest eigenvalues.

Gain of h 11 Figure 3 :
Figure 3: Gain of path h 11 and h 22 without carrier frequency offset versus symbols number.N t = 2, δ f T s = 0, f d T s = 0.0033 and SNR = 30 dB.

Definition 1 . 10 = {e 1 ,
. The vectors d l which minimize d T l ( C + C)d l are in the nullspace of C + C, but the sole vectors 0 R 16 verify the constraint d l = t l ⊗ t l , yet 0 R 16 is a trivial solution to be avoided.However, the vectors d l are in the image of C + C. By construction, the sole dependence obtained on the columns of C + C is C 2 = C 5 , C 3 = C 9 , C 4 = C 13 , C 7 = C 10 , C 8 = C 14 , C 12 = C 15 , where C n represents the nth column of C + C. So, dim[ker( C + C)] = 6.A basis B ker of the nullspace is defined as B ker = e 1 , e 2 , e 3 , e 4 , e 5 , e 6 = {e 2 −e 5 , e 3 −e 9 , e 4 −e 13 , e 7 −e 10 , e 8 −e 14 , e 12 −e 15 }, (A.5)where e i , i ∈ {1, . . ., 16} are vectors in the standard basis of R 16 .The e i are linearly independent and verify ( C + C)e i = 0. Thanks to the following rank-nullity theorem:rank C + C + dim ker C + C = dim R 16 , (A.6)and since the dimension of ker( C + C) is equal to 6, the dimension of Image( C + C) is equal to 10.And B im is a basis of image:B im = e 1 ,e 2 , e 3 , e 4 , e 5 , e 6 , e 7 , e 8 , e 9 , e e 6 , e 11 , e 16 , e 2 + e 5 , e 3 + e 9 , e 4 + e 13 , e 7 + e 10 , e 8 + e 14 , e 12 + e 15 }.(A.7)

1 :
Vectorofall1s 0 N : An N × N matrix with zeros components I N : The N × N identity matrix E[•]: Expectation operator diag(a): A diagonal matrix constructed from a vector a vec(A): Stacking of the columns of A into a vector 1 t 1 t 2 t 1 t 3 t 1 t 4 t 1 t 2 t 2 2 t 2 t 3 t 2 t 4 t 1 t 3 t 2 t 3 t 2 3 t 3 t 4 t 1 t 4 t 2 t 4 t 3 t 4 t 2 (t ⊗ t) is a (4 × 4) symmetric matrix and its diagonal values are positive, while X is a skew-symmetric matrix.So, vectors x do not have a Kronecker structure, except 0 R 16 .So, vectors d l are not in the nullspace of C+C, but vectors d l are in the image of C + C. Using property 1, we deduce the following lemma.Let us consider E, the set of vectors in R 16 , which have a Kronecker structure: E = {p ∈ R 16 /p = t ⊗ t, t ∈ R 4 }, then E ⊂ Image( C + C).