Decoupled Independent Vector Analysis Algorithm for Convolutive Blind Source Separation without Orthogonality Constraint on the Demixing Matrices

In this paper, we consider the problem of convolutive blind source separation in frequency domain and introduce a solution to the problem in an independent vector analysis (IVA) framework. IVA utilizes both the statistical independence of different sources in each frequency bin and the statistical dependence of the same source in different frequency bins. However, most of previous works impose orthogonality constraint on the rows of each separation matrix which may undermine the separation performance. In this work, we propose a nonorthogonal IVA algorithm based on decoupled relative Newton method.This proposed algorithm updates the separation matrices row by row, and unlike deflation separation algorithm, there is no separation error accumulation arising. Simulation results are provided to show the superior convergence behavior and separation performance of the proposed algorithm.


Introduction
Blind source separation (BSS) has been widely researched over the last decades since it is able to estimate the source signals from their mixtures (observed sensor signals) without knowing both the mixing process and the sources, such as crosstalk separation in telecommunications [1], speech enhancement [2], and biomedical signal processing [3].From the mixing model point of view, the works on BSS can be divided into two categories, which are instantaneous mixing and convolutive mixing.For the instantaneous mixing BSS problem, the most popular and promising method to address this issue is independent component analysis (ICA) [4][5][6].ICA assumes that the source signals are mutually statistically independent and at most one source is Gaussian distributed.Based on this assumption, some separation criteria are employed, such as non-Gaussianity maximization [7] and negentropy maximization [6].However, in many real applications such as wireless communications, the instantaneous mixing model is invalid.In this case, the sources undergo propagation time delay and reverberation which results in convolutive mixing.In the recent decades, various approaches have been proposed to deal with this case and they basically fall into two categories: time domain [8,9] and frequency domain [10][11][12] algorithms.Time domain methods are mainly inspired by the existing blind deconvolution methods and the solutions usually require intensive computations due to the relationship of filter coefficients with each other.The computation load can be overcome by the frequency domain methods.By applying Fourier transform to the time domain convolutive observed signals, it can be converted to multiple independent linear mixing in the frequency domain.Subsequently, the separation methods for instantaneous mixing can be performed on every single frequency bin.However, there is a common problem for instantaneous BSS methods that the estimated signals are disordered.Therefore, further postprocessing is necessary to correct the permutation disorder at each frequency bin and then the separated signals in the time domain are reconstructed properly.Extensive works [13,14] have been performed to solve the permutation problem, but most of these permutation correction methods do not perform consistently well.
Independent vector analysis (IVA), an extension of ICA, can solve the frequency domain BSS problem efficiently and normally requires no bin-wise permutation correction post processing [10][11][12].IVA algorithms extend the univariate function of ICA algorithms to multivariate function as the score function.In this way, the separation for different frequency bin data is no longer separate but joint, and the permutation problem is mitigated by exploiting the dependences of frequency bins.Various algorithms employ different multivariate prior distributions to preserve the interfrequency dependences for individual sources and corresponding nonlinear score functions are derived.However, most of these algorithms impose orthogonality constraint on the rows of each separation matrix which may undermine the separation performance.In addition, the most popular algorithms to address this issue are based on gradient descent method because of its simplicity [14,15], but they suffer from slow convergence.Several Newton-based algorithms are proposed to speed up the convergence [12,16].However, performance of these algorithms, as shown in this paper, may be unsatisfactory in certain scenarios.
In this paper, we propose a nonorthogonal IVA algorithm based on decoupled relative Newton method.This proposed algorithm updates the separation matrices row by row, and, unlike deflation separation algorithm, there is no separation error accumulation arising.For real data, it has been shown that the decoupled IVA algorithm may converge faster than the vector gradient decent and Newton ones [17].This paper is an extended version of the literature [17] to the complex domain.In order to avoid the occurrence of singular solutions, some approximate solutions are taken in the process of algorithm derivation, and the improved algorithm is applied to solve the problem of convolutive blind separation.Simulation results are demonstrated to confirm its superior performance.
The rest of this paper is organized as follows.The problem of IVA for frequency domain blind source separation is presented in Section 2. The proposed algorithm is introduced in Section 3. Finally, the performance of the proposed algorithm is evaluated in Section 4 by means of some simulations, and our conclusions are stated in Section 5.

Frequency Domain IVA
Let there be  sensors and  independent sources; the convolutedly mixture at the ℎ sensor is where s () is ℎ source in the time domain,   () is the impulse response of the channel linking ℎ source and ℎ sensor, and  is the channel order.The signals are transformed to the frequency domain using the short time Fourier transform (STFT).A sliding window is used to perform discrete Fourier transform; then a time-frequency representation of the observation signal is created.Using STFT, the sensor observation vector at time block  and frequency bin  becomes where , ]  (superscript  denotes transpose operator) is a source vector and A [] ∈ C × is the unknown nonsingular mixing matrix.The ℎ source component vector (SCV) can be written as s   , = [ [1]  , ,  [2]  , , ⋅ ⋅ ⋅ ,  []  , ], which is independent of all the other SCVs.Then, the probability density function of the concatenated source vector can be written as (s 1, , s 2, , ⋅ ⋅ ⋅ , s , ) = ∏  =1 (s , ).The IVA solution finds  demixing matrices and the corresponding source signal estimations for each frequency bin, denoted as W [] and y []    = W [] x []    for the ℎ frequency bin, respectively.ℎ estimated source of the ℎ frequency bin at time block  is  []  , = (w []  )  x []   , where (w []  )  is ℎ row vector of W [] .

Proposed Algorithm
3.1.Decoupled Relative Leaning.In the decoupled relative learning, the demixing matrix W [] is updated through repeated left multiplications with matrices in form [5] W []   = (I + uk  ) W [] until convergence, where I is the  ×  identity matrix and u and k are two  × 1 vectors to be optimized.Noting that the eigenvalues of I+uk  are 1 with multiplicity −1 and 1+k  u, we can find that (3) is a rank-1 update as long as 1 + k  u ̸ = 0. Thus a nonsingular matrix can be transformed into another arbitrary nonsingular matrix within  steps in (3).
It is obvious that the decoupled relative learning transforms the matrix optimization problem into a series of vector optimization problems; thus we can design the algorithm more flexibly compared with the relative (natural) gradient algorithm.We only consider one special kind of relative rank-1 update, i.e., letting u = e  , where e  is a unit vector whose ℎ element is 1.Then, by letting u = e 1 , e 2 , ⋅ ⋅ ⋅ , e  , we can update W [] row by row via optimizing k.In this way, as shown in the next subsection, the calculation of the Newtonbased learning algorithm is simplified.

Proposed Algorithm.
In this paper, we use the score function [18] where [y , ] = [−log (y , )] is the entropy of y , and  1 is constant which is independent of demixing matrices W [] Mathematical Problems in Engineering 3 Without loss of generality, according to (3), the optimization of the ℎ row of W [] can be written as where Δk is a small perturbation of k at 0. Correspondingly, the change of the ℎ estimated source within ℎ frequency bin is Noting that det(W []  ) = (1 + Δk  e  )det(W []  ), the change of cost can be written as Δ =  { (y , + Δ []  , e  ) −  (y , )} − log      1 + Δk  e Then, using second-order Taylor series expansion to approximate (y , + Δ []  , e  ) and log|1 + Δk  e  |, the change of cost Δ can be written as where , ( []   , ) * y []   (y []  )  }, and ,  []   , y []   y []   } + e  e   are the first-and second-order derivatives of (y , ) with respect to  []  , , respectively.
To design a Newton algorithm, the quadratic form in ( 9) is required to be positive definite.However, the quadratic form may not be positive definite when  []  , does not converge to the source.Therefore, we make an approximation of the quadratic form to ensure the positive definiteness.When  []  , converges to the source, D 0 and D 1 reduce to diagonal matrices respectively.Approximate D 0 and D 1 as ,  []   , where diag{d} generates a diagonal matrix with diagonal element vector d and ⊙ denotes element-wise multiplication of two vectors.Then, the Hessian matrix for the ℎ element of Δk is [ where  0, and  1, are the ℎ diagonal elements of D0 and D1 , respectively.To ensure the positive definiteness of this Hessian matrix, a modification is made: where  is a small positive number.The Newton update rule for the ℎ element of Δk is derived as From the derivation process of the proposed algorithm, we can see that there is no orthogonality constraint on the demixing matrices and no separation error accumulation arises.

Simulation and Results
In the first simulation, we consider the separation of two convoluted speech signals generated by (1), and set the channel order  = 3.At each mixed signal, 1024-point FFT is done to time blocks.Figure 1 shows the separation result of two convolutedly mixed speech signals using the proposed algorithm.It indicates that the proposed algorithm is able to separate convolutive mixing successfully in frequency domain.
In order to numerically compare the separation performance of the proposed decoupled IVA algorithm with other IVA algorithms, multiple datasets separation for complex sources are simulated.The complex mixed signal is generated by (2).In addition, the dependence elements of SCV are generated as where M , is a real-valued  ×  matrix and z ,− is a zero-mean vector whose entities are subject to uniform distribution.In the following simulations,  = 3.
In this section, we compare the proposed decoupled IVA algorithm with other two algorithms, vector gradient, and Newton updates for IVA with multivariate Gaussian model [16], respectively.Performance is assessed using interference to source ratio (ISR) where  []  , is the (, ) element of C [] = W [] A [] ; here we assume that no permutation exists.The smaller the value of , the better the separation performance, specially,  = 0 implies ideal separation performance.In all the simulations, the results of 100 independent realizations are averaged.
Figure 2 shows the convergence behavior of the ISR.We observe that, among the three algorithms, the proposed decoupled relative Newton algorithm converges in the fewest iterations, and the vector gradient algorithm performs poorly compared with the other two algorithms.In addition, the decoupled relative Newton algorithm has the lowest ISR values when the algorithm converges, which implies that the decoupled relative Newton algorithm has the best separation performance.
Figure 3 depicts the separation performance versus the datasets number.We can observe that the separation performance of the proposed algorithm is the best among the three algorithms, and the ISR value of the proposed decoupled algorithm is lower than the other comparison algorithms around 15-20dB.Furthermore, the ISR of the three algorithms increases with the number of datasets.In addition, we also     consider the influence of the number of sources.It shows that the number of sources has little effect on the separation results.
Figure 4 shows the ISR of the estimated sources as a function of the number of samples per dataset, where we can find that the ISR of the proposed algorithm is the lowest among the three algorithms and the ISR of the three algorithms increases as the number of datasets increases, with the largest rate of decrease occurring for the sample length smaller than 1000.When the sample length is smaller than 1000, vector gradient and Newton optimization algorithms have a similar separation performance, while, with the increase of sample length, the influence of the number of datasets becomes more apparent.

Conclusion
In this paper, we propose a new IVA algorithm for separating convolutedly mixed signals in frequency domain which is based on decoupled relative Newton approach.This algorithm decomposes the matrix optimization problem into a series of row vector optimization problems, and the separation matrices do not need to be orthogonal.Simulation results show that the proposed algorithm converges fast and its separation performance is superior to that of other two algorithms based on vector gradient and Newton methods, respectively.

Figure 1 :
Figure 1: Separation of convoluted speech signals using the proposed algorithm.

Figure 2 :
Figure2: Mean ISR of 100 trails versus number of iterations for K=4, N=4, T=3000 using the vector gradient, Newton and proposed decoupled relative Newton optimization algorithms.

Figure 3 :Figure 4 :
Figure3: Mean ISR of 100 trails versus the number of datasets K for N=4, 10, T=3000 using the vector gradient, Newton, and proposed decoupled relative Newton optimization algorithms.