Variable Step-Size Method Based on a Reference Separation System for Source Separation

Traditional variable step-size methods are effective to solve the problem of choosing step-size in adaptive blind source separation process. But the initial setting of learning rate is vital, and the convergence speed is still low. This paper proposes a novel variable step-size method based on reference separation system for online blind source separation. The correlation between the estimated source signals and original source signals increases along with iteration. Therefore, we introduce a reference separation system to approximately estimate the correlation in terms of mean square error (MSE), which is utilized to update the step-size. The use of “minibatches” for the computation of MSE can reduce the complexity of the algorithm to some extent. Moreover, simulations demonstrate that the proposed method exhibits superior convergence and better steady-state performance over the fixed step-size method in the noise-free case, while converging faster than classical variable step-sizemethods in both stationary and nonstationary environments.


Introduction
Blind source separation (BSS) aims at extracting the latent unknown source signals from their observed mixtures by an array of sensors without a priori knowledge of the original source signals and the mixing coefficients.In the separating process, nothing can be used except for the observation sequences and the statistical characteristic assumptions of the sources.This makes BSS become a versatile tool used in many multisensor systems such as antenna arrays in acoustics or electromagnetism, chemical sensor arrays, and electrode arrays in electroencephalography [1].
Several optimization algorithms have been proposed for BSS [2] and can be generally categorized into batch-based algorithms and adaptive (sequential) algorithms.Batchbased algorithms are block-wise and will not work until a block of data samples is received, such as the fast fixed-point algorithm [3].In this paper, we consider the latter, which have particular practical advantages due to their computational simplicity and latent ability in tracing a nonstationary environment [4].
However, the traditional adaptive BSS algorithms such as equivariant adaptive separation via independence algorithm (EASI) [5] and natural gradient algorithm (NGA) [6] usually assume that the step-size is a small positive constant, leading to an inevitable conflict between the learning rate and stability performance, that is, slow convergence speed or large steady state error.A simple way to solve the conflict is reducing the learning rate as the iteration goes on [7,8], but it brings about another new problem: if the learning rate decreases to be too small before source components are extracted, the separation system will fail to separate sources properly.To improve the learning rate and stability performance, variable step-size algorithms have been proposed.The variable stepsize algorithms can exploit the online measurements of the state of the separation system from the outputs and the parameter updates.In [4,9,10], variable step-size algorithms have been derived according to the gradient of different contrasts, that is, NGA, EASI, and S-NGA algorithms.Zhang et al. put forward a grading learning algorithm based on the measurements of correlation of the separating signals, whose learning rate is updated by the state of separating [11].Hsieh et al. proposed an effective learning rate adjustment method based on an improved particle swarm optimizer [12].But the separating performance of these variable step-size algorithms is usually sensitive to the initial parameter settings.As a result, the convergence is still slow and improper initial value of learning rate results in large steady state error or even divergence.Ou et al. proposed a variable step-size algorithm based on an auxiliary separation system [13].The step-size is updated by estimating a pseudo-performance index in the light of the index descending in an exponential form.Compared to classical variable step-size methods, the separation performance of Ou's method is less sensitive to the initial settings.
In order to improve the initial convergence and stability performance, we consider using a reference separation system based on MSE of the instantaneous outputs to update the step-size.This technique is shown to improve the convergence speed and the steady-state performance.Moreover, the use of "minibatches" can reduce the whole computational load of the algorithm.The remainder of this paper is organized as follows.In Section 2, the principle of adaptive source separation methods is briefly summarized.Our algorithm is proposed in Section 3. Numerical stimulation results and discussion are provided in Section 4. At the end of the paper, a concise conclusion is given.What is more, this paper can be regarded as an important complement for Ou's method in [13].

Adaptive Algorithms for BSS
In the noise-free instantaneous case, we assume that  unknown statistically independent zero mean source signals, with at most one having a Gaussian distribution, contained within s() = [ 1 (), . . .,   ()] T pass through an unknown mixing system A ∈ R × ( ≥ ); therefore  mixed signals x() = [ 1 (), . . .,   ()] T can be modeled as where  is the time index and T is the vector transpose operator.To simplify the problem, we further assume that the number of sources matches the number of mixtures, that is,  = , an exactly determined problem.
The blind separation problem is then to recover original source signals s() from observations x(), which is equivalent to estimate an separating matrix W ∈ R × that performs the inverse operation of the mixing process, as subsequently used in separation model.Figure 1 shows a block diagram of adaptive BSS model.Then the output signal vector is obtained: where y() = [ 1 (), . . .,   ()] T is an estimate of s() to within the well-known permutation and scaling ambiguities.
Based on classical contrast such as mutual information contrast, maximum likelihood contrast, and informax principle, many adaptive algorithms have been proposed to estimate s().Amari proved the NGA algorithm is the fastest least-mean-square (LMS) type BSS algorithm [6].The natural gradient BSS algorithm based on the mutual information contrast, maximum likelihood contrast, and informax principle have the same form: where I is identity matrix, and (⋅) = [ 1 (⋅), . . .,   (⋅)] T ,   (⋅),  = 1, 2, . . .,  are increasing odd functions, usually called activation functions.
Based on the fact that the separating matrix W can be factorized into the product of an orthogonal matrix and the prewhitening matrix, via combining LMS-type updating formulas of these two matrixes above, using some reasonable approximation, the EASI algorithm is derived [5]: It has been shown that, as compared with using a fixed step-size, the algorithm with a variable step-size has an improved convergence rate.Yuan et al. derived a gradient variable step-size algorithm for the NGA algorithm [10], which adapts the step-size in the form of where  is a small constant, and () is an instantaneous estimate of the cost function from which the NGA algorithm is derived.
What should be noticed is that the activation functions   (⋅) (i.e., the step-size update functions) can be identical, when sources are all sub-Gaussian or super-Gaussian signals.
The distinct distributions of signals determine the different activation functions; that is, the separation of all sub-Gaussian sources usually utilizes the cubic function while the proper choice for super-Gaussian sources separation is hyperbolic tangent function.

The Proposed Algorithm
As is known to us, in the process of adaptive BSS the estimated signals y() will approximate to source signals s() as iteration goes on if the permutation and scaling ambiguities of the estimated signals y() can be eliminated [14].The correlation between the estimated signals   () and source signals   () can be evaluated by mean-square-error, which is defined as where  is the sample size, and   () as well as   () is normalized before the evaluation of mse  .When the separation system is steady, the mean-square-error matrix MSE, whose (, ) element is mse  , has one, and only one, zero entry in each row and column.
If we can calculate the matrix MSE at each update of the separating matrix W, a rule for variable step-size algorithm is to adjust  adaptively in terms of MSE.However, since the source signals s() are unknown, the matrix MSE at each update is not accessible in practice.
In this section, we propose to estimate MSE approximately by combining a reference separation system W  (), which follows the same optimization criteria and updating principle as W() based on natural gradient algorithm (NGA), except for the initialization.Hence, we obtain that which represents the reference signal.The correlation between y() from the primary separation system and y  () from the reference system should increase as iteration goes on regardless of the ambiguities.Therefore, at every iterative, we replace mean-square-error in ( 6) by where where norm(⋅) denotes the root-mean-square value of the output vectors, and the operator | ⋅ | takes the absolute value of the normalized vector.In this way, scaling ambiguity can be removed.Online procedures use a given sample every time [6], whereas to appropriately evaluate mean-square-error one time requires some samples just as (9) indicates.Therefore, we consider updating the separating matrix once over a "minibatch, " that is, a small block of signal samples, while the observation window slides [15,16].Hence, the online updating equation of the separating matrix becomes where  is the iteration number index (or the minibatch index), and the step-size parameter is updated by a nonlinear function in the form of  which is a widely used rule in adaptive filtering algorithms [17].The primary separation system follows the same updating rule above, where the parameters  and  are two positive constants, which control the shape of the function curve and initial step-size, respectively.The effects of these two parameters on performance of the algorithm will be investigated in the next section.We define the correlation function as where mse  () denotes mean-square-error of th minibatch,  represents a weighting factor 0 <  < 1, and  is the number of weighting functions.Thus we introduce exponential weighting into past data, which is proper especially when the channel characteristics are time-variant [18,19].The separation procedure using (( 9)-( 13)) represents the proposed variable step-size algorithm.Figure 2 shows the scheme of our proposed algorithm.

Simulation Results and Discussion
Here, several sets of simulation results are provided to demonstrate the performance of the proposed algorithm.Generally speaking, comparisons among fixed step-size algorithms, classical variable step-size algorithms and the proposed algorithm in both stationary and nonstationary environments have been carried out.In this experiment, we consider the separation of three zero mean sub-Gaussian sources in stationary environment: where () is a random source signal distributed uniformly in [−0.5, 0.5].The mixing matrix A is randomly generated subject to the normal distribution with mean 0 and standard deviation 1, and three receivers are used ( =  = 3).The sampling period is set to 0.0001 s.
To evaluate the performance of the BSS algorithms, we use the cross-talking error as the performance index [5,[20][21][22]: where the  ×  matrix G = WA = {  } is the combined mixing-separating matrix.As W converges to PDA −1 , the combined mixing-separating matrix G will converge to PD, a generalized permutation matrix, and PI will converge to zero.
In the algorithms, activation function () =  3 is applied.The step-size  = 0.001 and 0.01 is taken in natural gradient algorithm [6] and optimized EASI algorithm [20], respectively.The parameters of the proposed algorithm are set to  = 5,  = 0.9.Considering the balance between tracking performance and evaluation accuracy of the mean-squareerror matrix MSE, the sample size of "minibatches"  = 5 in all the experiments.The effects of crucial parameters  and  on the performance of proposed algorithm are investigated in Figure 3.What should be noticed is that larger  and , respectively, lead to faster initial learning rate and better convergence performance, so the results provide reference for choosing appropriate parameters  and .Hence, we set the parameters  and  to be 0.06 and 10 4 , respectively.Besides, if sources include both sub-Gaussian and super-Gaussian signals, the activation functions should not be the same increasing odd function.The activation functions might be initialized by polynomials or kernel functions with some adjustable parameters, so that the optimal activation functions vector (⋅) = [ 1 (⋅), . . .,   (⋅)] T can be estimated adaptively along with the iteration [23].However, further investigation on the activation functions might go beyond this paper.Figure 4 plots the average PI value obtained from the simulations of three adaptive algorithms for 500 Monte Carlo trials.From the plots, we can see that the proposed algorithm provides the fastest convergence speed, while achieving lower steady state error than both the NGA and optimized EASI approaches.The step-size , evolution of which is demonstrated in Figure 5, decreases generally in the exponential type as iterations.We observe that step-size maintains a constant 0.06 during about 100 iterations of the beginning.This is attributed to the choosing of parameter , which can benefit for high initial learning rate yet sensitive detection in the state of separating.As result, the separating performance is more robust to the setting of initial learning rate.

Experiment 2. Comparison between the proposed algorithm and variable step-size algorithms.
In this experiment, we firstly define the function Q() = (y())y T () in (11).In order to allow fair comparison, the same function in [10] is used for the proposed algorithms; that is, where diag[⋅] and off[⋅] denote the operation of taking the diagonal elements and off-diagonal elements of a matrix, respectively, and two zero mean sub-Gaussian sources s() = [sin(2200), ()] are mixed by a 2×2 mixing matrix A 0 ; that is, Zero-mean independent white Gaussian noise is added to the mixture with the signal-to-noise ratio being equal to 20 dB.The parameters including the initial step-size in LMStype algorithms are manually tuned so each algorithm has nearly the same steady state performance.The initial value of  for classical variable step-size algorithms, that is, VS-NGA and VS-S-NGA in [10], is set to 0.004,  = 10 −5 , and 500 Monte Carlo trials are run for averaged performance.The parameters of Ou's method in [13] are set to  = 0.08,  = 10.For the proposed algorithm, the parameters are, respectively,  = 0.08,  = 10 3 ,  = 5, and  = 0.9.The parametric settings imply that the proposed algorithm can lead to a higher learning rate while maintaining appropriate steady state performance.The average values of PI resulting from three approaches are compared in Figure 6.The proposed algorithm only requires approximately 500 samples for convergence; however, the other three algorithms need 600 samples at least.Clearly, the performance of the proposed algorithm is considerably improved over the classical variable step-size algorithms in noisy case.Figure 7 plots the average PI value of three approaches in a nonstationary environment.The mixing matrix to simulate the time-varying environments is chosen as where Ξ = Ξ + randn(), randn(⋅) is MATLAB built-in function [24], and the initial Ξ is set to a null matrix.Here,  = 0.9 and  = 0.001.The initial parameter for classical variable step-size algorithms is the same as the experiment in noisy case.The parameters of the proposed algorithm are reset to  = 0.03,  = 10 3 ,  = 5,  = 0.9.Likewise, results are obtained over 500 Monte Carlo runs.From this figure, it is observed that the proposed algorithm converges faster than VS-NGA and VS-S-NGA algorithms in the nonstationary environment.
Finally, we checked the computational time and separation performance for different separation methods in noisy case.The Fast-ICA algorithm, a classical batch-based method, is also utilized for comparison.The sources, mixing matrix, and initial parameters are set as Experiment 2. The data length is set to 10000 samples, which is for achieving the convergence.The iteration number of Fast-ICA was set to 100.The results are provided in Table 1.It can be seen that Fast-ICA generally has better separation performance under high SNR (signal to noise ratios).However, it costs large computational time since it is a kind of batch-based algorithm, and it will not work until a large number of data samples are received.In contrast, though the proposed algorithm performs slightly worse than Fast-ICA, it behaves better when the noise power is increased (SNR = 0 dB, 5 dB, 10 dB).As a novel adaptive online algorithm, the proposed algorithm has particular advantage due to its computational simplicity and latent ability in tracing the noise and nonstationary environments.It also shows that the proposed algorithm performs even better than optimized EASI, NGA, and VS-NGA in terms of average PI.Similar separation performance with lower computational load is also obtained compared to VS-S-NGA and Ou's method.This demonstrates the complexity analysis in Section 3; that is, the utilization of "minibatches" would probably reduce the computational cost.

Conclusion
In this paper, we propose a new variable step-size algorithm for blind source separation.Reference separation system is utilized to acquire the mean-square-error matrix which is treated as the metric to update the step-size.As for performance comparison, fixed step-size algorithms, classical variable step-size algorithms, and the proposed algorithm have been carried out in both stationary and nonstationary environments.The performance of the abovementioned approaches is analyzed and compared in terms of crosstalking error.It is revealed that the proposed scheme has improved learning rate and stability performance over the fixed step-size algorithms and converges faster than classical variable step-size algorithms.

Figure 2 :
Figure 2: Scheme of the proposed algorithm.

Experiment 1 .
Comparison between the proposed algorithm and fixed step-size algorithms.

Figure 3 :
Figure 3: Effects of parameters  and  on the performance of proposed algorithm.

Figure 4 :
Figure 4: Average performance indexes over 500 independent runs of the three algorithms in noise-free case.

Figure 5 :
Figure 5: Evolution of the step-size  by the proposed algorithm.

Figure 6 :
Figure 6: Average performance indexes over 500 independent runs of the three variable step-size algorithms in noisy case.

Table 1 :
Average PI and execution time for various methods under different level of noise.