Fast Extraction for Skewed Source Signals Using Conditional Expectation

The extraction of the stochastic source signals whose probability density functions (PDFs) are skewed is very important in many applications such as biomedical signal processing and mechanical fault diagnosis. This paper shows that the skewed source signal with the maximal absolute value of skewness can be fast extracted by a proposed algorithm using conditional expectation. Compared with the existing conditional expectation-based algorithms, the proposed one possesses two main advantages. One is that it does not require the prior knowledge of the positive support of the desired source, namely the time indices where the source of interest is positive. The other is that it can be employed both in the determined and underdetermined cases. Furthermore, the proposed algorithm is mainly based on the firstand second-order statistics and does not need the preprocessing so that the computational cost is significantly low. Simulation results show the superiority of the proposed algorithm over the existing methods and indicate that the proposed algorithm also performs well in the underdetermined case when the number of sensors is slightly less than that of sources.


Introduction
The target of independent source extraction is to estimate a specific source from the observations mixed by the source signals, where the source signals are mutually independent.It can be applied in various areas such as speech and image processing, biomedical signal processing, mechanical fault diagnosis, and wireless communication.Due to these wide application fields, independent source extraction has gained much attention in the past few decades [1,2].In some applications especially biomedical signal processing and mechanical fault diagnosis, it is of significance to extract the stochastic source signals with the skewed probability density functions (PDFs).For instance, the mechanical vibrations derived from defective bearings, which are desired to be extracted in vibration analysis, may have asymmetric PDF [3], and the fetal electrocardiogram (FECG) signal has quite different skewness compared to the maternal electrocardiogram (MECG) signals [4], where the FECG requires to be estimated from the mixtures of the FECG and the MECG.The existing extraction algorithms mainly employ the second-and higher-order statistics by exploiting the statistical independence of the sources, and some of them need to preprocess the mixture data such as FastICA [5].Recently, Zarzoso et al. [6] and Xu et al. [7] and Xu and Shen [8] proposed a class of more computationally efficient algorithms for independent source extraction based on the first-order statistics as well as the conditional expectation.But before performing this class of algorithm, it requires to know the time indices where the source of interest is positive, that is, the positive support of the desired source, which reduces the practicability of this class of algorithm.
In this paper, we propose a new extraction algorithm using the conditional expectation for the skewed source signal with the maximal absolute value of skewness.It does not require the prior knowledge of the positive support of the desired source and can be applied both in the determined and underdetermined cases.It should be noted that after obtaining the source signal with the maximal absolute value of skewness, the source signal with the second maximal absolute value of skewness can also be extracted by the proposed algorithm from the mixtures which subtract the component of the estimated source (e.g., through linear regression as in [6]).Likewise, if required, the other skewed source signals with different skewness can be gained sequentially.For simplicity, this paper assumes that the skewed source signal with the maximal absolute value of skewness is the desired source.The proposed algorithm obtains the desired column vector of the mixing matrix corresponding to the source of interest by the conditional expectation and retrieves the desired source by the minimum mean-squared error-based (MMSE) beamforming approach [9].Through several iterations with the initial value of the desired column vector of the mixing matrix gotten by the estimated approximately purely positive or negative interval, the estimation of the desired source can be derived accurately.The proposed algorithm is rather costeffective, since it is mainly based on the firstand secondorder statistics and does not require the preprocessing.Simulation results validate the superiority of the proposed algorithm over the existing methods and show that the proposed algorithm even performs well in the underdetermined case when the number of sensors is close to that of sources.

Data Model
Consider the instantaneous linear mixture shown by where s t = s 1 t , … , s N t T is composed of N source signals which are mutually independent, x t = x 1 t , … , x M t T consists of M mixtures received by sensors, A is the unknown M × N mixing matrix, and the superscript T represents the transpose operator.In this paper, we consider that the source signals are stochastic with the unimodal continuous PDFs, since many signals such as some vibrational signals [10] and ECG signals [11] possess these characteristics.For convenience, we further assume that the source signals are stationary with zero mean and unit variance.Note that the assumption of stationary source signals is reasonable, since some nonstationary sources can also be divided to be several stationary blocks and we can tackle these stationary blocks separately.For example, some ECG signals can be regarded to be stationary in the duration of one heartbeat, despite they are nonstationary signals [12].Our goal is to extract the skewed source signal which has the maximal absolute value of skewness from the M accessible mixtures.

The Proposed Algorithm
Since the first-order statistics-based algorithms [6][7][8] generally extract the source of interest under the condition that the mixing matrix is a unitary matrix, they can be only used in the determined case.In this paper, we remove this condition and obtain the column vector of the mixing matrix corresponding to the desired source based on the conditional expectation shown by where E • denotes the expectation operator, s d t is the desired source, a d is the dth column vector of the mixing matrix, α = E s d t | s d t >0 , and β = E s d t | s d t <0 .Obviously, α and β are the constants according to the stationary assumption of the sources.The proof of (2) can be easily deduced from the assumptions of the sources, that is, where e d is the unit vector in which the dth entry is 1 and the other entries are 0. The proof of the other equation when s d t < 0 in ( 2) is similar with (3).Then, we estimate the desired source by the MMSE beamforming approach, which is where s d t is the estimation of s d t , C x = E x T t x t is the covariance matrix of the mixtures, and the superscript −1 stands for the inversion operator.Thus, when α and β and the positive or negative support of the desired source are provided, the desired source can be estimated by ( 2) and (4).Actually, α and β can be easily figured out when the PDF of the desired source is known.The values of α for some normalized distributions are shown in [6].However, when the PDF of the desired source is unknown, α and β are not obtainable.In this case, we can only get the direction of the vector a d by (2) which is the same with the direction of . Since the estimated a d has the correct direction and unknown size, it will lead to the ambiguous amplitude of the estimated s d t by (4).Fortunately, this indeterminacy of amplitude for estimating the desired source is allowable in many applications.For simplicity, we set α = β = 1, and then (2) denotes the direction of a d .Unless stated otherwise, in the rest of this section, estimating a d refers to estimate the direction of the vector a d .In practical applications, the complete information about the positive or negative support of the desired source is extremely hard to be acquired.However, it is more possible to get a subset of the samples of the desired source in which the positive samples are more than the negative samples obviously, or it is the opposite.We can see that this subset is close to a purely positive or negative set.We define the correct index classification ratio r = N 1 /N as in [6], where N 1 is the number of the positive samples in the subset and N is the total number of the samples in the subset.It was suggested in [6] that when r is close to 0 or 1, the subset can be also used to estimate the desired source and the estimation performance is only slightly worse than that of employing the complete information about the positive or negative support of the desired source.Similarly, if we get the subset like this, we 2 Journal of Sensors can use the information about this subset to roughly estimate a d .Fortunately, for a signal with unimodal continuous skewed distribution, we can utilize the asymmetry of its PDF to get the subset which is close to be purely positive or negative.
Figure 1 shows the PDF of unimodal continuous skewed distribution, where ξ denotes the skewness defined by ξ = E y − μ 3 /σ 3 in which μ and σ are the mean and standard deviation of the random variable y, respectively.ξ > 0, ξ < 0, and ξ = 0 in Figure 1 represent the positively skewed distribution, the negatively skewed distribution, and the symmetric distribution, respectively.Since we consider the case when y is zero mean and unit variance, the definition of skewness is reduced to be ξ = E y 3 .When y is subject to a positively skewed distribution (ξ > 0), we take a finite set of samples, Θ, generated by the distribution of y into account.As the skewness reflects the asymmetry of a PDF, a larger absolute value of skewness means stronger asymmetry of a PDF.Thus, it is easily deduced that most samples in Θ are smaller than zero, and the proportion of the negative samples increases when ξ rises.When randomly extracting some samples from Θ to form a subset, we can find that the subset may get close to a purely negative set.And with the increase of ξ, it will be more likely that the samples in this subset are completely negative.Likewise, in the case with ξ < 0, we can obtain the similar results.These provide the possibility to extract the source with the maximal absolute value of skewness based on the conditional expectation.Nevertheless, the random extraction is unstable, because this method is probabilistic.Instead, we separate the whole samples into several equivalent intervals and test all the intervals to find out the one closest to be purely positive or negative.Note that the size of each interval should be appropriate.If the size of each interval is too large, there will be high probability to contain the positive and negative values simultaneously.On the contrary, too small size of each interval cannot present the skewness characteristic of the skewed signal, since exhibiting the skewness requires a certain quantity of samples.The proper size should be adjusted depending on the nature of the signal, mainly its PDF.In the following section, we show that this proper size and the number of the intervals can be gotten by the simulations.
According to the analysis above, we divide the mixtures into K equivalent intervals, and the size of each interval is L = f loor T/K , where T is the number of samples and f loor ⋅ represents the floor function.Meanwhile, the inaccessible desired source is divided into K intervals accordingly.Assume that the ith interval of the desired source has c i positive samples.Then, in the ith interval, the correct index classification ratio r i is c i /L.The objective is to find the optimal interval in which the samples of the desired source are closest to be purely positive or negative.Thus, a d can be roughly estimated by (2) through employing the information of this optimal interval.Assume that this optimal described by b interval is the kth interval which can be mathematically expressed as Then, we propose a new method to find the index k.We suppose that the qth interval of the desired source is purely positive or negative, namely, r q = 0 or 1.According to (2), a d is estimated by where Ω q is a set constituted by the indices in the qth interval.Nevertheless, r q is almost impossible to be 0 or 1, which leads to the inaccurate result by (6).We further rewrite the right side of ( 6) as According to (3), if ∑ t∈Ω q s t /L is the integer multiple of e d , âd q will be the accurate estimation of a d .Since the desired source has the maximal absolute value of skewness, the kth interval of the desired source is closer to a pure positive or negative region and has much greater absolute value of the sum of the samples than the other sources' kth interval, namely, Then, we have ∑ t∈Ωk s t /L ≈ ae d , where a is an integer.Therefore, by performing (6) with the indices in the kth interval, we can obtain the approximate estimation âd̂k of 3 Journal of Sensors a d .Obviously, the estimation accuracy of âd̂k is better than that of âd p p ∈ 1, … , K , p ≠ k which is the estimation of a d by employing the other intervals.Using (4), we get the estimations of the desired source written as s dk t and s d p t by applying âd̂k and âd p , respectively.It is obvious that s dk t has higher estimation precision than s d p t , so the absolute value of the skewness of s dk t , ξ k is larger than the absolute value of the skewness of s d p t , ξ p .Given all that, the index k of objective interval is the index corresponding to the maximum of ξ i i=1,…,K .
By means of the above method, a d can be roughly estimated as âd̂k by the information of the kth interval.In order to improve the estimate accuracy, we propose an iteration method which is implemented by iterating ( 4) and ( 2) with the initial value âd̂k until convergence.The outline of the proposed algorithm is shown below.
(1) Calculate C x and its inverse.
(2) Divide the mixtures into K intervals, compute âd i by (6), and then obtain s d i t by ( 4) and the corresponding ξ i .
(3) Select the index k corresponding to the maximum of ξ i i=1,…,K and get the initial value âd̂k by ( 6).
(4) Achieve s d t by (4) and acquire the positive or negative support of s d t .
(5) Obtain the estimation of a d by (2) through employing the knowledge about the positive or negative support in the step 4.
(6) Iterate the step 4 and the step 5 until convergence.

Simulation Results
We We set the number of samples to 600 and rate to 300.We aim at extracting the skewed source signal with the maximal absolute value of skewness from the M mixtures.
FastICA [5] and the algorithm in [6] will be compared with the proposed algorithm.
We denote the algorithm in [6] as the first-order algorithm (FOA for short).
Figure 2 illustrates the average interference-to-signal ratio (ISR) (defined in [6]) versus K obtained by the proposed algorithm over 200 Monte Carlo runs when M = N, M = 19, M = 18, and M = 15.It can be observed from Figure 2 that the proposed algorithm has the low extraction performance when K is selected to be too small or large.Since each interval should not be too short or long according to the discussion above, K needs to be set properly.Based on Figure 2, we choose K in (8, 33) empirically and set K = 15 in the following experiments.In Figure 3, we show the average loci of the ISR versus the iteration number obtained by FastICA, FOA with r = 0 6 and r = 0 8 when M = N, and the proposed algorithm when M = N, M = 19, M = 18, and M = 15 over 200 Monte Carlo runs, where r is the correct index classification ratio defined in [6].As depicted in Figure 3, the proposed algorithm can extract the desired source successfully after several iterations both in the determined and underdetermined cases.When M = N, the proposed algorithm has better extraction performance and faster convergence rate than FastICA.Moreover, the proposed algorithm possesses a faster convergence rate than FOA with r = 0 6 when M = N and the proposed algorithm and FOA with r = 0 6 and r = 0 8 when M = N own the same performance of the steady state.Although the performance of the convergence rate of FOA with r = 0 8 is better than that of the proposed algorithm, FOA is hard to get a priori knowledge satisfying r = 0 8.It can also be seen from Figures 2  and 3 that the extraction performance of the proposed algorithm deteriorates with the decrease of the number of sensors, M.However, the proposed algorithm still performs well when M is slightly less than N in the underdetermined case.  1.We can see from Table 1 that the proposed algorithm costs less in computations than FastICA and the proposed algorithm with less M has slightly lower computational cost.It can also be observed from Table 1 that FOA with r = 0 6 possesses lower computational cost than the proposed algorithm.This is because FOA is only based on the first-order statistics.However, FOA requires some prior knowledge of the desired source which is difficult to get and cannot be applied into the underdetermined case.So, the proposed algorithm is more practical than FOA.

Conclusion
In this paper, we proposed a cost-effective algorithm based on the conditional expectation for the extraction of the skewed source signal with the maximal absolute value of skewness.Simulation results testify the superiority of the proposed algorithm and validate that it performs well even in the underdetermined case when the sensor number is close to the source number.Future research of this work should extend to study more complicated and more practical mixture modes such as the convolutive mixture and the nonlinear mixture.

Figure 1 :
Figure 1: The PDF of unimodal continuous skewed distribution, where ξ denotes the skewness and ξ > 0, ξ < 0, and ξ = 0 indicate the positively skewed distribution, the negatively skewed distribution, and the symmetric distribution, respectively.

Figure 2 :
Figure 2: The average ISR versus K obtained by the proposed algorithm over 200 Monte Carlo runs when M = N, M = 19, M = 18, and M = 15.
consider five skewed source signals and fifteen symmetric distributed source signals (N = 20).The five skewed source signals are generated as follows.Firstly, we generate five skewed signals based on the noncentral t-distributions