Mixture of Generalized Gamma Density-Based Score Function for Fastica

,


Introduction
By definition, independent component analysis ICA is the statistical method that searches for a linear transformation, which can effectively minimize the statistical dependence between its components 1 .Under the physically plausible assumption of mutual statistical independence between these components, the most application of ICA is blind signal separation BSS .In its simplest form, BSS aims to recover a set of unknown signals, the so-called original sources s t s 1 t , s 2 t , . . ., s n t T ∈ R n , by relying exclusively on information that can be extracted from their linear and instantaneous mixtures x t x 1 t , x 2 t , . . ., x m t T ∈ R m , given by x t As t , t 1, 2, . . ., m, 1.1 where A ∈ R mxn is an unknown mixing matrix of full rank and m ≥ n.In doing so, BSS remains truly blind in the sense that very little to almost nothing be known a priori for the mixing matrix or the original source signals.
Often sources are assumed to be zero-mean and unit-variance signals with at most one having a Gaussian distribution.The problem of source estimation then boils down to determining the unmixing matrix W ∈ R nxm such that the linear transformation of the sensor observation is u t Wx t , t 1, 2, . . ., n, 1.2 where u t u 1 t , u 2 t , . . ., u n t T ∈ R n yield an estimate of vector s t corresponding to the original or true sources.In general, the majority of BSS approaches perform ICA, by essentially optimizing the negative log-likelihood objective function with respect to the unmixing matrix W such that where E • represents the expectation operator and p u l u l is the model for the marginal probability density function pdf of u l , for all l 1, 2, . . ., n. Normally, matrix W is regarded as the parameter of interest and the pdfs of the sources are considered to be nuisance parameters.In effect, when correctly hypothesizing upon the distribution of the sources, the maximum likelihood ML principle leads to estimating functions, which in fact are the score functions of the sources 2 In principle, the separation criterion in 1.3 can be optimized by any suitable ICA algorithm where contrasts are utilized see; e.g., 2 .A popular choice of such a contrast-based algorithm is the so-called fast cubic converging Newton-type fixed-point algorithm, normally referred to as FastICA 3 , based on where, as defined in 4 ,

. , ϕ n u n
T being valid for all l 1, 2, . . ., n.In the ICA framework, accurately estimating the statistical model of the sources at hand is still an open and challenging problem 2 .Practical BSS scenarios employ difficult source distributions and even situations where many sources with very different pdfs are mixed together.Towards this direction, a large number of parametric density models have been made available in recent literature.Examples of such models include the generalized Gaussian density GGD 5 , the generalized lambda density GLD , and the generalized beta distribution GBD or even combinations and generalizations such as super and generalized Gaussian mixture model GMM 6 , the generalized gamma density GGD 7 , the Pearson family of distributions 4 , and even the so-called extended generalized lambda distribution EGLD which is an extended parameterizations of the aforementioned GLD and GBD models 8 .In the following section, we propose Mixture Generalized Gamma Density MGΓD for signal modeling in blind signal separation.

Mixture Generalized Gamma Density (MGΓD)
A Mixture Generalized Gamma Density MGΓD is a parametric statistical model which assumes that the data originates from weighted sum of several generalized gamma sources 9 .More specifically, a MGΓD is defined as ii k is the number of mixture density components, iii m i is the ith mixture weight and satisfies m i ≥ 0, iv p i u|c i , α i , β i , γ i is an individual density of the generalized gamma density which is characterized by 7 , where c i is the location parameter, β i > 0 is the scale parameter, α i > 0 is the shape/power parameter, and γ i > 0 is the shape parameter.Γ α is the gamma function, defined by By varying the parameters, it is possible to characterize a large class of distributions such as Gaussian, sub-Gaussian more peaked, than Gaussian, heavier tail , and supergaussian flatter, more uniform .It is noticed that for γ 1, the GΓD define gamma density as special case.Furthermore, if γ 2 and α 0.5, it become the Gaussian pdf, and if γ 1 and α 1, it represent the Laplacian pdf.
Figures 1 and 2 show some examples of pdf for MGΓD for k 1 and k 2. Thanks to the shape parameters, the MGΓD is more flexible and can approximate a large class of statistical distributions, this distribution requires to estimate 5 * k parameters, θ m i , c i , α i , β i , γ i i 1, 2, . . ., k. Particularly, we discuss the estimation of these parameters in detail in the following section.

Numerical Optimization of the Log-Likelihood Function to Estimate MGΓD Parameters
We propose in this section a generalization of the method proposed in 9 which address only the case of 2 components by setting the derivatives of the log-likelihood function to zeros.The log-likelihood function of 2.2 , given by 10 where N the sample size and h i,j p i | u j i 1, . . ., k, j ∈ 0, N − 1 represents the conditional expectation of p i given the observation u j , this means the posterior probability that u j belongs to the ith component.In the case of generalized gamma distribution, if we substitute 2.2 into 3.1 and after some manipulation, we obtain the following form of

3.2
Accordingly, we obtain for i 1, 2, . . ., k the following nonlinear equation related to the estimated parameters by derivatives of the log-likelihood function with respect to c i , α i , β i , and γ i and setting these derivatives to zeros, we obtain where Ψ • is the digamma function Ψ x Γ x / Γ x .After a little mathematical manipulation, the ML estimate of γ i is obtained

3.5
Given the estimate of γ i , it is straightforward to derive the estimate for α i , β i , and c i .Let γ i be the estimate of γ i .Then, where where α i and β i are the resulting estimates for α i and β i , respectively, and to estimate the location parameter, we solve 3.4 by gradient ascent.The estimation of weight coefficient obtains directly from h i,j as follows 10 3.8 However, 3.5 cannot be easily solved, so we adopt the gradient ascent algorithm to obtain the estimate of γ i and determine the estimates of α i , β i , and c i uniquely using this value of γ i .
Alternative numerical method can be used to estimate the parameters is called NM, where the appeal of the NM optimization technique lies in the fact that it can minimize the negative of the log-likelihood objective function given in 3.2 , essentially without relying on any derivative information.Despite the danger of unreliable performance especially in high dimensions , numerical experiments have shown that the NM method can converge to an acceptably accurate solution with substantially fewer function evaluations than multidirectional search descent methods.Good numerical performance and a significant improvement in computational complexity for our estimation method, therefore, optimization with the NM technique, produce a good estimation for parameters in MGΓD.To show the performance of NM, we consider the next example.

Example
We generate random number from MGΓD with parameters k 2, m 1 0.25, m 2 0.75, α 1 0.5, α 2 0.5, β 1 2, β 2 2, γ 1 1, γ 2 4, c 1 0, and c 2 10.By performs NM, we obtain best estimation for parameters.As we show in Table 1, the first 5th values of estimated parameters after being sorted according the value of function.In the following section, we resolve to FastICA algorithm for blind signal separation BSS , this algorithm depends on the estimated parameters and an unmixing matrix W which estimated by FastICA algorithm.

Application of MGΓD in Blind Signal Separation
Novel flexible score function is obtained, by substituting 2.1 into 1.4 for the source estimates u l , l 1, 2, . . ., n, it quickly become obvious that our proposed score function inherits a generalized parametric structure, which in turn can be attributed to the highly flexible MGΓD parent model.In this case, a simple calculus yield the flexible BSS score function

4.1
In principle ϕ l u l | θ is capable of modeling a large number of signals, such as speech or communication signals, as well as various other types of challenging heavy-and light-tailed distributions.This is due to the fact that its characterization depends explicitly on all parameters m i , c i , α i , β i , γ i , i 1, 2, . . ., k.Other commonly used score functions can be obtained by substituting appropriate values for m i , c i , α i , β i , and γ i in 4.1 , for instance, when k 1, we have score function 2 The function ϕ l u l | θ could become singular, in some special cases, essentially those corresponding to heavy-tailed or sparse distribution defined for α i γ i ≥ 1 with α i 1 and |u l − c i | 0. In practice, to circumvent such deficiency, the denominator in 4.1 can be modified slightly to read where ε is a small positive parameter typically around 10 4 which, when put to use, can almost always guarantee that the discontinuity of 4.1 or values in or approaching the region |u l − c i | 0 is completely avoided.

Numerical Experiments
To investigate the separation performance of the proposed MGΓD-based FastICA BSS method, a set of numerical experiments are preformed, in which we consider only two cases when k 1, k 2, and we illustrate this in the following two examples.

Example 1
In this example, k 1 and the data set used consists of different realizations of independent signals, with distributions shown in Table 2.Note that this is a large-scale and substantially difficult separation problem, since it involves a Gaussian, various super-and sub-Gaussian symmetric PDFs, as well as asymmetric distributions.In all cases, the number of data samples has also been designed to be relatively small; for example, N 250.The source signals are mixed noiselessly with randomly generated full-rank mixing matrices A. The FastICA method is implemented in the so-called simultaneous separation mode whereas the stopping criterion is set to ε 10 − 4 .FastICA is executed using the flexible MGΓD model is used to

Example 2
In this example, k 2 in which the data set used consists of different realizations of independent signals.Note that each signal not only a Gaussian, super-, and sub-Gaussian PDFs, but it is mixed of this PDFs as shown in Table 3.

Algorithm Performance
The separation performance for ICA algorithm is evaluated with the crosstalk error measure Note that here, p ij represents the elements of the permutation matrix P WA, which after assuming that all sources have been successfully separated should ideally reduce to a permuted and scaled version of the identity matrix.The separation performance for the first example is PI −12.68 dB and for second example is PI −16.68 dB.

Conclusions
We have derived a novel parametric family of flexible score functions, based exclusively on the MGΓD model.To calculate the parameters of these functions in an adaptive BSS setup, we have chosen to maximize the ML equation with the NM optimization method.This alleviates excessive computational cost requirements and allows for a fast practical implementation of the FastICA.Simulation results show that the proposed approach is capable of separating mixtures of signals.

Figure 5 model
Figure 5

5 . 1
Let the mixing matrix A and unmixing matrix W be defined as follows: equation x As, we obtain mixed signals as show in Figure3, where mixing signals are in the left and source signals are in the right.After using FastICA, we recover the sources, and we show the estimated signals in the left and original signals in the right in Figure4with different in scale only.

Table 1
When α 1 γ 1 1 and β 1 1, we have a scaled form of the GGD-based score function constitutes such a special case of 4.2