Kernel Least Logarithmic Absolute Difference Algorithm

Kernel adaptive filtering (KAF) algorithms derived from the second moment of error criterion perform very well in nonlinear system identification under assumption of the Gaussian observation noise; however, they inevitably suffer from severe performance degradation in the presence of non-Gaussian impulsive noise and interference. To resolve this dilemma, we propose a novel robust kernel least logarithmic absolute difference (KLLAD) algorithm based on logarithmic error cost function in reproducing kernel Hilbert spaces, taking into account of the non-Gaussian impulsive noise. ,e KLLAD algorithm shows considerable improvement over the existing KAF algorithms without restraining impulsive interference in terms of robustness and convergence speed. Moreover, the convergence condition of KLLAD algorithm with Gaussian kernel and fixed dictionary is presented in the mean sense. ,e superior performance of KLLAD algorithm is confirmed by the simulation results.


Introduction
Kernel adaptive filters as a tremendous breakthrough of the conventional linear adaptive filters have been widely used in many practical nonlinear applications including time series prediction [1], acoustic echo cancellation [2], channel equalization [3], abnormal event detection [4], etc. e scheme of kernel adaptive filtering (KAF) is to map the original input data into high or infinite dimensional feature space via kernel function and then apply the framework of typical linear adaptive filtering to the transformed data in the reproducing kernel Hilbert spaces (RKHS) leading to various KAF algorithms [5][6][7][8]. e kernel least-mean-square (KLMS) algorithm, as the benchmark among of KAF algorithms, is developed from the cost function of secondorder statistic of the error between the desired signal and instantaneous estimate under the assumption of Gaussian noise for its mathematical simplicity and convenience [9]. erefore, it can be ensured that the performances of KLMStype algorithms only for Gaussian disturbance noise severely degrade, when the desired signals are corrupted by the impulsive interferences. In practical applications, e.g., underwater acoustic signal processing [10], wireless communication environments [11], and radar cluster elimination [12], the impulsive noises with the statistical characteristics consisting of infrequency, short duration, and high amplitude are more rational than the ideal Gaussian noises.
For conventional linear robust adaptive filtering, the signed adaptive filters and their theoretical analyses have been extensively studied in [13][14][15]. e generalized maximum correntropy criterion (GMCC) algorithm with generalized Gaussian density function was proposed in [16], and its stability and steady-state mean square performance were also investigated. e authors of [17] proposed the least logarithmic absolute difference (LLAD) with the correntropy-induced metric (CIM) constraint in order to exploit system sparsity and suppress the interferences. e minimum kernel risk-sensitive loss (MKRSL) algorithm was developed to achieve fast convergence speed by applying risk-sensitive loss while yielding the robust performance to outliers [18]. When the shape parameter is equal to 2 and the risk-sensitive parameter gradually tends to trivial, both GMCC and MKRSL algorithms reduce to the ordinary maximum correntropy criterion (MCC) algorithm. e constrained least mean logarithmic square (CLMLS) based on a relative logarithmic cost function and its variants were proposed in [19], and they were used in the application of sparse sensor array synthesis achieving the desired beam pattern with much less senor elements. In [20], a robust least mean logarithmic square (RLMLS) algorithm and its variable step-size variant were presented to combat impulsive noises, and its theoretical mean square performance was also analyzed with the stationary white Gaussian inputs.
Meanwhile, the topic of nonlinear system identification corrupted by the impulsive noise also attracted large numbers of research interests [21]. It is thus important and necessary to study the robust KAF algorithm in order to resolve the divergence problem of performance caused by non-Gaussian impulsive noise. In [22], the kernel maximum correntropy criterion (KMCC) algorithm was developed by introducing the framework of MCC into RKHS in the presence of impulsive noises. By combining the kernel method and the affine projection sign algorithm, the kernel affine projection sign algorithm (KAPSA) was proposed in [23] to combat the non-Gaussian impulse interference. An improved variable forgetting factor recursive logarithmic least mean p th (IVFF-RLLMP) algorithm was developed in [24] for the Volterra system identification against the impulsive interference modeled by α-stable distribution. As a counterpart of GMCC algorithm, the generalized kernel maximum correntropy (GKMC) and the quantized GKMC (QGKMC) algorithm were developed in [25] for robust nonlinear adaptive filtering. In [26], the quantized minimum kernel risk-sensitive loss (QMKRSL) algorithm was proposed to achieve better and robust performance of nonlinear filtering for outliers. Motivated by the studies in [27,28] on the Cauchy loss which has been successfully used in various robust learning applications, the multikernel minimum Cauchy kernel loss (MKMCKL) algorithm was reported in [29] showing the improved nonlinear filtering performance over counterpart single algorithm in the presence of extreme outliers. Recently, the kernel affine projection-like (KAPL) algorithm in RKHS was proposed and investigated for nonlinear channel equalization in scenarios of non-Gaussian noises [30]. e kernel least mean p-power (KLMP) algorithm was proposed to alleviate the adverse impact of impulsive noise in [31,32], independently. More recently, Nyström kernel recursive generalized maximum correntropy (NKRGMC) with probability density rank-based quantization sampling algorithm was proposed to improve the convergence performance for impulsive noises [33]. e tracking analysis of kernel signed error algorithm (KSEA) with Gaussian kernel for time-variant nonlinear system was analyzed thoroughly under the assumption of impulsive noise [34]. More recently, the logarithmic hyperbolic cosine-based adaptive filter (LHCAF) was proposed in [35] to address the issue of instability of its prototype algorithm, and the transient and steady-state analyses were also provided. Subsequently, the authors of [36] proposed the multiple random Fourier features Cauchy-loss conjugate gradient (MRFGCG) algorithm which has better performance than the classical KAF algorithms in terms of computational complexity and filtering accuracy. erefore, the cost functions adopting the frameworks of fractional order statistics of error or the distinct types of error measures are able to provide effective ways to reveal the robust performance against impulsive noises. Inspired by the family of linear adaptive filtering algorithms based on the logarithmic cost proposed in [37], our purpose is to extend this scheme into RKHS to obtain the robustness of KAF algorithm particularly within non-Gaussian impulsive noise environment. In this paper, the kernel least logarithmic absolute difference algorithm based on the logarithmic error cost framework is proposed to achieve the nonlinear system identification in the impulsive interference environments, which are more frequently encountered in practical applications. Simulation results illustrate the proposed KLLAD algorithm can consistently decrease the drastic perturbation of recursive weight coefficients caused by the large amplitude of instantaneous estimation error with low probability.
Notation: We use normal font small letters x for scalar variables, boldface small letters x for column vectors, and boldface capital letters X for matrices. e superscript (·) ⊤ represents the transpose of a vector or a matrix. e expectation is denoted by E · { }, and matrix trace is denoted by tr · { }. e Gaussian distribution with mean μ and variance σ 2 is denoted by N(μ, σ 2 ). e notation ‖ · ‖ 2 is the ℓ 2 -norm of its matrix or vector argument; ‖x‖ 2 A is the weighted square value x ⊤ Ax. Notation sgn · { } is the signum function. e operator eig max X { } denotes the maximum eigenvalue of matrix X. Identity matrix of size N × N is denoted by I N .

Preliminaries of KAF Algorithms
Let H denote a Hilbert space of real-valued function ψ from a compact subspace U ⊂ R L . Function κ: U × U ⟶ R is a reproducing kernel, and (H, 〈·, ·〉) H is the induced RKHS with its inner product. e following unknown nonlinear system is considered: (1) e scalar y n and the vector x n ∈ R L denote the desired output and the input signal, respectively. Here, f ⋆ represents the optimum functional to be identified, and z n stands for the non-Gaussian impulsive noise modeled as the contaminated-Gaussian (CG) noise as follows [13,38]: where both v n and ϵ n are statistically independent zero-mean white Gaussian noises with the variances σ 2 v and σ 2 ϵ � Kσ 2 v , with parameter 1 ≪ K. Moreover, the random sequence b n is from a Bernoulli random process with the probability of Pr(b n � 1) � p r and Pr(b n � 0) � 1 − p r . Notice that v n and b n ϵ n represent the common Gaussian noise and particular impulsive interference component, respectively. Subsequently, the probability density function (pdf ) of the impulsive CG noise z n is given by When p r � 0, the impulsive CG noise z n deteriorates to a common white Gaussian noise with zero-mean and variance 2 Mathematical Problems in Engineering σ 2 v , namely, impulsive interference free. Correspondingly, the variance of impulsive noise z n is given by Hence, z n is statistically independent of any other signals. e reason why we adopt the CG model is that it makes the analysis of mean stability of KLLAD algorithm mathematically tractable.
Given a sample set of pairs of input vectors and desired output scalars, i.e., x n , y n N n�1 , we aim at estimating a nonlinear regression function ψ that relates input data x n and output data y n corrupted by impulsive noise z n . Let H be a RKHS with kernel κ. By virtue of the representer theorem [39], the function ψ in H that minimizes the regularized least-squares problem: with ε a nonnegative regularization constant, can be written as the kernel expansion in terms of available training data in form of When ε � 0, it will not affect the derivation of algorithm. It can be seen from (5) and (6) that the functional representation is formulated as the parametric vector form. However, solution (6) is infeasible to be performed in an online manner, because the algorithm can not cope with the linear increase of the size N of the model as the latest input data available. A commonly used strategy is to adopt a finiteorder model of the form [5] ψ � M n m�1 w m κ ·, x ω,m .
e set of selected input data D n � x ω,m M n m�1 is socalled online dictionary with the length M n and is generated by the informative criteria from input vectors x n in an online manner, e.g., coherence criterion [1], surprise criterion [5], and approximate linear dependency [40]. Note that (6) seems to be the same as (7) in form, whereas the length M n determined by the online sparsification criterion is analogue to the order of transversal filter. Without loss of generality, we chose the coherence criterion which allows the candidate x n to be inserted into the dictionary if its maximum coherence remains below the given threshold δ κ , namely, where 0 < δ κ ≤ 1 determines both the level of sparsity and the coherence of the dictionary. Based on the stochastic gradient of cost functions J(e n ) with respect to w, using the parametric finite-order model (7), we can then obtain the corresponding recursive update equation of the KAF algorithms: With the positive step-size η and the weight coefficients vector w n � [w n (1), w n (2), . . . , w n (M n )] ⊤ , the instantaneous estimation error e n is given by Moreover, the kernelized input vector κ ω,n is defined by Adopting the mean-squared error (MSE) criterion for (9) leads to the scheme of KLMS algorithm described as follows: where the complete KLMS algorithm with online sparsification criterion is not present for clarity. It can be easily observed from (12) that the KLMS-type algorithms suffer from sever performance degradation and even divergence caused by the instantaneous estimation error given in (10), which is contaminated by the non-Gaussian impulsive noise defined in (2). It is thus critical for the design of robust KAF algorithm to effectively suppress the adverse effects of impulsive interference noise.

KLLAD Algorithm
In this section, we shall derive the KLLAD algorithm based on the logarithmic cost function.
Logarithmic cost as one type of relative cost measures is capable of providing relatively legitimate amplification for the ordinary error fluctuation and significant attenuation for very large error value induced by impulsive interferences. It has been illustrated in [37] that the logarithmic cost function can proportionally adjust the weight coefficients for small and large error values depending on the combination weights varying with time. erefore, we introduce the differentiable combined logarithmic error cost function as follows: where λ is a design parameter, and F(e n ) is the conventional cost function of instantaneous estimation error. Note that the universal adaptive filtering algorithm obtained from (13) is able to update the coefficients by small error and suppress the drastic perturbation of error simultaneously. Taking the derivative of (13) with respective to weight vector w, we obtain ∇ w J e n � ∇ w F e n λF e n 1 + λF e n .
Since the impulsive CG noise exists, the kernel adaptive filters based on quadratic cost function have to face the severe performance degradation. Nevertheless, the signed adaptive filtering algorithms based on the ℓ 1 -norm of Mathematical Problems in Engineering estimation error are robust to the impulsive noise because of its recursive update equation only utilizing the sign of instantaneous estimation error. In order to mitigate the adverse impact of impulsive noise on KAF, let the conventional cost function of error F(e n ) be E |e n | using (10), namely, It should be pointed out that cost function (13) using (15) can perform less attenuation for the small estimation errors and more attenuation for relatively large estimation errors simultaneously. Substituting (15) into (14) leads to zJ e n zw � zF e n zw λF e n 1 + λF e n .
Applying the steepest-decent method to minimize the logarithmic cost function (13), thus the weights vector of KLLAD algorithm can be solved iteratively by By (16), the recursive update equation of KLLAD algorithm can be reformulated as where the subgradient in the second term on the right hand side of (18) is calculated as zF e n zw ≈ sgn e n .
Note that the above approximation notation implies that the subgradient of conventional cost function F(e n ) is replaced by its instantaneous estimate. Substituting (15) and (19) into (18) and then removing the expectations, the stochastic update equation of KLLAD algorithm can be rewritten as w n+1 � w n + η λ e n 1 + λ e n sgn e n κ ω,n � w n + η λe n 1 + λ e n κ ω,n , where κ ω,n denotes the M n × 1 dimensional kernelized input vector as in (11). For universal nonlinear filtering performance, we only focus on using the Gaussian kernel function , with ξ > 0 the kernel bandwidth in this paper. Moreover, the Gaussian kernel function has been successfully used in the theoretical analysis of KAF algorithms due to its mathematical convenience and tractability in the derivation [34,41]. Based on (20), the recursive update equations of KLLAD algorithm based on the finiteorder model using the coherence criterion are presented in the following.
At each time instant n, the input x n will be decided into the case of rejection or reception according to the coherence criterion (8) for online dictionary D n as follows: Note that the dimensions of kernelized input vector κ ω,n and weight vector w n remain still in this case.
Note that the dimensions of kernelized input vector κ ω,n and weight vector w n are augmented to make the update equation of weight vector valid.
It should be pointed that the threshold δ κ is selected by grid search over the interval [0, 1) to determine the sparsity of online dictionary D n as in [1,[5][6][7]. e KLLAD algorithm described by (21) and (22) combines the advantages of KLMS and KSEA algorithms leading to the improved convergence performance taking into account Gaussian noise and non-Gaussian impulsive noise existing simultaneously. When the design parameter λ is set to large value, the KLLAD algorithm can achieve the robust convergence performance against the impulsive noise. In addition, the KLLAD algorithm has faster convergence rate than the KSEA algorithm especially for highly correlated input signals. e scheme of KLLAD algorithm is summarized as Algorithm 1.
By (9), the recursive update equation of KAF algorithms can be expressed in a general form of w n+1 � w n + ηf e n e n κ ω,n , where f(e n ) is defined as the nonlinear error function which performs like the generalized variable step-size against non-Gaussian impulsive noise. According to (23), various nonlinear KAF algorithms can be readily obtained by solving distinct cost functions in the RKHS. Hence, the error functions of some KAF algorithms mentioned previously are listed in Table 1.
It should be pointed out that the error function of MKRSL algorithm reduces to that of KMCC algorithm as the risk-sensitive parameter λ f ⟶ 0 + or that of KLMS algorithm as the kernel bandwidth ξ f ⟶ ∞ [18]. is means that the KMCC algorithm is a particular case of MKRSL algorithm in fact.
us, we are only concerned with the KMCC algorithm with the characteristic of robustness for impulsive noises in the following. Correspondingly, the curves of error functions listed in Table 1 with different parameters are plotted in Figure 1. First of all, it can be observed from Figure 1(a) that the value of error function of KLMS algorithm is constant one without any ability of suppressing the impulsive noise due to its quadratic cost function for Gaussian noise. We can then see from Figures 1(b)-1(d) that the output values of nonlinear error functions of KMCC, KLMP, and KLLAD algorithms are effectively attenuated even for the very large instantaneous estimation errors caused by impulsive interferences. Specifically, the attenuation rate of error function of KMCC with small kernel bandwidth ξ f is very fast, whereas its maximum output values corresponding to the relatively small error inputs particularly near zero are always less than 1. Consequently, the KMCC algorithm has to take much larger step-size η than the KLMP and KLLAD algorithms. Although both error functions have similar shape of output values for KLMP algorithm with small p and KLLAD algorithm with large λ, the error function of KLLAD consistently gives much smaller outputs for larger error inputs than that of KLMP.

Convergence Condition on Mean Stability of the KLLAD Algorithm
In this section, we shall investigate the sufficient convergence condition of KLLAD algorithm via Gaussian kernel function with the fixed dictionary in the mean sense. For mathematical tractability of theoretical analysis, the CG modeled impulsive noise z n given in (1), instead of the standard symmetric α-stable distribution, has been successfully used in the theoretical analysis of robust adaptive filters [13,14,34].
One the one hand, the coherence criterion guarantees the length M n is infinite [1]. On the other hand, it is true that the length M n of online dictionary D n gradually tends to be invariant in the steady-state phase. erefore, we only need to consider the recursive update equation (21) of KLLAD algorithm with the steady fixed dictionary D with constant length M in the context of the derivation of convergence condition on mean stability.
We start with introducing the weight error vector defined by where w ⋆ � [w ⋆ 1 , w ⋆ 2 , . . . , w ⋆ M ] ⊤ represents the optimal weight vector of vector-valued form of nonlinear system. It should be emphasized that w ⋆ corresponds to the functional f ⋆ based on the specific dictionary elements. From (1) and (24), the instantaneous estimation error (10) can be rewritten as With the kernelized input vector with fixed dictionary D Subtracting w ⋆ from both sides of (21), and using (25), Taking the expected values of both sides of (27), it follows that Since the impulsive CG noise z n with zero-mean is assumed to be statistically independent of any other signals as mentioned above, and κ ω,n κ ⊤ ω,n is assumed to be statistically independent of weight error vector v n , i.e., the modified independence assumption (MIA) widely used in [34,41,42], then (28) can be reformulated as For theoretical analysis tractability, the second term of (29) can be approximated as (1) Input: x n , y n , n � 1, 2, . . .
(2) Initialization: select the step-size η > 0, the Gaussian kernel bandwidth ξ > 0, the threshold δ κ > 0, the parameter λ > 0, the dictionary D 0 � x 1 , and weight coefficient w 0 (1) � 0.  Algorithm On the one hand, the numerator of (30) is the autocorrelation matrix of the kernelized input vector defined by It has been determined in the theoretical analysis of KAF algorithms [34,41]. e (i, j)-th element of matrix R κκ can be computed by with the input covariance matrix R xx � E x n x ⊤ n ∈ R L×L . On the other hand, using (25) and the MIA assumption, the expression of MSE for KLLAD algorithm is given by with the autocorrelation matrix of weight error vector V n � E v n v ⊤ n ∈ R M×M . Although the recursion of V n is not explicitly provided, it has trivial influence on the convergence condition on the mean stability of KLLAD algorithm as shown below. By the result of (4), (33) can be expressed as e expectation in denominator of (30) can be roughly approximated as  Mathematical Problems in Engineering en, (29) can be further determined by (36), as shown at the top of next page. By (36), thus the sufficient convergence condition on mean stability is given by us, (37) can be rigorously reformulated as where the calculation of autocorrelation matrix of weight error vector V n is not used as explained before.

Simulation Results
In this section, we evaluated the performance of the proposed KLLAD algorithm in the context of impulsive noise by the simulation results. All the curves are obtained by averaging over 200 independent Monte Carlo trails.

Example I.
In general, the optimal weight vectors of KAF algorithms, which correspond to the elements of online dictionary built by an online manner, are unavailable a priori. As a consequence, it is difficult to exhibit the convergence performance, particularly the mean-square-deviation (MSD). In order to obviously demonstrate the excellent mean and mean square convergence performance of KLLAD algorithm, the desired output of an ideal nonlinear synthesis system consisting of the optimum weight vector and the kernelized inputs is given by where the kernel bandwidth is set to ξ � 0.65, and z n is the non-Gaussian impulsive CG noise with parameters K � 1 × 10 4 , σ 2 v � 1, and p r � 0.05. Furthermore, the CG noise model defined by (2) can be generalized into the mixture Gaussian noise model by changing the distribution of the random variable ϵ n into the uniform, binary, Rayleigh, Laplacian distributions, etc. en, we are able to thoroughly investigate the variations of nonlinear filtering performance of KLLAD algorithm in the presence of distinct impulsive noises with large outliers.
Meanwhile, the preselected dictionary with 5 elements utilized for the kernelized nonlinear mapping is given by   D � x ω,1 , x ω,2 , x ω,3 , x ω,4 , x which is generated by the coherence criterion from the input signals a priori. Correspondingly, the learning curve of the MSD is defined as where T is the total number of Monte Carlo runs. e input signal was assumed to be a sequence of statistically independent vectors x n � [x 1,n , x 2,n ] ⊤ with correlated samples satisfying x 1,n � 0.5x 2,n + τ x,n , where x 2,n is a white Gaussian noise sequence with variance σ 2 x 2 � 1 and τ x,n is a white Gaussian so that x 1,n has variance σ 2 x 1 � 1. e optimum weight vector w ⋆ was supposed to be abruptly changed from e set of parameters of all used algorithms for Example I is listed in Table 2. It should be pointed that the step-size of KLLAD algorithm is selected by satisfying the convergence condition presented by (38). e convergence curves of weight coefficients obtained by implementing KLMS, KMCC, KLMP, and KLLAD algorithms are illustrated in Figures 2-5, respectively. As illustrated in Figure 2, the averaged curves of weight coefficients of KLMS algorithm are not able to tend to the optimal weight coefficients during the two stages due to the interference effect of impulsive CG noise. In contrast, the averaged curves of weight coefficients of KMCC, KLMP, and KLLAD algorithms are smooth and all converge to two optimal weight coefficients within the two stages, as shown in  In addition, the effectiveness of convergence condition on the mean stability (38) is validated by Figure 5. More importantly, Figures 6(a)-6(f ) show that the filtering performance of KLLAD algorithm consistently outperforms those of KLMS, KMCC, and KLMP algorithms in terms of robustness, convergence rate, and accuracy of the learning curves of MSD during the two stages using the mixture Gaussian noise model based on the normal, uniform, binary, Rayleigh, Laplacian, and symmetric α-stable distributions, respectively. erefore, the robust performance of KLLAD algorithm is validated by the simulation results of nonstationary nonlinear system identification in the presence of non-Gaussian impulsive noise.

Example II.
As the second example, we consider the input random sequence generated from the following relation: where ζ n is a random noise following the i.i.d. standard normal distribution. Here, the correlation factor ρ and the standard deviation σ u of random sequence u n were all Mathematical Problems in Engineering 7 chosen as 0.5. e desired output of nonlinear system is generated as follows: where z n is the non-Gaussian impulsive CG noise with parameters K � 1 × 10 4 , σ 2 v � 1, and p r � 0.1. At each time instant n, the input vectors x n � [u n , u n− 1 ] ⊤ are used to estimate the nonlinear desired response signal y n contaminated by the impulsive noise. In order to clearly investigate the characteristic of convergence of weight coefficients, the 6 elements of dictionary were chosen by the coherence criterion from the input vectors a priori as follows: e set of parameters of used algorithms for Example II is listed in Table 3. Likewise, the step-size of KLLAD algorithm in second example is also determined from the range of convergence condition according to (38).
Although the fixed dictionary with 6 elements is used to compare the differences of convergent performance among used algorithms, the corresponding optimal weight Weights w n (m)  coefficients are still unknown in fact. e KLMS algorithm is ignored due to its severe instability of convergence. Figure 7 shows that the mean value curves of weight coefficients of KLLAD algorithm converge much smoother and faster compared with the KLMP and KMCC algorithms, which verified the superiorities of KLLAD algorithm.

Conclusion
In this paper, we presented a novel KLLAD algorithm based on the logarithmic error cost criterion under the assumption of non-Gaussian impulsive CG noise. e KLLAD algorithm can effectively mitigate the instability of convergence learning curves caused by impulsive noise. e simulation results demonstrated that the proposed KLLAD algorithm has an excellent performance compared to the KLMS, KMCC, and KLMP algorithms in the presence of impulsive noise.
Data Availability e ideal mathematical model data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.