Extended Stochastic Gradient Identification Method for Hammerstein Model Based on Approximate Least Absolute Deviation

,


Introduction
In recent years, the Hammerstein model has drawn a lot of attention because of its block-oriented nonlinear (BONL) character.It can be used to describe a variety of nonlinear systems such as nonlinear filtering, actuator saturation, signal analysis, and chemistry in biology system.So far, there have been several kinds of methods for the identification of Hammerstein model with white noise such as the iterative method [1,2], the overparameterization method [3], the support vector machine (SVM) method [4], the subspace method [5][6][7], the blind method [8], the frequency domain identification method [9], and the artificial intelligence method [10][11][12].In industry process control, however, the noise always includes colored noise and peak noise.It is necessary to consider the identification method for Hammerstein model when data are contaminated by colored noise and peak noise.Chang and Luus proposed an iterative method for Hammerstein model with colored noise [13], but it cannot be used for onlineidentification. Ding and Chen proposed an extended stochastic gradient method based on least squares for the nonlinear Hammerstein ARMAX system and proved its convergence by the martingale convergence theorem [2].In most researches, the LS criterion is taken as the objective function during the identification of Hammerstein model.The studies show that the LS method has better identification effect when the stochastic noise is normally distributed [14].However, the LS method cannot work as well as the LAD method in certain conditions; for example, if the noise does not obey the normal distribution, the statistics performance of LAD estimation is better than LS estimation with irreplaceable advantages [15].As to the square term in the objective function of LS method, a small change of the measured data will lead to a great influence on the identification results when there are outliers in the measured data [16].To compensate the effect of the impulse noise and outliers on the identification accuracy, the LAD criterion is chosen to be the objective function which replaces the square terms with absolute deviation.The LAD method decreases the sensitivity to impulse noise and outliers and greatly improves the robustness because the LAD criterion only does the first power computation of the deviation.The LAD objective function is not differentiable.It needs to solve a nonsmooth optimization problem [17], which complicates the computation.The proposed method replaces the absolute deviation in LAD with a certain differentiable function and rebuilds the ALAD objective function.This paper derives the identification algorithm for Hammerstein model from ALAD objective function and the extended stochastic gradient method.To improve the identification accuracy and convergence rates, we add an inertial term to the proposed method.The convergence of the algorithm is proved by Lyapunov stability theory at the same time.
The simulation experiments show that the proposed method can effectively eliminate the influence of impulse noise and outliers.Compared with the LS method, the ALAD method has stronger robustness and higher identification accuracy, demonstrating the superiority of the proposed method.
The rest of this paper is organized as follows.Section 2 describes the identification problem of nonlinear Hammerstein systems with colored noise.Section 3 derives the proposed identification algorithm from ALAD criterion.The convergence of the proposed method is also discussed in Section 3. Section 4 offers an illustrative example and compares the proposed algorithm with existing LS methods.Some concluding remarks are provided in Section 5.

Hammerstein Model with Colored Noise
As is shown in Figure 1, the structure of Hammerstein model is a series connection of a nonmemory static nonlinear block and a linear dynamic block.The static nonlinear block is connected with an input signal; the linear dynamic block is connected with an output signal.
In Figure 1, () is an input and () is the output of the nonlinear part and it is the input of the linear part as well.() is a real output.() is a measured output and the result of () which is disturbed by the colored noise ().V() is an additive white noise with zero mean.The colored noise () is the output of the linear link () that is driven by the white noise V().() is the "noise model" which decides the property of (), and () is the transfer function.Assuming the nonlinear part () could be represented by a polynomial about the input () with known order the linear block of Hammerstein model is described by the model of CARMA/ARMAX (Controlled Autoregressive Moving Average/Autoregressive Moving Average eXogenous), the transfer function () = ()/(), and () = ()/().Therefore, the relationship of input and output can be written as From ( 1) and ( 2), it is obtained that where (), (), and () are the polynomials of the shift operator: where   ,   ,   , and   ( = 1, . . .,   ,  = 1, . . .,   ,  = 1, . . .,   ,  = 1, . . .,   ) are the parameters to be identified.From (3), it is obtained that Define where a, b, and d are the parameter vectors of the linear section and c is the parameter vector of the nonlinear section of the Hammerstein model: We define Then, (5) can be rewritten as

Approximate Least Absolute Deviation.
As is known, the absolute deviation is not differentiable.To overcome this shortage, we choose a certain differentiable function to approximate the LAD criterion.This requirement can be met by the following logarithmic function: In (11),  is an adjustable parameter, and the nonlinear function ( * ) is related to .While  is small enough, nonlinear function cosh(/) ≈ (1/2) |/| , and  ln[cosh(/)] ≈ ||; that is, () ≈ ||, approximating the absolute value function.The property curve of the logarithmic function when  = 0.01 is shown in Figure 2.
The curve shows that while  is small enough, nonlinear function () =  ln[cosh(/)] can approximate the absolute value function || effectively.Because () is differentiable, then the identification algorithm based on () can be solved by some optimization methods.

Algorithm Derivation.
In (10), the input () and output () are measurement data.According to the known structure of the nonlinear function ( * ), we can get the value of (()).But V( − ) in () is white noise and immeasurable, which makes it impossible to solve the identification problem.Based on the idea that the noise can be replaced by its estimated value, we use V( − ), the noise residual in the th step, to replace V( − ) in the  + 1th step, so () can be replaced by its estimated value φ(): Then, the identification of  in (10) is to find  that minimizes the following criterion function (ALAD) based on input data φ() and output data (): For (13), it is a typical nonlinear function optimization.To reduce the computation complexity, we use the stochastic gradient method to derive the recursive equation of θ() [18,19].The formula of the stochastic gradient method is where grad( * ) represents gradient and   is step size.The formula above shows that θ() is corrected along the negative gradient direction until the extreme value of () is obtained.
We can get the gradient of () to   from ( 16): where Equation ( 17) is obtained based on the exact line search algorithm and always leads to the two adjacent gradient search directions being orthogonal and usually results in zigzag phenomenon.When we use (17) to compute the optimal   , the convergence rate is slow because of the linear convergence order, and the identification results of the parameters may be fluctuated.For this shortcoming, an improved stochastic gradient method is applied to get   where   is based on the information of gradient that is  times before the current iteration step; that is, From ê() −     = 0, it is obtained that From ( 14), (15), and (19), it is obtained that then where () is computed by the following recursive formula: where () = ê( − 1)/ê().Equation ( 24) is equivalent for adding an inertial term to the recursive equation of ().The weights of the inertial term are related to the iterative output error at  − 1 and  step.
In conclusion, we can obtain the extended stochastic gradient identification method based on approximate least absolute deviation (ALADESG): where () is a real symmetric matrix, and the rank of () = 1.
Obviously, when the following inequality in (34) is fulfilled, the matrix () is negative and semidefinite, and then (28) will be asymptotically stable in a large scope at the balance point θ() = 0: ( The second term of (35) can be written as where tanh (ê () /)  () > 0, 1  2 () > 0. (37) During the iterative process, if we could make sure that  ( − 1)  ( − 1)  () ≥ 0 (38) then we get Therefore, in order to guarantee the convergence of the proposed algorithm, we need to add (38) as a constraint when (24) is used for iteration.

Separation of Parameters by Average Method.
It is noteworthy that in the description of the Hammerstein model (as shown in Figure 1), for some nonzero and finite constant , any () and ()/ can match and yield identical input and output measurements; thus, the nonlinear function () and the transfer function () are not unique.In other words, any identification scheme will not be able to distinguish between ((), ()) and ((), ()/).Therefore, without any loss of generality, one of the gains of () and () has to be fixed.
There are several ways to normalize these gains [20,21].We adopt the following method.

Simulation Results and Discussions
Consider the following Hammerstein model: where The nonlinear static part can be described as the polynomial below: where the input sequence () is a series of uncorrelated, continuous, and stochastic sequences with zero mean and unit variance  2  = 1.The noise sequence {V()} is a group of stochastic white noise sequences with zero mean and variance  2 V = 0.5.The input and output data for the identification procedure are obtained from this model.We then apply proposed algorithm (25) to estimate the parameters of this system at different situation.The identification process and result analysis are demonstrated in the following.The simulation experiments are performed in Matlab.The value of  is taken as 0.01.
The relative error of the parameters estimation is used as the evaluation criteria of the algorithm which is defined as follows: In order to verify the performance of the proposed algorithm, we designed two simulation cases.
Case 1.Only colored noise exists in the measured data.This case is used to prove the impact of inertial term on the identification results of ALADESG method.
Case 2.Not only colored noise but also impulse noise exists in the measured data.This case is used to compare the identification effects of ALADESG method with that of LSESG method when measured data are contaminated by impulse noise.The robustness of ALADESG method is also verified when impulse noise with different amplitude is added.
The impulse noise in the experiment is subject to SS distribution [23], and the probability density function of the standard SS distribution is where Γ( * ) is Γ gamma function and  is characteristics exponent.The less  is, the higher chance of large amplitude sample of the random variable which is subject to SS distribution takes place, and the stronger pulse strength is.
The way to get a random variable which obeys the standard SS distribution follows the steps below [24]: (1) Get a random variable  subject to uniform distribution on (−/2, /2).
(2) Get a random variable  subject to exponential distribution and mean value is 1.
(3) Get the variable subject to standard SS distribution by the following formula: where  and  are independent.

The Effect of Inertial Term 𝜆(𝑘) on the Identification
Results.
(1) Let () = 0; namely,   is computed by (17) with current gradient information only.The curve of the relative error is shown in Figure 3.
According to theoretical analysis, there are some drawbacks when ( 17) is used to compute the optimal search step size, such as linear convergence order, slow convergence rate, and fluctuant estimation of parameters.As shown in Figure 3, the curve of the relative error of the parameters of Hammerstein model fluctuates severely, and the estimated parameters cannot converge to the true value.That is to say, the method cannot provide a reliable parameter estimation for the identification of Hammerstein model when () = 0.
(2) In order to improve the accuracy and convergence rate and to enhance the reliability of the identification results, we add an inertial term () = ê( − 1)/ê() to the method,  namely, applying ( 24) and (38) to parameter identification.The results are shown in Table 1.
Obviously, when () = ê( − 1)/ê(), the deviations of the estimated values and true values of model parameters are small, and the identification accuracy is much better than that of () = 0.This shows that the inertial term coming from the improved stochastic gradient method enhances the identification accuracy effectively and insures the convergence of the ALADESG method.The relative error curve is shown in Figure 4.
Comparing Figure 3 with Figure 4 shows that the relative error curve in Figure 4 is much smoother, the estimated parameters converge to those true values steadily, and the relative error is smaller than the situation as shown in Figure 3. Simulation results in Figures 3 and 4 show that the inertial term can greatly improve the identification accuracy and significantly enhance the reliability of the parameter estimation.

Considering the Measured Data with Colored Noise
Only.According to the same idea in Section 3.2, we can get the iterative equations of LSESG also containing a similar inertial term.Then the proposed ALADESG method was compared with the LSESG method when the measured data are contaminated with colored noise.The simulation result is shown in Figure 5.As shown in Figure 5, the identification accuracies of the ALADESG method and the LSESG method are both acceptable when the measured data are only disturbed by colored noise, but the identification accuracy of the LSESG method is higher and the convergence rate is faster.So when the measured data is contaminated with colored noise only, the identification performance of the LSESG method is better than the ALADESG method.

The Measured Data with Impulse Noise Subject to
Distribution.The measured data are contaminated with both colored noise and impulse noise.Let the characteristics exponent of the impulse noise be 1.5.The time-domain waveform of the impulse noise is shown in Figure 6.Then the measured data are identified by the ALADESG method and the LSESG method, respectively.The relative errors are shown in Figure 7.  Figure 7 shows that the identification accuracy of the ALADESG method is higher than that of the LSESG method when the measured data are disturbed by colored noise and impulse noise with  = 1.5.From Figures 6 and  7, we notice that the relative error of the LSESG method will fluctuate severely at the moment when a large impulse is added, and the larger the amplitude of the impulse is, the severer the fluctuation is, which makes the parameter estimation unable to converge steadily.In the LSESG method, the objective function is the square of the error so the influence of impulse noise would be amplified especially when the impulse noise is large.In contrast, the ALADESG method is based on the ALAD criterion which restrains the influence of impulse noise and enhances the robustness, the parameter estimation converges to the true value quickly and steadily, and the identification accuracy is also improved.The simulation results indicate that the ALADESG method has a better identification performance than LSESG method when impulse noise is added.

The Influence of Different Pulse Amplitude on the Identification Results of ALADESG Method.
In this part, we conduct three simulations with  = 1.5,  = 1.2, and  = 0.9.
The identification results of the ALADESG method are shown in Table 2, and the relative errors are shown in Figure 8.In addition, the identification results of the ALADESG method are compared with those of LSESG method when  = 1.2 and  = 0.9.The relative errors curves when  = 1.2 and  = 0.9 are shown in Figures 9 and 10, respectively.From Table 2 and Figure 8, the identification accuracy of the ALADESG method becomes worse when  decreases, but the decrease of accuracy is small.In contrast, from Figures 7,  9, and 10, the identification accuracy of the LSESG method becomes worse when  is decreased and the decrease of accuracy is very obvious.Particularly when  = 1.2 and  = 0.9, the LSESG method cannot converge indeed, which suggests that the LSESG method cannot identify the parameters of Hammerstein model in this situation.But high accuracy and stable identification results still can be achieved by the ALADESG method.Therefore, it can be concluded that the method based on ALAD criterion overcomes the impact of impulse noise very well and has better robustness to impulse noise.

Identification Results
Analysis.According to the identification results shown in Sections 4.1 and 4.2, the following conclusions are summarized.
(1) When the measured data are contaminated by colored noise, the Hammerstein model can be identified effectively by the proposed method.
(2) By applying inertial term () which contains the gradient information of past time, the identification accuracy and convergence rates are improved significantly.
(3) The identification performance of the LSESG method is better than that of the ALADESG method when only colored noise exists.(4) The ALADESG method has better accuracy and convergent rate than LSESG method when both colored noise and impulse noise exist in measured data.Particularly when the amplitude of impulse noise is very large (such as when  = 0.9), the ALADESG method still can get an acceptable identification result, which proves the adaptability of the method to more serious noise.The robustness of the method is also confirmed by simulation results.

Conclusions
In this paper, we expand the application of the LAD technique to the field of nonlinear identification.A new algorithm is proposed based on ALAD criterion and improved stochastic gradient search algorithm for the Hammerstein model, and the convergence is proven by Lyapunov stability theory.With the application of ALAD criterion, the proposed method overcomes the influence of impulse noise and colored noise on the identification results and improves the robustness to a large extent.The proposed algorithm is easy to implement.The simulation results show that the proposed algorithm yields better robustness against impulse noises and a faster convergence rate than that by the LSESG method.

Figure 1 :
Figure 1: The structure of Hammerstein model.

Figure 2 :
Figure 2: The curve of function property when  = 0.01.

Figure 6 :Figure 7 :
Figure 6: The time-domain waveform of the impulse noise when  = 1.5.

9 Figure 8 :
Figure 8: Relative error with impulse noise of different amplitude.

Table 2 :
Identification results with different .