Parameter Identification of ARX Models Based on Modified Momentum Gradient Descent Algorithm

. The parameter estimation problem of the ARX model is studied in this paper. First, some traditional identiﬁcation algorithms are brieﬂy introduced, and then a new parameter estimation algorithm—the modiﬁed momentum gradient descent algorithm—is developed. Two gradient directions with their corresponding step sizes are derived in each iteration. Compared with the traditional parameter identiﬁcation algorithms, the modiﬁed momentum gradient descent algorithm has a faster convergence rate. A simulation example shows that the proposed algorithm is eﬀective.


Introduction
ere are many identification algorithms which can estimate the parameters of linear models and nonlinear models, such as the coupled identification algorithms [1,2], the filtered identification algorithms [3,4], and the hierarchical identification algorithms [5][6][7]. e autoregressive exogenous (ARX) model is based on the traditional autoregressive model, adding measurable external inputs at various times to generate output. Such a model is widely used in engineering practice. For example, Naveros used the ARX model to identify physical parameters of walls [8], Qin et al. applied the ARX model to control the magnetic levitation ball system [9], Haddouche et al. utilized the ARX model to control the gas condition tower [10]. Since a robust controller often has the assumption that the structures of the systems should be given in prior [11][12][13], the system identification plays an important role in control engineering. Its basic idea is to use the identification algorithms to determine a mathematical model [14][15][16][17] and by which the behavior of the systems can be predicted. e gradient descent algorithm is usually used for ARX model identification. It can effectively reduce the computational efforts but with slow convergence rates [18,19]. e gradient descent algorithm includes two steps: the first is to determine a direction, which is called negative gradient, and the second is to determine a suitable step size for the direction. Besides, the least squares algorithm is another widely used method in system identification, which has a faster convergence rate [20][21][22][23][24][25][26]. However, the least squares algorithm has heavy computational efforts and needs to solve a derivative function. erefore, it is inefficient for some models with complex nonlinear structures.
In order to determine the step size of the gradient descent algorithm, the root of a higher-order equation needs to be calculated, which is challenging/impossible. Fortunately, the stochastic gradient (SG) algorithms [27,28] avoid the root calculation by updating the parameters in each sampling time with only one set input-output data. It can be widely used in engineering practice for its simple structure. However, only one set of data is used at each sampling instant; the convergence rate of the SG algorithm is slow. To improve the convergence rate, Ding et al. first proposed a multi-innovation stochastic gradient algorithm and a multiinnovation least squares algorithm for linear regression models [29,30], which have quick convergence rates. e conjugate gradient descent method is another method which has quicker convergence rate when comparing with the gradient descent algorithm, but it is only available for offline identification [31][32][33][34]. Inspired by the conjugate gradient descent algorithm, the focus of this paper is to propose a modified momentum gradient descent algorithm, which has a quicker convergence rate and no root calculation. e remainder of this paper is organized as follows. Section 2 introduces the ARX model and the traditional SG algorithm. e multi-innovation stochastic gradient algorithm is presented in Section 3. In Section 4, a modified momentum gradient descent algorithm is developed. A simulation example is given in Section 5. Finally, the conclusions and future directions are summarized in Section 6.

Stochastic Gradient Descent Algorithm
Consider the following ARX model: where y(t) is the output, u(t) is the input, v(t) is the noise, and A(z) and B(z) are polynomials: A(z) � 1 + a 1 z −1 + a 2 z − 2 + · · · + a n z − n , Take equations (2) and (3) into equation (1), and let a i � −a i , i � 1, 2, . . . , n, the ARX model can be written by Let θ * be the true value, θ be the estimated one: and φ(t) is the information vector: Define the cost function as follows: To obtain the minimum value of J(θ), let the iteration function be When the estimated parameter vector θ converges to the true value θ * , where p is the negative gradient direction of Jand a is the step size. Substituting equation (6) into equation (4) yields In order to get the minimum value of J(a), use and let (zJ/za) � 0. e steepest descent algorithm can be obtained: Remark 1. When θ t is close to the true value, the calculated step size would be imprecise, which will cause the error to fluctuate. erefore, the steepest descent algorithm is inefficient.
e SG algorithm proposed in the following can deal with this problem: Remark 2. e step size will be reduced with the increase in time. When θ t is close to the true value, the smaller step size reduces the fluctuation dramatically.

Two-Innovation Stochastic Gradient (TI-SG) Descent Algorithm
Because of the slow convergence rate of the SG algorithm, Ding proposed a multi-innovation stochastic gradient (MI-SG) algorithm in [6]. As a special case of the MI-SG algorithm, when two sets of input-output data are performed in each iteration, we term it as two-innovation stochastic gradient (TI-SG) algorithm. For the ARX model, two sets input-output data are collected in each iteration as follows: 2 Complexity Establishing the following two functions J 1 (θ) and J 2 (θ), we get We can calculate the negative gradient directions p 1 and p 2 , respectively: e cost function is established as follows: Let the iteration function be Update the parameters θ � F(θ), then the cost function is ere are two ways to calculate the step size a: (1) e two-innovation stochastic gradient descent algorithm has the same step size as that in the SG algorithm. Let the initial value of the step size be 0. e TI-SG algorithm can be designed as (2) e other method is to calculate the optimal step size, which is called modified two-innovation stochastic gradient (MT-SG) descent algorithm.
Let (zJ/za) equal 0, then e MT-SG algorithm can be designed as Complexity Remark 3.
e traditional two-innovation algorithm and the modified two-innovation algorithm use two gradients and assume that the two gradient directions have the same step size. Although the computational effort is reduced, it is not optimal. Because each gradient direction plays a different role in estimating the parameters, it is necessary to consider assigning different weights to each gradient.
Remark 4. Compared with the traditional two-innovation method, the modified two-innovation method calculates the optimal step size in each sampling instant. erefore, the modified two-innovation algorithm has a faster convergence rate but with heavier computational efforts.

Modified Momentum Gradient Descent Algorithm (MMG)
Before introducing the modified momentum gradient descent algorithm, we first introduce the conjugate gradient descent algorithm. Assume that we have collected L input-output data. e collected information vectors and outputs are Φ(L) and Y(L), respectively, Set up the cost function as follows: To calculate the minimum value of J(θ), simply make (zJ/zθ) � 0: zJ When the order of Φ(L) is greater than n + m, it is easy to know that A is a symmetric positive definite matrix, and A ∈ R (m+n)×(m+n) and b ∈ R m+n .
Using the conjugate gradient descent method to solve higher-order matrix equations, let r � b − Aθ, where r is the current negative gradient direction. Reconcile the previous iteration direction p ′ with the current negative gradient direction r as the new iteration direction p, which is p � r + βp ′ . Making p ′ and p conjugate about A, that is Let the iteration function be where a is the step size and p is the iteration direction, then Calculating the minimum value of J(a) and letting Let θ satisfy e conjugate gradient descent algorithm can be designed as Remark 5. Here r is the negative gradient direction of the current position and p ′ is the direction of the last iteration. e current iteration direction p is obtained based on r and p ′ . Compared with the traditional gradient descent method, this method has a faster convergence rate but with heavier computational efforts.
Inspired by the conjugate gradient descent method, the modified momentum gradient descent algorithm is proposed. Its basic idea is to use two gradient directions in each iteration/sampling instant and then to assign different step sizes for each direction.
When using the TI-SG algorithm method, a set of repeated data during the neighbouring two sampling instants will be involved, which causes the step size unsolvable. To overcome this difficulty, a new method is developed. For the ARX model, collect two sets of information vectors and two outputs in each iteration as φ(2t), y(2t), φ(2t + 1), and y(2t + 1): Establish two cost functions J 1 (θ) and J 2 (θ) as follows: Using J 1 (θ), J 2 (θ) to calculate the negative gradient directions p 1 , p 2 yields Let the iteration function be en, the cost functions are Let (zJ 1 /za 1 ), (zJ 1 /za 2 ), (zJ 2 /za 1 ), and (zJ 2 /za 2 ) all be equal to 0, then e MMG algorithm is listed as follows: e MMG algorithm constitutes the following steps: (Algorithm 1) Remark 6. In each iteration, the MMG algorithm uses two directions and assigns the optimal step size for each direction.

Example
Consider the following ARX model:    Figure 2 and Tables 1-4.
Select 100 new data based on the true model, and use the estimated models by the SG, TI-SG, TI-SG and MMG algorithms to generate the predicted outputs, respectively. e errors between the true outputs and the predicted outputs are shown in Figure 4.
Finally, a Monte Carlo experiment is performed by using the MMG algorithm (100 sets noises), and the results are shown in Figure 5. e following conclusions can be obtained: (1) It can be seen from Figures 2 and 3 and Tables 2 and 3 that the MT-SG algorithm has a significantly faster speed than the original TI-SG algorithm (2) From Figures 2 and 3 and Tables 1-4, we can see that the MMG algorithm has the fastest convergence rates among the four algorithms (3) Figure 4 demonstrates that the estimated model by using the MMG algorithm is the most accurate one among these four estimated models (4) Figure 5 shows that the MMG algorithm is robust to the noises

Conclusions
is paper proposes an improved gradient descent algorithm for ARX models based on the conjugate gradient descent method. Since two gradient directions and the two corresponding step sizes are involved in each iteration, the proposed algorithm has a quicker convergence rate. e simulation example shows the effectiveness of the proposed algorithm. is algorithm can increase the convergence rate and does not require root calculation. erefore, it can combine other identification techniques [43][44][45][46] to study the parameter estimation issues of linear and nonlinear stochastic systems with colored noises [47][48][49][50] and can be extended to other literatures [51][52][53][54], such as signal modeling, parameter identification information processing, and engineering application systems [55][56][57].
Although the MMG algorithm is hoped to be a powerful tool for parameter identification, its convergence property is an open and challenging problem.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.