Adaptive Critic Learning-Based Robust Control of Systems with Uncertain Dynamics

Model uncertainties are usually unavoidable in the control systems, which are caused by imperfect system modeling, disturbances, and nonsmooth dynamics. This paper presents a novel method to address the robust control problem for uncertain systems. The original robust control problem of the uncertain system is first transformed into an optimal control of nominal system via selecting the appropriate cost function. Then, we develop an adaptive critic leaning algorithm to learn online the optimal control solution, where only the critic neural network (NN) is used, and the actor NN widely used in the existing methods is removed. Finally, the feasibility analysis of the control algorithm is given in the paper. Simulation results are given to show the availability of the presented control method.


Introduction
e basis of intelligent optimization decision-using adaptive dynamic programming (ADP) method is the optimal control design. ere are many mature methods for optimal regulation control design of linear systems in the field of control theory and control engineering. However, for general nonlinear systems, Hamilton-Jacobi-Bellman (HJB) equation is yielded. e analytical solution of HJB equation is not easy since it is inherently a partial differential equation. Recently, the optimal control design of systems has attracted extensive attention. Among them, the successive approximation methods [1][2][3] overcome this difficulty via finding the approximate solution of HJB equation, which is closely related to the ADP method. ADP is a new method based on the idea of intelligent learning, which can provide effective optimal control solution for complex dynamic systems [4,5]. In the past two decades, ADP has been widely used in solving adaptive optimal control problems of discrete-time and continuous-time systems [6,7]. Now, data-driven control design has become a research hotspot in the field of control theory and control engineering [8,9]. e ADP methods can promote the research of databased decision-making and optimal control and is conducive to the development of artificial intelligence and computational intelligence technology.
Most of the existing results of ADP methods are obtained without considering the uncertainty of the controlled plant. However, the actual control system is always affected by model uncertainty, external disturbance, or other changes. We must consider these factors in the controller design to avoid the deterioration of the performance for the closed-loop system and improve the robustness of the controlled system. For robust control design, several alternative methodologies have been suggested in the control community. e work in [10] exploited the relationship between the robust control and the optimal control of nominal system subject to a specific value function. It indicates that one can design a robust control by solving an equivalent optimal control problem alternatively. Similarly, it was shown in [11] that the robust control design may be accomplished by addressing an H control problem. Nevertheless, online solving the derived optimal control equations was not discussed in [10]. Instead, they adopted offline schemes to seek for the solution of the derived optimal control equations. Recently, robust control design using the adaptive critic learning method has gradually become one of the research hotspots in the field of ADP, and many methods have been proposed [12][13][14]. ese results fully show that the ADP method is suitable for robust control design of complex nonlinear systems in uncertain environment. Since many previous ADP literatures do not focus on the robust performance of the controller, the emergence of robust adaptive critic control greatly expands the application scope of the ADP method. Generally, the controller based on robust ADP can not only stabilize the original uncertain system but also make the system optimal without dynamic uncertainty. us, adaptive critic learning-based robust control includes the discussion of system stability, convergence, optimality, and robustness. It plays an important role in the field of intelligent learning control of complex systems in uncertain environment.
Based on the above facts, we develop an adaptive critic learning algorithm to resolve the robust control problem of uncertain systems. To this end, we construct an equivalence between the robust control problem and the optimal control problem via selecting the appropriate cost function; then, a single critic NN is used to reformulate the cost function. To realize the optimal control solution, we design an adaptive critic leaning algorithm; since it has strong convergence, the actor NN widely used in existing ADP results is removed. en, the feasibility analysis of the control algorithm is also given in the paper. Simulations are given to indicate the validity of the developed method. e major contributions of this paper include (1). To address the robust control problem, we transform the robust control problem of uncertain systems into an optimal control problem of the nominal system. It provides a new approach to address the robust control problem. (2). Different to [13], the uncertainty in the input matrix is considered in this paper, and then, the proposed control method is used in robotic systems. is helps to apply the proposed control algorithm to the practical industrial robotic systems in the future.
(3). A novel designed adaptation algorithm driven by the NN weights' errors is used to online learn the critic NN weights. Different to [15], the convergence of the estimated NN weights to the true values can be retained.
is paper is organized as follows. In Section 2, we introduce the robust control problem and transform the robust control problem into an optimal control problem. In Section 3, a single critic NN is used to reformulate the optimal cost function, and then, an adaptive critic learning method is proposed to address the derived optimal control problem. Section 4 gives some simulation results to illustrate the effectiveness of the proposed method. Some conclusions are stated in Section 5.

Preliminaries and Problem Formulation
A continuous-time (CT) uncertain system can be written as where x ∈ R n and u ∈ R m are the system state and the control action, respectively. f(x) ∈ R n with f(0) � 0 and g(x) ∈ R n×m are the nonlinear functions. b(x) and d(x) are the uncertainties. e purpose of this paper is designing a controller to make system (1) asymptotically stable under the uncertainties b(x) and d(x). To this end, we give following assumptions.
To design a robust controller for the linear system, a linear matrix inequality (LMI) is proposed [16], while for nonlinear system (1), it is not easy. Inspired by [10,12], an equivalence is built between the robust control problem of the uncertain system and the optimal control of the nominal system via selecting the appropriate cost function.
us, we define the nominal system of the uncertain system (1) as (2) For system (2), a control action u should be found to minimize the following cost function [17]: where M ∈ R n×n and N ∈ R m×m are the positive definite weight matrices. Hence, based on the optimal principle, we can obtain the Lyapunov function of the cost function (3) as where V x � zV/zx is the derivative of V with respect to x. erefore, we can get the optimal cost function as and its corresponding HJB equation can be given as By solving (6), we have the optimal control action as en, we will give the lemma to explain the robust control problem of system (1) which can be transformed into an optimal control problem of system (2) via constructing cost function (3).
Lemma 1 (see [11,18]). Assume that the solution can be solved via optimal control problem of system (2) with cost function (3) and , and this solution can make uncertain system (1) asymptotically stable, which means that the optimal control solution is the solution of the robust control problem for system (1). (5), then we can consider V * (x) is a Lyapunov function; based on (6) and (7), we have According to the condition given in Lemma 1, i.e., i.e., _ V * (x) < 0, x ≠ 0, and _ V * (0) � 0, and the uncertain system (1) is asymptotically stable for any uncertainties b(x) and d(x). According to the above facts, the optimal solution u * is the robust control solution of the uncertain system (1). is completes the proof.
Remark 1. Lemma 1 shows that the robust control problem of the original uncertain system can be equivalent to the optimal control problem of the nominal system, and then, the solution of the robust control problem can be obtained indirectly by solving the optimal control problem. erefore, this equivalence relationship can be used to develop a new robust control design method and solve it by using ADP method, as described in the following section.

Remark 2.
It is well-known that H ∞ control belongs to robust control. Although many H ∞ control design techniques have been proposed, it should be noted that, as explained in Section 8.5 in [18], the H ∞ control differs in the optimal method proposed in this paper. In the optimal control method, we start from the uncertainty bounds and then design the controller according to these bounds. Hence, if the controller exists, we can say the uncertain system is robustly stable.

Solving the Robust Control Problem via Adaptive Critic Learning
To obtain the optimal control solution (7), the unknown cost function (5) should be resolved. However, it is quite difficult to address the cost function (5) directly; then, a critic NN in this section will be proposed to approximate the cost function (5); this allows to develop an adaptive learning method to update online the NN weights, where the convergence of NN weights can be retained. Because its strong convergence, the actor NN widely used in the ADP schemes is removed. e proposed control system structure is given in Figure 1. is section will propose an adaptive critic learning method to obtain the solution of the derived optimal control problem. To this end, a critic NN is trained to estimate the cost function V * (x), where the cost function V * (x) is considered as continuous; hence, we have the following NN [13], where W ∈ R l is the ideal critic NN weight, σ(x) ∈ R l is the regressor vector, l is the number of neurons, and ε v (x) is the approximate error of NN. en, we have the partial derivative of cost function as where ∇σ(x) � zσ(x)/zx ∈ R l×n is the regressor matrix and Without loss of generality, the following assumption is given in [13].
In fact, the ideal NN weight W is unknown; hence, we have that the practice W can be online updated; then, the actual cost function can be written as Hence, the practical estimated optimal solution can be solved as According to (10) and (11), we have the ideal optimal control (7) as and its practical optimal control is given as e problem next to be solved is solving the unknown weight W, which can guarantee the stability of the controlled system and realize the convergence to the ideal value W. Most existing ADP methods can only get the uniform ultimate boundedness (UUB) of the approximated NN weight rather Computational Intelligence and Neuroscience 3 than the convergence. In this paper, a novel adaptive critic learning method is introduced to guarantee the convergence of W to W. is strong convergence property is conducive to avoiding the use of actor NN, and the optimal control approximated via critic NN can converge to its ideal optimal solution. Substituting (11) into (4), we can rewrite the HJB equation as where ε HJB � (∇ε v (x)) T [f(x) + g(x)u] is the residual error determined by the approximation error ∇ε v (x).
For developing an adaptive critic learning law to estimate the critic NN weight W, the known terms in (16) can be defined as en, the HJB equation (16) with (17) can be given as According to (18), only the NN weight W is unknown in this parameterized formulation. Hence, it can be estimated by using a recently proposed learning algorithm [19,20], which is driven by the derived estimation error.
To this end, the filtered regressor matrices P ∈ R l×l and Q ∈ R l can be denoted as [19,20] where ℓ > 0 is a positive parameter. Hence, its solution can be derived as which can be online calculated based on the system state x.
With P and Q in (20), an auxiliary vector M 1 ∈ R l can be defined as From (18) and (20), we have Q � − PW + v with v � − t 0 e − ℓ(t− s) ε HJB (s)Ξ(s)ds. A bounded variable, e.g., ‖v‖ ≤ ε v , for a positive constant, ε v > 0. en, we can obtain from (19)-(21) that with W � W − W being the NN weight estimation error. e estimation error used in the adaptive learning algorithm can help to guarantee the convergence of the estimate, as shown in [13]. Hence, we can design the following adaptive law to online calculate W as with Γ > 0 being the adaptive learning gain. (23) is driven by the estimation error W. e purpose of this new learning algorithm is to guarantee the convergence of estimate W to unknown weight W. erefore, the learning algorithm given in this paper is different from those used in the existing ADP methods, e.g., [3,21], which employ the gradient-based methods [22] to guarantee the boundedness of W only.

Remark 3. e adaptive law
To illustrate the convergence of the proposed learning algorithm, the positive definiteness of the matrix P defined in (20) will be introduced: (20) is positive. e convergence of the proposed learning algorithm can be summarized as follows.

Lemma 2. When the regressor Ξ in (18) fulfills the persistent excitation (PE) condition, the matrix P defined in
Theorem 1. For the adopted critic NN with adaptive law (23), if the regressor vector Ξ in (18) satisfies the PE condition, the critic NN weight error W exponentially converges to a small bounded set around zero.
Proof. For Lemma 2, the matrix P is positive definite when the regressor Ξ satisfies the PE condition, i.e., the minimum eigenvalue λ min (P) > δ > 0. Hence, a Lyapunov function can be chosen as V 1 � 1/2W T Γ − 1 W, and its derivative _ V 1 along with (23) can be derived as which further implies us, we have that the estimation errors of NN weight W will converge to a compact set Ω: W| � � � �W ≤ ε v /δ , in which the size of this set depends on the approximation error ε v and the excitation level δ, i.e., for an arbitrarily small NN approximation error (according to the NN approximation property, this error can be arbitrarily small for sufficient NN nodes, i.e., ∇ε v (x) ⟶ 0 with l ⟶ ∞). erefore, W can converge to W. In the ideal case, i.e., ε HJB � 0 and v � 0, then we know the estimation errors of weights W converge to zero exponentially.

□
For system (2) with practical optimal control (15) and adaptive law (23), if the regressor Ξ satisfies the PE condition, the error W converges to a small set around zero. Moreover, the actual optimal control u in (15) converges to a region around its optimal solution u * in (14), i.e., ‖u − u * ‖ ≤ ε u . Hence, the original robust control problem is resolved.

Numerical Simulation. Consider an uncertain system as
where , , then we can obtain the optimal control problem as with the cost function as As given in [18], the optimal cost function is written as en, we can obtain its optimal solution as A critic NN will be used to approximate the cost function V; thus, the activation function σ(x) is defined as To realize the simulation, we set the learning parameters Γ � 100 and ℓ � 10, the initial system state is given as x 0 � 1, − 0.2 T , the initial weight W(0) � 0, and the weight matrices are set as M � I and N � I. Figure 2 shows the estimated value of the evaluation NN weights. It can be seen from Figure 2 that the estimated NN weight converges to a certain value. is result verifies the convergence of eorem 1 and the effectiveness of the proposed learning algorithm, which indicates that estimated critic NN weight W converges to its true value, i.e., W � 1, 0, 2 . To better display the performance of the proposed learning method, the error between the ideal cost function V * and practical coat function V is given in Figure 3, where we can obtain the fairly satisfactory approximation performance. In fact, the simulation results in Figures 2 and 3 can be also found in [13]; different from [13], this paper considers the uncertainties involved in control input. Figure 4 shows the change of the state of the controlled system under the derived optimal control, which shows that the closed-loop system is asymptotically stable. e corresponding control input is shown in Figure 5, bounded and smooth.

Application to Robotic Systems.
is section will develop a simulation based on a 2-DOF robot [18,23]. To realize the simulation, the robotic systems model can be defined as where q is the joint variables, τ is the generalized forces, M(q) denotes the inertia matrix, C(q, _ q) is the centripetal vector, F(q) is the friction vector, and G(q) defines the gravity vector.
is result verifies the convergence of eorem 1 and the effectiveness of the proposed learning algorithm. Figure 7 shows the change of the controlled system state under the derived optimal control when the load condition is set m L � 3 kg, which shows that the closed-loop system is asymptotically stable. e corresponding control input is shown in Figure 8. Although it jitters at first, it tends to be smooth when it stabilizes.
From above simulation results, we have that the proposed learning method and control technique are effective.

Conclusion
e purpose of this paper is to address the robust control problem of the uncertain systems via developing an adaptive critic learning method. To this end, the robust control problem of the uncertain systems is transformed into an optimal control problem of the nominal systems via selecting the cost function.
en, a single NN is used to reformulate the cost function, where the unknown cost function can be represented as a known term; then, an adaptive critic learning method based on the adaptive parameter estimation technique is presented to obtain the optimal cost function such that the optimal control problem can be solved. Simulations are given to show the effectiveness of the proposed leaning algorithm and control method. Future work will focus on the robust tracking control with uncertain systems.

Data Availability
Data were curated by the authors and are available upon request.  Computational Intelligence and Neuroscience 7