Robust Adaptive Control via Neural Linearization and Compensation

We propose a new type of neural adaptive control via dynamic neural networks. For a class of unknown nonlinear systems, a neural identifier-based feedback linearization controller is first used. Dead-zone and projection techniques are applied to assure the stability of neural identification. Then four types of compensator are addressed. The stability of closed-loop system is also proven.


Introduction
Feedback control of the nonlinear systems is a big challenge for engineer, especially when we have no complete model information.A reasonable solution is to identify the nonlinear, then a adaptive feedback controller can be designed based on the identifier.Neural network technique seems to be a very effective tool to identify complex nonlinear systems when we have no complete model information or, even, consider controlled plants as "black box".
Neuroidentifier could be classified as static (feed forward) or as dynamic (recurrent) ones [1].Most of publications in nonlinear system identification use static networks, for example multilayer perceptrons, which are implemented for the approximation of nonlinear function in the rightside hand of dynamic model equations [2].The main drawback of these networks is that the weight updating utilize information on the local data structures (local optima) and the function approximation is sensitive to the training dates [3].Dynamic neural networks can successfully overcome this disadvantage as well as present adequate behavior in presence of unmodeled dynamics because their structure incorporate feedback [4][5][6].
Neurocontrol seems to be a very useful tool for unknown systems, because it is model-free control, that is, this controller does not depend on the plant.Many kinds of neurocontrol were proposed in recent years, for example, supervised neuro control [7] is able to clone the human actions.The neural network inputs correspond to sensory information perceived by the human, and the outputs correspond to the human control actions.Direct inverse control [1] uses an inverse model of the plant cascaded with the plant, so the composed system results in an identity map between the desired response and the plant one, but the absence of feedback dismisses its robustness; internal model neurocontrol [8] that used forward and inverse model is within the feedback loop.Adaptive neurocontrol has two kinds of structure: indirect and direct adaptive control.Direct neuroadaptive may realize the neurocontrol by neural network directly [1].The indirect method is the combination of the neural network identifier and adaptive control, the controller is derived from the on-line identification [5].
In this paper we extend our previous results in [9,10].In [9], the neurocontrol was derived by gradient principal, so the neural control is local optimal.No any restriction is needed, because the controller did not include the inverse of the weights.In [10], we assume the inverse of the weights exists, so the learning law was normal.The main contributions of this paper are (1) a special weights updating law is proposed to assure the existence of neurocontrol.(2) Four different robust compensators are proposed.By means of a Lyapunov-like analysis, we derive stability conditions for

Neuroidentifier
The controlled nonlinear plant is given as where f (x t ) is unknown vector function.In order to realize indirect neural control, a parallel neural identifier is used as in [9,10] (in [5] the series-parallel structure is used): where x t ∈ n is the state of the neural network, W 1,t , W 2,t ∈ n×n are the weight matrices, A ∈ n×n is a stable matrix.The vector functions σ(•) ∈ n , φ(•) ∈ n×n is a diagonal matrix.Function γ(•) is selected as γ(u t ) 2 ≤ u., for example γ(•) may be linear saturation function, The elements of the weight matrices are selected as monotone increasing functions, a typical presentation is sigmoid function: where a i , b i , c i > 0. In order to avoid φ( x t ) = 0, we select ( Remark 1.The dynamic neural network (2) has been discussed by many authors, for example [4,5,9,10].It can be seen that Hopfield model is the special case of this networks with A = diag{a i }, a i := −1/R i C i , R i > 0 and C i > 0. R i and C i are the resistance and capacitance at the ith node of the network, respectively.
Let us define identification error as Generally, dynamic neural network (2) cannot follow the nonlinear system (1) exactly.The nonlinear system may be written as where W 0 1 and W 0 2 are initial matrices of W 1,t and W 2,t W 1 and W 2 are prior known matrices, vector function f t can be regarded as modelling error and disturbances.Because σ(•) and φ(•) are chosen as sigmoid functions, clearly they satisfy the following Lipschitz property: where σ = σ( x t ) − σ(x t ), φ = φ( x t ) − φ(x t ), Λ 1 , Λ 2 , D σ , and D φ are known positive constants matrices.The error dynamic is obtained from ( 2) and ( 7): where As in [4,5,9, 10], we assume modeling error is bounded.
(A1) the unmodeled dynamic f satisfies Λ f is a known positive constants matrix.
If we define and the matrices A and Q 0 are selected to fulfill the following conditions: (1) the pair (A, R 1/2 ) is controllable, the pair (Q 1/2 , A) is observable, (2) local frequency condition [9] satisfies frequency condition: then the following assumption can be established.
(A2) There exist a stable matrix A and a strictly positive definite matrix Q 0 such that the matrix Riccati equation: has a positive solution P = P T > 0. This condition is easily fulfilled if we select A as stable diagonal matrix.Next Theorem states the learning procedure of neuroidentifier.
Theorem 2. Subject to assumptions A1 and A2 being satisfied, if the weights W 1,t and W 2,t are updated as where K 1 , K 2 > 0, P is the solution of Riccati equation ( 14), Pr i [ω] (i = 1, 2) are projection functions which are defined as where the "condition" is then the weight matrices and identification error remain bounded, that is, for any T > 0 the identification error fulfills the following tracking performance: where κ is the condition number of Q 0 defined as Proof.Select a Lyapunov function as where P ∈ n×n is positive definite matrix.According to (10), the derivative is Since Δ T t PW * 1 σ t is scalar, using (9) and matrix inequality where X, Y , Λ ∈ n×k are any matrices, Λ is any positive definite matrix, we obtain In view of the matrix inequality ( 22) and (A1), So we have η, using the updating law as ( 15) we can conclude that V t is bounded.Integrating (27) from 0 up to T yields Because κ ≥ 1, we have where From (I) and (II), V t is bounded, (18) is realized.From (20) and   The dead-zone s t is applied to overcome the robust problem caused by unmodeled dynamic f t .In presence of disturbance or unmodeled dynamics, adaptive procedures may easily go unstable.The lack of robustness of parameters identification was demonstrated in [11] and became a hot issue in 1980s.Dead-zone method is one of simple and effective tool.The second technique is projection approach which may guarantee that the parameters remain within a constrained region and do not alter the properties of the adaptive law established without projection [12].The projection approach proposed in this paper is explained in Figure 1.We hope to force W 2,t inside the ball of center W 0 2 and radius r.If W 2,t < r, we use the normal gradient algorithm.When W 2,t − W 0 2 is on the ball, and the vector W 2,t points either inside or along the ball, that is, < 0,W 2,t are directed toward the inside or the ball, that is, W 2,t will never leave the ball.Since r < W 0 2 ,W 2,t / = 0.
Remark 4. Figure 1 and (7) show that the initial conditions of the weights influence identification accuracy.In order to find good initial weights, we design an offline method.From above theorem, we know the weights will convergence to a zone.We use any initial weights, W 0 1 and W 0 2 , after T 0 , the identification error should become smaller, that is, W 1,T0 and W 2,T0 are better than W 0 1 and W 0 2 .We use following steps to find the initial weights.
(1) Start from any initial value for (2) Do identification until training time arrives T 0 .
Remark 5. Since the updating rate is K i P (i = 1, 2), and K i can be selected as any positive matrix, the learning process of the dynamic neural network ( 15) is free of the solution of Riccati equation ( 14).
Remark 6.Let us notice that the upper bound (19) turns out to be "sharp", that is, in the case of not having any uncertainties (exactly matching case: f = 0) we obtain η = 0 and, hence, lim sup from which, for this special situation, the asymptotic stability property ( Δ t → t → ∞ 0) follows.In general, only the asymptotic stability "in average" is guaranteed, because the dead-zone parameter η can be never set zero.

Robust Adaptive Controller Based on Neuro Identifier
From (7) we know that the nonlinear system (1) may be modeled as Equation ( 33) can be rewritten as where If updated law of W The object of adaptive control is to force the nonlinear system (1) following a optimal trajectory x * t ∈ r which is assumed to be smooth enough.This trajectory is regarded as a solution of a nonlinear reference model: with a fixed initial condition.If the trajectory has points of discontinuity in some fixed moments, we can use any approximating trajectory which is smooth.In the case of regulation problem ϕ(x * t , t) = 0, x * (0) = c, c is constant.Let us define the sate trajectory error as From ( 34) and (36) we have Let us select the control action γ(u t ) as linear form where U 1,t ∈ n is direct control part and U 2,t ∈ n is a compensation of unmodeled dynamic d t .As ϕ(x * t , t), x * t , W 1,t σ( x t ) and W 2,t φ( x t ) are available, we can select U 1,t as Because φ( x t ) in ( 5) is different from zero, and W 2,t / = 0 by the projection approach in Theorem 2. Substitute (39) and ( 40) into (38), we have So the error equation is Four robust algorithms may be applied to compensate d t .
(A) Exactly Compensation.From ( 7) and ( 2) we have So, the ODE which describes the state trajectory error is Because A is stable, Δ * t is globally asymptotically stable.
(B) An Approximate Method.If ẋt is not available, an approximate method may be used as where δ t > 0, is the differential approximation error.Let us select the compensator as Define Lyapunov-like function as The time derivative of ( 49) is 2Δ T t P 2 δ t can be estimated as where Λ is any positive define matrix.So (50) becomes where Q is any positive define matrix.Because A is stable, there exit Λ and Q 2 such that the matrix Riccati equation: has positive solution P 2 = P T 2 > 0. Defining the following seminorms: where Q 2 = Q 2 > 0 is the given weighting matrix, the state trajectory tracking can be formulated as the following optimization problem: Note that lim based on the dynamic neural network (2), the control law (47) can make the trajectory tracking error satisfies the following property: A suitable selection of Λ and Q 2 can make the Riccati equation (53) has positive solution and make Δ * t 2 Q2 small enough if τ is small enough.
(C) Sliding Mode Compensation.If ẋt is not available, the sliding mode technique may be applied.Let us define Lyapunov-like function as where P 3 is a solution of the Lyapunov equation: Using (41) whose time derivative is According to sliding mode technique, we may select u 2,t as where k is positive constant, (62) Substitute ( 59) and ( 61) into (60) where We reformulate (71) as Then, integrating each term from 0 to τ, dividing each term by τ, and taking the limit, for τ → ∞ of these integrals' supreme, we obtain lim In the view of definitions of the seminorms (55), we have To minimizing Ψ(U d 2,t ), we assume that, at the given t (positive), x * (t) and x(t) are already realized and do not depend on U d 2,t .We name the U d * 2,t (t) as the locally optimal control, because it is calculated based only on "local" information.The solution of this optimization problem is given by min It is typical quadratic programming problem.Without restriction U * is selected according to the linear squares optimal control law: that is inserted in the closed-loop system, chattering occurs in the control input which may excite unmodeled highfrequency dynamics.To eliminate chattering, the boundary layer compensator can be used, it offers a continuous approximation to the discontinuous sliding mode control law inside the boundary layer and guarantees the output tracking error within any neighborhood of the origin [13].Finally, we give following design steps for the robust neurocontrollers proposed in this paper.
(1) According to the dimension of the plant (1), design a neural networks identifier (2) which has the same dimension as the plant.In (2), A can be selected a stable matrix.A will influence the dynamic response of the neural network.The bigger eigenvalues of A will make the neural network slower.The initial conditions for W 1,t and W 2,t are obtained as in Remark 4.
(2) Do online identification.The learning algorithm is (15) with the dead zone in Theorem 2. We assume we know the upper bound of modeling error, we can give a value for η.Q 0 is chosen such that Riccati equation ( 14) has positive defined solution, R can be selected as any positive defined matrix because Λ −1 1 is arbitrary positive defined matrix.The updating rate in the learning algorithm ( 15) is K 1 P, and K 1 can be selected as any positive defined matrix, so the learning process  The results of local optimal compensation are shown in Figures 6 and 7.
We may find that the neurocontrol is robust and effective when the robot is changed.

Conclusion
By means of Lyapunov analysis, we establish bounds for both the identifier and adaptive controller.The main contributions of our paper is that we give four different compensation methods and prove the stability of the neural controllers.

) Remark 7 .
Approaches (A) and (C) are exactly compensations of d t , Approach (A) needs the information of ẋt .Because Approach (C) uses the sliding mode control U c 2,t