Multiple Model Adaptive Tracking Control Based on Adaptive Dynamic Programming

Adaptive dynamic programming (ADP) has been tested as an effective method for optimal control of nonlinear system. However, as the structure of ADP requires control input to satisfy the initial admissible control condition, the control performance may be deteriorated due to abrupt parameter change or system failure. In this paper, we introduce the multiple models idea into ADP, multiple subcontrollers run in parallel to supply multiple initial conditions for different environments, and a switching index is set up to decide the appropriate initial conditions for current system. By taking this strategy, the proposed multiple model ADP achieves optimal control for system with jumping parameters. The convergence of multiple model adaptive control based on ADP is proved and the simulation shows that the proposed method can improve the transient response of system effectively.


Introduction
In recent years, multiple model adaptive control (MMAC) has been a research focus on improving the transient response of nonlinear system.In practical control process, system dynamics may change abruptly due to system failure or parameter change.Traditional adaptive control methods can not deal with this kind of change, resulting in bad transient response or even system unstability.According to multiple model adaptive control theory, multiple models will be established to cover system uncertainty; corresponding multiple controllers will also be constructed [1].Based on the switching mechanism, at every moment, the controller corresponding to the model which is the closest to current system will be selected as the current controller.Thus, the transient response and the control property will be greatly improved.
From 1990s, multiple model adaptive control based on index switching function has obtained satisfying results for linear system, linear time-variant system with jumping parameters, and stochastic system with stochastic disturbance.However, for nonlinear system, there is still no identical research method or satisfying process result.Among the main MMAC researches for nonlinear system, multiple model adaptive control based on neural networks has attached more and more attention [2][3][4].Because the neural network shows outstanding performance in approximating nonlinear system, it can turn the system uncertainty into the uncertainty of weights and structure of neural networks.Thus, multiple adaptive control for nonlinear system can be designed based on the change of weights and structure of neural networks.
In recent years, neural networks (NNs) and fuzzy logic are widely used to handle the control problem of nonlinear systems owing to their fast adaptability and excellent approximation ability.For system without complete model information or system regarded as "black-box," neural networks show great advantage.For uncertain nonlinear discretetime system with dead-zone input, [5] introduces NNs to approximate the unknown functions in the transformed systems, so that the tracking error converges with the dead zone handled by an adaptive compensative term.Fuzzy logic systems are used to approximate the unknown functions to achieve control for discrete-time system with backlash [6,7] or input constraint [8].
Combining dynamic programming, neural networks, and reinforcement learning [9], adaptive dynamic programming (ADP) solved the problem of "curse of dimensionality" in traditional dynamic programming and provides a practical control scheme for optimal control of nonlinear system.ADP adopts two neural networks, one critic neural network to approximate the cost function and one actor neural network to approximate the control strategy, so that the optimal principle can be satisfied [10,11].In 2002, Murray proposed the iterative ADP algorithm for continuous-time system firstly.Iterative ADP can update the policy equation and value function by iteration of policy and value [12,13].However, iterative ADP can only be used to calculate offline due to its long-time calculation caused by uncertain iteration times.In recent years, online ADP strategies are proposed widely [14][15][16][17].They can obtain the optimal solution in an adaptive means rather than by offline calculation.
Paper [18] proposed a ADP tracking strategy which does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty.However, as in most existing online ADP methods, the controller needs the initial control to satisfy the admissible condition for corresponding system [15,19,20].Thus, once system endures abrupt changes of parameters and control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP controller can not make the state track the desired trajectory any more.In this paper, we introduce MMAC into ADP; multiple models are established to cover uncertainty of system; correspondingly, multiple subcontrollers are constructed and run in parallel.A switching index function is introduced to decide the most accurate model to describe current system.Once there is a model switching, corresponding controller will be selected to provide its current state and control signal as the initial condition of system.Based on this idea, we design multiple fixed models if the submodels are precisely known.And, for imprecise estimation models, multiple fixed models and one adaptive model are combined to obtain an improved transient response.
This paper is organized as follows.System with jumping parameters is described in Section 2.Then, a transformed ADP tracking control scheme is introduced and proved convergent in Section 3. In Section 4 the main structure of MMAC based on ADP is described and two kinds of MMAC strategies are introduced for precisely known submodels and imprecise models.Simulation experiments are shown in Section 5 and Section 6 concludes this paper.

Problem Description
Consider the following nonlinear discrete-time system with jumping parameters: where () ∈  The objective of the tracking problem is to design an optimal controller with constrained control signal so that the output state can track the following desired trajectory in an optimal way: ( As shown in [21], (2) can generate large class of trajectories satisfying the requirement of most applications, including unit step, sinusoidal waveforms, and damped sinusoids.
According to Bellman optimal principle and the firstorder necessary condition, theoretical optimal control law can be calculated as where and theoretical HJB equation is derived as where (  ,   ) = ()() + (()).
In the following part of this section, an online actor-critic structure is introduced to solve the optimal tracking problem, the critic neural network (NN) is designed to approximate the value function, and the actor NN is designed to approximate the optimal control signal.
(1) Critic NN.A two-layer NN is utilized as the critic NN to approximate the value function where   ∈    ×2 and   ∈  1×  are constant target weights of the hidden layer and output layer, respectively,   (⋅) ∈  2×1 is the activation function and   ∈  is bounded approximation error, and   is the number of neurons in hidden layer.  ,   (⋅), and gradient of   (⋅) are assumed to be bounded as ‖  ‖ ≤   , ‖  (⋅)‖ ≤   , and ‖  (())/()‖ ≤    , respectively.The actual output of the critic NN is given as where Ŵ and V are the estimations of   and   .Then, the approximate HJB function error can be derived as follows: The goal of critic NN is to minimize the following function: Using the gradient-descent method, the update law of the critic NN is given as In this paper, we select the activation function of critic NN as   (⋅) = tanh(⋅), so we have where (2) Actor NN.To obtain the optimal control input, a two-layer NN is utilized as the actor NN to approximate  * (): where   ∈    ×2 and   ∈  ×  are constant target weights of the hidden layer and output layer, respectively,   (⋅) ∈  2 is corresponding activation function,   ∈  is the bounded approximation error, and   is the number of neurons in hidden layer.  (⋅) and   are assumed to be bounded as ‖  (⋅)‖ ≤   and ‖  ‖ ≤   .
The actual output of the actor NN is given as where Ŵ and V are the estimations of   and   , respectively.Using (17), the actual approximation target is The goal of the actor NN is to minimize the following function: where the actor NN approximation error is defined as Using the gradient-descent method, the update law of the actor NN is given as Ŵ  () . (28) In this paper, activation function of actor NN is selected as   (⋅) = tanh(⋅).Define    () = ∑ 2 =1 V  ()  (); we have Finally, optimal control signal is obtained as follows: Remark 3. To obtain the optimal control policy, the actor NN is designed to approximate  * () so that the control signal can be strictly restricted in given constraints by using the (⋅) function as in (30), while, in some cases, the actor NN approximates  * () directly, resulting in control signal out of constraints due to unsuitable weights in the initial period.
Theorem 4. For nonlinear discrete-time system given by (7), let the weight tuning laws of the critic NN and actor NN be given by ( 20) and ( 28), respectively, and let the initial weight of the actor NN reflect the initial admissible control of system (7).There exist positive constants   and   such that system state and estimation errors of two networks are all uniformly ultimately bounded (UUB).
Proof of Theorem 4 is shown in the Appendix.
In contrast with traditional ADP tracking strategies, the above method does not require the knowledge of the system drift dynamics.By this means, it supplies some adaptability that for systems with different drift dynamics this method can still make system state track the desired trajectory.However, for different systems, initial admissible control conditions must be required.

Multiple Model Control Scheme Based on ADP
In this section, firstly, we propose the multiple model ADP for system with accurately known submodels.Secondly, an adaptive ADP main controller is introduced so that the new multiple model ADP can deal with the problem of estimated submodel.

Multiple Model ADP with Accurately Known Submodels.
In this section, we consider the case that known submodels can reflect system dynamics at every working point precisely as follows: where  ∈ {1, 2, . . ., }.
According to the idea of multiple model adaptive control, it is natural to design independent multiple subcontrollers to track the target trajectory in parallel and use a switch index function to decide the best controller to control current system.The main structure of multiple model ADP controller for accurate known submodels is shown in Figure 1.
For every submodel   , according to Theorem 4, if initial weights of the actor NN   (1) reflecting initial admissible control are given and the weights of two NNs are tuned online according to ( 20) and (28), respectively, with appropriate learning rates, then, output states can track the desired trajectory in the optimal manner.Thus, multiple subcontrollers can be constructed as follows: where   and   are the learning rates of the two NNs for model   .
For every moment, the following index function is calculated to show the matching degree between current system and every model: At every moment, the most accurate model to describe current system  () is selected, and, at the switching point, () =   () and the controller  () is selected to control the system.

Multiple Model ADP with Estimated Submodels.
In Section 4.1, we discussed multiple model ADP for accurately known submodels.Because the submodels are precisely known, subcontroller can control the system directly if corresponding submodel matches current system.However, practically, submodels are estimated and somehow unprecise.In this case, using control scheme in Section 4.1 can not obtain satisfying control result as this scheme lacks adaptivity.
For system with jumping parameters as (1), system can be viewed as different dynamic characters described by different fixed parameters.ADP tracking controller discussed in Section 3 does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty.However, the controller needs the initial control to satisfy the admissible condition for corresponding system.Thus, once system endures abrupt changes of parameters and the control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP controller can not make the state track the desired trajectory accurately.
The main idea of multiple model adaptive ADP to deal with estimated submodels is designing a main controller to improve its adaptivity and using multiple submodels and subcontrollers to guarantee the initial admissible control condition after system changes.The structure of multiple models ADP is shown in Figure 3.
The main procedure of multiple model adaptive ADP tracking control is as follows: (1) Multiple models are established to cover the uncertainty of system.
(2) Multiple ADP subcontrollers are set up according to multiple models.
(3) Multiple independent subcontrollers run in parallel to track the same referenced trajectory.
(4) At every moment, a switching index function is calculated to decide the closest model corresponding to current system.
(5) Once there is model switch showed by switching index function, state and control of corresponding subcontroller are selected as the initial condition of the main ADP controller for the new stage.

Design of Main Controller.
For system with jumping parameters as (1), we adopt the ADP controller in Section 3 as the main controller.However, different initial parameters including the initial admissible control and initial state should be given according to the control and state of the most accurate model as shown in Figure 3.

Establishment of Multiple Models and Subcontrollers.
According to different working conditions, multiple estimation models are constructed to cover system uncertainty.
And it has to be ensured that, for every working condition, there must be at least one model which is close enough to corresponding plant.Multiple models are set up as where  ∈ {1, 2, . . ., }.
Theoretically, for given system with jumping parameters, control performance will be improved with more submodels.However, too many submodels will also increase the calculation and may cause frequent model switching.Suitable number of submodels depends on experience and simulation.
According to Theorem 4, for model   , we can find corresponding initial weight of the actor NN Ŵ  (1) reflecting the initial admissible control û (1), and appropriate learning rates   and   , so that output state of model   can track the desired trajectory   if the weights of two NNs are tuned according to (20) and (28), respectively.Thus, multiple ADP subcontrollers corresponding to multiple models can be designed as follows: where û ( 1) is initial admissible control, X (1) is the initial states,   and   are selected learning rates of the critic and actor NNs, respectively, for the th model.Figure 2 shows the structure of subcontrollers.The role of ADP subcontrollers is to supply initial parameters IP  (), including initial state X () and initial weights of the actor NN Ŵ () and V () which reflect the initial control û (), for main ADP controller: As the parameter change and model switch can happen at any moment, multiple independent subcontrollers run in parallel at all times.

Choice of Switching Mechanism.
At every moment, the switching function determines which model is closest to system and which group of initial parameters should be given to the main controller.To avoid incorrect switch caused by performance of single point, we employ the following accumulation of model error as the index function: where 0 <  < 1 is the forgetting factor, model estimation error of the th model is defined as   () = X  () − (), and model state for comparing X  () is defined as follows: At every moment, the best model to describe current system  () will be selected, and the state and actor NN's weights IP () () of the subcontroller  () will be selected as the initial parameters of the main controller for the system of new stage.is the most accurate normally.However, in few cases such as when there is disturbance, (33) will index the wrong best model.Moreover, in most cases, precise jumping parameters are hard to be obtained.Therefore, we prefer the latter multiple model scheme given in Section 4.2 with main ADP controller as the final optimal control strategy.
Remark 7.For multiple model adaptive ADP described in Section 4.2, convergence can hardly be proved if there are infinite model switches.In order to facilitate the convergence and stability, the following assumption is made.Assume that model switch starts at time step  0 , we set up a period Δ and a tracking error limit  0 to improve transient response.
If  >  0 + Δ or () <  0 , the switching between multiple submodels is stopped and the main ADP controller keeps working.Thus, the optimal tracking control problem can be viewed the same as in Section 3 and convergence can be guaranteed by Theorem 4.

Simulation
In this section, an experiment is constructed to show the effectiveness of the proposed method.Consider the following nonlinear system: where  = [ 1  2 ]  is the state vector with initial state (0) = [1 0.5]  and control input  ∈  is bounded by () ≤ 0.4.Compared with system (1), drift dynamics and the input dynamics can be denoted as where jumping parameters  satisfy The control objective is to force the system state to track the following target trajectory in an optimal manner:  1 ( + 1) = 0.6 +  −0.2 ( 1 () − 0.6) where   (0) = [0 0.6(1 −  −0.2 )].
In this example, the cost function as ( 9) is designed with  = 0.2,  = 1 and  0 = 5 2 , where  2 is a twodimensional unit diagonal matrix.Two known estimated submodels consist of the same input dynamics as in (42) and the following drift dynamics: where  = 1, 2 and b1 = 1 and b2 = 2.
The proposed ADP control algorithm is applied to construct subcontrollers and the main controller.The two-layer critic and actor NNs are designed with five neurons in the hidden layers and the activation functions both adopt the tanh(⋅) function.For the subcontrollers, initial weights of the critic NN are set as random values in [−1, 1]; initial weights of the critic NN are selected to reflect the initial admissible control for corresponding submodels.
(1) Single Model ADP.For nonlinear system with jumping parameters as (42), only the first submodel and subcontroller are adopted.From Figure 4, we can see that the system state can track the desired trajectory perfectly at the first stage.However, after parameter change at the 50th time step, adopted submodel does not match system, and the controller can not track the desired trajectory any more.In Figure 4, output state after step 55 is omitted as the state diverges far away from desired trajectory.
(2) Multiple Model ADP.The proposed multiple model ADP strategy is adopted to control the system with jumping parameters.Forgetting factor  in the switching index function (39) is set to be 0.4.
Figure 5 shows that the model switching process can perfectly match the system change.For example, after parameter changes from 1.1 to 1.95 at the 50th time step, models are switched from model  1 with b1 = 1 to model  2 with b2 = 2.
The control result by using the proposed multiple model ADP method is showed in Figures 6 and 7. States deviate from the desired trajectory when there are parameter changes.However, as the most precise model is selected and corresponding initial parameters are given to the main controller, system states can track the desired trajectory again after some transient process.Figure 8 shows that the control input is bounded in [−0.4 0.4].

Conclusion
This paper proposes a ADP based multiple model adaptive control scheme for nonlinear system with jumping parameters.System uncertainty is covered by multiple submodels; corresponding multiple subcontrollers are constructed and run in parallel.A switch mechanism is introduced to decide the most precise model and corresponding initial parameters, so that the initial admissible condition can be satisfied at the whole time.The proposed method realizes the optimal  tracking control for nonlinear system with jumping parameters and improves the transient response and control quality greatly.As the model switch occurs after some period of model error accumulation, transient response may not be satisfying.In the future, the main work focuses on designing a scheme to improve the control quality further by introducing an upper limit of state or adopting multiple set-points.

Appendix
Proof of Theorem 4. For simplification of proof process, we consider the weights of hidden layer of two NNs that keep fixed after some tuning time.Define weight estimation error of the action and critic network as W () = Ŵ () −   and W () = Ŵ () −   , respectively.For simplification, denote   () =   ( V ())  First, considering Δ 1 , substituting ( 7) and ( 13), and then applying the Cauchy-Schwarz inequality [24] obtain  and the learning rates of the two networks are selected as   < 1/7 and   < 4/9 for nonlinear systems with optimal closed loop bounds described as 0 <  * < 1/2.Therefore, according to the Lyapunov extensions [25], the system states and the weight estimation error of the critic and actor NNs are UUB.

Figure 1 :
Figure 1: Structure of multiple model ADP with accurate submodels.

Figure 2 :
Figure 2: Structure of multiple model ADP with estimated submodels.