Model-Free Composite Control of Flexible Manipulators Based on Adaptive Dynamic Programming

This paper studies the problems of tip position regulation and vibration suppression of flexible manipulators without using the model. Because of the two-timescale characteristics of flexible manipulators, applying the existing model-free control methods may lead to ill-conditioned numerical problems. In this paper, the dynamics of a flexible manipulator is decomposed into two subsystems which are linear and controllable at different timescales by singular perturbation (SP) theory and a model-free composite controller is designed to alleviate the ill-conditioned numerical problems. To do this, a model-free composite control strategy is constructed which facilitates in designing the controller in slow and fast timescales. In the slow timescale, the slow subsystem controller is designed by adaptive dynamic programming (ADP) based on the measurements of the slow inputs and the position, while the vibration in the slow timescale is estimated by the least square method. In the fast timescale, the vibration is reconstructed based on the measurements of vibration and its estimate in the slow timescale, by which the fast controller is designed using ADP. Stability of the closed-loop system is proved by SP theory. Finally, simulations are given to show the feasibility and effectiveness of the proposed methods.


Introduction
Flexible manipulators are widely applied in aero crafts, construction industries, and other areas because of their many advantages such as fast motion, higher payload-torobot weight ratio, lower manufacturing consumption, and larger workspace [1,2].Taking the physical forces caused by actuation and inertial effect into consideration, the motion of flexible manipulators includes macro rigid-body rotation and micro flexible vibration, which are strongly coupled with each other [3].Flexible manipulators are known as nonlinear, infinite-order, and uncertain systems [4].Thus, it is a challenging problem to improve the positioning accuracy and avoid vibration caused by flexibility simultaneously.
Based on the dynamic model of flexible manipulators, researchers have made many studies on the topic of flexible manipulator control.On the one hand, some effective control strategies are investigated based on the coupling system model, such as the traditional PID control [5,6], optimal control [7,8], sliding mode control [9,10], H ∞ control [11], robust control [12][13][14], boundary control [15], and neural network control [16,17].On the other hand, taking the two-timescale characteristics into account, the SP approach is successfully introduced into the modeling and control of the complex flexible manipulator systems [18,19].In [20], a composite controller based on a computed torque control and linear-quadratic control was proposed to suppress the joint and link vibration satisfactorily and achieve a perfect trajectory tracking performance.In [21], an adaptive boundary control scheme using hyperbolic functions was developed to suppress the vibration and regulate the tip position.In [22], the dual sliding-mode scheme was employed to track a desired trajectory and stabilize the link vibration.Using output redefinition, a two-performance enhanced controller based on PD control and neural networks was designed for flexible manipulators in [23].It can be seen that the controller design for the subsystems is more efficient and can achieve a higher performance using various effective controllers based on SP theory.
Though many results have been achieved about the flexible manipulator control, most of the control strategies are based on the dynamic model.However, flexible manipulators are usually subject to uncertainties.So studying the control of flexible manipulators using the measurements of the inputs and states is a hotpot.In [24,25], neural networks are designed for system uncertainty approximation.In [26], a nonlinear partial differential equation observer was proposed to estimate the positions and the velocities of a flexible pendulum.And the sliding-mode scheme was designed for the vibration suppression based on SP theory.It can be seen that these studies just talk about the controller design in the case that the dynamics are partially unknown, but model-free composite controller design has not been discussed.In [27], a fuzzy logic controller by the SP approach for a single flexible arm was proposed.The slow subsystem fuzzy controller realized the trajectory tracking, and the fast subsystem hybrid fuzzy controller was designed to damp out the vibration caused by the elasticity of the system structure.But it is not easy to tune the fuzzy controller parameters to achieve the optimal performance.In [20,28,29], the optimal control schemes were used to realize the vibration suppression based on the subsystem models.Experimental results showed that these methods had a good performance, but they required accurate system parameters.Thus, it is of great significance to study the model-free optimal control of flexible manipulators.
In recent years, using ADP theory to solve optimal control problems for unknown systems has received much attention [30][31][32].ADP uses a function approximation structure to obtain the approximate optimal control strategy.Thus, the optimal control problem of linear or nonlinear systems can be effectively solved [33].By employing the ADP theory, the optimal controller can be designed by solving the algebraic Riccati equation based on the measurements of the inputs and states of the system.This learning process greatly simplifies the design of the controller [34].Based on SP theory [35,36], the flexible manipulator dynamics can be decomposed into slow and fast subsystems, which are linear and controllable.Inspired of the two-timescale characteristics of flexible manipulators, we will apply the ADP theory to solve the optimal control problem of flexible manipulators without using the system model.
In this paper, a model-free composite controller of flexible manipulators is proposed based on ADP.By employing this method, the dual control targets of position regulation and vibration suppression are achieved.First, the dynamics of a flexible manipulator is decomposed into two subsystems at different timescales by SP theory.Then, a slow subsystem optimal controller is designed by the inputs and the position in the slow timescale using ADP.At the same time, the vibration in the slow timescale is estimated by the least square (LS) method, which lays the foundation of the fast subsystem controller design.A fast subsystem optimal controller is designed by the fast states in the fast timescale using ADP.The contributions of this paper include the following points.(1) This paper proposes a novel controller for flexible manipulators based on ADP without using the model.And simulation results show that the design leads to a better control performance.(2) It is proved that the close-loop system is stable under the model-free composite controller by SP theory.
(3) The proposed composite control structure based on dual ADP lays the theory foundation for model-free control of general two-timescale systems.
In Section 2, the dynamic model is established using Lagrange and assumed-mode methods and is decomposed into two subsystems by SP theory.And the problem under consideration is formulated.In Section 3, a model-free composite controller is designed by ADP.In Section 4, the numerical simulations are performed to verify the effectiveness of the proposed methods.Section 5 concludes the paper.

Problem Description
Figure 1 gives a mechanical structure diagram of the single manipulator system.As shown in Figure 1, X 0 OY 0 and X 1 OY 1 represent the inertia axis and the local rotating reference axis, respectively.u is the control input, m is the beam mass, M is the payload mass, and L is the beam length.The variable θ represents the rotating angle, and ω represents the actual vibration which can be measured by sensor S.
The flexible manipulators' dynamic model is established by using Lagrange and assumed-mode methods [18] as follows: where M is the positive definite inertia matrix.G θ and G q are the nonlinear terms.K is the stiffness matrix.u is the vector of the joint torque.θ is the vector of the rotating angle.q is the generalized coordinate vector of modes used to describe the actual vibration measured by sensor S. When the system model is unknown, it is a challenging problem to realize the position regulation and vibration suppression.Define Then, (1) can be written as where k is the minimum eigenvalue of the stiffness K.Then, ( 3) and ( 4) can be rewritten as Based on SP theory, (1) can be decomposed into two subsystems at different timescales.Since ε is small enough [18], by letting ε = 0 in ( 5) and ( 6), the state z in the slow timescale can be obtained as Substituting ( 7) into ( 5), the slow dynamics can be written as where the superscript "s" means the slow dynamic.u s is the control torque of the slow subsystem (slow controller).θ is the approximation of θ in the slow timescale.Considering the two-timescale characteristics of flexible manipulators [35], we define z f = z − z s and τ = t/ ε.Setting ε = 0 yields Thus, the slow variables are regarded as constants in the fast timescale.Taking ( 6), ( 7), (8), and (9) into account, the fast dynamics can be written as where u f is the control torque of the fast subsystem (fast controller).Figure 2 gives the block diagram of the classical composite control.Combining the slow and fast controllers together, the full control of flexible manipulators can be achieved by the following composite controller: Based on the Tikhonov theorem [37], the relationship between the subsystems ( 8) and ( 10) and the full-order system (1) is as follows: where O ε stands for the infinitesimal of the higher order of ε.From ( 13), the flexible mode trajectory includes z s and z f .z s relies on u s in the slow timescale, and z f relies on u f as well as on the flexible mode trajectory in the fast timescale.When the parameters of the dynamics are known, z s can be obtained by (7).The fast state z f can be reconstructed for the fast controller design.Most of the existing controller design methods for flexible manipulators are based on fully known or partially unknown dynamics [24][25][26].When the dynamic model is unknown, the abovementioned methods are invalid.This paper will consider the model-free composite control problems for flexible manipulators.

ADP-Based Model-Free Composite Control
In this section, ADP is adopted to design a model-free composite controller for flexible manipulators.In the framework of ADP, the controller is designed by using the measurements of the inputs and states where the rotating angle and actual vibration can be measured in engineering.However, the states of the slow subsystem (8) and the fast subsystem (10) cannot be measured directly.From ( 7), ( 12), (13), the position information θ can be used to design an ADP-based slow controller.The vibration z s in the slow timescale should be estimated to reconstruct the fast state z f , which can be used to design the ADP-based fast controller.The flow chart is shown in Figure 3.

ADP-Based Slow Controller Design.
As shown in (8), the slow subsystem represents the rigid body motion of the flexible manipulator system.From (12), θ can be approximated by the state variable θ which is easy to be measured.
Define the trajectory tracking error as where θ d is the desired joint angle of an end-effector.New variables are defined as x s 1 = e c and x s 2 = e c .Then, the slow subsystem (8) can be rewritten as 3 Complexity where Choose the performance index [38] as follows: where Implement the algorithm mentioned in Section 3.3 with x = x s , u = u s , and Κ s 0 being a stabilizing feedback gain matrix for (15).Then, the slow controller can be obtained as 3.2.Estimation of the Vibration in the Slow Timescale.
According to (13), the vibration caused by flexibility includes z s and z f .To design the fast controller using ADP, z s must be estimated first.In the slow timescale, the LS method is a good way to estimate z s .As shown in (7), the approximate structure of the mathematical model between z s and u s is as follows: where a and b are the parameters to be estimated.
Considering the existence of a random error, ( 19) can be rewritten as where v i represents the random error and z s i is the measured data.According to the LS method, the weighted function J is defined as To minimize the weighted function, the method of finding the extremum was used to get Furthermore, the estimated values of a and b are derived as Therefore, we have the estimate of z s as follows: = z f and x f 2 = dz f /dτ, the state space equation of the fast subsystem described in (10) can be expressed as where By combining ( 13) and ( 24), z f , that is x f , can be obtained as follows: In the fast timescale, choose the performance index [38] as follows: where Implement the algorithm mentioned in Section 3.3 with x = x f , u = u f , and Κ f 0 being a stabilizing feedback gain matrix for (25).Then, the fast controller can be obtained as The ADP algorithm [34] was used to solve optimal control problems for uncertain systems, which is shown as follows: (1) Design an initial controller on the time interval t 0 , t l , in which l is a positive integer: where where ⊗ represents the Kronecker product.
(2) Solve P k and K k+1 from (33), where P k is the real symmetric positive definite solution of the Riccati equation during the convergence process and K k is the real feedback gain. where where α is a small threshold; then, return to step 2.
(4) By letting K * = K k , the approximated optimal control law can be solved as 3.4.Composite Controller Design.As described in (11), the composite controller of the SP system can be achieved as where and K f 0 be any stabilizing feedback gain matrix, such that (15) and ( 25) are asymptotically stable.Then, the obtained composite controller (37) stabilizes the whole system.Proof 1.Since A s , Q s 1/2 and A f , Q f 1/2 are observable and K s 0 and K f 0 are stabilizing feedback gain matrices, the obtained K s * and K f * make the subsystems ( 15) and ( 25) asymptotically stable [34,39].Then, according to the SP theory [37], the system (1) is asymptotically stable under the obtained composite controller.
Figure 4 shows the model-free composite control algorithm flowchart of flexible manipulators based on ADP.

Simulation and Analysis
To verify the effectiveness of the method proposed in this paper, simulation results of flexible manipulators made of aluminum alloy are given.The parameters of a flexible manipulator are shown in Table 1.
In the framework of ADP, a model-free composite controller for flexible manipulators by using the measurements of the inputs and states is designed by the proposed method which does not rely on the system parameters.According to the SP theory, the nonlinear system can be decomposed into two subsystems describing the rid and the flexible motion of flexible manipulators, respectively.For the slow subsystem, θ is equal to θ approximately as mentioned in (12); then, it can be directly applied for designing the optimal controller using ADP introduced in Section 3.3, where the initial stabilizing feedback gain is chosen as K s 0 = 3 5 and the weighted matrices are set as Q s = diag 1, 0 1 and R s = I.After finite iteration, the final optimal feedback gain matrix is obtained as follows: By solving directly the algebraic Riccati equation, where A = A s , B = B s , Q = Q s , and R = R s , the optimal solution is It can be seen that K s * is equal to K sd approximately.And the slow controller can be obtained by using ADP with the inputs and the states of the system.Figure 5 gives the convergence of K s k to the optimal value K sd .It is noticed that the feedback gain converges to the optimal values after four iterations.
and compute  x s x s , I x s x s, and I x s u s Let u f = −K 0 x f + K f and t ∈ [t 0 , t l ], f and compute  x f x f , I x f x f, and  For the fast subsystem controller design, by the LS method, the estimated value z s can be obtained as Thus, according to (27), the fast state variables z f are obtained, which can be applied for designing the fast subsystem controller based on the algorithm introduced in Section 3.3.We choose K f 0 = −10 12 1 1 as the initial feedback gain, and the weighted matrices are set as Q f = diag 1, 0 1, 1, 0 1 and R f = I.The final optimal feedback gain matrices are obtained as By solving directly the algebraic Riccati equation, where It can be seen that K f * is equal to K f d approximately.Figure 6 gives the convergence of K f k to the optimal values K f d .As shown in Figure 6, after five iterations, K f k converges to the optimal values.Figure 7 gives the control inputs under the ADP-based composite controller.
In order to verify the performance of the model-free composite controller designed in this paper, the comparison experimental results between the ADP-based composite controller and the fuzzy logic composite controller designed in [27] are given.
Figure 8 shows the trajectory of the flexible manipulator from 0 to 1 rad.As shown in Figure 6, the system achieves the steady state after 5 seconds under the ADP-based composite controller.But the fuzzy logic composite controller takes about 12 seconds to achieve the steady state.The flexible manipulator under the composite controller designed based on ADP can reach the ideal position quickly and accurately.

Complexity
The performances of the first two modes of the flexible manipulator are shown in Figures 9 and 10, respectively, which show that the controller designed in this paper has a better vibration suppression effect than the fuzzy logic composite controller.

Conclusion
This paper has proposed a novel composite controller of flexible manipulators with completely unknown dynamics.By SP theory, the dynamics can be decoupled into two linear and controllable subsystems.In the slow timescale, the vibration is estimated by the LS method, while the slow subsystem controller is designed by ADP based on the measurements of the information of input and slow states.In the fast timescale, the fast states are reconstructed based on the vibration and its estimate in the slow timescale.Then, the fast subsystem controller is designed by ADP.Finally, a model-free composite controller based on ADP is designed to realize the goals of tip position regulation and vibration suppression.Compared with the existing methods, the proposed composite controller design approach is model-free and can guarantee the stability of the closed-loop system, and the dual-ADP structure gives an example for the model-free control design of general two-timescale systems.

Figure 1 :
Figure 1: The diagram of the single flexible manipulator system.

Figure 4 :
Figure 4: Model-free composite control algorithm flowchart of flexible manipulators based on ADP.

Table 1 :
Parameters of the flexible manipulator.

Figure 5 :Figure 6 :
Figure 5: Convergence of K sk to its optimal value K sd during the learning process.

Figure 7 :
Figure 7: Control input under the ADP-based composite controller.

Figure 8 :
Figure 8: Position trajectory under ADP-based and fuzzy logic composite controller.

Figure 9 :
Figure 9: The first mode of the flexible manipulator under the ADP-based and fuzzy logic composite controller.

Figure 10 :
Figure 10: The second mode of the flexible manipulator under the ADP-based and fuzzy logic composite controller.
m×n is any stabilizing feedback gain matrix and κ is the exploration noise.Compute δ xx , I xx , and I xu until (31) is satisfied.
rank I xx , I xu = n n + 1 2 + mn 31 In (31), δ xx , I xx , I xu , and μ are the matrices used to collect state and input information in the learning process.The matrices δ xx , I xx , and I xu are defined as follows: