Adaptive Inverse Optimal Control for Rehabilitation Robot Systems Using Actor-Critic Algorithm

The higher goal of rehabilitation robot is to aid a person to achieve a desired functional task (e.g., tracking trajectory) based on assisted-as-needed principle. To this goal, a new adaptive inverse optimal hybrid control (AHC) combining inverse optimal control and actor-critic learning is proposed. Specifically, an uncertain nonlinear rehabilitation robotmodel is firstly developed that includes human motor behavior dynamics. Then, based on this model, an open-loop error system is formed; thereafter, an inverse optimal control input is designed to minimize the cost functional and a NN-based actor-critic feedforward signal is responsible for the nonlinear dynamic part contaminated by uncertainties. Finally, the AHC controller is proven (through a Lyapunov-based stability analysis) to yield a global uniformly ultimately bounded stability result, and the resulting cost functional is meaningful. Simulation and experiment on rehabilitation robot demonstrate the effectiveness of the proposed control scheme.


Introduction
Rehabilitation robots are becoming increasingly common in upper extremity rehabilitation [1], because they are not only able to provide intensive rehabilitation consistently for a longer duration, but also able to offer object assessment for patient's motor performance.Relevant studies show they contribute significantly to motor recovery [2][3][4].However, most of the existing rehabilitation robots (RRs) so far have still suffered from a number of challenges.Remarkably, an autonomous rehabilitation robot has yet to be developed due to the lack of an advanced and intelligent controller.To achieve this requirement, various control strategies have been proposed, such as impedance control [5], admittance control [6], and PID control [7].One common aspect of the majority of these RRs control algorithms applied to robot-aided therapies is to allow a variable deviation from a predefined trajectory rather than imposing a rigid training pattern [8].Relevant studies demonstrate these RRs controllers are effective to some extent [9,10].However, their control parameters in essence are predefined by human.From the practical viewpoint, the fixed and predefined control parameters might often lead to either an overdetermined or an underdetermined control performance.Furthermore, the predefined control pattern does not suit each person.Thus, how to provide controllable, suitable assistance specific to a particular patient is a worth studying problem.
One possible solution to this problem is adaptive functional training.The basic idea of the adaptive functional training is to allow a modification of controller parameters or task difficulty levels according to the patient's motor performance, rather than imposing a rigid predefined training pattern.Typical examples include adaptive admittance control [11], adaptive impedance control [12,13], adaptive task-oriented therapy [13][14][15], and adaptive mixed reality rehabilitation training [16,17], as well as adaptive slide model control [18].Relevant works show these approaches can adjust their controller parameters or task difficulty levels according to the patients' motor performance or diagnosis and can provide a better control performance.However, their resulting impedance parameters (e.g., mass, damping, and stiffness) are still obtained in a heuristically way and cannot be easily extended to other applications.Recently, the combinations of neural network-(NN-) based intelligent control and adaptive functional training have been proposed to RRs, such as NNbased adaptive sliding mode control [19], neural-fuzzy robust adaptive control [20], and neural-fuzzy control [21].Relevant studies show these NN-based intelligent RRs controllers can provide a better performance than the above-mentioned traditional adaptive control.Unfortunately, their results are limited to uniformly ultimately bounded (UUB) stability because of the inevitable NN reconstruction error [22].Crucially, most of these existing intelligent RRs controllers are in essence adaptive but not optimal for rehabilitation training.However, in practical application, the optimal RRs controller can benefit the patient's rehabilitation training more than the traditional adaptive RRs controllers due to the trade-off between control torques and tracking accuracy [23].
Reinforcement learning (RL) has the potential to overcome the above-mentioned problems [24][25][26][27][28]. Actor-critic-(AC-) based controller has, for example, been used for controlling a three-finger gripper [29]; also, this method can be used to find suitable compliance for different contact tasks [30].Recently, the AC-based controller design problems have been extensively investigated for functional electrical stimulation (FES) [31] and myoelectric devices [32].Thomas et al. [31] showed the feasibility of using the AC for FES control of upper extremities.Similarly, Pilarski et al. [32] proposed an AC-based reinforcement learning method for optimizing the control of myoelectric devices.Despite such encouraging results, the AC-based control approach so far found relatively limited application in real-time robotic rehabilitation system, because rehabilitation actions are often continuous in robotic rehabilitation process, whereas most of the existing AC-based controllers concern discrete actions.Unfortunately, the discrete AC controllers cannot ensure effectiveness in continuous-time systems.Moreover, during rehabilitation training process, since the exogenous reference inputs for many online applications are event-driven and not known a priori, it is often impossible to monitor online whether a signal is persistence of excitation (PE) and whereas PE condition is vital for AC learning [33].In this case, the AC-based controller is not able to ensure convergence in a finite time.Recently, a continuous adaptive critic control [34] solves this problem, where a NN-based critic is used to approximate a long-term cost function and a NN-based actor approximates the optimal control.Simulation results demonstrate the feasibility of the proposed AC-based approach.However, a limitation of the method is that it requires the knowledge of the Hamilton-Jacobi-Bellman (HJB) equation.In practice, solving the HJB equation is quite a difficult and time-consuming task in model-free online way [35].
To overcome the above-mentioned difficulties and to develop an AC-based controller for the highly nonlinear RRs, a new adaptive inverse optimal hybrid control (AHC) combining inverse optimal control and AC learning is proposed in this paper, where the inverse optimal controller guarantees the RRs' optimality by minimizing a meaningful cost function; thus, the effort of solving the HJB equation is totally avoided; the AC-based learning method helps the robot select the meaningful cost function despite being uncertain in RRs.In addition, understanding the human motor behavior is essential to assisted-as-needed rehabilitation training.Therefore, an uncertain nonlinear rehabilitation robot model is firstly developed that includes human motor behavior dynamics.Then, based on this model, an open-loop error system is formed; thereafter, an inverse optimal feedback control input is designed to be responsible for nonlinear static part and minimize the cost functional, and a NN-based AC feedforward control is responsible for the nonlinear dynamic part contaminated by uncertainties.Finally, Lyapunov-based stability analysis proved the stability of the proposed AHC and determined a respective meaningful cost functional.Simulation and experiments on a 2-degree-of-freedom rehabilitation robot indicate the feasibility and effectiveness of the proposed AHC.
Based on the previous discussion, we highlight the contributions of this paper as follows.
(1) An uncertain nonlinear rehabilitation robot model is developed that includes individual motor behavior dynamics such that the rehabilitation robot can better capture enough information on the real individual (human) motor behavior.
(2) A data-driven NN is employed to cope with the problem of unknown rehabilitation robotic dynamics such that neither the rehabilitation robot structure nor the physical parameters are required in the control design to be developed.
(3) AHC is adopted such that the robot can achieve some function task (e.g., tracking trajectory) in an optimal way.
The rest of the paper is organized as follows.In Section 2, a specific rehabilitation robot system under study is described and control objective is discussed.In Section 3, the details of the proposed AHC control are presented, followed by the rigorous analysis.In Section 4, the validity of the proposed method is verified by simulation and experimental studies.Concluding remarks are given in Section 5.

Problem Statement
2.1.System Description.In this paper, we investigate a typical robot-assisted system, which includes a human limb and a rehabilitation robot with an end-effector, as shown in Figure 1.Without loss of generality, similar to [36], the following assumption is given.
Assumption 1.The handle (C) of the rehabilitation robot is tightly grasped by the patient and there is no relative motion between the patient's hand and the handle (i.e., end-effector).Based on this assumption, the patient's hand is deemed as "a part" of the rehabilitation robot.
The rehabilitation robot in task space can be modeled as where , ẋ , ẍ ∈ R  are the position, velocity, and acceleration of the end-effector, respectively.  ∈ R × is the positive inertia matrix of the robot at the end-effector,   ∈ R × denotes the centripetal and Coriolis force at the end-effector,   ∈ R  is stiffness matrices, and  ∈ R  is unknown bounded disturbance.  ∈ R  is the user's force. ∈ R  is the input control vector.In general, in many cases,   in ( 1) is often only considered as a measurable load.However, the patient's motion behavior is typically a timevarying trajectory, which cannot be represented by only force states; thus, this pure load-based control is limited by its poor robustness in practice [37].
To solve this problem, it is necessary to integrate the patient's motor behavior dynamics into system (1).The user's motor dynamics is defined as [38] where   ,   , and   are the mass, damper, and spring matrices of the arm end-effector, respectively, and   is the user hand force,  *  is the planned force in the patient's CNS, and  1 is the unknown bounded disturbance (e.g., muscle spasticity).Note that, by using (2), the rehabilitation robot is able to capture the dynamic characteristics of motion behavior.
Substituting (2) into (1), the total rehabilitation robotic dynamics can thus be rewritten as where , , and  denote the combined patient-robot positive inertia matrix, centripetal and Coriolis force, and stiffness matrices at the handle, respectively.And they satisfy the following equation: Remark 2. An uncertain nonlinear rehabilitation robot model ( 3) is developed that includes human motor behavior dynamics (2).The objective of the models developed is not to replicate human behavior but only to capture enough information to compensate the drawbacks inherent to the most prior rehabilitation robotic dynamics [39,40].By recognizing the patient's motor behavior, RRs can adjust its robotic assistance according to the patient's performance on the task, based on assisted-as-needed principle [39].
Without loss of generality, the following properties are given.
Property 1.The inertia matrix () is symmetric, positive definite and satisfies the following inequality: where  is a known positive constant, () is a known positive function, and ‖ ⋅ ‖ denotes the standard Euclidean norm.
Property 2. The following skew-symmetric relationship is satisfied: Property 3. The nonlinear disturbance term and its first and second time derivatives are bounded by known constants.
Property 4. The desired trajectory   () is assumed to be designed such that ẋ  (), ẍ  () exist and are bounded.
Property 5 (see [41]).A continuous nonlinear function () can be approximated by NN as follows: where  ∈ R represents weight vector and (⋅) ∈ R denotes the NN activation function with () being approximation error satisfying ‖()‖ ≤   , where   is the approximation error bound.The NN activation function and the weight vector are upper bounded by a positive constant such that ‖()‖ ≤   and ‖‖ ≤   .Note that Property 5 is easily obtained, because any  1 -function can be expressed as (7).
Remark 3. The above-mentioned properties are the fundamental in this paper, and their validity will be shown later.

Control Object.
The object of rehabilitation robot is to enable a patient to achieve some function task (e.g., trajectory tracking) based on assisted-as-needed principle.Towards this goal, the current control objective is to find a control input which enables a patient to track a desired trajectory, denoted by   (), despite uncertainties in the dynamic model, while also minimizing a given performance index that includes a penalty on the tracking error and the control effort.To quantify the objective, a position tracking error  1 () ∈  is defined as Further, to facilitate the subsequent closed-loop error system development and stability analysis, a filtered tracking error is defined as where  denotes positive constants, which behaves like the control gain in PID controller.

Adaptive Inverse Optimal Hybrid Control Design
In this section, a new AC-based adaptive inverse optimal hybrid control (i.e., AHC) for rehabilitation robot system was given.First, a state-space model is developed based on the tracking errors in ( 8) and ( 9).Then, based on this model, direct feedforward terms (i.e., identifier NN and an action NN) with an inverse optimal PD feedback controller are given, and, subsequently, a critic NN is proposed; finally, Lyapunov-based stability and optimality analysis are performed to show that the AHC can yield an asymptotic optimal tracking.

Actor NN-Based Control.
Using the errors in ( 8) and tracking errors in (9) and the system dynamics ( 3), an openloop error for system (1) can be obtained: To facilitate the subsequent control design, ( 10) is rewritten as where some auxiliary terms are defined as follows: Note that  1 and  2 depend on the position and desired trajectory and contain the same system parameter (seen from target impedance); instead,  3 is the error function.Motivation for this segregation is to facilitate the subsequent control design and stability analysis.Nevertheless, since all the terms contain unknown parameters, the controller  given in (10) cannot be implemented.But if the proposed control input can identify and/or cancel these effects, then the optimal control law can be realized.To identify and cancel these effects, two typical NN-based approximators for both  1 and  2 are defined: where   ∈ R (+1)× and   ∈ R (+1)× are the ideal weight matrices, (⋅) denotes the activation function of NN,   (),   () are bounded reconstruction errors, and  is the input vector to NN, which is defined as The estimates of  1 and  2 are designed as where Ŵ ∈ R (+1)× , Ŵ ∈ R (+1)× are the estimates of the ideal weights.Based on the open-loop error system in (10), the control input is defined as where  01 is the control law to be developed, which is later shown to minimize a meaningful cost, and the estimates for the NN weights in ( 16) are generated online by the following adaptive update law: where Υ  ∈ R (+1)×(+1) and Υ  ∈ R (+1)×(+1) are constant, positive definite and symmetric gain matrices, , Ŵ  , V  , and  are the input and weight estimates of the critic NN and reinforcement signal, respectively.Note that the weight update law in ( 17)-( 18) consists of a gradient-based term and a reinforcement signal term ; this makes the subsequent critic NN affect the behavior of the action and/or identifier NN.
As described earlier, AC learning's convergence cannot be ensured in a finite time due to the lack of PE.That is, weight parameter convergence is not guaranteed.Thus, to overcome this difficulty and ensure weight parameter convergence in ( 17)-( 18), a virtual control input with the known noise (i.e., PE) [42,43] is injected to our control design (16), and the virtual control input is defined: where  V ∈ R denotes a positive constant adjustable control gain. is any given nonzero measurable signal which is exactly known a priori and bounded by ‖‖ ≤   .Thus, the control input with a PE input can be defined as To facilitate the analysis, an auxiliary function is given: Furthermore, substituting (20) and ( 21) into (11), the closed-loop error for system (1) can hence be obtained: where the weight mismatch errors are defined as For simplicity, ( 22) is expressed as where the auxiliary terms   and   are defined as According to Properties 1 and 4 and using the mean value theorem, the following upper bound can be developed: where  is defined as Since   and   contain the same parameters, to facilitate the subsequent stability, the term  in (24) can be defined as where  1 =  1 +  3 and  2 =  2 +  4 are known constants.
Based on (10) and the subsequent stability analysis, an inverse optimal PD feedback control input is designed as Remark 4. Ŵ  () and Ŵ  () in ( 16) belong to the actor NN in practice.The goal of the actor NN is to generate appropriate control signals and approximate the uncertainties in the system; instead,  01 in ( 29) is used to stabilize the system.In contrast to typical controllers, for example, NN + PD control [44] and AC control [45], the proposed AHC can not only ensure system stability but also improve control performance in flexible gains.That is, combining the inverse optimal policy with the AC direct optimal algorithm, the rehabilitation robot can successfully achieve the desired trajectory tracking and minimize a cost functional, despite being uncertain and nonlinear in the dynamic system.

Critic NN.
In this section, a critic NN is employed to approximate the long-term cost function.Similar to [46,47], the reinforcement signal () ∈ R is assumed to be defined as where  = V  , Ŵ ∈ R (+1)×1 , and V ∈ R × are the estimates of the ideal weight matrices, () ∈ R +1 denotes the activation function,  is the input to the critic NN, and  is an adaptive term (auxiliary critic signal) with known noise (the noise is used to ensure PE condition), to aid the subsequent stability; its time derivative is written as where  and  are as defined before.Based on the subsequent stability analysis, the weight update law for the critic NN can be easily obtained: where Υ  and Υ V ∈ R are positive and known constants.
After taking the time derivative of (30) To facilitate the subsequent stability analysis, the expression of ( 33) is rewritten as Some auxiliary terms are given: Using Properties 1-5, and the mean value theorem, the following upper bound can be developed: Remark 5.The critic network monitors the state of the rehabilitation robot system and affects the behavior of the actor (i.e., action NN and identifier NN).In addition, since the term  −1  in ( 29) can change the time derivative of  in (34), thus, increasing the term  −1 can speed up convergence of the critic.It is concluded that our AHC has faster convergence than the AC in [45].Furthermore, provided the critic NN converges correctly and  can be adjusted arbitrarily by the actor NN, the proposed AHC will provide global (universal) stable and optimal control [48].

Stability Analysis.
The stability of the proposed AHC given in ( 20) and ( 29) can be examined through Theorem 10.
First, some assumptions are given.Theorem 10.For the rehabilitation robot system (1)-( 3), let the proposed identification scheme in (15) along with the weight update law in (17) be used to identify the human motor behavior dynamics in (2), and let the action-critic controller given in (15) and (30) along with the weight update laws for the action and critic NN given in (18) and (32) and the inverse optimal control  01 in (29), respectively, ensure that all system signals are bounded under closed-loop operation and that the position tracking error is regulated in the sense that where introduced in (29) and  in (9) are selected according to the following sufficient condition: V , Ψ, ,  ∈ R are chosen to satisfy the following sufficient condition: Proof.Consider the following positive definite Lyapunov function defined as where  1 ,  2 , and  are defined as, respectively, Using Property 1 and Assumptions 6-8, () can be bounded: where  1 ,  2 ,  3 ∈ R are known positive constants and () ∈ R is defined as Taking the time derivative of (42) yields Based on the previous analysis, using ( 17)-( 18), (32), and (29), the expression of ( 46) can be bounded: Using ( 24)-( 25) and ( 8)-( 9) and canceling similar terms, the expression in (47) can hence be bounded as where the auxiliary term   is defined: Using inequality (36) and Young's inequality, the auxiliary function   can be rewritten as where  =  2 1 ,  =  2 1 .Furthermore, using ( 25)-( 26), ( 8)-( 9), (19), and (38) and canceling similar terms, the expression in (48) can hence be bounded as Based on the above-mentioned analysis and using ( 26)-( 28), the expression in (51) can be rewritten as Provided the conditions given in inequality ( 41) are satisfied, the expression of ( 52) is hence simplified as And the following equation is assumed to be satisfied: Thus, the expression of (53) can be rewritten as To facilitate the subsequent analysis, using inequality (44), the expression of (55) is further simplified as where  ∈ R is a positive constant, which is defined as Furthermore, using inequalities ( 44) and (56) and assuming that  1 =  2 is satisfied, thus, the above-mentioned inequality (56) can be rewritten as where Ω is defined as before.Provided the sufficient conditions given in ( 40)-( 41) are satisfied, the expressions in (42) and ( 58) can be used to prove the control input and all the closed-loop signals are bounded in Ω: According to (56)-( 57) and (59), it is observed that larger value of  1 will expand the size of the domain Ω and reduce the residual error.According to inequality (58), the result in (39) can be easily obtained.

Optimality Analysis.
Motivated by the works in [49], the ability of the controller  01 to minimize a meaningful cost can be examined through the following theorem.

Theorem 11. The feedback law is given by
With the scalar gain constant selected as Η > 2 and the adaptive weight update law given in ( 17)-( 18), (32), minimizes the meaningful cost functional where a positive function M(, ) is defined as where Θ = Ṗ1 + Ṗ2 + Q +  Ṙ .
Proof.The cost function  in ( 61) is considered to be meaningful if it is a positive control function.That is,  in (61) is a positive function if M > 0. To examine the sign of M(, ), utilizing (46) and inequality (55), the following inequality is hence obtained: Multiplying both sides by −2Η and then adding Η(Η − 2)   −1 , inequality (63) can be rewritten as Based on inequality (63), inequality (64) can be simplified as According to inequality (65), if Η ≥ 2, then M ≥ 0 is satisfied.Therefore  is a meaningful cost.
To show that  * 01 can minimize (⋅), an auxiliary signal  is defined as Substituting  in (66) into ( 61) Substituting M in (62) into (67) and then utilizing L in (46) and lim  → ∞  = 0, (67) can be rewritten as According to the result in (68),  can minimize (⋅) and stabilize the system.Therefore  is optimal control law.Moreover, if the action NN and the identifier NN can precisely approximate the uncertainties and nonlinearities in system (1) or their approximation performance satisfies the proposed condition by [50], then a global optimal control law for rehabilitation robot system is obtained.
Remark 12. Differing from direct optimal control [33][34][35], the proposed AHC is hybrid (indirect and direct) optimal control without having to solve an HJB equation but rather minimizing a meaningful cost.Thus computation complex is alleviated.

Simulations and Experiments
In this section, the proposed AHC method is examined through simulation and experiments.The objective of the simulation is to examine the performance of the controller in (20); instead, the objective of the experiment is to test the controller's validity.Next we present the simulation and experimental procedure, relevant simulation, and experimental results with some volunteers.

Simulation.
To illustrate the effectiveness and robustness of the proposed control algorithm on robotic rehabilitation system (3), a comparison among the adaptive proportionalderivative (PD) control, the NN-based adaptive control (NN + PD), the AC-based adaptive robust control (NN + AC), and the proposed AC-based inverse optimal control (NN + AHC) is made.Relevant simulation results are shown in Figures 3-6; summary results between NN + AC and NN + AHC are shown in Table 1.Presume the desired trajectory is selected as (in rads) where Δ = sgn(sin(0.05− 0.5)), which determines the desired trajectory.First, the adaptive PD control is applied to the simulation experiment, and the control parameters of the PD are taken as   = 20,   = 30, and Λ = 5I, and its simulation results are shown in Figure 3. From Figure 3, it is obvious that the adaptive PD control cannot track the desired position with the predefined control parameters.Specifically, the maximum position tracking error is about 0.8 rads; the minimum position tracking error is about 0.2 rads, and the steady mean error is about 0.6 rads.
Then, an action NN is added to PD arriving at a NNbased adaptive control (NN + PD), which is applied to the simulation experiment, and its control parameters are taken as   = 10,   = 20, with  = 3 (learning rate).The simulation results of the NN-based adaptive control are shown in Figure 4. Though the NN-based adaptive control can track the desired trajectory, its control performance still needs to be improved.Specifically, the maximum position tracking error is about 0.2 rads, the minimum position tracking error is less than 0.1 rads, and the steady mean error is about 0.13 rads.
Thereafter, a critic NN is added to NN + PD arriving at an AC control (NN + AC), which is applied to the simulation, and its control parameters are selected as [45].The simulation results of the AC-based robust adaptive control are shown in Figure 5. Though a favorable tracking performance can be achieved due to the effectiveness of the critic NN, the unwanted transients (e.g., 10 s, 30 s,. .., 110 s, etc.) can result in unsatisfactory behavior.
Finally, the AC-based inverse optimal control (NN + AHC) is applied to the simulation.At the same time, to furthermore illustrate the robustness of the proposed controller, external disturbance  = 0.05 sin(3) cos() is considered.The initial position is (0,0).Both critic NN and actor NN contain 16 nodes in hidden layer.For weight updating, the weights  and  are set to initial random [0, 1].All the activation functions (sigmoid) for hidden neurons are selected as The gains for inverse optimal controller was selected as The control gains for the critic NN and actor NN are selected as The choice of these parameters is through some trails and all the gains are chosen in consideration of the requirement of stability.The results of the AC-based inverse optimal control are shown in Figure 6.A favorable high tracking performance can be achieved; the unwanted transients are almost avoided with little control torque.Specifically, the steady position tracking error of the proposed AHC decreases less than 0.001 rads in 43 s. Figure 2 shows that the meaningful cost function is positive, and thus the controller  01 in (29) can minimize a meaningful cost, which means the proposed AHC is optimal.
Figures 3-6 show the simulation results of traditional control and our proposed controller.From Figures 3-6, NN + PD, NN + AC, and the proposed controller can make the tracking error decrease during the learning process since these controllers have the learning ability.However, the proposed controller has a faster reduction rate in tracking errors than NN + PD, NN + AC with slightly reduced control torque.The control torque of the proposed controller is smoother and has a smaller vibration than NN + AC to  achieve the requested level of performance.It means that NN + AC controller augmented by an inverse optimal can adjust the performance versus control input by minimizing a cost functional.It is concluded that the approximation ability of the proposed NN + AHC is better than the NN + AC control system, and the proposed AHC is optimal.Table 1 indicates that the position tracking error with NN + AHC is approximately 9% smaller than the one with NN + AC, and resulting force is approximately smaller by 15%, and convergence speed increases by 36%.The control accuracy from the simulation is sufficient for typical functional tasks.Nonetheless, the performance of the proposed method may vary when implemented in the real rehabilitation robot, since the simulation condition is too ideal compared to the real application.

Tracking Experiment.
Tracking experiments with two controllers (i.e., NN + AC and NN + AHC) were conducted on both arms of five volunteers (two females and three males, ages 22 to 36 yrs.).Designated as A, B, C, D, and E, subject A was used as an example and Figure 7 is the experimental setup.
To achieve better the control objective, each volunteer (e.g., A) first undergoes a standard assessment session before the experiment, in which he/she performs no less than 30 seconds reaching movements in eight specific movement directions only by himself/herself [51].Then, according to the evaluation result, we make a suitable training task, as shown in Figure 8.Thereafter, after having a rest, he/she (e.g., A) is asked to track the desired trajectory (i.e., the green line and the red circle); in the first phase of the experiment (0-50 s), subject A draws circle and in the second phase of the experiment (50-100 s), subject A draws line, according to the instruction on a monitor as accurately as possible.However, since individual motor behavior varies, subject A may not be able to track the desired motion trajectory all along.Once the user's motor behavior varies, the robot will adjust automatically the assistance modes (i.e., assistance or resistance) using the proposed controller.The first phase (red circle)   In Table 2, the maximum steady-state position error (PEr) is defined as the maximum absolute value of error that occurs after 2 seconds of the trial.The maximum steadystate position errors range from 1.71 to 2.27 cm with a mean of 1.95 cm and a standard deviation of 0.16 cm.The RMS position errors range from 0.41 to 0.73 cm with a mean RMS error of 0.58 cm and a standard deviation of 0.09 cm.The maximum steady-state force () is defined as the maximum absolute value of force that occurs after 2 seconds of the trial.The maximum forces range from 4.53 to 9.82 N with a mean of 6.98 N and a standard deviation of 1.98 N. The RMS forces range from 0.11 to 0.27 N with a mean RMS error of 0.20 N and a standard deviation of 0.05 N. Table 2 indicates that the control accuracy from the experiment is sufficient for typical functional tasks.Nonetheless, these tests were conducted on some healthy individuals; thus, future efforts will focus on the real patients.
Based on these results, it can be concluded that our NN + AHC is able to evaluate the patient's performance during training and automatically change its assistance/resistance based on the volunteer's requirement.However, at initial time, the control performance with the AHC is not better than the one with NN + AC, since RISE feedback structure in AC is more sensible to the disturbance and nonlinear than the inverse optimal controller  01 .Nevertheless, with the proceeding of NN + AHC learning, the inverse optimal controller begins to show better performance than the NN + AC's; the control performance of the NN + AC becomes degraded due to its inherent accumulated error.

Conclusions
A Lyapunov-based stability analysis indicates that the developed adaptive AC-based inverse optimal hybrid control  method yields global asymptotic tracking for an unknown nonlinear rehabilitation robot and human motor behavior dynamics, even in the presence of uncertain additive disturbances.Simulation and experimental results clearly illustrate that the proposed AHC controller enables a person to achieve a desired functional task while minimizing a meaningful cost and the proposed AHC performs better in terms of reduced error and convergence speed in comparison to AC controller.However, the performance of the controller on volunteers or patients remains to be seen.Future efforts will focus on improving the practicable intelligent control algorithms for rehabilitation robot and experimental trials on more volunteers, potentially including stroke/hemiplegia patients and so on.

Figure 1 :
Figure 1: The system under study.

Figure 2 :
Figure 2: (a) The meaningful cost functional.(b) The integral of the meaningful cost functional.

Figure 3 :Figure 4 :
Figure 3: Position tracking, tracking error, control torque, and weights of actor NN with PD.

Figure 5 :
Figure 5: Position tracking, tracking error, control torque, weights of actor NN, and weights of critic NN with NN + AC.

Figure 6 :
Figure 6: Position tracking, tracking error, control torque, weights of actor NN, and weights of critic NN with the proposed NN + AHC.

Figure 7 :
Figure 7: A subject sitting on a chair with back support, holding the handle of the four-link robot manipulator.The interaction force generated by the subject onto the handle and the robot.The actual and reference handle positions were displayed on a monitor in front of the subject.
Note: A-B denotes five volunteers, L represents left arm, R represents right arm, PEr represents position tracking error, and  represents the result force; S.D. denotes statistically significant value.
The weight estimate values and mismatch errors for the ideal weights   ,   ,   , and   are assumed to exist and be bounded by known positive values.
Assumption 6.The ideal weight values for   ,   ,   , and   are assumed to exist and be bounded by known positive values; that is,           ≤   max ,           ≤   max ,           ≤   max ,           ≤   max .(37) Assumption 7.
Note: TS represents temporary state; SS denotes steady state.
The experiment results of subject A are shown in Figures8, 9, 10, and 11, and the whole experimental results of five subjects tested for the desired trajectories are summarized in Table2.show the experiment results of NN + AC and our proposed controller.From Figures8-11, NN + AC and the proposed controller can track the desired trajectory during the learning process since both controllers have the learning ability.However, the proposed controller has a faster trajectory tracking velocity than NN + AC, and its control torque is obviously smaller than the NN + AC; this is a favorable property, because high control torque can cause arm to fatigue faster.

Table 2 :
Summarized experimental results of five volunteers.