Data-Based Control for Humanoid Robots Using Support Vector Regression , Fuzzy Logic , and Cubature Kalman Filter

Time-varying external disturbances cause instability of humanoid robots or even tip robots over. In this work, a trapezoidal fuzzy least squares support vector regression(TF-LSSVR-) based control system is proposed to learn the external disturbances and increase the zero-moment-point (ZMP) stability margin of humanoid robots. First, the humanoid states and the corresponding control torques of the joints for training the controller are collected by implementing simulation experiments. Secondly, a TFLSSVR with a time-related trapezoidal fuzzy membership function (TFMF) is proposed to train the controller using the simulated data. Thirdly, the parameters of the proposed TF-LSSVR are updated using a cubature Kalman filter (CKF). Simulation results are provided. The proposed method is shown to be effective in learning and adapting occasional external disturbances and ensuring the stability margin of the robot.


Introduction
In general, lots of experiences are needed to turn the parameters of the controller for humanoid walking robots [1,2].At the same time, the turned parameters could be out of operation once external disturbances occurred [3,4].It is still a giant challenge for humanoid robots to walk autonomously in disturbed environments.How to improve the antidisturbance ability of humanoid robots using the data online and offline is an interesting problem to be settled.
Traditionally, an accurate dynamic model of the considered robot should be built in order to implement the desired high-quality control [5,6].The dynamic model of humanoid robots is a set of strong coupling nonlinear ordinary differential equations about the joint variables.When humanoid robots walk slowly, the coupling among the joints can be treated as disturbances.In this case, proportion integration differentiation (PID) control for each independent joint can be adopted.Additionally, inverse dynamics feed-forward control, decoupling control, and feedback linearization control also can be considered.All the abovementioned methods have a common feature that they are dependent on the established mathematical model strongly.
The control effect could be great if the system model is known exactly.However, these methods need to be improved in practical applications because there always exist external disturbances and the system model cannot be known exactly [7,8].Due to the cumulative errors in disturbances, when the robot is tracking the walking pattern planned in advance, the difference between actual motion states and target values will increase rapidly.As a consequence, humanoid robots always fall down after a few steps.To keep stable and sustained humanoid walking, the control systems must be improved to cope with the disturbances.
To realize the stable walking of humanoid robots, researchers proposed some effective methods, such as stability controls based on linear inverted pendulum model, stability controls based on ZMP theory, and attitude controls for the upper part of robot.By summarizing the existing humanoid prototypes, ZMP-based methods are the most popular and practical [9][10][11][12][13].When the humanoid dynamics (the masses of each module, the center of mass, and the moment of inertia) is known exactly, the gait of a humanoid robot can be obtained by solving the ZMP equation.However, it is really difficult to collect all the precise parameters from a physical robot.As a result, several simplified methods are 2 Mathematical Problems in Engineering proposed to guarantee the stability criterion of ZMP.For example, Liu et al. designed an effective fuzzy logic (FL) controller for humanoid walking robots using ZMP as one of the antecedents [14].Inspired by this literature, we try to realize a kind of data-based control with implicit constraints of ZMP.
Considering the time-varying external disturbances, it is necessary to deduce the control torques referring to the time-varying states of the humanoid joints.The support vector machine (SVM) is supervised learning methods with associated learning algorithms that analyze data [15][16][17].SVMs can be used to solve the classification problems [18,19] and also the regression problems [20][21][22].In this paper, support vector regression (SVR) is used to expresses the scene of solving the regression problems using SVM.The formulation of SVM employs the structural risk minimization (SRM) principle, which has been shown to be superior to the empirical risk minimization (ERM) based on infinite samples.SRM is a technique where nested sets of functions of different complexity, controlled by the regularization term, are considered.One could select then the one which is minimizing the upper bound on the generalization error.This feature makes SVM more efficient in resolving the learning problems with limited training data.To cope with timevarying external disturbances, a time-based fuzzy SVR will be proposed to learn the dynamics between the states and the control torques of each joint in this work.
On the other hand, the effectiveness of time-based fuzzy SVR depends on the design of the SVR and the parameters of the fuzzy system.Kalman filter has been used in algorithm studies on the training of neural networks and fuzzy systems.Singhal and Wu [23] demonstrated that the EKF could serve as the basis for training MLP networks by treating the weights of the network as a nonlinear dynamical system.Inspired by the successful application of the Kalman filter for training neural networks [24] and for defuzzification strategies [25], Simon [26] built a nonlinear system to train fuzzy systems using an extended Kalman filter.Recently, there is a research [27] that embedded a third-degree spherical-radial cubature rule into the Bayesian filter to build a kind of new filter named CKF.CKF has demonstrated excellent performance in solving nonlinear filtering problems with minimal computational effort [28][29][30][31].Therefore, we will explore approaches to train the proposed time-based fuzzy SVR using CKF.
The contributions of this work can be summarized as follows: (i) For the first time, a trapezoidal fuzzy SVR is proposed to cope with time-varying external disturbances imposed on humanoid robots.(ii) For the first time, a novel approach for training the trapezoidal fuzzy SVR using CKF is presented.The organization of this paper is as follows.In Section 2, the backgrounds of humanoid robots, SVR, and CKF are presented.The details for proposed framework are presented in Section 3. Simulation results are provided in Section 4, followed by the conclusions in Section 5.

Background
2.1.Dynamic Balance of Humanoid Robots.The dynamic equations of the single support phase (SSP) can be written as where  ∈   is the generalized coordinate and () ∈  × is the inertia matrix.(, θ ) ∈  × denotes the matrix of centripetal acceleration and Coriolis terms, () ∈  × is the gravity vector, and  ∈   denotes the input torque vector during the SSP.The external disturbances are represented by   ∈   .The dynamic equations of the double support phase (DSP) can be written as  () θ +  (, θ ) θ +  () +   ()  =  +   , (2) where   () ∈  × is a Jacobian matrix and  ∈   is the force vector of constraints caused by the contact with ground.
To analyze the stability of the humanoid motions, the ZMP theory is used as the criterion of dynamic humanoid balance in this work.The concept of zero moment point (ZMP) has been applied to many famous humanoid robots successfully, such as ASIMO [11] of Honda.ZMP is a wellknown concept introduced in 1990 [32], which is a point on the ground at which the net moment of the inertial forces and the gravity forces has no component along the horizontal axes.At a given time instant, dynamic balance of legged systems is ensured if the ZMP is inside the support area.

Support Vector Regression.
In this section, we briefly review the basis of the theory of SVR.
Given a labeled training data set {  ,   } ( = 1, 2, . . ., ),   ∈   is the input vector of the system and   ∈  is the output vector.The basic idea of SVR is mapping the input space into a higher dimension feature space using the nonlinear mapping function and searching the optimal linear regression function in this feature space.Objective function of the least squares SVR (LS-SVR) [17] where  is a positive real constant for tuning.The error of the regression becomes smaller when the value of  is smaller. is a weight vector and  is a bias;   is the positive slack variable enabling dealing with permitted errors.To solve this optimization problem we construct the Lagrangian and we find the saddle point of (, , , ), where   ∈   is the input vector of the system and   ∈  is the output vector. = [ 1 ,  2 , . . .,   ]  is a vector of the Lagrange multipliers.The parameters must satisfy the following conditions: Eliminating  and , problem (3) can be transformed into where  = [  [17] is 2.3.Cubature Kalman Filter.To describe the CKF, we consider the filtering problem of a nonlinear dynamic system with additive noise, whose state space model is defined by a process equation and a measurement equation in discrete time: where   is the state of the dynamic system at discrete time ; (⋅) and (⋅) are some known functions;   is the control input;   is the measurement; { −1 } and {V  } are independent process and measurement Gaussian noise sequences with zero means and covariances  −1 and   , respectively.When we deal with a problem of state estimation using a nonlinear filtering, the integrals for the means and variances of the states can be expressed as the form of a Gaussianweighted integral.Consider a Gaussian-weighted integral of the following form: where () is an arbitrary function and   is the region of integration.There are different integration methods to derive different nonlinear filters.For a CKF, the spherical-radial cubature rule is adopted to implement the integration.Let  = (   = 1,  ∈ [0, +∞)), and integration (9) can be separated into radial integration and spherical integration.That is, where   is the surface of the sphere defined by  ∈   .Using Lagrange integration, the radial integration can be rewritten as where Using a cubature rule of degree three, the spherical integration can be rewritten as where ⟨1⟩  denotes the th column of set ⟨1⟩.For example, Combining ( 11) and ( 12), the spherical-radial cubature rule is as follows [27]: For standard Gaussian distribution, (13) can be rewritten as where (; 0, ) denotes the Gaussian density of  with mean 0 and covariance .Combining ( 9) and ( 14), we get where  is the dimension of the state vector.The point (  ,   ) is called cubature point here.  and   can be calculated as follows: This means that, for the third-degree spherical-radial rule, it entails a total of 2 cubature points.After calculating the cubature points, we use the cubature-point set to numerically compute the integrals and obtain the CKF algorithm; details of time update and measurement update can be found in the literature [27].

Learning weights of the training data Time
The trapezoidal fuzzy membership function for humanoid walking samples.

Data-Based Control for Humanoid Walking Robots
External disturbance is one of the key issues which influence the stability of humanoid walking robots.On the other side, it is difficult to measure the external disturbances directly.
Based on these facts, we turn away to focus attention on the data of the humanoid states and the corresponding control torques because the pattern of the disturbances can be disclosed using the varying data collected from the humanoid robots to some extent.First of all, the states and the control torques of the humanoid joints are collected from a simulated stable walking robot.Then, a data-based controller considering the varying states of the humanoid robot is designed using SVR and fuzzy theory.To optimize the controller, a CKF is designed to train the parameters of the SVR and the fuzzy system.The complete framework for data-based humanoid control using SVR, FL, and CKF is shown in Figure 1.First of all, data of the joint angles are generated from reference trajectory planned offline.The trajectories can be represented as follows: where  ℎ ,  ℎ represent the position of the hip and   ,   represent the position of the swinging ankle joint. denotes the walking step length, and  denotes the height of swinging ankle. denotes the total number of samples for a step,  denotes the index of the samples, and  ℎ ,  ℎ represent the length of lower limbs.Secondly, a proportion integration differentiation (PID) controller is used to obtain the driving torques.In this work, the initial driving torques of all the joints are obtained using this PID controller.Then the key driving torques, including driving torques for the support hip and support ankle in the SSP and driving torques for the knees in the DSP, are improved using SVR, FL, and CKF.The PID controller is as follows: where   ( = 1, . . ., ) is the torque of the joints.  denotes the offset of the desired reference trajectories and the actual trajectories. is the integral period.The proportional gains   , integral gains   , and differential gains   are slightly modified by the trial-error method.

Designing the Controller Using the Collected Data.
A TF-LSSVR is proposed in this section to design the controller using the collected data.

Humanoid Controller to Be Built Using the TF-LSSVR.
Based on the existing literature [14], when the ZMP criterion is satisfied, the dynamics between the torque control inputs and the joint angles of the DSP can be presented as  left knee = ℎ ( left knee ,  right knee ) ,  right knee =  ( left knee ,  right knee ) , (19) where  left knee and  right knee are the driven torques of the left knee and the right knee. left knee and  right knee are the joint angles of the left knee and the right knee.ℎ(⋅) and (⋅) are the nonlinear dynamics in the DSP to be learned using the proposed TF-LSSVR.
On the other side, when the ZMP criterion is satisfied, the dynamics between the torque control inputs and the joint angles of the SSP can be presented as [14]  sup hip =  ( sup hip ,  sup ankle ) ,  sup ankle =  ( sup hip ,  sup ankle ) , (20) where  sup hip and  sup ankle are the driven torques of the supporting hip and the supporting ankle. sup hip and  sup ankle are the joint angles of the supporting hip and the supporting ankle.(⋅) and (⋅) are the nonlinear dynamics in the SSP to be learned using the proposed TF-LSSVR.
For illustration purposes, we will expound the situation in the SSP mainly.The solution for the DSP can be easily deduced in the same way.Then, the humanoid controller to be built can be simply presented as follows: where (⋅) is the nonlinear dynamics that the TF-LSSVR tries to build. is the driving torque, and Θ = ( sup hip ,  sup ankle ) is a vector of joint angles.where  is a weight vector.(⋅) is a nonlinear mapping function. is a penalty coefficient,   is a positive slack variable, and  sup hip is the corresponding bias.{ () sup hip ,  () sup ankle ,  () ,   } is the th sample, and  is the number of the samples.The   is the fuzzy learning weight for the training samples, which can be calculated using the fuzzy membership functions designed in the next section.It is noted that in literature [17] also a weighting scheme similar to (22) has been proposed but it is based on robust statistics.In this paper, a different weighting scheme is proposed.

Designing the Learning Weights
Using TFMF.The walking data from different time can be evaluated using some linguistic terms such as "more important" or "less important."When the timeliness of the training data is considered, the collected data from current steps are "more important," and those from past steps are "less important." For this reason, a TFMF, which is the left half of a trapezoidal function, is proposed as the membership function of humanoid walking samples.The proposed time-related trapezoidal fuzzy membership function is shown in Figure 2. The formula for the trapezoidal fuzzy membership function is as follows: Mathematical Problems in Engineering where   is the trapezoidal fuzzy membership function of time   ( = 1, 2, . . ., ),   ∈ [ℎ, 1], and ℎ ∈ (0, 1) is a tuning parameter. −− and   are the beginning and the ending of the time window, respectively, and  − is a time point between the beginning and the ending.
From Figure 2 and ( 23), it can be seen that the time window [ −− ,   ] is divided into two parts.The first part [ − ,   ] contains the newest walking samples, which are designed to have a full fuzzy membership grade.The second part [ −− ,  − ] crosses through the relative old samples, which are assigned descending fuzzy membership grades.As shown in Figure 2, the values of the learning weight at points  and  are larger than that of point  because the states  and  are closer to the current situation.Besides, as the sampling time window moves, the trapezoidal learningweight function moves along the time axis accordingly.Therefore, reasonable and adaptable learning weights are assigned for all samples in the whole walking process.The deduced learning weights are then used in the learning algorithm of the TF-LSSVR.
The nonlinear dynamic model (⋅) in ( 21) can be built when the learning process of the TF-LSSVR is completed.After this, considering the parameters of the TF-LSSVR, controller (21) can be written as where  is the driving torque, Θ is the joint angle, and  = [, , , , ℎ]  is a parameter vector.Here, we denote the plenty coefficient by , the width of the RBF kernel by , and the width of the moving time window by  and , and we denote the minimum value of the membership function by ℎ.To obtain optimized parameters for the controller, a CKFbased training method is proposed in the next section.

Parameters Optimization Problem in a Form Suitable for CKF.
In order to cast the parameters optimization problem of the TF-LSSVR in a form suitable for CKF, we let the parameters of the TF-LSSVR constitute the state of a nonlinear system, and we let the output of the TF-LSSVR constitute the output of the nonlinear system to which the CKF is applied.
As mentioned above we denote the state of the nonlinear system by The vector  thus consists of all of the parameters for the TF-LSSVR arranged in a linear array.Let  = (Θ, ) be the transfer function of the TF-LSSVR, where  is the output, Θ is the input, and  = [, , , , ℎ]  is a parameter vector.
The training of the TF-LSSVR can be formulated as a filtering problem.The model for updating the parameters using CKF can be written as Similar to the approach of training fuzzy systems with the extended Kalman filter [26], in order to execute a stable Kalman filter algorithm, we add some process noise and measurement noise to the above model.That is, where  −1 and V  are artificially added Gaussian noise sequences with zero means and covariances  −1 and   , respectively.

The CKF Algorithm for Updating the Parameters of the Controller.
To update the parameters of the controller, the CKF algorithm [27] is implemented as follows.
After time update and measurement update, an estimation of the parameters  for the TF-LSSVR is obtained.That is to say, the initial controller with random parameters is updated to the proposed data-based controller.

Simulation Research
In this section, we test our proposed learning control method on the control of a seven-link robot by simulation experiments.Matlab 7.0 is used to model the humanoid robot and the controller.

Simplified Model of a Seven-Link Humanoid
Robot.The simplified model of the robot has two legs and a trunk.Each leg is composed of a thigh, a shank, and a foot.There is one degree of freedom (DOF) in the trunk, one for each hip, one for each knee, and one for each ankle.In this paper, we focus on the sagittal dynamics of the humanoid robot.The simple model of the humanoid is shown in Figure 3.The details of the humanoid can be referenced in Table 1   DSP and 0.8 s for the SSP.The trunk of the humanoid robot is assumed to be upright during walking.Three different step lengths are implemented (0.16 m, 0.18 m, and 0.20 m) and the step height is 0.02 m for all the three step lengths.The sampling interval is 0.025 s.That is to say, there are 40 groups of samples that are collected in a single walking cycle.
To validate the advantage of the proposed method, a simulated time-varying external perturbation   = 0.08 sin(5) (Nm) is considered here for the tests.It is a horizontal external force with duration of 0.1 s that is applied on the hip at  = 0.15 s in the DSP and  = 0.65 s in the SSP, respectively.

Experimental Conditions and the Parameters.
Here, we use the universal RBF as the kernel for the TF-LSSVR.
The penalty coefficient  = 1000 and width of the RBF kernels  = 0.9 are determined by a 10-fold crossvalidation strategy.The other initial parameters for the proposed framework are identified according to experiences, which include the number of the steps for a full weight  = 1, the number of the steps for a gradient weight  = 1, and the  lower bound of the fuzzy membership function ℎ = 0.5.Then the initial parameters are updated using CKF.

Learning Results of the TF-LSSVR. The learning results
of the proposed TF-LSSVR are compared with two other intelligent methods including fuzzy and the traditional LS-SVR.The designing details of the fuzzy control system can be found in the literature [14].Nonlinear dynamics of the submodels formulated in ( 19)-( 20) are learned using the three different methods (see . Define control error  = τ − , and the integral square error (ISE) is defined as the measure index:  where  denotes the sampling index and  = 40 is the number of all the samples.τ() is the output of the learning methods and  () is the desired output.The integrated square errors (ISEs) criterion is listed in Table 3, which indicates that the proposed TF-LSSVR achieves a better learning result than the other two existing methods.As we can see, the ZMP trajectories corresponding to the proposed method have bigger stability margins for the humanoid compared to the other two intelligent methods.That is to say, using the data online and offline with different weights, the proposed learning control method is more effective in learning the external disturbances and increasing the stability margin of humanoid robots.

Conclusions
In this work, a TF-LSSVR-based control system is proposed to learn the external disturbances and increase the stability   margin of humanoid robots.First, data for training the controller is collected by implementing simulation experiments.Secondly, a TF-LSSVR with a time-related TFMF is proposed to train the controller using the simulated data.Thirdly, the parameters of the proposed TF-LSSVR are updated using a CKF.Simulation results are provided.The proposed method is shown to be effective in learning and adapting occasional external disturbances and ensuring the stability margin of the robot.We believe that the proposed method will be very promising for the development of autonomous humanoid robots.
Data collecting from the simulations Designing the controller using the collected data Simulation data (angle and torque) Biped controller to be built using the TF-LSSVR Objective functions of the TF-LSSVR Designing the learning weights using TFMF Simulated biped robot An initial controller with random parameters Parameters optimization problem in a form suitable for CKF The CKF algorithm for updating the parameters of the controller Training the parameters of the controller using CKF Gait planning The proposed controller A PID controller

Figure 1 :
Figure 1: Data-based control for humanoid walking robots using support vector regression, fuzzy logic, and cubature Kalman filter.

Figure 3 :
Figure 3: Simplified model of the simulated humanoid robot.

3. 1 .
Data Collecting from the Simulations.Data for training the controller is collected by implementing simulation experiments.Two kinds of data are collected from the simulated humanoid robot, including the joint angles and the driving torques.The way we get the training data is described in detail next.

4. 4 .
Performance Comparisons.ZMP trajectory with disturbance in the SSP can be found in Figures16-18, which indicate that the proposed learning control method produces a larger stability margin relative to the traditional ones.Similar situations appeared in the DSP and the comparisons of humanoid walking when disturbances occur in the DSP are shown in

Figure 19 :
Figure 19: ZMP trajectory with disturbance in the DSP when the step length is 0.16 m.Fuzzy controller (green); LS-SVR controller (magenta); the proposed TF-LSSVR controller (black).
TF-LSSVR.When the timeliness of the training data is considered, the collected data from current steps are "more important," and those from past steps are "less important."Basedonthis, in the proposed objective functions of 2 s.t.()sup hip =    (()sup hip ,  () sup ankle ) +  sup hip +   ,  = 1, . . ., ,

Table 1 :
Parameters of the humanoid robot.

Table 2 :
Initial radians of the humanoid joint angles.

Table 3 :
Comparisons of the ISE performances.