Online Adaptive Optimal Control of Vehicle Active Suspension Systems Using Single-Network Approximate Dynamic Programming

In view of the performance requirements (e.g., ride comfort, road holding, and suspension space limitation) for vehicle suspension systems, this paper proposes an adaptive optimal controlmethod for quarter-car active suspension system by using the approximate dynamic programming approach (ADP). Online optimal control law is obtained by using a single adaptive critic NN to approximate the solution of the Hamilton-Jacobi-Bellman (HJB) equation. Stability of the closed-loop system is proved by Lyapunov theory. Compared with the classic linear quadratic regulator (LQR) approach, the proposed ADP-based adaptive optimal control method demonstrates improved performance in the presence of parametric uncertainties (e.g., sprung mass) and unknown road displacement. Numerical simulation results of a sedan suspension system are presented to verify the effectiveness of the proposed control strategy.


Introduction
Improvement in suspension systems plays an important role in achieving the goals of pursuing more comfortable and safer vehicles.Naturally, the suspension systems are then expected to have more intelligence to accommodate themselves in various road conditions.Hence, many control methods as summarized in [1] have been used for suspension controller design.However, model uncertainties due to the parameter uncertainties and unknown road inputs in real suspension systems bring a great challenge for the controller design.
It is well known that optimal controllers are normally designed offline by solving Hamilton-Jacobi-Bellman (HJB) equation.For linear system, linear quadratic regulator (LQR) controller is designed offline by solving the Riccati equation (special case of HJB).However, the main drawback of the conventional LQR method lies in the fact that the system model has to be known precisely in advance to find the optimal control law.In addition, the feedback control gains are obtained offline.Once the feedback gains of the controllers are obtained, they cannot be changed with the different driving environment.Thus, a more efficient control strategy is needed to adaptively cope with an active suspension control problem subject to time varying parameters under different driving situations in real time.
Recent studies summarized in [2] show that approximate dynamic programming (ADP) based controller design merging some key principles from adaptive control and optimal control can overcome the need for the exact model and achieve the optimality at the same time.The learning mechanism of ADP supported by the actor-critic structure has two steps [3].First, policy evaluation executed by the critic is used to assess the control action, and then policy improvement executed by the actor is used to modify the control policy.Most of the available adaptive optimal control methods are usually based on the dual NN architecture [4][5][6][7], where the critic NN and action NN are employed to approximate the optimal cost function and optimal control policy, respectively.The complicated structure and computational burden make the practical implementation difficult.In this paper, an adaptive optimal control method with simplified structure for the quarter-car active suspension system is proposed.The optimal control action is directly calculated from the critic NN instead of the action-critic dual networks.Meanwhile, robust learning rule driven by the parameter error is presented for the critic NN which is used to identify the optimal cost function.Finally, based on the critic NN, the optimal control action is calculated by online solving the HJB equation.Uniformly ultimately bounded (UUB) stability of the overall closed-loop system is guaranteed via Lyapunov theory.Compared with the conventional LQR control approach, simulation results from a quarter-car active suspension system verify the improved performance in terms of ride comfort, road holding, and suspension space limitation for the proposed ADP-based controller.
The remainder of this paper is organized as follows.In Section 2, the preliminary problem formulation is given.The proposed ADP-based control algorithm is introduced in Section 3. Section 4 presents the simulation results from a vehicle suspension system and, finally, the conclusions for the whole paper are drawn in Section 5.

Problem Formulation
The two-degree-of-freedom quarter-car suspension system is shown in Figure 1, which has been widely used in the literatures [8,9].It represents the motion of the vehicle body at any one of the four wheels.The suspension system is composed of a spring   , a damper   , and an active force actuator .The active force can be set to zero for the passive suspension.The sprung mass   denotes the quarter equivalent mass of the vehicle body.The unsprung mass   represents the equivalent mass of the tire assembly system.The springs k t and b t represent the vertical stiffness and damping coefficient of the tire, respectively.Vertical displacements of the sprung mass, unsprung mass, and road are denoted by z s , z u , and z r , respectively.
The motion equations of the sprung and unsprung mass shown in Figure 1 are given as follows [10]: (1) Define the state variables as where   −   is the suspension deflection, ż  is the velocity of sprung mass,   −  is the tire deflection, and ż  is the velocity of unsprung mass.
Then, (1) can be further rewritten in the following state space form: where The objective of the controller design for the active suspension systems should consider the following three tasks [11].
Firstly, one of the main tasks is to keep good ride quality.This task means to reduce the vibratory forces transmitted from the axle to the vehicle body via a well-designed suspension controller, that is, to reduce the sprung mass acceleration as far as possible when confronting parameter uncertainties m s and unknown road displacement z r .
Secondly, in order to assure the good road holding performance, the firm uninterrupted contact of wheels to road should be ensured, and the variations in normal tire load related to vertical tire deflection   −   should be small.Lastly, the suspension space requirements should not exceed the maximum suspension deflection  max ; that is, The suspension control goal is to find a controller which can ensure the stability of the closed system and minimize the following performance index: where  = ( 1 ,  2 )  and  and  are chosen by the designer which determine the optimality in the optimal control law.
Remark 1.Most of the existing LQR controller designs are based on (3) with the assumption that all of the parameters are known in advance.Besides, the feedback control law of the LQR approach is usually obtained by solving the Riccati equation offline.Once it is obtained, the control law cannot be updated online when subjected to the parameter uncertainties m s and unknown road displacement z r .Thus, it may affect the control performance.Therefore, high performance control approach that can be tolerant of various operating conditions should be developed for the active suspension system design.

Controller Design Based on Approximate Dynamic Programming
In this section, an adaptive optimal control method for active suspension systems is realized by using the ADP approach with only one critic network structure instead of the classic action-critic dual networks as presented in Figure 2. The outputs parameters should be the state of the suspension.The implementation of the state feedback control algorithm for the suspension system is based on the requirement that all states of the system can be measurable or part of the states can be estimated by using some estimated methods like Kalman filter.In this paper, we assume that all of the suspension states are available.
The objective of the optimal regulator problem is to design an optimal controller to stabilize system (3) and minimize the infinite horizon performance cost function where the utility function with symmetric positive definite matrices  and  is defined as (, ) =    +   .
According to the optimal regulator problem design in [4], an admissible control policy should be designed to ensure that infinite horizon cost function (6) related to (3) is minimized.So, the Hamiltonian of ( 3) is designed as where   ≜ ()/ denotes the partial derivative of the cost function () with respect to .
The optimal cost function  * () is defined as and it satisfies the HJB equation where  *  ≜  * ()/.Based on the assumption that the minimum on the right-hand side of (8) exists and is unique, by solving (, ,  *  )/ = 0, the feedback form for admissible optimal control  * can be derived as Substituting ( 10) into (9), we have Assumption 2 (see [4]).The solution to ( 9) is smooth, which allows us to bring in informal style of the Weierstrass higherorder approximation theorem.Then, there exists a complete independent basis () ∈   such that the solution  * () to ( 9) is uniformly approximated.
From (10), one can learn that optimal control value  * is based on the optimal cost function  * ().However, it is difficult to solve the HJB equation (11) to obtain  * () due to the uncertain parameters in matrix A and the unknown road displacement input.The usual method is to get the approximate solution via a critic NN as in [12,13].Hence, from Assumption 2, it is justified to assume that there exist weights  1 such that the value function  * () is approximated as where   ∈   is the nominal weight vector,  is the approximation error, () ∈   is the active function, and  is the number of neurons.
Then, substituting ( 12) into (10), one obtains In practical implementation, the critic NN is represented as where Ŵ is the estimation of nominal   .
Remark 3.There are many ways to get the estimation value of   such as least-squares [4] or gradient method [14,15].It has been proved in [16][17][18] that adaptive estimation method considering the parameter information can greatly improve the convergence speed in contrast to the conventional estimation method driven by the observer error.Inspired from these facts, a novel robust estimation method of   is presented in the following analysis.Substituting ( 12) into ( 9), one obtains where  = ∇()( + ),  =    +   , and  HJB =    ∇()  ż+∇(++  ż  ) denotes the Bellman error caused by the approximation of the cost function.
The filtered variables   ,   , and  HJB are defined as where  is a filter parameter.It should be noted that the fictitious filtered variable  HJB is just used for analysis.
Then, the auxiliary regression matrix  ∈  × and vector  ∈   are defined as where  is a positive constant as defined in (16).
The solution to ( 17) is derived as Another auxiliary vector M is denoted as Finally, the adaptive law of   is provided by where  is the learning gain.
Theorem 4. For system (3) with adaptive optimal control signal  given in (14), and adaptive law (20), the optimal control  converges to a small bound around its ideal optimal solution  * in (13).
From the basic inequality  ≤  2 /2 +  2 /2 with  > 0, we can rewrite (25) as The time derivative of (23) can be deduced from ( 3) and ( 8): From ( 26) and ( 27), the time derivative of L is L = L + L and satisfies the following inequality: where The steady state of the upper bound of ( 31 where  depends on the critic NN approximation error.From (32), we know that the optimal control  converges to a small bound around its ideal optimal solution  * .Remark 5.In this paper, a simplified ADP structure with a single critic NN approximator instead of the commonly used complex dual NN approximators (critic NN and actor NN) is proposed.Moreover, the weight updating law of the critic NN is based on the parameter estimation error rather than minimizing the residual Bellman error in the HJB equation by using the least-squares or the gradient methods, so that the weight error of the critic NN is guaranteed to converge to a residual set around zero with faster speed.

Simulation Results and Discussions
In this section, numerical simulations are carried out for a sedan active suspension model platform presented in 2 to evaluate the effectiveness of the proposed ADPbased adaptive control method designed in Section 3. The main parameters set is listed in Table 1.Comparative results of passive suspension and active suspension with two different control methods (i.e., LQR method and the proposed ADPbased adaptive optimal control method) are presented in the following two different cases.
Case 1 (with bump road displacement).The suspension is excited by a bump road input as [11], which is given by where  and  are the height and the length of the bump, respectively, and   is the vehicle forward velocity.Here, we select  = 0.1 m, and  = 5 m and   = 60 km/h.
For the LQR controller design, the weighing matrices  and  have to be predetermined.The relationship between  and  is that, for a fixed , a decrease in  matrix's values will decrease the transition time and the overshoot but this action will increase the rise time and the steady state error.When  is kept fixed but  is decreased, the transition time and overshoot will be increased but the rise time and steady state error will be decreased [19].Based on this rule,  and  are selected for the performance cost function (6) as  = 2 * 10 −5  and  = diag [10, 65, 1.8, 20].Then, we can get the state feedback gain matrix from Matlab  = 10 4 * [0.0166 0.5520 −5.5777 −0.2564].The proposed ADP-based adaptive optimal control (14) with the updating laws (20) for the critic NN ( 12) is simulated with the following parameters:  = 500,  = 1500 and the critic NN vector activation function is selected as ]  .Simulation results with two different sprung masses are presented in Figures 3 and 4. Compared with the passive suspension system, the active suspension system with the LQR and the proposed ADP-based adaptive optimal control methods have the lower peak and less vibration.Meanwhile, one can see from Figures 3(a) and 4(a) that the proposed ADP-based adaptive optimal control method demonstrates smaller amplitude and faster transient convergence in terms of the suspension deflection and sprung mass acceleration than the LQR control method.Figures 3(b) and 4(b) further provide the simulation results of the suspension deflection and sprung mass acceleration with the varying sprung mass (from 250 kg to 350 kg).One may find that the proposed ADP-based adaptive optimal control method still shows improved performance compared with the LQR control method even under different sprung mass.Besides, Figure 3 also shows that the suspension working space falls into the acceptable ranges to satisfy the suspension space limitation requirement.
In order to ensure the good road holding performance, the ratio between tire dynamic load and stable load should be less than 1; that is, |  +   |/(  +   ) < 1, where   =   (  −   ) and   =   ( ż  − ż  ) are the elasticity force and damping force of the tires [20].Here, the tire damping   is assumed to be negligible.From Figure 5, one can see that the ratio between tire dynamic load and stable load is always less than 1 for both of the proposed ADP-based adaptive optimal control method and LQR control methods, which implies that the better road holding ability is guaranteed and thus ensures the stability of vehicle.Actuator output forces Mathematical Problems in Engineering    are presented in Figure 6, from which one can find that the proposed ADP-based adaptive optimal control method needs smaller actuator force than the LQR control approach.
Furthermore, the performance index in terms of Root Mean Square (RMS) for the states in Figures 3 and 4 is listed in Table 2.The fact that all of the RMS values of the proposed ADP-based adaptive optimal control method are smaller than the LQR control method further shows the improved performance of the proposed method.
Case 2 (with sinusoid road displacement).In this case, a sinusoid road displacement of 0.005 sin(5) is used and all other simulation parameters are the same as those in Case 1. Comparative simulation results in Figures 7 and 8 further validate the improved performance of the proposed ADPbased adaptive optimal control method compared with the LQR control method.It should be pointed out that poor self-adaptive property of the LQR method leads to the small oscillation of the suspension deflection response and the sprung mass acceleration response as shown in Figure 3 and Figure 4.Meanwhile, the smaller oscillation for the proposed ADP-based adaptive optimal control method may come from the Bellman error caused by the approximation of the value function as shown in (15).
The reasons that lead to the improved performance of the proposed ADP-based adaptive optimal control method during the aforementioned simulation results are analyzed as follows.
From the LQR design process, one can see that it is based on the precise dynamic model (3) with the assumption that all the system parameters are time invariant.The optimal state feedback gain matrix is obtained by solving the Riccati equation offline and cannot be updated online according to the time varying parameters and unknown road input, respectively.So, it may influence the control performance and even lose stability.The proposed ADP-based adaptive optimal control approach provides a novel solution to the online optimal control of the uncertain system.the feedback control law of the proposed ADP-based adaptive optimal control approach can be updated online with the time varying parameters like sprung mass and unknown road displacement input.It therefore could be concluded that self-adaptive property of the proposed ADP-based optimal control method provides a more effective solution for the controller design of the active suspension systems and can greatly enhance the passenger's comfort level and thus achieve the goals of pursuing more comfortable and safer vehicles.

Conclusion
In this paper, an ADP-based adaptive optimal controller for active suspension systems considering the performance requirements has been proposed.Compared with the commonly used LQR method, self-adaptive property of the proposed ADP-based adaptive optimal control method has improved performance in terms of the basic tasks of suspension control like ride comfort, road holding, and suspension space limitation.This performance improvement thus will increase the passenger comfort level and at the same time enhance vehicle handling and stabilities when driving on the road.In future work, more comfortable and safer full vehicle suspension system considering the state constraints and actuator saturation limits will be designed.

Figure 2 :
Figure 2: Schematic of the proposed controller system.