Real-Time Energy Management Strategy Based on Driver-Action-Impact MPC for Series Hybrid Electric Vehicles

,


Introduction
Nowadays, the global concentrating issues like energy crisis, air pollution, and health problems pose severe challenges to the development of vehicles [1]. For traditional fuel vehicles, the requirement for engine emissions has become more stringent in recent years [2], which brings tremendous time and economic pressures to the research and development for fuel vehicles. As for pure electric vehicles, factors such as high initial cost, short driving range, and long charging time have highlighted their limitations [3]. Hybrid electric vehicles (HEVs) are considered to be able to help human beings cope with the challenges related to energy crisis and pollution by taking into account the advantages of traditional fuel vehicles and pure electric vehicles. A typical HEV employs an internal combustion engine (ICE), an energy storage source (ESS), electric machine(s), and inverter(s) [4].
Since hybrid electric vehicles have multiple energy sources and operating modes, the main challenges in their development are the coordination of multiple energy sources and converters, as well as power flow control of mechanical and electrical paths. Energy management strategy (EMS), as the brain of hybrid electric vehicles, plays a vital role in improving the energy efficiency and dynamic response of vehicles, so it has become difficult and a hot spot in HEV research in recent years. e existing EMS can be roughly divided into rule-based strategies and optimizationbased strategies [5], which are shown in Figure 1.
Rule-based energy management strategies are widely used in practice because they can be easily developed and are able to operate quite reliably [6]. e idea of rule-based strategies is to define the operating states of the hybrid system in advance and make the system operate according to the preset rules. ese rules are generally based on heuristics, intuition, human expertise, and mathematical models. Rulebased controllers could further be subcategorized into deterministic rule-based strategies and fuzzy rule-based strategies [7]. e deterministic rule-based strategies can be subdivided into thermostat (on/off) strategy, power follower strategy, modified power follower strategy, and state machine strategy [8]. e establishment of deterministic rules is usually based on the MAP and efficiency curves of power components such as engines, motors, and power batteries. And a series of deterministic rules are established to control the mode switching and energy distribution so as to make the power components work in the high-efficiency area as much as possible and maintain the relative stability of SOC under the condition of meeting the power demand.
Fuzzy rule controllers in general originate from deterministic rule-based controllers [9]. Because of its robustness and adaptability, the fuzzy logic system (FLS) is characterized as a powerful tool to cope with uncertain dynamics and unknown nonlinearities by using linguistic knowledge representation and the corresponding fuzzy rules [10]. erefore, it is more suitable for complex nonlinear timevarying systems such as HEVs. Fuzzy strategies can be further categorized into conventional, adaptive [11,12], and predictive [13] strategy.
Optimization-based strategies are reported to achieve performance targets by optimizing the cost function representing efficiency and emissions over a drive cycle, yielding to global optimal operating points [5]. Optimization-based strategies can be sorted into two categories: offline (global) optimization strategies and online (real-time) optimization strategies.
Offline (global) means that the optimization problems are solved under the premise that driving cycle conditions and power requirements are already known. Commonly used algorithms include linear programming (LP), dynamic programming (DP), genetic algorithm (GA), and particle swarm optimization (PSO) [9]. ese methods can ensure the global optimality of the solution. However, due to a large amount of calculation and the need to know the entire driving cycle in advance, they cannot be practically applied in real time but are often used as a benchmark and reference for other control strategies to compare with.
Online (real-time) optimization-based strategies reduce global optimization problems into a succession of local optimization problems, thus reducing the associated computation burden [9]. Although the results obtained are suboptimal compared with global optimization, online optimization eliminates the need for future driving information so that it can be implemented in real time. Online optimization methods consist of equivalent consumption minimization strategy (ECMS), artificial neural network (ANN), robust control, and model predictive control (MPC) [14][15][16]. Besides, online fault-diagnosis which contributes to a significantly increased system reliability by tracking down faults in the system at run time has been widely applied on safety-critical applications such as hybrid vehicle automotive power systems [17][18][19][20].
In recent years, model predictive control (MPC), as one of the most effective methods to deal with multivariable constrained control problems, has attracted great attention from scholars and has been widely utilized in the automotive industry. In MPC, the future driving information is the prerequisite and the performance and practicality of MPC are highly dependent on the accuracy of the forecast of future driving information. Nowadays, most prediction models use previous information of vehicles to predict future driving velocity [21][22][23][24], which is easy to be implemented and analyzed. However, since the transportation system is a comprehensive system composed of people, vehicles, roads, and environment, the driver plays a key role in the intermediate link between the complex environments and vehicles. e prediction models established only using historical velocity data cannot reflect the impact of the driver and the environment, thereby having less prediction accuracy. erefore, in this paper, a real-time energy management strategy based on driver-action-impact MPC is proposed. e following are the main contributions of this paper: (1) a long short-term memory (LSTM) neural network model, which is trained by the traffic data derived from a VR-based driving simulator, is adopted to predict the future driving speed by using driver action information and current driving velocity; (2) combined with the LSTM velocity prediction  2 Complexity model, a real-time MPC-based energy management strategy is established, and better fuel consumption result is obtained compared with the rule-based strategy; (3) the effectiveness of real-time computation of the EMS is validated through a hardware-in-the-loop test platform. e paper is organized as follows: in Section 2, a velocity prediction model based on LSTM neural network model is established and trained. In Section 3, the mathematical model and optimization problem of the series HEV are presented. An EMS based on driver-action-impact MPC is introduced, and the results of simulation are shown and analyzed. A hardware-in-the-loop test is implemented to verify the control efficiency for real-time system in Section 4. Finally, the conclusions are given in Section 5.

Velocity Prediction Model Based on LSTM Networks
2.1. Structure of LSTM Networks. As the basic neural network, multilayer perceptron (MLP) neural networks have been widely used in velocity prediction of vehicle because of their high efficiency in processing data [25,26]. However, when training MLP, the correction of network weights and thresholds is only related to the current set of data and has nothing to do with the previous set of data, so traditional MLP cannot reflect the continuity of the data in time-series problems such as velocity prediction problem. As a special recurrent neural network (RNN), long shortterm memory (LSTM) networks avoid the long-term dependence of standard RNN and have achieved great success in dealing with time-series problems such as speech recognition, language modeling, translation, trajectory prediction, picture, and video processing [27][28][29][30][31][32][33].
Since Hochreiter and Schmidhuber proposed the LSTM cell in 1997, LSTM networks have been modified and popularized by many researchers [34]. Most LSTM networks can be divided into two broad categories: LSTM-dominated networks and integrated LSTM networks, and the classification of popular structures of these two networks is shown in Figure 2. LSTM-dominated networks focus on optimizing the connections between inner LSTM cells, and integrated LSTM networks mainly pay attention to integrating the advantageous features of different components [30].
Although there are lots of different networks, the basic structure of a standard LSTM cell is almost the same, as shown in Figure 3. e black arrow indicates vector transmission, the yellow box represents a layer of neural network, and an LSTM cell has four interactive neural network layers. e pink circle represents pointwise operations, like vector addition and multiplication. σ and tanh represent sigmoid and tanh functions, respectively, which are defined in equation (1).
A cell of an LSTM unit is composed of three gates: an input gate, an output gate, and a forget gate. e key information of LSTM is cell state C. e forget gate layer is used to decide what information is discarded in the previous cell state C t-1 , the input gate layer is used to decide which values in the cell state are updated, and the output gate layer is used to control the output h t based on the current cell state C t and input x t . By using these three gates, LSTM can forget or add information in the cell state, thus avoiding the vanishing gradient problem in traditional RNN: (1)

Acquisition of Training Dataset.
Dataset is the basis for training of neural networks, and whether the data can be used reasonably or not directly determines the quality of the prediction results. However, real road testing is time-consuming and costly; moreover, there are certain safety risks when the vehicle is still in the development stage. Driving simulator is considered to be a feasible solution to eliminate risk, reduce cost, and accelerate vehicle  Complexity 3 development [35]. In this paper, a virtual reality-(VR-) based driving simulator is used to obtain the operation data of different drivers, roads, and environments. e driving simulator was developed by the Vision Simulation Laboratory of Beijing Institute of Technology (BIT) before [36], and the system architecture is illustrated in Figure 4.
Since the establishment of the system, we have carried out a lot of driving data collection work of different drivers and different environments, as shown in Figure 5. ese data include vehicle state data such as velocity, acceleration, steering radius, steering angle speed, and driver's manipulation data, such as pedal signal and steering wheel angle signal.
e collected data are classified and recorded according to different driving habits and environmental conditions.

LSTM Velocity Prediction Model Based on Driver's Action.
In order to reflect the influence of driver's action on vehicle's velocity, the opening and change rate of accelerator and brake pedal as well as the current vehicle's velocity are selected as the input of the LSTM network, and the output is the predicted velocity values of the next five moments, as shown in Figure 6.
After plenty of comparisons on convergence speed and accuracy, in this paper, a conventional LSTM network model proposed in Figure 3 is used. And the basic parameters of this model are as follows: the input_size is the number of expected features in the input vector. e output_size is the size of the output vector. e hidden_size is the number of nodes in each LSTM hidden layer. And the num_layers are the number of recurrent layers. Because the number of inputs and outputs are 5, therefore, the input_size and output_size are chosen to be 5. e selection of the next two parameters is achieved by repeated comparison of accuracy, and finally, the hidden_size and the num_layers are set to be 50 and 1, separately. e backpropagation through time (BPTT) algorithm is used to train the network, as shown in Figure 7 [37]. e Adam optimizer is utilized to optimize the model with a learning rate of 0.001 [38]. e iterations are set to 500.
In view of the particularity of the vehicles studied in this paper, the driving data of different drivers under the conditions of intense and steady driving on urban roads and suburban roads are chosen to form the network training set, as shown in Figure 8.
After training, a standard HWFET driving cycle is used to verify the accuracy of the velocity prediction model. e results are shown in Figure 9. e average error per second and the root mean square error (RSME) of the prediction results of the LSTM network and the three-layer MLP network are compared, as shown in Table 1.
It can be seen that the accuracy of the prediction results from the LSTM network is significantly better than the 3layer MLP network. e result shows that the established LSTM velocity prediction model based on driver's action is feasible for vehicle's velocity prediction, which lays the foundation for prediction control.

Real-Time EMS Based on Driver-Action-Impact MPC for Series HEV
3.1. Series HEV Powertrain Model. In this paper, a series HEV which is composed of an engine-generator set (EGS), a battery pack, and two drive motors is put forward. e powertrain configuration can be seen in Figure 10. Specifications of this system are given in Table 2. e engine and generator are connected by a damper, and the EGS serves as the main energy source of the vehicle. As the only energy storage device, the battery pack is connected with DC bus through a DC/DC converter. e vehicle control unit (VCU) with integrated inverters is used to control those power components, enabling them to operate with higher efficiency while meeting power requirements. e two motors drive the wheels on both sides through reducers.
Ignoring the complex dynamic characteristics of those power components, the mathematical model of the powertrain is established based on the balance of power, which can be written as follows: where P req is the required power of the powertrain, P batt is the power of the battery pack, P gen is the discharge power of the EGS, P mot is the total power of two motors, P eng is the engine's power, and T g and n e are the torque of the generator and the rotation speed of the engine, respectively. P req comes from driver's intention, and based on the velocity prediction model in Section 2, the required power can be calculated by the predicted velocity, as follows: where F f (t) is the traction force, P f (t) is the traction power, F j (t) is the acceleration resistance, F α (t) is the air resistance, F r (t) is the rolling resistance, v (t) is the velocity of the vehicle, δ is the conversion coefficient of the rotating mass, m is the 4 Complexity vehicle's mass, C D is the air resistance coefficient, ρ is the air density, and A is the frontal windward area of the vehicle. e state of charge (SOC) of battery is a very important variable for hybrid electric vehicles. In order to reflect the dynamic response characteristics of the battery pack as well as meet the requirement of real-time computation, the resistance model is adopted in this paper. e equations of SOC and P batt can be written as  Complexity 5 where S _ OC is the change rate of SOC, I batt is the current of the battery pack, Q batt is the capacity of the battery pack, U ocv is the open circuit voltage, and R int is the internal resistance of the battery pack.

Problem Formulation for MPC.
e fundamental principle of MPC is to get the optimal control variables over a finite receding horizon through the predictive model, rolling optimization, and feedback compensation [39]. According to the powertrain model developed in the previous section, the predictive model can be developed using the following equations: e state variable x, control input u, input disturbance v, aand the output variable y in this problem are defined as where _ m fuel denotes fuel consumption which can be obtained by looking up the table of the engine. e constraints are as follows: n e min ≤ n e ≤ n e max .
In optimization problems, whether the cost function is reasonable or not directly determines the quality of control. For the series HEV studied in this paper, the core of EMS is to minimize the fuel consumption and maintain the stability of battery SOC on the basis of meeting the power demands and constraints. Besides, in order to limit the fluctuation of the power of the EGS during the optimization process, the change rates of the control variables are limited. For the need of practical calculation, the cost function is discretized into y min ≤ y(k) ≤ y max , k � 0, 1, . . . , N − 1, where N is the prediction horizon length; Q, Z, and R represent the penalty weights of the states, inputs, and outputs, respectively; and (k + i|k) represents the predicted value at the i-th time after the current sampling time k.
e predicted model of this system is linearized and discretized using the first-order Taylor formula, as shown in equation (11). e trajectory of the future states will be obtained by the discrete model, as equation (12):   8 Complexity x(k + 1)  Figure 10: Configuration of the series HEV.

Complexity 9
Among them, By inserting equation (12) into the original objective function equation (10) and ignoring the constant term, the original optimization problem can be transformed into a standard quadratic programming (QP) problem: where the Hessian matrix H is symmetric and positive or semipositive definite. P is the gradient vector. I N×N is an N × N identity matrix. U max and U min are column matrix composed of upper limits and lower limits of the control variables, respectively. e QP problem shown above is solved by the barrier method, a particular interior-point algorithm, by applying Newton's method to a sequence of equality constrained problems or to a sequence of modified versions of the KKT conditions [40]. e explicit expression of the barrier method is not reported here for the sake of brevity.

Real-Time EMS Based on Driver-Action-Impact MPC.
In this section, a real-time energy management strategy based on driver-action-impact model predictive control is established, the framework of which is shown in Figure 11. It can be seen that the EMS is mainly composed of two parts: a velocity prediction module and an MPC control module. e velocity prediction module utilizes the pedal signal coming from the driver and actual vehicle's velocity signal to predict the power demands in the future through an LSTM neural network. e MPC control module outputs the control variables to act on the underlying controllers of those power components after optimization by solving the QP problem mentioned above. In MPC, a series of control variables are obtained through optimization computation at the current sample time, the rolling optimization and feedback compensation are achieved by applying the first control variable to the vehicle, and the process is repeated at the next time.

Simulation Results of the EMS.
To better evaluate the performance of the EMS, computer simulations are implemented on two standard driving cycles: EUDC and UDDS. e results of the velocity, the powers and torques of those main power components, and the SOC of the battery pack and engine's operation points are shown in Figure 12. 10 Complexity It can be seen that the EMS based on driver-actionimpact MPC can follow the speed requirements well no matter in steady or intense driving conditions. e power and torque curves show the power and torque of each main power component. It can be seen that the EGS can better meet the power demands of the motors under the steady driving condition with almost no need for an additional battery power supply. At the same time, when the demand power changes suddenly, the battery plays a buffer role, avoiding the situation that the output power of the EGS is insufficient or excessive when the instantaneous power mutation occurs. In the simulation, the reference value of battery SOC is set to 0.6. It can be seen that the control strategy can stabilize the SOC around the set value of 0.6 except for charging the battery in the last braking mode.
From the figures of operation points of the engine in the bottom of Figure 12, it can be seen that the engine can work near the optimal fuel consumption curve due to the decoupling of the front and rear powertrain, so as to realize the efficient control of the engine. Compared with the rulebased EMS, the fuel consumption is averagely reduced by 5.6%, resulting in better fuel economy.

Experiment Results
Hardware-in-the-loop (HIL) simulation is a technique for performing system-level testing of embedded systems in a comprehensive, cost-effective, and repeatable manner [41]. Herein, a HIL test platform which includes a real vehicle control unit (VCU), a vehicle simulator, a CAN communication bus, and a signal reading equipment is built to evaluate the real-time performance of the EMS. e schematic diagram of the platform is shown in Figure 13.
e VCU, which is developed by Key Laboratory of Vehicle Transmission in BIT for application to realistic vehicle control, carries the EMS proposed in Section 3.3. e rapid   Figure 11: e framework of the real-time EMS based on driver-action-impact MPC.
Complexity prototyping product OpenECU M220 designed by Pi Innovo company is used as a vehicle simulator which receives the control signals from VCU and outputs the state signals of all the power components through the CAN bus.
e Kvaser Leaf Light v2 is a high-speed USB interface for CAN, and a PC with calibration software PiSnoop is used for data monitoring and online calibration of control parameters. A picture of the test platform is shown in Figure 14. e sampling interval is set to 0.01s, and the results of the realtime experiment on the HIL test platform are shown in Figure 15. Compared with the results demonstrated in Figure 12, the results of hardware real-time calculation are basically consistent with the offline computer simulation results.
In real road tests, CAN bus always induces time-varying delays when there are a number of communication nodes on the bus [42]. In this paper, the communication nodes in the test platform are less than real vehicles. erefore, the delay can almost be ignored. e computational time of the algorithm in each step is an important criterion to demonstrate the complexity level of control techniques. In the HIL experiment, the output period of control variables monitored by PiSnoop is stable at 8 ms, which shows that the designed EMS can meet the requirements of real-time computation.

Conclusions
In this paper, a real-time energy management strategy based on driver-action-impact MPC is proposed for series HEV. In this approach, a long short-term memory (LSTM) neural network model, which is trained by the traffic data derived from a VR-based driving simulator, is adopted to predict the future driving velocity. In order to develop the MPC-based strategy, a nonlinear optimization problem considering both fuel consumption and SOC of battery is built, and the results of the EMS on different driving cycles are discussed. Compared with rule-based strategy, 5.6% reduction of the fuel consumption is obtained by proposed EMS. To validate the efficiency of real-time calculation, the EMS is embedded into an HIL test platform, and the results show that the designed EMS can meet the requirements of real-time computation.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest. 14 Complexity