Machine Learning-Based Management of Hybrid Energy Storage Systems in e-Vehicles

In transportation systems based on e-vehicles, the energy demand is met with the integration of renewable energy sources while maintaining the voltage pro ﬁ le and mitigating the active and reactive power losses. Vehicle-to-grid optimization technique is used to ensure this integration. Minimum active and reactive power losses are achieved when e-vehicles are integrated with the renewable energy sources in a hybrid mode. A machine learning framework with nested learning is used to ensure optimal methodology to trigger vehicular movement and monitoring of the SoC battery level. When the HEV operates, there is a high possibility for battery degradation, leading to loss of its capacity. To determine the optimal policy, the TD( λ ) learning algorithm is incorporated. This algorithm is known to showcase high performance and a high convergence rate in a non-Markovian environment. The output is simulated to record the readings observed which is aimed at optimizing the total operation cost and reduction in battery replacement. The results show that for shorter drives, the battery replacement cost is more and it is optimally possible to increase the battery life by 21% using the proposed work. Similarly, the recordings indicate that the proposed work shows a signi ﬁ cant reduction of about 8% – 10% in the operating cost when compared with the RL and rule-based policy.


Introduction
There is a rapid increase in energy demand by consumers across several applications. The voltage profile has to be maintained while reducing the energy losses in order to meet these demands by distribution network operators. Consumers are provided with energy through large distribution and transmission networks from the centralized energy generation power plants. Throughout this process, due to distribution and transmission losses, around 35% electricity is lost. By 2030, the electricity demand is estimated to rise to 900 GW while the environmental pollution may rise to 59% by conventional energy sources that operates towards meeting this demand [1]. In order to meet the demands of the consumers, renewable energy sources (RESs) are installed close to the load centers.
Along with the distribution and transmission losses, environmental pollution can also be reduced with appropriate integration [2]. At the consumer end, the green energy generation sources installed are termed RESs. Microhydro, wind turbines, solar photovoltaics, and so on are some of the types of RESs [3]. The system complexity increases with high losses, random energy consumption profiles, and different types of connected sources with the integration of the RES. Identification of objectives and integrated approaches are implemented with the integration of the RES. It is crucial to integrate the backup energy sources to increase the reliability of the RES [4]. e-Vehicles or battery stations can be considered as a backup energy source. During transportation, the excess energy from the e-vehicles can be supplied back to the grid, enabling better payoffs at the consumer end [5].
When energy storage facilities are not available, the selfconsumption at the generation site increases by 20-40% with the integration of additional devices as the renewable generation and load profiles do not coincide with each other [6]. In Germany, the installation of over 34000 decentralized solar energy storage systems was carried out in the past three years. In the domestic fields, new photovoltaic plants are installed as a routine to enable these applications [7]. During the past few years, despite the availability of several commercial energy storage devices like supercapacitors, chemicalhydrogen storage, and batteries, none of these devices has an efficient tracking and management system. Often, there is a compromise in the energy device as none of the devices can meet all the requirements of the customer for any specific application according to Daniel and Besenhard [8].
The high-power residential storage systems have a high energy density. When two or more energy storage systems are combined together, the coupling benefits the applications as one system complements the other based on the demand. An overall high efficiency is achieved by the management system design in such cases. Improper utilization of the available energy and facilities may result if the renewable energy allocation is not optimized. Hybrid energy storage systems (HESS) are formed by pairing two different storage devices. These devices are paired with each other and operate on a swap mode. It is primarily used in the building sector and several other applications. Coupling HESS with complementary characteristics is beneficial as the strengths of each device complement the other to optimize the system.
Multiple energy storage systems are interchangeably operated by the hybrid system thereby benefiting from the most efficient characteristics of each storage facility. An electric vehicle installed with photovoltaic components consisting of a vanadium redox flow battery as well as a solar lead-acid battery is considered as the HESS in this work. At a smart house premise, the power required to charge the e-vehicle with the integration of the vanadium redox flow battery and the solar lead-acid battery via PV installation is considered to be a commercially mature technology as it meets the load demand in an optimized manner. Low fluctuations are observed when lower grid interactions occur at a higher self-energy consumption rate at both the energy storage systems. This energy is termed the upper target. During deep discharges, intolerance and high current characteristics are observed with short cycle life in the lead-acid battery systems.
The power and capacity of the battery are independent of their sizes, making them easy to differentiate. Without causing any damage to the system, these batteries can perform deep discharge while maintaining the cycle durability of VRB-type batteries. It is possible to operate the battery in higher power ranges by replacing it with a powerful one or by increasing the capacity of the storage tanks. However, when compared to VRB, the lead-acid battery systems have a considerable efficiency rate, medium energy density, shorter reaction time, and financially favorable conditions. When compared to the lead-acid battery systems, the VRB has lesser efficiency and a higher price and can be called an immature technology [9]. Some of the key contributions of this proposed work include the following:

Literature Survey
In order to reduce the reliance on fossil fuels and adopt RESs, decentralization of electrical energy generation is a crucial step [10]. Over the past few years, there has been a 7% increase in wind power and 4% increase in solar PV globally. There has been a 13% increase in wind energy and 27% increase in the solar PV-based energy generation on average in the last five years [11,12]. RESs depend on weather constraints and are unpredictable, small in capacity, and complex. In conventional power systems, several issues and challenges with respect to high active and reactive power losses, voltage profile balancing, and network reliability are caused by these characteristics [13,14]. Hybrid backup energy source models and enhanced integration approaches are used by researchers to analyze the impact of the RES on the system. Some of the most commonly used energy backup sources are the battery banks and e-vehicles. They supply power to the grid during peak and emergency hours and increase reliability due to their interconnections. The system reliability is enhanced by researchers by considering battery banks as backup sources. Very few researchers have paid attention to using e-vehicles for this purpose. While overcoming the electric constraints, integration of e-vehicles into the grid is crucial. Reliability assessment and the advantages of integration of the RES are discussed in [15]. Reference [16] discusses the integration approaches and challenges of distribution of energy sources and the uncertainty model. For energy 2 Journal of Nanomaterials management, [17] discusses distributed energy resource optimization using an internet framework. Reference [18] discusses and analyzes the issues of unbalanced grids and voltage sag. The case studies with implementation in Western Australia, architecture development and application, and adaptive schemes for renewable energy source integration are discussed in [17] for overcoming the issues in the optimization of the RES. Improving the distribution network voltage profile, minimization of the utility cost, energy generation cost, tariff structuring, initial cost, carbon emission, line loading, real and reactive power losses, and other such multiple advantages rely on the proper allocation of the RES. Machine learning, neural network, particle swarm optimization, genetic algorithms, fuzzy logic controller, and other tools can be used for the optimization of voltage sensitivity analysis, voltage indexing, power loss sensitivity techniques, and other novel methods used for RES integration.
In order to integrate the RES into the grid, the roadmap is discussed by the authors in [19] whereas the limitations of integration of the RES into the grid are discussed by the authors in [20]. Various optimization techniques for the integration of e-vehicles and the grid are discussed by the authors of [21]. The reliability of the RES and integrated grid can be improved largely by identifying an appropriate vehicle-to-grid optimization approach. The currently used HESS systems cannot be optimized with conventional techniques. Several researchers are exploring and researching the control systems for this reason. The energy flow between the conventional battery and supercapacitor is managed with rule-based algorithms in Chatzakis et al. [22]. The threshold values are compared for parameters like battery output current and load demand for the application of the respective rule. The lithium-ion battery and LAB are used for the constitution of the HESS, which is controlled using the abovementioned techniques. The results are compared by Piao et al. [23]. When compared to first-order filtering, the rule-based approach termed "amplitude sharing algorithm" delivers better results. When the battery and fuel cell or battery and superconducting magnet are used as power storage systems, the power allocation is managed efficiently with the help of a fuzzy logic controller as observed in Zhang et al. [24] and Min et al. [25].
Without the need for complex mathematical knowledge, the most appropriate alternative can be chosen from the set of rules designed by experts using the rule-based algorithms to which this control technique belongs. With the increase in complexity of the system, the difficulty in configuration also increases [26]. The current technique in Zhang et al. [24] focuses on the smooth operation of fuel cells rather than the exploitation of storage systems with high-efficiency operation ranges. The generalization of the setting and attribution of a group of rules prior to design the management techniques for the analyzed HESS is weakened and is not considered within the scope of this paper. The first-order filtering technique is the commonly used HESS management technique in the existing literature. A conventional battery system is used in addition to a supercapacitor or any other storage system to manage the high power fluctuations according to this technique [22,27]. Along with several parameters, the response time of the two energy storage systems is considered for designing the linear filtering system.
In the current management system, the design is independent of the response time, and hence, the filtering techniques presented by Changhao et al. and Liu et al. [23,27] are unsuitable. Shema et al. [28] analyze a HESS project on the Pellworm island of the North Sea. Cost-efficient and stable energy supply is achieved by exploiting the redox flow batteries and lithium-ion batteries used in the storage system of the renewable energy sources, namely, 300 kWp wind farm and 700 kWp PV park. The optimization approach can follow the mixed-integer linear programming technique. The generalization part is not available in the linear programming algorithms that are applicable in a simple and easily comprehensive manner. When global solutions are claimed, this approach can be avoided. Hybrid techniques are often preferred in the existing literature as several techniques are combined thereby overcoming several drawbacks. For example, two neural networks are combined with a lowpass filter by Xia et al. in [29] to decide the reference power percentage that must be allocated to each facility. The efficiency behavior is worsened by the excessive on-site generation or demand energy partitioning between the available storage devices, making this technique unfavorable. As explained, unlike the case designed here, low pass filters are not applied. Chatzakis et al. in [22] conceptualized the combination of a filtering technique with a rule-based algorithm. However, the requirements mentioned in this topic cannot be fulfilled by this system.

Proposed Architecture
The proposed work is aimed at decreasing the HEV operating cost with respect to battery replacement and fuel cost. A machine learning framework with nested learning is used to ensure optimal methodology to trigger vehicular movement and monitoring of the SoC battery level. Here, an outer-loop adaptive learning to decrease battery replacement costs and inner-loop reinforcement learning to decrease the amount of fuel consumed are incorporated. In the inner loop reinforcement learning is used because of the following considerations: (1) The HEV energy management inner loop holds information on the current fuel consumption, power demand, and vehicle speed without retaining any prior information on the vehicle parameters (2) Different HEV operation modes are required to change the battery charge level, power demand, and change in vehicle speed for a driving trip. Based on the current state, different actions are taken using a reinforcement learning agent (3) Instead of decreasing the instantaneous fuel consumption at every time step, the inner-loop HEV energy management focuses on decreasing the total amount of fuel consumed during the driving trip. Similarly, instead of the immediate reward, reinforcement learning targets the cumulative return optimization 3 Journal of Nanomaterials For optimal energy management, knowledge on the current reward and current state of the system is required, without need for prior information. To decrease fuel usage, SoH degradation in the battery is taken into consideration along such that the inner-loop reinforcement learning involves battery capacity fading. The system works such that the inner loop acts as an independent HEV energy management system that automatically decreases the operating cost. To determine the optimal SoC range, an adaptive learning methodology is used in the outer loop for several trips. The SoH battery degradation can be identified by observing the SoC range. Hence, the outer loop is important in decreasing battery replacement in the e-vehicle. Moreover, using prior information about the average speed and trip length during the driving trip, it is possible to reduce the battery replacement cost, by the outer loop.

SoH Estimation.
When the HEV operates, there is a high possibility for battery degradation, leading to loss of its capacity. In general, a battery is said to reach its life expectancy when its fading level reaches 20%-30%. This can be expressed as follows: such that Q full represents the full charge capacity of the battery and Q nom full is the nominal value of Q fade . Similarly, the state of charge (SoC) can be expressed using equation (2) as follows: On continuous operation, the total capacity fading after M cycles can be determined using equation (3) as follows: It has been observed that the charging/discharging cycle of the battery holds the same average SoC and same SoC swing. But, in practical scenarios, it is not possible for the battery to adhere to a specific charging/discharging cycle pattern. Hence, a cycle-decoupling methodology which can be used to determine the pattern of battery charging/discharging is used. Using this technique, it is possible to calculate the total fading capacity of the battery.

Reinforcement Learning.
In the reinforcement learning environment, "agent" represents the decision maker while every other thing is known as the "environment." An agent-environment interaction is represented in Figure 1 for a sequence of "t" discrete time steps. Here, the state of the environment is observed for every step "t" such that a set of possible actions and states is taken into consideration. After one time step, the outcome of the action taken is given as a reward while a new state is established in a different environment. To incorporate inner-loop reinforcement learning, the reward for the action taken should be known to the HEV controller (agent) as it plays a crucial part in deriving the optimal policy. For an action taken "a" in state "s," the reward "r" can be evaluated using the following formula: such that w represents the battery weight, xx is the length of the time step, and ΔQ fade and w are the battery capacity fading and fuel consumption rate, respectively. Here, m f :ΔT can be determined directly from the fuel consumption while ΔQ fade can be determined with the help of SoH estimation and cannot be determined online. However, since the time complexity is large, it is safer to derive ΔQ fade using an equivalent cycle method. Accordingly, it is possible to calculate the SoQ avg and SoQ swing values using equation (5) as follows:  To determine the optimal policy, the TD(λ) learning algorithm is incorporated. This algorithm is known to showcase high performance and a high convergence rate in a non-Markovian environment. The lambda "λ" parameter represents the trace decay parameter. It lies within the range 0 and 1. Here, for every state-action pair, the Q value is represented by Qðs, aÞ. The charge level is represented as "q" and the vehicle speed is represented as "v" with respect to the state "s." Similarly, the action "a" will pick the kth gear ratio and its corresponding "i" current is discharged from the battery. The following are the steps involved in the TD(λ) algorithm: Step 1. An arbitrary value is assigned for Qðs, aÞ at the initial stage.
Step 2. For every step "t," an action "a" is chosen based on the values of Qðs, aÞ.
Step 3. The exploration-exploitation policy is used to avoid the risk of being caught in an optimal solution. This means that for the current state, the maximum Qðs, aÞ is not chosen using the action "a".
Step 4. Based on the action chosen, a new state is identified and given the reward.
Step 5. According to the reward and state, the values of Qð s, aÞ are updated for the various values of ðs, aÞ pairs within eðs, aÞ. eðs, aÞ represents the eligibility of every state-action pair that has been previously used.
Step 6. A new constant λ is used which holds a value between 0 and 1.
Step 7. When using the eligibility of the state-action pair, it is not necessary to update "e" and the Q value for every state action.
Step 8. Hence, a record of the most recent state-pair action "M" is recorded while all other pairs are ignored.

Result and Discussion
The operation of the electric vehicle is simulated and developed using vehicle simulator ADVISOR. Table 1 represents the key parameters that are taken into consideration. In this work, we have compared the proposed work with the rulebased policy and reinforcement learning policy. The output is recorded. Based on the simulation results observed, the following are the outcomes recorded: (1) On optimizing the total operation cost, it is seen that the replacement cost of the battery is significant (2) The cost of replacing the battery is a significant part of the total operating cost and is even identified to be higher than the fuel cost (3) The RL policy that is in effect will use the rule-based policy to decrease the fuel consumption. However, this policy will not take into consideration the cost of the battery The results of the observations are tabulated in Table 2 which also shows that for shorter drives, the battery replacement cost is more and it is optimally possible to increase the battery life using the proposed work.
The reading was taken for several trips across Coimbatore, Tamil Nadu, India, and the amount of charge required for the trips was recorded. The recordings indicate that the proposed work was able to record a significant reduction in the operating cost when compared with the RL and rule-based policy.

Conclusions and Future Scope
In this paper, the HEV energy management is outlined as well as a solution to efficiently manage it through optimization of the HEV operating cost with respect to battery replacement and fuel cost. In this work, the instantaneous fuel consumption is decreased at every time step, with focus on reducing the total amount of fuel consumed with innerloop HEV energy management. Similarly, instead of the immediate reward, reinforcement learning targets the cumulative return optimization. The proposed work indicates that for shorter drives, the battery replacement cost is more and it is optimally possible to increase the battery life by 21%. The readings observed also show significant reduction of about 8%-10% in the operating cost when compared with the RL and rule-based policy.