A Hybrid Data-Driven Approach for Multistep Ahead Prediction of State of Health and Remaining Useful Life of Lithium-Ion Batteries

In this paper, a novel multistep ahead predictor based upon a fusion of kernel recursive least square (KRLS) and Gaussian process regression (GPR) is proposed for the accurate prediction of the state of health (SoH) and remaining useful life (RUL) of lithium-ion batteries. The empirical mode decomposition is utilized to divide the battery capacity into local regeneration (intrinsic mode functions) and global degradation (residual). The KRLS and GPR submodels are employed to track the residual and intrinsic mode functions. For RUL, the KRLS predicted residual signal is utilized. The online available experimental battery aging data are used for the evaluation of the proposed model. The comparison analysis with other methodologies (i.e., GPR, KRLS, empirical mode decomposition with GPR, and empirical mode decomposition with KRLS) reveals the distinctiveness and superiority of the proposed approach. For 1-step ahead prediction, the proposed method tracks the trajectory with the root mean square error (RMSE) of 0.2299, and the increase of only 0.2243 RMSE is noted for 30-step ahead prediction. The RUL prediction using residual signal shows an increase of 3 to 5% in accuracy. This proposed methodology is a prospective approach for an efficient battery health prognostic.


Introduction
e depletion of fossil fuel resources and issues related to climate change provides a strong impetus to developers to focus on green energy resources, green transportation, and smart grids [1,2]. Energy storage devices are the core component in the above-mentioned fields. Due to their lightweight, high energy and power density, low self-discharge rate, and long lifecycle, lithium-ion (Li-ion) batteries have superiority among other sources of energy storage devices [3,4]. However, as the Li-ion battery is one of the system's costly components, it must be handled carefully using an efficient battery management system (BMS) [5]. e role of an intelligent BMS is to manage the battery efficiently and monitor the state of the battery with high accuracy. Li-ion battery malfunctions often lead to functional impairment, degraded performance, or total failure. In recent years, the estimation and prediction of battery state of health (SoH), state of charge (SoC), state of life (SoL), remaining useful life (RUL), and state of function (SoF) gained significant attention for battery health prognostic (BHP) [6][7][8]. In smart grids, renewable energy systems, and electric vehicles, battery life is one of the most important features to accomplish economic viability. In battery life, battery degradation due to dynamic operational conditions is one of the most critical issues. So early estimation and prediction of battery SoH and RUL are crucial tasks of smart BMS for reliable operation.
Researchers have been working on the Li-ion battery capacity estimation in recent years, as it is the determinative SOH indicator [9,10]. When the Li-ion battery capacity reaches 80% of its initial capacity, it must be replaced to ensure smooth and reliable operation [11]. However, the battery capacity cannot be measured using any physical sensor, so it is challenging to measure the accurate SoH and RUL. To date, various methodologies have been reported to estimate and predict the SoH and RUL. Based on the literature, these procedures can be categorized as specific model-based methods, data-driven methods, and hybrid approaches [1,12]. e model-based methods define the battery degradation behavior by using differential, algebraic, or empirical equations. Different researchers presented empirical models [13][14][15], mechanistic models (also known as chemical models) [16][17][18], equivalent circuit models [19,20], and fused models [21] to capture the battery degradation behavior. Hu et al. [22] presented a model-based method for coestimation of SoC and SoH of Li-ion batteries. e utilized fractional-order battery model is identified using a hybrid optimization algorithm, and the model shows a steady-state error of less than 1%. In their subsequent work [23], the authors utilized incremental capacity analysis to determine the SoH of the electric taxi. eir proposed methodology has the root mean square error of 0.0204. Although the modelbased methods have good accuracy, they still have some drawbacks. e empirical and equivalent circuit techniques are easy to build a model. Still, it only accurately measures the short-term states due to changing parameters during the cycling process. However, filtering algorithms are utilized to update the model parameters at the cost of the high complexity of the system. Similarly, mechanistic models also have increased complexity and require expert knowledge to build the model [1]. It is also difficult to build these models in noisy/uncertain environments. e data-driven methods require only Li-ion battery sensor data (voltage, current, and temperature) to predict the SoH and RUL [24]. Different machine learning algorithms were used to build the connection between operation data and battery degradation. Compared to model-based approaches, it does not require any complex physical model; it only builds a weight vector based upon its training data. Tian et al. [25] proposed a deep learning sequence to sequence model to predict the capacity degradation of the Liion battery. e authors used the data of one cycle of the Liion battery for multistep (100, 200, and 300 cycles) ahead prediction. In another study [26], for the prediction of the entire charging curve, a deep neural network was trained with discrete sections of the charging curves as input. irty data points were collected as input in less than 10 minutes to train the deep learning model. Wang et al. [27] proposed a data-driven approach to diagnosing the abnormality in the battery charging capacity. ese techniques need historical data to train the model. In the past, relevance vector machine, logic regression, and support vector machine have been reported to predict the RUL [28]. In a study [29], the authors presented the Bayesian model to predict the RUL of Li-ion batteries under dynamic operating conditions. ey showed that their proposed model had better prediction accuracy as compared to the support vector machine. Tang et al. [30] proposed a balancing current ratio-based SOH predictor for series-connected cells in a battery pack. Liu et al. [31] proposed a two-stage trajectory model to determine the future aging trajectory with uncertainty quantification. Wang et al. [32] proposed another variant of the Bayesian model to predict the RUL. Neural network [33,34], autoregressive fused model [35], and Box-Cox transformation [36] were also utilized to estimate the battery capacity. In all aforementioned literature, they directly neglect the effect of fluctuation and local regeneration phenomena in the capacity, affecting prediction accuracy. A Gaussian process functional regression model was proposed to tackle the issue of local capacity regeneration [37]. A variant of recurrent neural network (long short-term memory) was proposed to predict the Li-ion battery capacity [38]. eir experimental results show an average error of 0.0765 Ah (2.46%). In a recent study [39], a hybrid method based upon long short-term memory and Gaussian process regression (GPR) has been proposed to predict the capacity and RUL of Li-ion batteries. e GPR and long short-term memory were utilized to capture local regeneration and global capacity degradation trend. ey also predict the battery RUL for multistep ahead. e maximum noted error was less than 1.8%. However, it has been observed that the battery local fluctuation and regeneration have a significant impact on the multistep ahead prediction of SoH and RUL. erefore, further research is needed to predict q-step ahead SoH and RUL with high accuracy.
Driving by the desire to increase the BMS reliability and improve battery safety. In this study, a novel hybrid method consisting of multiscale kernel recursive least square (KRLS) and GPR is proposed for the q-step ahead SoH prediction of Li-ion battery. To be more explicit, the following are the proposed approach's key contributions: (i) e empirical mode decomposition (EMD) method is employed to split the local generation, global battery degradation, and other fluctuations. (ii) e KRLS with an autoregressive moving average with exogenous signals (ARMAX) model is recursively used to predict global battery degradation. GPR is applied to track the local fluctuation and regeneration of the Li-ion battery. (iii) Finally, the prediction of KRLS and GPR ensemble to obtain the final predicted SoH.
2 Computational Intelligence and Neuroscience (iv) e RUL is predicted using SOH, intrinsic mode functions (IMFs), and a residual value of the battery data. (v) e suggested approach is validated using various online datasets (NASA and CALCE). (vi) Experimental results and comparative analysis reveal the effectiveness and supremacy of the proposed methodology, respectively.

State of Health of Lithium-Ion Battery
Li-ion battery is a highly nonlinear and complex electrochemical system, which significantly impacts its health under dynamic operating conditions. SoC, SoH, SoL, and RUL are the different parameters primarily used to predict the health of Li-ion batteries [40,41]. SoH is one of the essential components of the BHP system [42]. e most widely accepted definition of SoH of the Li-ion battery can be stated as the ratio of battery capacities at the kth cycle and initial cycle. In other words, it can be explained using the following mathematical equation: where SoH k is the SoH at kth cycle, and Q k and Q o are the battery capacities at kth cycle and initial cycle, respectively. However, battery degradation can occur in the cathode and anode. erefore, a scalar SOH is not sufficient. For further details, see [43,44].

Methodology
In this section, the framework of the proposed methodology has been explained in detail.

Empirical Mode Decomposition (EMD).
e EMD is a very efficient tool for analyzing highly dynamic signals; it decomposes the nonstationary and nonlinear signals into different oscillatory components known as series of IMFs and residuals. Owing to its extraordinary abilities, it has been implemented in other fields (e.g., image processing, vibration, rotating machinery). Huang et al. [45] discussed the EMD approach in more detail. In the EMD approach, the IMFs should satisfy the following condition after decomposition.
(1) e mean value of upper and lower envelopes must be equal to 0 at any instant. (2) In the whole time series input dataset, the no. of zero crossings and the no. of extrema must be equal to 1 or 0.
In this work, it is considered that the local fluctuation and regeneration phenomena in original SoH signals are the high-frequency components, and global SoH degradation is the low-frequency SoH signal. is signal decomposition is also known as the sifting phenomenon. After finding all the extreme values (minima and maxima) in the input signal (x k ), then connect all the local minimum and maximum values using a spline line to develop a lower (e k,lower ) and upper (e k,upper ) envelope, respectively. After this, compute the local mean of both envelopes by using the following equation: Determine the difference (D) between the x k and the mean value (m k ).
After calculating the difference, check whether D fulfills the IMFs condition, as discussed above. If it meets all the conditions to be an IMF signal, remove it from the x k to obtain the residual signal (res).
Repeat all the steps until the residue meets the stopping criteria. All the information on local fluctuation and regeneration has been saved in IMFs, and monotonous residue contains the information on the global degradation of SoH [46]. By adding all the IMFs and monotonous residue, the original input signal can be described as follows: In this work, the wavelet and signal processing toolbox of MATLAB ® was utilized to perform the EMD. e flowchart of the working of EMD is shown in Figure 1.

Kernel Recursive Least
Square. In this work, the ARMAX model is used to predict the SoH of the battery. e ARMAX model can be represented using the following equation [47]: where y and u are the measured signal and desired response, respectively. α, β, and c are the model coefficients, which have to be estimated recursively. ε represents the zero-mean Gaussian noise. M and N are the order of the system and the input. e above mathematical model can be written in a simplified form as follows: where φ T is the transpose of the regression vector. e KRLS method can be utilized to determine the unknown coefficients of the above equation. e cost function can be expressed by the following equations: where Mercer kernel is represented by κ. ϕ, R, H, and λ are the kernel matrix, regularization factor (always taken as a positive number), reproducing kernel Hilbert space (RKHS), and the forgetting factor, respectively. e most commonly used kernel for prediction are the Gaussian kernel [48], where σ, φ ″ , c, and p are the scaling factor, latest upcoming data, positive valued constant, and polynomial order, respectively. s and t both are positive constants. In this work, all the kernel function was implemented.
e presented results are of the polynomial kernel, which shows the best accuracy. e KRLS method works by mapping input data into high dimension RKHS. In this process, the linear inner product changes into RKHS by simply replacing the inner product with kernels [49,50]. e linear algorithms can then be used to solve the transformed feature space (RKHS).
e unique global solution is the salient feature of kernelbased methods [51]. Additionally, if the input data is highly nonlinear, the linear regression techniques fail to model it accurately. Kernel-based algorithms can easily tackle this issue by mapping the nonlinear data into high dimension linear feature space. Because of the high dimensionality of data in RKHS, it experiences overfitting problems. is issue can be resolved by penalizing it to the L2 norm, as shown in (10) [52], which can be solved and updated as follows [53]: e approximate linear dependency criteria are used to reduce the computation complexity of KRLS due to an increase in observations [54]. In this work, the KRLS coupled with approximate linear dependency has been employed using MATLAB ® . To estimate the model capacity (y k ), (7) can be written as follows: (13) can be modified for q-step ahead prediction (y k+q ) as follows:

Gaussian Process Regression.
A GPR is an effective approach to solving nonlinear regression and classification problems [55,56]. GPR is a probabilistic nonparametric model, which combines different variables; these combinations are defined by the probability distribution (f(x)). e GPR model can be described by its mean and covariance (kernel) function as follows: where m(x) and k(x, x ′ ) are the mean and covariance functions, respectively. e m(x) function is mainly assumed as zero. e relation between input and output can be expressed as follows: where ε is the additive noise, which has zero mean and variance of σ 2 n . ε ∼ N 0, σ 2 n .
, and I is the M × M unit matrix. According to [57], the marginal distribution of p(f) can be written as follows:  Computational Intelligence and Neuroscience where K � k(x i , x j ), using (18) and (19).
where K y � K + σ 2 n I, for the prediction of the target value (y * ) for the updated input value, the joint distribution over y 1 , y 2 , y 3 , . . . y m , y * can be written as follows: where f * � f(x * ) is the latent function corresponding to its input x * and noise ε * .
e predictive distribution p(y * |y) is the Gaussian distribution, which has the following characteristics: e K − 1 y can be calculated using Cholesky decomposition [58]. e covariance (kernel) function is a very critical component in the prediction process. e rational quadratic kernel functions are used for the prediction [39].

Proposed Methodology.
In this work, EMD, KRLS, and GPR-based fused battery SoH prediction models have been proposed.
e framework of the proposed approach is shown in Figure 2. e raw battery sensor data is passed through the Savitzky-Golay filter to reduce the measurement noise error [59]. e filter is implemented using the MATLAB ® tool sgolayfilt. After that, the battery SoH was calculated using (1). e EMD technique is utilized to decompose the battery SoH in IMFs and its residual signals, as discussed in Section 3.1. e KRLS and GPR methodology was adopted to track the global degradation and local regeneration phenomenon in the Li-ion battery, respectively. Finally, the predicted IMFs and residuals were ensembled to get the predicted SoH. When the predicted SoH exceeds the battery end of life (EOL), the RUL will be predicted. Percentage fitting (FIT) and root mean square error (RMSE) were utilized to evaluate the performances of SoH prediction.
where y k and y k are the original and estimated output, and N is the total number of samples.
In this study, to examine the accuracy of RUL prediction, the following testing standard has been followed:

Experimental Data and Results
In this section, the proposed methodology's distinctiveness is evaluated using NASA's online available data source [60]. e details of different battery datasets are presented in Table 1. All the processing is done on MATLAB 2021 ® with the personal computer having the specification of Intel(R) Core (TM) i7-10700 CPU @ 2.90 GHz processor with 32 GB RAM, 1 TB SSD, and a 64-bit Windows 10 Pro operating system (OS). e cyclic aging experiments were carried out on all NASA batteries using a programmed electric load, adjustable temperature chamber, and electric supply [61].
e discharge current and temperature of all the Li-ion batteries are shown in Table 1. Further details of the experimental setup can be found in [61]. e SoH trends of all Li-ion batteries can be seen in Figure 3.
After collecting battery data through transducers, it passes through the Savitzky-Golay filter. e filter reduced the measurement noise error. e EMD technique decomposes the Li-ion battery SoH into residual and IMFs signals, as shown in Figure 4. e prediction results of Li-ion batteries B0005. B0006, B0018, B0055, and B0056 using the proposed technique (EMD, KRLS, and GPR) are shown in Figure 5, respectively.
For the comparison between the proposed and other methodologies such as solo GPR, solo RLS, EMD + GPR, and EMD + KRLS, the results are presented in Figure 6.
To further validate the model, another available online dataset of the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland is used for prediction [62]. e Arbin BT2000 system with a temperaturecontrolled chamber was used to perform all cycling tests on the CALCE battery dataset (CX2-16). e CX2-16 battery was drained at 1.1 A steady current, for further information on the experimental setting, see [39,61]. 60% of the data is used for the training and the rest for the Li-ion battery capacity prediction (CX2-16).
e prediction results are shown in Figure 7. Computational Intelligence and Neuroscience e FIT of 1-step ahead prediction of the proposed methodology for all datasets is shown in Figure 8. e q-step ahead prediction of the proposed methodology is shown in Figure 9 (for B0018), and the RMSE results of all datasets are presented in Table 2, respectively. e q-step ahead prediction comparison of proposed and other methodologies is illustrated in Figure 10.
e results of RUL prediction accuracy against different parameters have been presented in Tables 3-7 for the Li-ion batteries. For all Li-ion battery datasets, the RUL is predicted at various cycle numbers to check the accuracy. e comparison of the proposed approach with another state-of-theart study is shown in Table 8.

Discussion
In this work, a BHP model is proposed to avoid unexpected battery failures. As discussed earlier, the accurate and early SoH prediction of Li-ion batteries is one of the main components of intelligent BMS. e basic framework of the proposed approach is shown in Figure 2. After the filtration step, the EMD technique divides the SoH of the Li-ion battery into its global degradation (residual) and local regeneration (IMFs) (see Figure 4). e EMD technique consumes 0.54 ms and 2.31 ms to decompose the data of Li-ion batteries B0055 (102 data points) and CX2-16 (1998 data points), respectively. e residual shows the actual SoH degradation of the Li-ion battery ( Figure 4). Meanwhile, all local regeneration points of the original SoH were captured by all IMFs. e one-stepahead prediction results of Li-ion batteries B0005. B0006, B0018, B0055, and B0056 are shown in Figure 5. 110 battery cycles out of 168 were used to train the model for B0005 and B0006, as shown in Figures 5(a) and 5(b). e KRLS effectively tracks the residual values without any significant error, as shown in Figure 5. e GPR was utilized to predict the IMFs signal of the batteries, and it shows good tracking ability, also reported in [39]. e proposed methodology shows similar accuracy in the case of B0018. 80 out of 127 samples were used to train the models (see Figure 5(c)). In [61], the author used B0005, B0006, and B0018 to validate his proposed multiscale logic regression (LR) and GPR model. e results showed the maximum RMSE of 0.8 for 1-step ahead prediction; in comparison, our proposed methodology shows the maximum RMSE of 0.284 for the mentioned dataset. e data of Li-ion batteries B0055 and B0056 was noisy because these batteries were operated at 4°C. e proposed methodology still shows high accuracy in the presence of perturbation, as seen in Figures 5(d) and 5(e).   Computational Intelligence and Neuroscience Figure 6 reported the comparison results; the solo GPR has poor tracking capability and shows a significant prediction error. In contrast, EMD with KRLS has shown the secondbest prediction accuracy after the proposed approach. For the CALCE dataset, 1200 data points from 1998 were used to train the model. e prediction RMSE was just 0.64 for the whole prediction of 798 data points (see Figure 7). e proposed method predicts 1-step ahead values with high accuracy (Figure 8). e SoH fitting accuracy of B0055 and B0056 is on a bit lower side due to high perturbation in the measured signal. However, it still shows better accuracy as compared to [61].
For all the datasets, for q-step ahead prediction, 5, 10, 15, 20, 25, and 30 steps ahead prediction was carried out. e graphical presentation of the q-step ahead of the Li-ion battery (B0018) is shown in Figure 9. It can be observed that the proposed methodology shows high accuracy even in the case of a 30-step ahead prediction (see Table 2). e RMSE of the 1-step prediction of B0006 was 0.2299, while it shows only a small increase of 0.2243 in RMSE for the 30-step ahead prediction. In some cases, the prediction RMSE reduces with the increase of the value of the ahead prediction step. In the case of B0005, the 0.2823 RMSE was noted at 5step ahead prediction, while the RMSE at 10-step ahead prediction is just 0.2296, which is 0.0527 lesser than the 5step ahead prediction error. At 5-step ahead prediction of Liion battery (B0005), there was a regeneration point to predict, which is why the RMSE was more at 5-step than 10step. e maximum RMSE of 1.1021 was noted for Li-ion battery (B0055) at 30-step ahead prediction under a perturbated environment. e q-step ahead prediction comparison analysis reveals the effectiveness and distinctiveness of the proposed methodology under q-step ahead prediction (see Figure 10).
For a smart BMS, the early accurate prediction of Li-ion battery RUL is one of the key components for safe and reliable operation. Different features were used to predict the RUL at different cycle numbers using the proposed robust   Computational Intelligence and Neuroscience model in this work. Predicted SoH, IMFs, and residual were used to estimate the future RUL of the Li-ion battery. All the RUL prediction results of Li-ion batteries are tabulated in Tables 3-7. For B0005 and B0006, the RUL prediction was started at cycles 50 to 120 with a difference of 5 cycles; it can be observed in Table 3 that the RUL accuracy was just 75.59% at the 50 th cycle using SoH as the predictor, while residual has the accuracy of 99.21% at the same point. In [61], the RUL prediction accuracy of just 79.84% was observed at the 50 th cycle. e RUL prediction accuracy increased with the prediction point (i.e., at 110 cycles, the RUL prediction accuracy was 94.57%). Similarly, a prediction error of 3.3% was noted in [39]. e residual has the minimum RUL prediction accuracy of 96.06% at the 90 th cycle. In comparison, SoH has an accuracy of 96.85% at the same point. e accuracy of IMFs was far below the accuracy of SoH and residual, which is also reflected in the results. e average RUL prediction accuracy using residual and SoH as a feature   Tables 4-7). For Li-ion battery CX2-16, the average RUL prediction accuracy was 99.51%. An average absolute error of only 3, 12, and 3 cycles is noted for the Li-ion battery B0005, B0006, and B0018, which is 13, 10, and 2 cycles lesser than the other study [61] (see Table 8). Hence, after extensive experimentation and comprehensive analysis, it can be concluded that the proposed trained model predicts the SoH and RUL with high accuracy.

Limitations and Future Perspectives
e presented technique for predicting battery health might be employed to develop a BMS. e prediction model, on the other hand, is validated in a controlled environment, such as constant charging/discharging current and temperature. In contrast, the operation circumstances fluctuate substantially        throughout cycles, causing the battery to deuterate in numerous phases. erefore, the performance of the proposed approach must be checked under dynamic conditions. Furthermore, the RUL prediction of a single battery cell is solely considered in this study. However, in a battery pack, numerous cells are connected in series/parallel. Because of the unequal aging of the battery cells caused by the temperature differential, the battery pack RUL prediction must be investigated with uncertainty quantification in the future.

Conclusion
In this work, the battery health predictor has been proposed to reduce the chances of unexpected battery failures. To address the issue of accurate prediction for local regeneration in the SoH signal, the EMD technique was employed to decompose data into low and high-frequency signals. e recursive KRLS method was utilized to track the global battery degradation and GPR to predict the local fluctuation and regenerations points with high accuracy. e proposed methodology shows above 91% fitting accuracy at 1-step ahead prediction under a normal environment. It has the maximum RMSE of 1.1021 at 30-step ahead prediction under a perturbated environment. e comparison analysis also illustrated that the proposed methods are more effective and accurate. Furthermore, the results show that the RUL prediction using the residual has 3 to 5% higher accuracy than the RUL prediction using SoH. It means that the proposed technique can be utilized to design the battery health prognostics.

Conflicts of Interest
e authors declare no conflicts of interest.