Data-Driven ICA-Bi-LSTM-Combined Lithium Battery SOH Estimation

Lithium battery state of health (SOH) is a key parameter to characterize the actual battery life. SOH cannot be directly measured. In order to further improve the accuracy of SOH estimation of lithium batteries, a model combining incremental capacity analysis (ICA) and bidirectional long- and short-term memory (Bi-LSTM) neural networks based on health characteristic parameters is proposed to predict the SOH of lithium-ion batteries. First, the health characteristic parameters are initially selected from the lithium battery charging curve, and the health characteristics are extracted by the Pearson correlation coeﬃcient, including the charging time of constant current, charging time of constant voltage, voltage change rate from 300 s to 1000s, 200s of voltage per cycle at a time. Second, ICA was used to deeply mine the deep associations related to SOH and the peaks of IC curves and their corresponding voltages were extracted as additional inputs to the model. Then, Bi-LSTM is used to form a combined SOH estimation model through adaptive weighting factors. Finally, the veriﬁcation is based on the 5th battery parameters of the NASA lithium battery data set. The experimental results show that the proposed combined model reduces the mean square error by 55.17%, 49.28%, and 41.47%, respectively, compared with single models such as BP neural network (BPNN), LSTM, and gated recurrent neural network (GRU).


Introduction
With the increasingly serious energy crisis and environmental problems [1][2][3], the demand for energy is increasing [4]. In order to reduce carbon emissions and fossil energy consumption [5], various industries are actively promoting the development of the energy and power industry [6], such as lithium-ion batteries [7] and supercapacitors [8,9]. Lithium-ion batteries have become the preferred battery type for electric vehicles, mobile phones, power grids, and other application scenarios due to their high energy density [10], high output voltage, no memory effect, low self-discharge rate, and long service life [11]. However, the aging phenomenon of Li-ion batteries is inevitable, and battery degradation is a complex combination of internal reactions and external environment [12,13]. e most significant process of external performance is the reduction of battery capacity [14]. As an indicator of battery deterioration, SOH is a key indicator to characterize battery aging. It is generally believed that when SOH is reduced to 70%, the battery reaches the end of its life [15]. Its SOH is not clearly defined [16]. It is generally believed that the SOH of a lithium-ion battery can be calculated by the ratio of the current maximum capacity to the initial capacity. e mathematical formula is as follows: SOH cannot be directly obtained through measurement equipment [17]. erefore, how to accurately evaluate the battery aging of real vehicles under complex and variable operating conditions has become a core step in battery management [18]. At present, the prediction methods of lithium-ion batteries mainly include model-driven methods and data-driven methods. e model-driven method is mainly based on the complex internal physical model of the lithium-ion battery [19], which is partially established based on experience and is transformed into a mathematical problem, and the degradation process of the lithium-ion battery is represented by the method of learning modeling [20], such as the Kalman filter method [21] and particle filter method [22]. For example, Wang et al. used the Kalman filter to estimate the SOC of lithium-ion battery energy system [23], Gholizade-Narm and Charkhgard used a square-root unscented Kalman filter to estimate the SOC of lithium-ion batteries [24]. Hua et al. solved simultaneous unknown input and state estimation for the linear system with a rank-deficient distribution matrix using specific recursive steps of the corresponding filters [25,26].
With the advent of the era of big data, thanks to the powerful computing power and robustness of computers, data-driven machine learning (ML) [27] and deep learning (DL) have increasingly become important tools. For example, Cui et al. reviewed the use of deep learning for realtime prediction of SOC of lithium-ion batteries [28], such as artificial neural network (ANN) [29] and support vector machine (SVM) [30]. For example, Liu et al. used the SVM model to predict the state of health of lithium-ion batteries [31], LSTM, etc. Zhou et al. used LSTM to predict the remaining useful life of supercapacitors [32]. Data-driven machine learning methods have the following characteristics: (1) ere is no need to know too much about the internal mechanism of the target, and it is necessary to extract features that are highly correlated with the results.
(2) General ML and DL need a large amount of data as support. As a method that does not require the establishment of an internal mechanism model, a small amount of data alone cannot support ML for accurate model establishment.
(3) e quality of data is generally considered to be the reason that hinders ML from further improving the prediction accuracy. Due to unavoidable factors in actual measurement, the data are full of noise, which will seriously limit the offline training of ML models.
Among various data-driven methods, neural networks have received extensive attention in the field of battery life prediction due to their advantages of deeply mining nonlinear relationships between data. Among them, the long short-term memory network solves the gradient disappearance and gradient explosion problems of the recurrent neural network (RNN) itself and has achieved relatively good results in the field of battery life prediction, but the generalization ability of the data-driven method of a single model is limited. At the same time, its accuracy and robustness are still not high.
In order to further improve the SOH prediction accuracy of lithium-ion batteries, this study proposes a combined estimation method of ICA-Bi-LSTM based on ICA. Firstly, the health characteristic parameters are extracted from the voltage, current, and time of the battery charging stage by using the Pearson correlation coefficient. en, the local features of the input parameters are extracted by ICA [16], the local features are transferred to LSTM in a time-series manner, and the ICA-Bi-LSTM fusion model is constructed to deeply mine the potential relationship between healthy features and SOH.
e multimodel combined prediction model is constructed by adaptive weights to achieve accurate prediction of lithium battery SOH.

Battery Data Source.
is study uses the NASA lithium battery data set to study the state of health of the battery [33]. Based on the parameters of the 5th, 6th, 7th, and 18th batteries, the influence of battery aging on the internal parameters of the battery at room temperature and 24°C was observed.
e main parameters include charging voltage, charging current, and charging time. e charging process of these four sets of batteries includes two stages: constant current charging and constant voltage charging, that is, first charging in the constant current mode of 1.5 A until the rated voltage of 4.2 V is reached and then charging in the constant voltage charge mode until the current drops to 20 mA. e charging voltage and charging current curves of the battery are shown in Figure 1.

Health Feature Extraction.
Among the four groups of batteries in the NASA battery data set, since only one current maximum capacity value is given for each charge-discharge cycle, the extraction of the health features in this study is performed in units of each cycle. On this basis, with B0005 as the example, a lithium-ion battery reaches the end of its life after a total of 166 cycles.
As the aging degree of the battery increases, the maximum capacity of the battery will inevitably decrease. Correspondingly, the constant current charging mode time will decrease and the constant voltage charging time will gradually increase. e temperature of the battery is also an important measure, but since this dataset does not give rigorous battery internal temperature and lacks correlation with capacity fading, temperature features are not collected. In addition, the electrochemical reaction rate and the change of the internal resistance of the battery are considered important indicators to reflect the SOH of the battery, but these two items are not considered due to the lack of measurement devices in practical applications and their poor applicability. erefore, the health features to be extracted in this study include constant current charging time T im , constant voltage charging time T Vm , voltage change rate ΔV t from 300 s to 1000 s, and voltage V 200 at 200 s.
In order to quantify the degree of correlation between the health features proposed above and the SOH of lithiumion batteries, the statistical Pearson correlation coefficient was used to calculate the correlation coefficient between each feature and the SOH. Its calculation formula and calculation results are shown in the following formula (Table 1): Among them, E is the mathematical expectation, D is the variance, and the root of D is the standard deviation, is called the covariance of random variables X and Y and the difference between the two variables.
e quotient of covariance and standard deviation is called the correlation coefficient ρ xy of random variables X and Y. e larger the absolute value of the Pearson correlation coefficient is, the higher the correlation between the two is. If the correlation coefficient is greater than 0, it means that the feature is positively correlated with the current capacity of the battery and vice versa. e characteristics with the correlation coefficient are greater than 0.95, that is, the charging duration T im of the constant current and the voltage V 200 at the 200 s time are selected as the input parameters for the SOH prediction of the lithium-ion battery.

Incremental Capacity Analysis
ICA deeply excavated and quantitatively analyzed the relationship between the voltage and the capacity of the lithium-ion battery during the charging and discharging process [34]. During the charging process of a lithium-ion battery, the open-circuit voltage (OCV) has a relatively level section during the rising process. In this section, as the charged power increases, the internal voltage of the battery changes slowly, which is called a voltage plateau.
is is because the internal chemical reaction of the lithium-ion battery reaches a relatively balanced state during the charging and discharging process. As the number of cycles increases, the voltage plateau gradually shifts upward. As a unique electrochemical phenomenon of lithium-ion batteries, the change of the voltage plateau is very gentle and the amplitude is small, which is not conducive to quantitative analysis. ICA is shown by the following formulas: Among them, Q t is the electricity at time t, V t is the voltage at time t, and I t is the current at time t.
rough the conversion of the above formula, the flat area on the voltage platform curve is converted into the peak point on the capacity increment (IC) curve, that is, the maximum point of the slope of the QV curve, so that the change of the voltage platform can be intuitively reflected, and intuitively, the relationship between the external characteristics of the battery and the chemical reaction characteristics inside the battery is used to predict the battery SOH.
Since the NASA data set does not give the real-time capacity change value of each cycle, in the constant current charging mode, since the current is almost unchanged, the time difference is used to replace the capacity change as shown in the following formulas: Among them, I is the current constant 1.5 A of the constant current mode. e grey curve in Figure 2 is the IC curve directly drawn from the original sampling data. It can be seen that the curve is noisy, and the basic characteristics of the curve cannot be directly identified, so the curve needs to be denoised. e Savitzky-Gola (S-G) filter is used here, which is a filtering method based on local polynomial least-squares fitting in the time domain. e biggest feature of this filter is that it can ensure that the shape and width of the signal remain unchanged while filtering out noise improving the smoothness of the curve, and reducing the interference of noise. It varies with the selected window width and can meet different occasions on demand.
After filtering, as shown in the red curve in Figure 2, the curve becomes smooth after noise reduction, and there are three obvious peak points. e last two points can be regarded as a peak point due to the characteristics of the selected lithiumion battery. Voltage plateau characteristics are converted to IC curves. In this way, the relationship between the external characteristics of the battery and its internal electrochemical characteristics can be established more intuitively.
As shown in Figure 3, the number in the legend represents the number of cycles. It can be seen that with the continuous progress of the cycle, the peak value of the IC curve gradually decreases, and the voltage value corresponding to the peak  point gradually increases. is is because the chemical reaction inside the battery changes due to factors such as active material and lithium-ion loss, resulting in an increase in internal resistance and making the voltage plateau cheaper. erefore, the change of the peak position and peak amplitude on the IC curve can reflect the process of the decline of the state of health of the battery during the cycle, so as to estimate the SOH of the battery ( Table 2).

Bi-LSTM Model
LSTM is a type of RNN, which is an improvement over a simple RNN. e parameter learning of the RNN cycle can be learned through the back-propagation algorithm over time, that is, the error is passed forward step by step according to the reverse order of time. When the input sequence is relatively long, the gradient explosion or gradient disappearance problem will occur, which is also called the long-term dependency problem. To address this issue, gating mechanisms are introduced to improve recurrent neural networks, namely, LSTM and GRU. Figure 4 shows the detailed internal structure of LSTM, in which LSTM has three special network structures called "gates" [35]. e overall combined LSTM structure can more effectively determine the forgetting or retention of information, specifically, as follows: Forgetting Gate (f t ). e forgetting gate will jointly decide which part of the memory needs to be forgotten according to the current input x t , the state C t−1 at the last moment, and the output h t−1 at the last moment. e mathematical formula is as follows: Among them, W f represents the input matrix of the forget gate at the current moment, U represents the output matrix of the previous moment, b f represents the bias unit, and σ g represents the sigmoid activation function.
Input Gate (i t ). After the work of the forget gate is over, some information is deleted and the input gate determines which memories will enter the current state C t according to x t , C t−1 , h t−1 . e mathematical formula is as follows: Output Gate (o t ): After the new state C t is calculated, the output of the current moment is generated through the output gate according to x t , C t−1 , h t−1 . e mathematical formula is as follows: So far, the current state C t and output h t can be obtained through the overall structure of LSTM. e mathematical formula is as follows: Among them, σ c represents the tanh activation function and * represents the star multiplication, which is the point-to-point multiplication between matrices.
Based on LSTM, Bi-LSTM is a variant that performs better in many tasks. Since LSTM is particularly dependent on order or time, disrupting the time step or reversing the time step will completely change the features that LSTM extracts from the learned column, erefore, Bi-LSTM takes advantage of the sensitivity of LSTM to sequence order and combines two ordinary LSTMs of positive and negative time series to form a bidirectional LSTM network. is makes it possible to deeply mine the correlation between the characteristics of Li-ion batteries and their SOH from two sequence directions. Bi-LSTM can capture the information that may be ignored by the one-way network.

Data Processing.
In the process of measuring and collecting battery data, due to the influence of noise from uncontrollable factors such as environmental measurement equipment, the measured data will inevitably have abnormal values. erefore, in order to improve the accuracy of the proposed combined model, the Boxplot method is used to select abnormal data to further improve the potential correlation of the data [36]. Among them, the outliers are determined by quartile and interquartile range, and the data less than Q 1 − 1.5I QR and greater than Q 3 + 1.5I QR are set as abnormal data, as shown in the following formula: Among them, D low and D high are the lower and upper boundaries of abnormal data, respectively, Q 1 and Q 3 are the first and third quantiles of battery data, respectively, and I QR is the interquartile range. Using this formula, combined with the above method of determining outliers, delete abnormal data to improve data quality.
In the deep learning algorithm, the dimensions of each battery health feature are different [34], which will lead to a surge in the calculation amount and gradient explosion in the fitting process [35]. In order to improve the convergence speed and model accuracy of the model, Z-score standardization is adopted. e mean and standard deviation of the original data were used to standardize the data [37]. e processed data conform to the standard normal distribution, that is, the mean is 0 and the standard deviation is 1, as shown in the following formula: Among them, x and y correspond to the data before and after normalization, μ represents the mean of the original data, and σ represents the standard deviation of the original data. e processed data are all distributed around 0, the magnitude of the data decreases rapidly, and the values are relatively close, which is more conducive to the progress of deep learning.

SOH Estimation Method.
e overall estimation process of SOH is shown in Figure 5. First, the abnormal data of the battery charging voltage, charging current, and battery charging and discharging time are screened and eliminated; meanwhile, the IC curve is extracted for capacity increment analysis, and relevant features are extracted. At the same time, the health feature parameters are extracted by the Pearson correlation coefficient and normalized. en, all the normalized lithium-ion battery features are used as input to the Bi-LSTM network for model fitting, and the output results are added first through the adaptive weight module to finally realize the SOH estimation.

Case Analysis.
In order to verify the feasibility of the proposed ICA-Bi-LSTM combination model for improving the SOH estimation accuracy of lithium-ion batteries, the B0005 battery data set of the NSA data set was used as the simulation data. e B5 battery was cycled 166 times, and the first 140 cycles of the battery were used. e periodic data are used as the training set, and the last 21 groups are used as the test set (5 of which have been proposed as garbage data). e parameters of Bi-LSTM are set as follows: the time step of the input layer is 1, the data dimension is 4, the number of neurons in the hidden layer is 64, the maximum number of training is 175, dropout is added as a regularization method, and its parameters are set to 0.2. In the comparison experimental group, L1 and L2 are added to BPNN as regularization methods, and the parameters are set to 0.1. LSTM and GRU also add a Dropout layer as a Regularization method, and the parameter is also 0.2. Finally, for all the methods mentioned above, the neural network learning rate is set to 0.001 [36]. In the python simulation environment, BP neural network, LSTM, GRU, and ICA-Bi-LSTM are used to predict the SOH of lithium-ion batteries, and the results are shown in Figure 6. e error of the training set is the mean square error (MSE) between the predicted value of the battery capacity and the actual value, as shown in the following formula: From the analysis and comparison in Figure 6, it can be seen that after the learning and training of the above four algorithms, compared with the traditional BPNN, LSTM, and GRU, the combined model of multidepth mining features such as ICA-Bi-LSTM and bidirectional LSTM has quite high accuracy, and it has a strong generalization ability for the test set. Although the minimum value of MSE does not have much advantage compared to the minimum value of other models, its smooth error like a straight line makes the model very good in practical engineering applications. In terms of stability and refitting speed, the advantages of bidirectional LSTM are highlighted in that compared with other models, it can complete the fast fitting process while ensuring considerable accuracy, and both fitting efficiency and fitting accuracy have been exactly guaranteed.
h t x t Figure 4: LSTM structure.

Mathematical Problems in Engineering
From Table 3, we can quantify the average MSE of the training set and the average MSE of the validation set in the SOH prediction process of lithium-ion batteries for the four models. e fitting phenomenon is mainly caused by the scale of the data. After comprehensively considering the training set and the validation set MSE, ICA-Bi-LSTM still has significant advantages. Compared with the BPNN model, the accuracy of the ICA-Bi-LSTM model is improved by 55.17%, the accuracy of the LSTM model is increased by 49.28%, and the accuracy of the GRU model is increased by 41.47%. erefore, through the comparative analysis, it can be seen that the prediction method of ICA-Bi-LSTM mentioned by the knowledge has higher prediction efficiency and accuracy.    Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.