Compound Fault Diagnosis of Stator Interturn Short Circuit and Air Gap Eccentricity Based on Random Forest and XGBoost

Taking the traction motor of CRH2 high-speed train as the research object, this paper proposes a diagnosis method based on random forest and XGBoost for the compound fault resulting from stator interturn short circuit and air gap eccentricity. First, the U-phase and V-phase currents are used as fault diagnosis signal and then the Savitzky–Golay ﬁltering method is used for the noise deduction from the signal. Second, the wavelet packet decomposition is used to extract the composite fault features and then the high-dimensional features are optimized by the principal component analysis (PCA) method. Finally, the random forest and XGBoost are combined to detect composite faults. Using the experimental data of CRH2 semiphysical simulation platform, the diagnosis of diﬀerent fault modes is completed, and the high diagnosis accuracy is achieved, which veriﬁes the validity of this method.


Introduction
Traction motor is one of the key components of the drive system in CRH2 high-speed train [1]. Because of the harsh working environment and its special structure, the traction motor is prone to failure. e short circuit between stator windings is a common fault in the motor [2]. Static air gap eccentricity exists more or less in the motors in engineering practice, so when the motor stator windings are shortcircuited, it is equivalent to the compound fault of stator interturn short circuit and air gap eccentricity. e traction motor of high-speed railway is installed on the bogie of the train. Because of working for a long time in a harsh environment, the mechanical wear of the motor will destroy the symmetry of air gap magnetic field and cause air gap eccentricity [3]. e copper chip in the motor is easy to damage the insulation layer of stator windings, resulting in short connection between windings [4]. If not diagnosed in time, the fault will lead to the further expansion of insulation layer damage [5], which will sequentially increase the number of short circuit turns and cause more serious distortion of the air gap magnetic field [6].
In the domain of the engineering, the industries' circumstances varied. For the analysis of the complex mechanical components' fault under various nonlinear responses, Keshtegar et al. [7] proposed the Modified Response Surface basis Models for failure turbine blisk response which also appears to be multiphased, where it includes two regression processes for the purpose of regressing the input variables and calibration more precisely. Moreover, in the traction motor domain, the induction traction motor health diagnostics have been reviewed in [8]. Also, the new merged techniques such as deep learning [9] and transfer learning [10], where the detection knowledge could be learned from another domain, are quite useful when the data are deficient. Also, many genetic algorithms have also been studied and proposed where the method can handle various fault types [11,12] using multiobjective optimization methods.
Motivated by these observations, a multiphase diagnosis method based on random forest and XGBoost for the compound fault of stator interturn short circuit and air gap eccentricity is proposed in this paper. Compared with the existing results, the main contributions of this paper are threefold: (1) SG filtering is used for signal denoising pretreatment. e wavelet packet decomposition is used to extract the detailed information of each frequency band of the current signal to form the fault feature vector. (2) Considering the problem that the dimension of feature vector is too high, the PCA method is used to reduce the dimension of feature signal and eliminate the unimportant features. (3) e feature vectors after dimensionality reduction are used to train the random forest classifier, and the most important features are selected to train the XGBoost classifier, which improves the prediction accuracy and generalization performance of the classification model. e trained classification model is used to identify different fault modes, and the better result of compound fault diagnosis is obtained.
e diagnostic flowchart is shown in Figure 1. e remainder of this paper is summarized as follows. Section 2 proposes an SG filtering method to preprocess the three-phase current of the motor. Section 3 presents the signal feature extraction and optimization based on wavelet packet and PCA. Fault diagnosis based on random forest and XGBoost is provided in Section 4. Section 5 introduces CRH2 fault injection simulation platform. e experimental results and analysis are provided in Section 6. Section 7 concludes the paper.

Signal Noise Reduction Based on SG Filtering
e traction system of high-speed train works in tough environment, and the current signal is bound to be affected by the noise [12]. Savitzky-Golay filtering (SG) is one of the commonly used noise reduction pretreatment methods in signal analysis, which can improve the signal smoothness and reduce the impact of noise [13]. e SG filtering method is an improvement of the moving smoothing algorithm, which reduces the impact on useful information of the signal in the process [14]. According to the average trend in the signal time domain, the suitable filtering parameters are selected, and the least squares fitting in the set sliding time window is realized by polynomial [15]. By changing the window width, the SG filter is applied to the noise reduction smoothing of threephase current signal to reduce the noise interference and facilitate the extraction of fault features in frequency domain [16,17].
For a sequence of signals x[n], set the size of SG filter window to 2M + 1 while the center is n � 0. e following polynomial is used to fit this set of data points: According to the fitting polynomial, the error between the fitting curve and the original data curve can be obtained as follows: (2) When N � 2 and M � 2, the result of the filter is y(0) � p(0) � a 0 . Constant terms of p(n) are required. e SG filtering method uses convolution operation to perform an FIR filter on the original data to obtain the constant term, that is, to carry out weighted average operation on the data as shown in the following formula [18]: When the fitting residual of the least squares fitting curve is the smallest, its partial derivative with respect to each parameter is zero [19], as shown in the following formula: Auxiliary matrices A and B are introduced as follows: A � a n,i , a n, According to the above definition, the following formula can be deduced: en, matrices x and a are built as follows: x � erefore, we can obtain In formula (8), the convolution coefficient of the first row of the vector in the H matrix is extremely required. According to the expression of the H matrix, it is only determined by the highest coefficient of the least square polynomial and the size of the filter window, which is independent of the original data. e SG filtering method is used to preprocess the threephase current of the motor [20], which can effectively reduce noise interference and enhance the discrimination between fault signals and normal signals in the frequency domain, so that the accuracy of feature extraction of compound fault can be improved.

Signal Feature Extraction and Optimization Based on Wavelet Packet and PCA
In the process of wavelet packet transformation, a series of filters with the same bandwidth but different center frequencies are used to filter the signal, and the signal is decomposed into several layers, so as to analyze its details [21]. Wavelet packet transform can decompose the lowfrequency and high-frequency components of the signal for many times at the same time step, which greatly improves the resolution and enables better local time-frequency analysis of the signal without redundancy and omission [22]. In this paper, the stator current after noise reduction is decomposed by wavelet packet, and the wavelet packet coefficient of the final layer is obtained. When the signal is reconstructed according to the wavelet packet coefficient, the energy of each node is defined as the two norm values of the wavelet packet coefficient of the node [23]: where m is the number of the levels of decomposition and d mi is the wavelet packet coefficient of the last layer. e wavelet packet energy feature vector is extracted using the following formula. e elements represent the percentage value of the energy of each node and the total energy of all nodes, respectively.
e compound fault feature vector obtained above has a high dimension. If it is directly used as the input sample of the classifier, the complexity of the classifier model will be increased and many abnormal features will be learned, resulting in the phenomenon of overfitting and the poor generalization performance of the new test data [24]. erefore, the PCA is used to conduct dimensionality reduction preprocessing of feature vectors. On the basis of preserving the original information of data as much as possible, the correlation between features is analyzed and independent principal component features are selected, so that unimportant features such as noise are removed to enhance the generalization performance of the classifier.
Principle component analysis (PCA) is often used for dimensionality reduction compression of data [25]. Its core idea is to map the n-dimensional features to k-dimensional space [26]. e two orthogonal features of this dimension are called principal components, which are built on the basis of the original dimensional features [27]. In order to keep the information of the original data as much as possible in the new dataset, the PCA algorithm needs to find a set of dimensional basis vectors, so that the new data points generated when a feature of the original data is projected on the basis vector can be scattered as far as possible, that is, they have a large variance [28]. At the same time, the basis vectors must be orthogonal to each other to ensure a small degree of coupling [29]. In general, covariance is used to measure the linear correlation of feature's projection on different basis vectors.
In practice, the covariance matrix of dataset is the basis of PCA. For n-dimensional random variables e covariance between the two characteristic dimensions of a and b can be expressed as follows: where x a and x b are the mean values of the samples. Covariance is a measure of the degree of correlation between two variables.When the covariance is positive, it indicates the positive correlation between the two samples. When the covariance is negative, the two samples are negatively correlated. When the covariance is 0, the two samples are independent of each other. For n-dimensional samples, their covariance is actually a symmetric covariance matrix [30]. For example, the covariance matrix of 3-dimensional data A � (x, y, z) can be expressed as e elements on the main diagonal of the covariance matrix represent the variance on each feature dimension of the data, while the elements off the main diagonal represent the covariance between the two feature dimensions [31]. If the correlation between different features of the data is smaller, the value on the nonmain diagonal is smaller. When different features are not correlated with each other, the covariance matrix becomes a diagonal matrix. erefore, the goal of PCA dimensionality reduction is to make the diagonal elements of the covariance matrix of the feature dataset as large as possible and the off-diagonal elements as small as possible.
For the original data feature set Q, the steps of PCA algorithm based on eigenvalue decomposition covariance matrix are as follows: (1) Subtract the average of each dimension from its own.
(2) Calculate the covariance matrix QQ T /n. (3) Find the eigenvalue λ i (i � 1, 2, . . . , n) of covariance matrix with the eigenvalue decomposition method and the corresponding eigenvectors. (4) e eigenvalues are arranged from large to small, and the total contribution rate of the eigenvalues is calculated to select the feature that retains the most original data information. e total contribution rate of the first k eigenvalues can be expressed as In general applications, the feature vectors corresponding to k eigenvalues with a total contribution rate of more than 85% are selected to form the dimensionality reduction matrix p, and then the principal component dataset Q ′ can be obtained as Q ′ � PQ.

e Principle of Random Forest.
Random forest (RF) is a classification model based on decision tree and Bagging algorithm [32]. Random forest can be regarded as a set of decision tree classifiers, which is a kind of classifier with high accuracy, capable of processing a large number of input features, and able to evaluate the importance of features when determining categories. It has a high fault tolerance for abnormal feature values and is widely used in gene classification, image recognition, and other classification problems. e classification model of random forest is composed of n decision trees, in which each decision tree will classify the input samples. Finally. the random forest will vote on the classification results of each decision tree, and the classification results with the most votes will be counted as the final classification results of random forest [33]. In this paper, the characteristic optimization algorithm used in the decision tree model of random forest is CART algorithm, and every decision tree is binary tree. Random forest is an integration model of multiple decision trees, and Bagging algorithm is a common integration method of random forest model [34]. is method firstly trains several classifiers with training samples and finally sets up multiple classifiers through the clustering method to obtain the final classification results. Compared with a single classifier, the integrated classifier has higher accuracy and generalization [35]. Bagging algorithm from the given training focuses back on the extraction of the training sample, respectively, in each sample drawn to generate corresponding decision tree; then, all decision tree models were tested by testing the sample set and vote on several classification results. Finally, the decision tree with the most votes in the category of the classification result is the test sample. Bagging algorithm is a simple but stable integrated learning algorithm, which can effectively improve the generalization ability of decision tree classification model. Random forest classification algorithm adds random thought in the steps of Bagging algorithm, which is reflected in two aspects: (1) For the original dataset, a random sampling method is adopted, which is put back. e data in different subdatasets can be repeated, but the amount of data is consistent with the original dataset. Multiple output results can be obtained by training decision tree classifier according to the subdataset. When the new data are used to test the classification effect, the final classification result of random forest can be obtained by voting the classification results of multiple decision trees. (2) e random forest randomly extracts a certain amount of feature attributes from the feature set as the node split attribute set and then selects the optimal feature attributes among the features, so that the classification results of each decision tree are not the same, and the performance of the classifier is improved.
e classification process of random forest is as follows: (1) e Bagging algorithm is used to randomly sample the training dataset with return to obtain the training sample set θ k (2) Based on CART algorithm, a binary classification tree is generated for each random training sample, and the nodes of the classification tree select the classification attribute from the random feature attribute subset [36].

RF Feature Importance Assessment.
In the process of random forest construction, the importance of the features used in each tree is different, so the importance of each feature needs to be evaluated as the basis for feature selection. In this paper, Gini coefficient is used to rank the importance of feature sets and the features with the highest importance are selected as training samples of the XGBoost classifier in the next section. e feature importance score is represented by VIM to calculate the average change of the information impurity of upper and lower nodes split with feature x j in all decision trees in the random forest model, that is, the Gini index of this feature. e Gini index of node m can be defined as In the above formula, k is the number of representative categories and p k represents the weight of the k-th sample. e significance of x j in node m can be defined as the change of Gini index before and after the node splitting, that is, where G l and G r , respectively, represent the Gini index of the two children nodes obtained after the node m split to the lower layer of the decision tree. If all nodes x j of the i-th decision tree of random forest fall into set M, the importance of x j in the tree can be obtained as follows: If there are n trees in the random forest, the total importance score of x j can be further obtained: Finally, the importance score of all features can be normalized as follows: e denominator part in the formula is the sum of all the characteristic gains. In this way, the importance of the features used in the random forest model training can be obtained.

e Principle of XGBoost. XGBoost (Extreme Gradient
Boosting) algorithm is an implementation of the traditional boosting tree algorithm [37]. XGBoost algorithm overrides the target function and defines the tree complexity on the basis of the traditional gradient lifting decision tree. e traditional lifting tree model reduces the loss error through the iteration of several decision trees and finally obtains the classification model. During the construction of decision tree, node splitting is carried out according to the splitting criteria of regression tree, including the least square loss and logarithmic loss.
In XGBoost, the tree structure in the boosting model is decomposed into structure part and leaf weight part, and the complexity of the tree is defined as follows: where T is the number of leaf nodes, r is the control parameter of T, w j is the weight of leaf node, and f t is the mode function. e calculation method of lifting tree model is shown in formula (20), where L is the loss function and y i ′ and y i , respectively, represent the predicted value and the actual value of the model: Considering the lifting model loss function of n trees, the objective function of tree model learning can be defined as shown in the following equation, where c is a constant: e objective function in the above equation is relatively simple to find the optimal solution for the least square loss, but the solution for other loss functions is more complex. On this basis, the XGBoost algorithm is optimized by the second-order Taylor expansion. e second-order Taylor expansion is shown in the following equation: For the original objective function, two variables are defined so that they can be expanded: en, the original objective function can be rewritten in the following form: where Y i � l(y i , y (t−1) . Define the candidate feature set for each tree when nodes split as I j � i | q(x i ) � j . en, when the model is trained, the objective function can be expressed as where G i � i∈I j g i and H i � i∈I j h i . e value of w j can be obtained by taking the derivative of the above equation, and the optimal solution can be obtained by substituting it into the function, as shown in equations (26) and (27): In this paper, some feature sets of high importance of the random forest model are selected to train the XGBoost model to complete the classification and recognition of different fault modes, and the classification accuracy of the random forest model (RF) and the classification model Mathematical Problems in Engineering 5 combining random forest and XGBoost algorithm (RF + XGBoost) is compared.

CRH2 Fault Injection Simulation Platform
e experimental data used in this paper come from the CRH2 experimental simulation platform of Electric Locomotive Research Institute in Hunan Province. e platform consists of upper computer, dSPACE, traction control unit (DCU), and signal regulator. Its internal control strategy and relevant parameters are consistent with the traction drive system of CRH2 high-speed train. e platform appearance is shown in Figure 2.
e main circuit simulation module is built based on MATLAB and controlled by physical DCU, mainly including traction transformer, rectifier, intermediate DC part, inverter, and 4 asynchronous motors. MATLAB is used to generate code in the upper PC, and it is downloaded to dSPACE real-time simulator to run. e upper PC can control dSPACE real-time simulation module of the simulator using the software compiled in MATLAB.
DSPACE real-time simulator is mainly used to control the corresponding real-time computing simulation module.
DCU functions as the main circuit control system. It sends control instructions to the simulation module and receives the feedback value calculated by DSPACE to control the main breaker and the contactor in the main circuit.
DCU connects the debugging PC through its own Ethernet port. us, the controlling parameters as well as the data can be dynamically displayed and recorded; moreover, DCU provides the function of online disposal of the parameters and program download.

Experimental Results and Analysis
In this section, the experimental results of three parts in the process of compound fault diagnosis are introduced. e data used are from the semiphysical simulation experimental platform of Zhuzhou Electric Locomotive Research Institute. e first part is the signal noise deduction based on the SG filter. e second part is fault feature extraction based on wavelet packet and PCA. e third part is fault diagnosis based on random forest and XGBoost. e U-phase and V-phase current data of 300 sets under three fault modes were collected with 100 sets per fault mode. e three fault modes are as follows: Mode I represents the data of a normal situation; Mode II represents the data of 2-turn stator short circuit and 5% air gap eccentricity; Mode III represents the data of 4-turn stator short-circuit and 10% air gap eccentricity. e percentage of air gap eccentricity represents the ratio of the offset distance between stator and rotor center to the radial distance of the original air gap. In this paper, the data of three fault modes are detected and classified.

Signal Noise Reduction.
e U-phase current signal is collected when the current sensor is normal and the sampling frequency is 2500 Hz. Because of the high sampling frequency, the data segment of 0.1 second is captured for smoothing and noise reduction pretreatment, which is convenient to observe the filtering and smoothing effect. By experimental comparison, the least square polynomial with the order of 4 and window length of 13 is selected to obtain a better smoothing effect. e experimental results are shown in Figure 3.
It can be seen from the above figure that the SG filter can effectively reduce the influence of noise and smooth the signal by performing the least-squares fitting filter on the signal in each time window, which lays a foundation for frequency domain feature extraction of composite faults.

Fault Feature Extraction and Optimization.
According to the wavelet packet decomposition principle, the dbN wavelet is used to decompose the U-phase and V-phase currents of three fault modes in three layers. 8 energy values are obtained in its bottom layer to form a 16dimensional eigenvector.
ree 100 × 16 matrices are obtained, respectively, by decomposing 300 sets of data of three fault modes, and matrix eigenvalue decomposition approach is used to dimension the matrices. e obtained eigenvalues were arranged from large to small, and the increasing curve of the total contribution rate was obtained as shown in Figures 4-6.
As can be seen from the three pictures above, the total contribution rate of top 9 in the eigenvalues obtained from mode I data and top 8 in the eigenvalues obtained from mode III data is over 90%, while the total contribution rate of top 6 in the eigenvalues obtained from mode II data is over 90%. In order to retain the original information of the signal to the greatest extent, the original 16-dimensional feature data samples of the three fault modes are dimensionalized to a new space composed of feature vectors corresponding to the first nine eigenvalues, and the new 9dimensional principal component feature dataset X � x 0 , x 1 . . . x 8 is used as the input sample of the classifier in the next section.

Fault Diagnosis.
e 300 groups of 9-dimensional principal component samples are divided into training sample sets and test sample sets in a ratio of 3 : 1. Because of the randomness of sampling in the process of random forest training, for each decision tree, there is a part of data that is not involved in its generation process. is part of data is called out-of-bag (OOB) data of the tree. For each sample in the total sample set, the classification results of all the trees using it as OOB data are calculated, and the final classification results of the sample are obtained by voting. Finally, the ratio of the number of misclassified samples to the total number of samples is used as the OOB error rate of random forest. e OOB error rate is an unbiased estimator of random forest generalization errors [38].
In this section, the random forest classifier of the scikitlearn machine learning module in the Python library is applied to diagnose composite faults. e feature attribute function of decision tree node partition selects the default CART algorithm. e mesh search method was used to adjust other parameters of the model, and the optimal value of parameters was selected by comparing the out-of-pocket error rate of the random forest model. e original sample and the principal component sample are trained, respectively, to compare the performance of the random forest model.     e number of decision trees in the random forest is set as 200, and other parameters are adjusted. e training is conducted on the basis of 300 groups of original feature samples and principal component feature samples, respectively. e change curves of the out-of-bag error rate of the two models with the increase of the number of decision trees are obtained as shown in Figure 7.
e n features and n estimators in the above figure, respectively, represent the dimension of the input feature dataset and the number of the decision tree, while the blue line and the yellow line represent the OOB error rate of the random forest model derived from the 9-dimensional principal component feature dataset and the 16-dimensional original feature dataset, respectively. It can be seen that, after the dimensionality reduction optimization of the input feature dataset, the OOB error rate of the random forest is significantly reduced from 8% to less than 5%, that is, the prediction accuracy of the random forest reaches more than 95%.
e function in the RF model is called featur-e_importance, which ranks the importance of training features according to the Gini index principle, so as to obtain the normalized score of the feature importance of 9-dimensional principal component feature sample for training the RF model, as shown in Figure 8.
As can be seen from the above figure, after ranking the features according to the importance score, the total importance score of the features ranked in the top 6 reach over 90%. erefore, it is considered to remove the features ranked in the bottom 3 and train the XGBoost model with the collection of the top 6 features as the input sample.
e XGBClassifier function module in the XGBoost toolkit is used, and 300 groups of 6-bit solicitation are input for training. e proportion of the test dataset is set as 0.5, the training set is used to train the model, and the test set is used to test the accuracy of the model. Since XGBoost's base learner also selects the CART decision tree model, most of the tunable parameters are similar to those in the random forest model. In this section, the web search method is used to adjust parameters, the function is called GridSearchCV, and the optimal parameter combination is determined according to the classification accuracy of the model. e values of some important parameters after parameter adjustment of the XGBoost model are listed in Table 1.
With the exception of the remaining parameters listed in the above table using the default values in the XGBoost toolkit, the final model has an accuracy value of 97.4%. e classification performance of RF model and RF + XGBoost model was tested by reselecting 80 groups of data in each of the three fault modes. e results are shown in Table 2.
It can be seen from the above table that both the RF model and the RF + XGBoost model can effectively diagnose the stator interturn short circuit and air gap eccentricity compound fault compared with other methods such as SVM and ANN. At the same time, the RF + XGBoost model has certain improvement in detection accuracy compared with the RF model alone.

Results and Discussion
Aiming at the compound fault of stator interturn short circuit and air gap eccentricity of high-speed traction motor, this paper studies a diagnosis algorithm based on random forest and XGBoost. e detailed information of each  Finally, the dimensionality reduced preprocessed feature vectors are used to train the random forest and XGBoost classifier, which improves the prediction accuracy and generalization performance of the model. Based on the onboard experimental data of CRH2 train motor provided by Zhuzhou, the effectiveness of the diagnosis method is proved. In the future, further research will be made on the composite faults under no-load and half load conditions of the motor and multiple diagnosis classifiers will be designed for different working conditions to improve the application scope of the diagnosis method.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare no conflicts of interest.