^{1}

^{2}

^{1}

^{2}

^{1}

^{2}

^{1}

^{2}

On basis of fault categories detection, the diagnosis of rotor fault causes is proposed, which has great contributions to the field of intelligent operation and maintenance. To improve the diagnostic accuracy and practical efficiency, a hybrid model based on the particle swarm optimization-extreme gradient boosting algorithm, namely, PSO-XGBoost is designed. XGBoost is used as a classifier to diagnose rotor fault causes, having good performance due to the second-order Taylor expansion and the explicit regularization term. PSO is used to automatically optimize the process of adjusting the XGBoost’s parameters, which overcomes the shortcomings when using the empirical method or the trial-and-error method to adjust parameters of the XGBoost model. The hybrid model combines the advantages of the two algorithms and can diagnose nine rotor fault causes accurately. Following diagnostic results, maintenance measures referring to the corresponding knowledge base are provided intelligently. Finally, the proposed PSO-XGBoost model is compared with five state-of-the-art intelligent classification methods. The experimental results demonstrate that the proposed method has higher diagnostic accuracy and practical efficiency in diagnosing rotor fault causes.

The steam turbine rotor plays an important role in transforming thermal energy into mechanical energy. In a high-speed rotating working station, any defect on the rotor will affect the safe running and even cause serious accidents [

In the field of industrial intelligent operation and maintenance, the research studies mainly focus on the detection of rotor fault categories [

In essence, diagnosing rotor fault causes is a classification problem, and various intelligent classification methods have been applied. Support vector machine [

To diagnose rotor fault causes accurately and efficiently, a hybrid model based on the particle swarm optimization-extreme gradient boosting algorithm (PSO-XGBoost) is proposed. XGBoost, a scalable end-to-end tree boosting system, with the second-order Taylor expansion and the explicit regular term, is used as a classifier to diagnose the rotor fault causes. PSO is used to automatically optimize the parameters such as the L1 and L2 regularization terms on weights during the XGBoost model training, which overcomes the low accuracy and low efficiency when using the empirical method or the trial-and-error method to adjust these parameters of the XGBoost model. The hybrid model combined with the advantages of the two algorithms can diagnose rotor fault causes more accurately. Following diagnostic results, maintenance measures referring to the corresponding knowledge base are provided intelligently.

The innovations and main contributions of this study are described as follows:

On basis of fault categories detection, the diagnosis of rotor fault causes is proposed, which has great contributions to the field of intelligent operation and maintenance

A novel hybrid model based on PSO and XGBoost is developed to effectively simplify the parameter adjustment process of the XGBoost model and improve the accuracy of diagnosis

The further detailed structure of this study is summarized in the remaining sections. Section

XGBoost [

For a given dataset with ^{th} sample, composed of fault types

The objective function is

The first term

Formally, let ^{th} instance at the ^{th} iteration; then, add

Second-order approximation can be used to optimize equation (

The definition of the tree

Here,

After reformulating the tree model, the objective value with the ^{th} tree can be written as^{th} leaf.

By defining

In equation (

Equation (

Equation (

The particle swarm optimization algorithm [

Suppose a population ^{th} particle is represented as a D-dimensional vector. According to the objective function, each particle’s corresponding fitness of position can be calculated. The individual extremum of ^{th} particle’s speed is

In equations (

Although XGBoost has excellent results in many aspects, there are many parameters in it and different combinations of parameters determine the performance of the model to a large extent. PSO has the unique advantage of optimizing the parameters of XGBoost, which can effectively improve the effectiveness and accuracy of diagnosing rotor fault causes. In this study, six parameters that have a great influence on the model are optimized by PSO. The information of each parameter is given in Table

Parameters to be optimized.

Parameter | Default value | Range | Explain |
---|---|---|---|

Eta | 0.3 | (0, 1) | Learning rate. |

Subsample | 1 | (0, 1) | Subsample ration of the training instance. |

colsample_bytree | 1 | (0, 1) | Subsample ration of columns when constructing each tree. |

colsample_bylevel | 1 | (0, 1) | Subsample ration of columns for each level. |

reg_alpha | 0 | (0, ∞) | L1 regularization term on weights |

reg_lambda | 1 | (0, ∞) | L2 regularization term on weights. |

According to Table ^{th} particle at the ^{th} iteration can be expressed as

The position vector is assigned to the corresponding parameters of XGBoost, and the negative accuracy score of the XGBoost model is used as the fitness value to measure the performance of PSO. The fitness value of the ^{th} particle at the ^{th} iteration is shown as

The individual optimum of the ^{th} particle at the ^{th} iteration is

The global optimum of the ^{th} particle at the ^{th} iteration is

The XGBoost algorithm and the PSO-XGBoost algorithm are shown in Figure

XGBoost and PSO-XGBoost methods. (a) XGBoost algorithm flow chart. (b) The proposed PSO-XGBoost algorithm flow chart.

The procedures of the proposed method for diagnosis of rotor fault causes, which can be seen from Figure

Step 1. Initialize the particle swarm. Initialize the particle swarm parameters, including the particle number, learning factors, weighting coefficient, and the maximum number of iterations.

Step 2. Train the XGBoost model. The parameters to be optimized change along with the flying of particles.

Step 3. Calculate and assess the fitness value. The fitness value, originating from the output negative accuracy score of the XGBoost model, is used to evaluate the performance of PSO. A smaller fitness value indicates better performance.

Step 4. Judge the stop condition. Terminate the iteration process and obtain the optimal parameters of the XGBoost model if the number of iterations is reached. Otherwise, proceed to the iterative calculation.

Step 5. Validate the classification model. Use the optimization results to build the XGBoost model and output the results of diagnosing rotor fault causes.

In this study, 450 sets of operation data related to three kinds of high-pressure rotor faults of a 330 MW unit in a power plant are summarized as example verification. The specifications are given in Table

The specific rotor fault causes.

Number | Fault type | Cause description | Number of samples | Label |
---|---|---|---|---|

1 | Rubbing fault (F1) | Rubbing at shaft seal caused by cylinder deformation | 50 | C1 |

2 | Rubbing at shaft seal caused by the fast rate of loading up | 50 | C2 | |

3 | Rubbing at shaft seal caused by the long time of low load remaining | 50 | C3 | |

4 | Rotor rubbing with oil baffle | 50 | C4 | |

5 | Mass imbalance fault (F2) | Poor stiffness of bearing pedestal | 50 | C5 |

6 | Fracture and falling off of rotating parts (blades and coupling wind shields) | 50 | C6 | |

7 | Other reasons | 50 | C7 | |

8 | Self-excited oscillation (F3) | Poor stability of bearing | 50 | C8 |

9 | Excessive journal disturbance | 50 | C9 |

In this study, ten running parameters with high correlation with rotor rubbing fault, mass imbalance fault, and self-excited oscillation fault are selected, including the steam temperature of high-pressure cylinder shaft seal and cylinder expansion value of high-pressure cylinder. The details are given in Table

Running parameters of a high-pressure steam turbine rotor.

Index | Parameters | Unit | Symbol |
---|---|---|---|

1 | Steam temperature of high-pressure cylinder shaft seal | °C | HT1 |

2 | Temperature difference between upper and lower cylinders of a high-pressure cylinder | °C | HT2 |

3 | Expansion value of a high-pressure cylinder | mm | HE1 |

4 | The change rate of unit load | Mw/min | HC1 |

5 | High-pressure cylinder temperature | °C | HT3 |

6 | Lubricating oil temperature of high-pressure rotor | °C | HT4 |

7 | Fundamental frequency amplitude of No.1 bearing shaft vibration in |
mm | X1 |

8 | Fundamental frequency amplitude of No.1 bearing shaft vibration in |
mm | Y1 |

9 | Fundamental frequency amplitude of No.1 bearing pedestal vibration in |
mm | X2 |

10 | Fundamental frequency amplitude of No.1 bearing pedestal vibration in |
mm | Y2 |

In all, the input data, composed of running parameters and fault type, have eleven dimensions, i.e.,

Data preprocessing aims to make the data adapt to the model and match the model’s needs. Data preprocessing mainly includes missing value processing, data dimensionless processing (including central processing and scaling processing), classified feature processing (text to digital), and continuous feature processing.

For missing values, in this study, the mean is used to fill the numerical feature, and the mode is used to fill the character feature.

In the original dataset, digits do not represent the fault types in the classification features (rubbing fault (F1), mass imbalance fault (F2), and self-excited oscillation (F3)) and fault cause category labels (rubbing at shaft seal caused by cylinder deformation (C1) and rubbing at shaft seal caused by the fast rate of loading up (C2)). For making the data adapt to the algorithm, the data must be encoded, converting texts to numerical types. The independent fault types (F1, F2, and F3) are transformed into dummy variables by using one-hot coding, namely, F1 [1 0 0], F2 [0 1 0], and F3 [0 0 1]. Labels of fault cause category [C1, C2, …, C9] are directly converted into the digital form [0, 1, …, 8].

First, decentralize the data by mean

The preprocessed dataset is given in Table

Preprocessed dataset.

Number | HT1 | HT2 | HE1 | … | F1 | F2 | F3 | Label |
---|---|---|---|---|---|---|---|---|

1 | −0.31395 | −1.0698 | −0.14448 | … | −0.89442719 | 1.414213562 | −0.534522484 | 6 |

2 | −1.33696 | −0.97908 | −0.94812 | … | −0.89442719 | −0.70710678 | 1.870828693 | 7 |

3 | −0.41837 | −1.61409 | −0.5463 | … | −0.89442719 | −0.70710678 | 1.870828693 | 8 |

… | … | … | … | … | … | … | … | … |

449 | 1.05843 | 1.440195 | 0.900258 | … | 1.118033989 | -0.70710678 | -0.534522484 | 0 |

450 | 1.045868 | 1.149048 | -0.38557 | … | 1.118033989 | -0.70710678 | -0.534522484 | 0 |

The test set is used to verify the performance of the PSO-XGBoost model. The model is quantitatively evaluated using evaluation indicators, such as the accuracy, confusion matrix, precision, recall, and F1-score [

The results can be divided into four classes, including true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Here, TP is the correct predicted positive category, FP is the incorrect predicted positive category, TN is the correct predicted negative category, and FN is the incorrect predicted negative category.

Accuracy is simply a ratio of the correctly predicted classifications to the total dataset. In formula, the accuracy ratio is

Precision is the ratio of the system generated results that TP to the system’s total predicted positive observations, both TP and FP. In formula, the precision ration is

Recall is the ratio of the system generated results that TP to all categories in the actual class. In formula, the recall ratio is

F1-score is the weighted average of precision and recall, and the calculation formula is

The confusion matrix is used for evaluating the model when faced with a multiclassification problem. Each column of the confusion matrix represents a predicted category, and the total numbers of data for each column represent the number of data predicted to be in the category. Each row represents the data’s actual category, and the total numbers of data for each row represent the number of data instances belonging to that category. For a confusion matrix, the larger the value on the diagonal is, the better the matrix. The smaller the value on other locations is, the better the matrix.

The result is shown in Figure

The accuracy of the PSO-XGBoost model.

The confusion matrix of the PSO-XGBoost model.

Figure

Table

PSO-XGBoost’s evaluation indicators.

Accuracy (%) | Precision | Recall | F1-score |
---|---|---|---|

98.52 | 98.52 | 98.52 | 98.52 |

An investigation of five different classifiers is performed to verify the superiority of PSO-XGBoost in classification performance, including XGBoost, RF, GBDT, DT, and SVM. The classification results of these algorithms are shown in Figures

The classification result of the XGBoostmodel.

The classification result of the RF model.

The classification result of the GBDT model.

The classification result of the DT model.

The classification result of the SVM model.

To have a detailed quantitative analysis related to each classifier’s classification results, five confusion matrixes according to five studied classification experiments are introduced for recording the recognition results and the percentage of misclassification of the rotor with different fault causes. Figures

The confusion matrix of the XGBoost model.

The confusion matrix of the RF model.

The confusion matrix of the GBDT model.

The confusion matrix of the DT model.

The confusion matrix of the SVM model.

Figures

Comprehensive model evaluation indicators.

Method | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
---|---|---|---|---|

PSO-XGBoost | 98.52 | 98.52 | 98.52 | 98.52 |

XGBoost | 95.56 | 96.13 | 95.56 | 95.48 |

RF | 93.33 | 93.45 | 93.33 | 92.93 |

GBDT | 92.59 | 93.46 | 92.59 | 92.30 |

DT | 91.85 | 93.02 | 91.85 | 92.02 |

SVM | 84.44 | 86.90 | 84.44 | 84.28 |

The detailed comparison of six algorithms in the accuracy, precision, recall, and F1-score is shown in Figures

The accuracy of six algorithms.

The precision of six algorithms.

The recall of six algorithms.

The F1-score of six algorithms.

The comparison of different algorithms in the iterative process.

In the view of Figures

The comparison of different algorithms in the iterative process is shown in Figure

From Figure

For nine different rotor fault causes, we build a knowledge base, mapping each rotor fault cause to a specific solution, in order to achieve the purpose of intelligent operation and maintenance. For example, when we diagnose the rotor fault cause C1, the computer will automatically link to the solution M1. Other details in the knowledge base are given in Table

Knowledge base for rubbing fault, mass imbalance fault, and self-excited oscillation of a high-pressure steam turbine rotor.

Fault type | Fault cause | Fault measure | ||
---|---|---|---|---|

Fault cause label | Description | Fault measure label | Solution | |

Rubbing fault | C1 | Rubbing at shaft seal caused by cylinder deformation | M1 | Adjust the clearance of vertical pin and thrust bearing and tighten the bolt of valve screw |

C2 | Rubbing at shaft seal caused by the fast rate of pupinization | M2 | Reduce the rate of pupinization | |

C3 | Rubbing at shaft seal caused by the long time of low load remaining | M3 | Reduce residence time under low load and increase load as soon as possible | |

C4 | Rotor rubbing with oil baffle | M4 | Adjust the dynamic and static clearance and control the thermal parameters in the start-up operation | |

Mass imbalance fault | C5 | Poor stiffness of bearing pedestal | M5 | Reduce the excitation force by rotor dynamic balance |

C6 | Fracture and falling off of rotating parts (blades and coupling wind shields) | M6 | Deal with the wind deflector or replace high quality hexagon bolts | |

C7 | Other reasons | M7 | Carry out first- or second-order dynamic balance | |

Self-excited oscillation | C8 | Poor stability of bearing | M8 | Use bearings with good stability such as tilting pad and elliptical bush |

Increase the bearing specific pressure such as reducing the bearing length and adjusting the bearing height | ||||

Increase the temperature of lubricating oil and reduce the viscosity of lubricating oil | ||||

Reduce the top clearance of the fixed pad bearing and improve the bearing preload | ||||

C9 | Excessive journal disturbance | M9 | Reduce the vibration of the shaft and the disturbing force of the journal |

On basis of fault categories detection, the diagnosis of rotor fault causes is proposed, which has great contributions to the field of intelligent operation and maintenance. This study proposes a hybrid model for diagnosing rotor fault causes using the PSO-XGBoost algorithm. Aiming at the problems of low accuracy and low efficiency in using empirical methods to adjust parameters of the XGBoost model, PSO is used to solve the difficulty of parameter adjustment when using the XGBoost model to diagnose rotor fault causes and improve the diagnostic accuracy at the same time. The experimental results show that

Compared with the direct construction of the XGBoost model to diagnose rotor fault causes, the hybrid model can achieve higher diagnostic accuracy and practical efficiency

The hybrid model can effectively identify nine different failure causes under three types of failures, and the classification accuracy, precision, recall, and F1-score are all above 98%. Compared with XGBoost, RF, GBDT, DT, and SVM, from the perspective of the PSO-XGBoost’s comprehensive classification performance, choosing the PSO-XGBoost model in diagnosing rotor fault causes is more effective than other algorithms.

The csv data used to support the findings of this study have been deposited in the Baidu Netdisk repository (

The authors declare that they have no conflicts of interest.

This work was supported by Shanghai 2019 “Science and Technology Innovation Action Plan'' High-tech Field Project” (19511103700).