FaultDetection forTurbineEngineDiskBasedonOne-Class Large Vector-Angular Region and Margin

Fault detection is an important technique to detect divergence based on unknown abnormalities, which involves establishing a computational model exclusively originated from the key features of the normal samples. )e multimodality of process data distribution of engine turbine disk is inevitably affected by incorporation of ambient disturbance; the mean and covariance would vary significantly, resulting in decayed detecting accuracy. By adopting a strategy to maximize vector-angular mean and minimize vector-angular variance simultaneously in the feature space, a one-class large vector-angular region andmargin (one-class LARM) framework is systematically conducted for fault detection of turbine engine disk which will enhance the robustness of the dynamic multimode process monitoring. Simulation based on the single mode and multimode of turbine engine disk is thoroughly performed and compared that the results of which solidly validated the favorable efficiency of the proposed method.


Introduction
Fault detection is an active research topic which learns a model from normal data and identifies anomalous behavior in data. In real world, fault detection has widely practical applications, for example, machine fault detection [1][2][3][4], industrial process monitoring [5], and aviation safety [6]. Anomaly detection could be generally categorized into classification problem. In the classification-based anomaly detection methods, the detection model is learned from the labeled training data which can predict if the test samples are normal or abnormal. In real fault detection applications, the normal patterns can be easily possessed while abnormal ones are difficult to obtain. In order to effectively solve this problem, one-class classification methods have been proposed which can effectively detect the fault signal under the condition of the normal sample.
Recently, kernel-based methods have been widely adopted in the field of fault detection, such as kernel principal component analysis (KPCA) [7], one-class support vector machine (OCSVM) [8], and support vector data description (SVDD) [9]. e KPCA algorithm is introduced by Schölkopf et al. [7], which is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel function, the original linear operations of PCA are performed in a reproduced kernel Hilbert space. Choi et al. [10] introduce the KPCA algorithm for fault detection and identification of nonlinear processes, in which the unified fault detection index is calculated for fault detection and a reconstruction-based identification index is developed for fault identification. In addition, the monitoring method is successfully applied to simulated continuous stirred tank reactor (CSTR). e fault detection and identification results confirm that the KPCA-based process monitoring can successfully be applied to a CSTR process under four different fault scenarios. e computational complexity and spatial complexity are O(n 3 ) and O(n 2 ), respectively. In practice, the dimension of a kernel matrix usually increases with the size of dataset and consequently leads to higher computation complexity and storage, which becomes the disaster of using the KPCAbased fault detection method. Günter et al. [11] introduce a gain adaptation method to increase the computation efficiency of KPCA algorithm, in which the kernel algorithm is accelerated by improving the convergence of the Kernel Hebbian Algorithm (KHA) for iterative kernel PCA. e conventional KPCA is intrinsically sensitive to abnormal data because the objective function is expressed by L2 norm in the training procedure. Xiao et al. [12] introduce the L1 norm into conventional KPCA, which is applied to fault detection. e experimental results confirmed that the L1 normal-based KPCA shows good robustness to outliers.
OCSVM is proposed by Schölkopf et al. [8] for tackling fault detection problem, which is an unsupervised learning approach. In order to enhance generalization ability, OCSVM calculates a hyperplane correlated to a kernel function constructed by the maximum-margin algorithm, in which the optimization hyperplane is adopted to separate the normal and abnormal samples in the feature space. It is of high complexity to solve OCSVM-related problems by conventional methods using large samples. erefore, a series of training algorithms are proposed to reduce the computation time for resolving the benchmark of OCSVM algorithm. In references [13,14], reduced support vector machine (RSVM) is proposed to tackle with large datasets using SVM coupled with nonlinear kernels. In RSVM, the procedure of training adopted active training set, which is a subset of the complete training set. Similarly, ]-Anomica for anomaly detection is proposed by Das et al. [15]. e accuracy of ]-Anomica is comparable to that of OCSVM while the training and test time have improved ranging from 5 to 20 times. e one-class SVDD algorithm is introduced by Tax and Duin [9], which is a single classification method based on minimum bounding sphere theory, can handle the fault signal classification problem only under the condition of the normal samples.
e optimization procedure of one-class SVDD established a minimal volume hypersphere by the normal data in the feature space to realize fault detection. Anomalous behavior in the datasets can be utilized by the minimal hypersphere of one-class SVDD. In order to improve generalization ability, Wu and Ye [16] propose a small sphere and large margin (SSLM) algorithm for anomaly detection with few outlier training data. e SSLM is aimed at constructing a minimal volume hypersphere enclosing most of the normal training examples in the feature space; meanwhile, the margin from any outlier to this hypersphere is as large as possible. In real application, fault detection problems are always required to process continuous massive data streams. Online one-class SVDD [17] and incremental one-class SVDD [18] are proposed for data stream anomaly detection.
In this paper, we propose a novel algorithm for learning a one-class large vector-angular region and margin (one-class LARM) for fault detection of turbine engine disk. e major improvement of presented work lies in the following two aspects. Firstly, one-class LARM is a flexible extension of two-class LARM algorithm. e one-class LARM is an unsupervised classification method which means only normal training samples are needed to build the detection model. is is of great significance because plenty of normal samples can be acquired easily, while the fault samples are difficult to obtain or even not available. erefore, the oneclass LARM can be better applied into practice. Secondly, the multimodality of process data distribution somehow inevitably encounters disturbance that the data's mean and covariance of multimodality will change significantly, leading to the reduction of estimation accuracy. e proposed method is adjusted to maximize the vector-angular mean and minimize the vector-angular variance simultaneously, which can achieve a better generalization performance in the process working with multimode. e article is organized as follows. Section 1 introduces the kernel-based fault detection technique. Section 2 summarizes a review of the two-class LARM approach. Section 3 discusses the proposed one-class LARM based fault detection method. Section 4 elucidated the detailed experimental design and compares the results on dataset. Section 5 discusses the detection performance of different comparative methods. Section 6 concludes the whole work and gives future prospects.

Two-Class LARM
We proposed two-class LARM [19] approach and applied successfully the anomalous detection problem based on imbalanced data. e two-class LARM algorithm is described as follows.
It is assumed that S � (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ) is the training set with n examples, is a feature mapping function which can map input data examples from a given input space X ⊂ R d to a high-dimensional feature space F. e feature mapping function can be calculated by a given kernel function κ(·, ·), i.e., [20] function expresses the length of the perpendicular projection of the training pattern ϕ( e primal two-class LARM can be formulated as follows: where X � [ϕ(x 1 ), ϕ(x 2 ), . . . , ϕ(x n )], y � [y 1 , y 2 , . . . , y n ] T , m 1 is the number of normal training patterns, m 2 � n − m 1 is the number of abnormal training patterns, ω 2 (ω > 0) is the width of vector-angular region, ρ 2 (ρ > 0) is the vector-angular margin, ξ � [ξ 1 , . . . , ξ n ] T is the vector of slack variables, and ], ] 1 , ] 2 , and λ are four positive constants.
For training the two-class LARM based on the fault detection model, both normal samples and abnormal samples are required. However, in the fault detection of turbine engine disk, we usually can obtain abundant normal examples under normal working conditions, while the abnormal data are difficult or expensive to obtain. Unsupervised technique is a fault detection method that only requires normal samples, which aims to build a description model of normal samples. erefore, one-class LARM is proposed to construct the fault detection model, which is an unsupervised learning method and does not require abnormal samples to detect the fault of turbine engine disk.

One-Class LARM-Based Fault Detection for Turbine Engine Disk
One-class LARM utilizes the strategy to maximize the vector-angular mean and minimize vector-angular variance simultaneously. According to the definition in ref. [21], the vector-angular mean c and variance c between training patterns ϕ(x i ), i � 1, . . . , ℓ and vector v can be expressed as where X � [ϕ(x 1 ), ϕ(x 2 ), . . . , ϕ(x ℓ )], e � [1, . . . , 1] T , and ℓ is the number of training patterns. In this case, maximizing the vector-angular mean and minimizing the vector-angular variance can be calculated as follows: where v is the optimal vector, ρ is the vector-angular margin, ξ � [ξ 1 , . . . , ξ ℓ ] T is the vector of slack variables, and ] and λ are the two positive constants. Unlike two-class LARM, the one-class LARM is an unsupervised classification method, and only the normal training matrix X � [ϕ(x 1 ), . . . , ϕ(x ℓ )] is needed to build the fault detection model. For the turbine engine disk, the majority of data are obtained under normal working condition, while fault signals under abnormal working condition are difficult and expensive to obtain. According to reference [21], the optimal vector v * for the above problem listed in (4) could be rewritten as follows: Hence, X T v � X T Xα � Kα can be obtained, where K � X T X is the kernel matrix. Problem (4) can be rewritten as follows: where To investigate the solving method with constraints described in (6), the Lagrangian function is constructed as follows: where β � [β 1 , . . . , β ℓ ] T and η � [η 1 , . . . , η ℓ ] T are Lagrange multipliers. e following equations can be deduced and obtained by substituting the partial derivatives of L(ρ, α, ξ, β, η) with respect to the primal variables to zero: Substituting (8) into (7), the dual form can be obtained as follows: where H � KQ − 1 K, p � − λ(He)/ℓ, and Q − 1 is the inverse matrix of Q. By solving the dual problem (9), the decision function is Mathematical Problems in Engineering 3 where S is the set of support vectors, and m denotes the number of support vectors.

Experimental Results
e performance of the proposed fault detection algorithm is evaluated on Disk Defect Data [22,23], which were collected by the National Aeronautics and Space Administration (NASA) Glenn Research Center's Rotor Dynamics Laboratory. In the experimental testing system, the notch was set on the rim region of the disk. ree different spin speeds (3000 revolutions per minute (RPM), 4000 RPM, and 5000 RPM) and three states (normal, notch, and large notch) were recorded in the Disk Defect Data.
All the testing simulations were conducted on a computer with a Xeon(R) E5-1630 v4 at 3.70 GHz CPU and a 16 GB main memory. e accuracy and recall rates are used to evaluate the performance of the results. e accuracy and recall rates are expressed as follows: where TP is the number of true positive samples, TN is the number of true negative samples, FP is the number of false positive samples, and FN is the number of false negative samples.

Parameter Selection and Impact
Analysis. For all the experiments, the Radial Basis Function (RBF) is selected as the kernel function: where c is the kernel parameter of the RBF. e parameter c of the RBF kernel is selected in the set of α 0 /32, α 0 /16, α 0 /8, α 0 /4, α 0 /2, α 0 by 5-fold cross validation, where α 0 is the average norm of the training instances. For one-class LARM, ] and λ are two regularization parameters which would affect the detection performance. erefore, the following experiments were conducted to understand how the regularization parameters ] and λ influence the performance. e first simulation is used to measure the effects of ] on the detection performance, in which λ is selected in the set of 0.01k, 0.1k { }, k � 1, 3, 5, 7, 9 by 5-fold cross validation. In this experiment, the training and testing patterns are randomly selected from Disk Defect Data on each RPM, the number of training patterns is set to 45, 60, 75, 90, and 105 (15,20,25,30, and 35 normal samples independently on each RPM), and the number testing patterns is set to 6000 (1000 normal samples and 1000 abnormal samples independently on each RPM). All the experiments are repeated 10 times, and the average accuracy and recall rates are displayed in Figure 1. e second simulation is used to measure the influences of the parameter λ on the detection performance, in which the parameter ] is selected in the set of 0.1k { }, k � 1, 2, . . . , 9 by 5-fold cross validation. e settings of training and testing samples are the same as previous experiment. All the experiments were repeated 10 times. e average values are displayed in Figure 2.
From Figures 1 and 2, when the number of training patterns is 60, 75, and 90, the values of ] and λ have a faint impact on accuracy and recall rates. However, the number of training patterns is 45 and 105; both the accuracy rates and recall rates show more fluctuation according to ] or λ. at is because a small number of training samples may cause underfitting result and many training samples may cause overfitting result. In this paper, the hyperparameters ] and λ of one-class LARM are selected by cross-validation technique.
e hyperparameter ] is selected in the set of 0.1, 0.3, 0.6, 0.9 { }, while the hyperparameter λ is selected in the set of 0.01, 0.05, 0.1, 0.5, 0.9 { } by 5-fold cross validation. e selected range of regularized ] and λ has demonstrated to be suitable for further simulations.

Performance Evaluation of One-Class LARM.
In this section, the one-class LARM is compared with OCSVM and one-class SVDD for notch detection in Disk Defect Data. We denote one-class LARM implemented by the LIBSVM package [24,25]. For comparison, one-class SVDD and OCSVM also used LIBSVM package in the simulation.

Case 1: Notch Detection at Single Mode.
In case 1, the fault of turbine engine disk is detected at single mode process in which the training samples are composed of the normal samples with a rotating speed of 3000 RPM. e process is run normally for 3000 samples followed by switching to a notch model for 1000 samples with a rotating speed of 3000 RPM, 4000 RPM, and 5000 RPM, respectively. e experiment is repeated 20 times in which all the data points of the training and testing patterns are randomly selected from the Disk Defect Data. Detection results calculated by three algorithms are statistically depicted in Figure 3.
From Figure 3(a), we can see that one-class LARM has obvious advantages in the accuracy rate when the number of training samples is larger than 30. e recall rate of one-class LARM (see Figure 3(b)) is obviously better than of OCSVM and one-class SVDD. However, the training speed of oneclass LARM, as shown in Figure 3(c), is slower compared to OCSVM and one-class SVDD. is is attributed to solving the quadratic programming (QP) problem of one-class LARM which is required to calculate the inverse matrix of Q. With the increase in the amount of training samples, calculation of the matrix inversion becomes considerably timeconsuming. e results in Figure 3(d) indicate that there is minor difference between the testing times of OCSVM, oneclass SVDD, and one-class LARM. When the number of training and testing samples is 105 and 6000, the average testing time of one-class LARM is approximately 0.0365 seconds.

Case 2: Notch Detection Process at Multimode.
e turbine engine disk operates in a multimode process with three rotating speeds, 3000 RPM, 4000 RPM, and 5000 RPM, respectively. e samples are selected in 6 phrases of selection strategy as follows: in phrase 1, 1000 samples at speed of 3000 RPM; in phrase 2, 1000 samples at 4000 RPM; in phrase 3, another 1000 samples at 5000 RPM. After normal working modes, the notch model occurs on the turbine engine disk and is switched to phrase 4. In this phrase, the process is working at the speed of 5000 RPM, 1000 samples are picked, and 1000 samples are recorded at 4000 RPM of phrase 5, followed by 1000 samples at 3000 RPM. e training samples consist of the same number of normal samples at 3000 RPM, 4000 RPM, and 5000 RPM. e settings of training and testing samples are the same as those in case 1. All the results are listed in Figure 4. Mathematical Problems in Engineering e results in Figure 4 show that one-class LARM has obvious advantages in accuracy rate and recall rate. It is notable to find that the recall rate nearly remains 98%, which is relatively higher than the OCSVM and one-class SVDD. Similar analysis could also be made as those in case 1; Figure 4(c) shows that the training time increases linearly of OCSVM and one-class SVDD while one-class LARM exhibits a nonlinear trend, particularly when the sample exceeds 75. At the same time, only minor difference is observed in the testing time in Figure 4(d). It shows that the one-class LARM can detect testing examples faster, and specifically, 6,000 samples are detected in merely 0.03 seconds.

Discussion
In OCSVM, an optimal hyperplane expressed by the normal vector w and the offset ρ is constructed by maximizing the margin between the origin and the hyperplane in the feature space. e OCSVM can be solved by the following primal optimization problem [8]: where ξ is the nonzero slack variable and ] is a positive constant.
In one-class SVDD, a minimal enclosing ball by the radius R and the center c is constructed to enclose most of the training samples in the feature space. e one-class SVDD can be solved by the following primal optimization problem [9]: where C is a positive constant and ξ is the nonzero slack variable. Including one-class SVDD and OCSVM, several kernelbased methods have been proposed for solving the fault detection problem when the abnormal data are unavailable for model training. Like OCSVM, one-class LARM also aims at building the separating hyperplane by maximizing the margin between the hyperplane and the origin. However, one-class LARM utilizes the strategy to find an optimal vector v in the feature space just by using positive labeled training data set, which attempts to maximize the vector-angular mean and minimize the vector-angular variance simultaneously. Figure 5 illustrates the principle of one-class LARM.
From Figure 5, we can see that the one-class LARM finds an optimal vector v in the feature space, and the training samples project onto the optimal vector v joining their large mean and small variance. e projection onto the optimal vector v can achieve the effect that the distribution between the origin and the vector-angular mean could be separated as much as possible, and the projection of samples within classes could be as dense as possible. erefore, the one-class LARM-based fault detection model can achieve a favorable generalization performance at single and multimode processes of turbine engine disk.
To make it easier to compare the detection performance of the three methods, all involved parameters at different modes are statistically calculated. e results of all working modes are combined and drawn in Figure 6.

Mathematical Problems in Engineering
As shown in Figures 6(a) and 6(b), for all examined process modes, the detection accuracy and recall rates indicate that one-class LARM compares favorably to both oneclass SVDD and OCSVM. e one-class LARM achieves the best detection performance among all the methods, especially for the turbine engine disk working at a multimode process.
From Figure 6(c), we can see that the training time of OCSVM and one-class SVDD performs better compared to one-class LARM. at is also because QP problem solvers of one-class LARM are required to calculate the matrix inversion. With the increase of training samples, solving the matrix inversion becomes extremely time-consuming. It is the same reason that the training time of one-class LARM is a major issue which will further restrict the algorithm's applicability for medium-and large-sized datasets. When the number of training sample is relatively small, the training time of three methods exhibits different. But the one-class LARM has obvious advantages in detection performance in terms of accuracy and recall rates.
It is observed from Figure 6(d) that the testing time is slightly different. e one-class LARM can detect test examples faster, using 0.0365 s to detect 6,000 samples in single mode process and to detect 6,000 samples in multimode process within 0.03 s.

Conclusions
In the presented work, one-class LARM algorithm is proposed for novelty detection of turbine engine disk, of which the detection model could be built merely by normal training samples. Considering the circumstances that fault samples are difficult to obtain or available, the proposed method can be better applied to solve practical issues. In addition, the method searches the optimization vector in the feature space via maximizing the vector-angular mean and minimizing the variance simultaneously, which can achieve better detection performance particularly in the multimode process. Comprehensive comparisons of simulation results have demonstrated the effectiveness of proposed algorithm.
In order to take full advantage of the proposed method and examine the robustness, future work will be extending one-class LARM coupled with decremental learning to test on medium-and large-size datasets.

Data Availability
e Disk Defect data used to support the findings of this study have been deposited in the DASHlink repository (https://c3.nasa.gov/dashlink/resources/314/), which is collected by the National Aeronautics and Space Administration (NASA) Glenn Research Center's Rotor Dynamics Laboratory.

Conflicts of Interest
e authors declare that they have no conflicts of interest.