Anomaly Detection for Aviation Safety Based on an Improved KPCA Algorithm

Thousands of flights datasets should be analyzed per day for a moderate sized fleet; therefore, flight datasets are very large. In this paper, an improved kernel principal component analysis (KPCA) method is proposed to search for signatures of anomalies in flight datasets through the squared prediction error statistics, in which the number of principal components and the confidence for the confidence limit are automatically determined by OpenMP-based K-fold cross-validation algorithm and the parameter in the radial basis function (RBF) is optimized by GPU-based kernel learning method. Performed on Nvidia GeForce GTX 660, the computation of the proposed GPU-based RBF parameter is 112.9 times (average 82.6 times) faster than that of sequential CPU task execution. The OpenMP-based K-fold cross-validation process for training KPCA anomaly detection model becomes 2.4 times (average 1.5 times) faster than that of sequential CPU task execution. Experiments show that the proposed approach can effectively detect the anomalies with the accuracy of 93.57% and false positive alarm rate of 1.11%.


Introduction
In a flight, onboard Quick Access Recorder (QAR) and Flight Data Recorder (FDR) can record more than 600 parameters sampled at 1 Hz.Typical FDR and QAR parameters are acquired from the propulsion system, avionics system, landing gear, cockpit switches position, control surfaces, and other critical systems.Due to the complexity of flight conditions and aircraft systems, there are a large number of abnormalities that are often unclear.Detecting precursors of aviation safety incidents based on anomalies detection of FDR and QAR data from aircraft is becoming more and more important for pilots and airline safety management teams.Once these anomalies are detected, the additional contextual information of historical flight data is needed to determine the frequency of anomalies occurrence.
Bay and Schwabacher [1] proposed a distance-based anomaly detection method called Orca.The distance metric was used to detect abnormal points.The output from Orca was a distance score representing the average distance to its -nearest neighbors.Higher score meant more anomalies since the nearest neighbors were farther away.Iverson et al. [2] developed a health monitoring software called Inductive Monitoring System (IMS).Clustering method was used to analyze system data; a higher composite distance value was adopted to detect anomaly data.
Recently, kernel methods were applied to flight data analysis.Das et al. [3] developed a multiple kernel anomaly detection (MKAD) algorithm which could work with heterogeneous datasets including both continuous and discrete sample sequences.MKAD could detect some significant anomalies from real aviation operational data.The major advantage of MKAD was that it could detect potential safety anomalies in both discrete streams and continuous streams.
Smart et al. [4] presented a two-phase novelty detection approach to locate abnormalities of flight data in the descent phase.Compared with the mixture of Gaussians and means, the support vector machine (SVM) provided the best detection rate and identified obstacle abnormalities with higher accuracy.
Kernel principal component analysis (KPCA) algorithm was used by Cho et al. [5] for fault identification in process monitoring.There were two significant problems for anomaly detection in the KPCA-based method: computation performance and adaptability.For the KPCA-based method, the kernel matrix was defined for the principal component feature extraction and analysis.However, computing the associated kernel matrices required ( 3 ) computational complexity, where  was the size of the training data, which limited the applicability of these methods for large dataset.Because the parameter in the kernel function, the number of principal components, and the confidence for the confidence limit should be set before anomaly detection by KPCA method, the adaptability of the KPCA method was largely limited.
To solve problems of KPCA, many extended methods were developed including fast iterative KPCA (FIKPCA) [6], adaptive KPCA [7,8], and multiscale KPCA [9].The adaptive KPCA [7] could flexibly track the change of external environment or sample data.However, the model should be updated with real-time data, which affected the real-time diagnosis with increasing data.In [8], the number of principal components and squared prediction error (SPE) confidence limit were obtained by experience.
The method proposed in this paper is based on KPCA and aims at anomaly detection in the flight.In order to analyze vast amounts of flight data, the anomaly detection algorithm should be effectively accelerated.The rest of this paper is organized as follows.In Section 2, classical KPCA algorithm is introduced.Section 3 introduces the improved KPCA algorithm used to optimize key parameters such as the parameter of the radial basis function (RBF), the number of principal components, and the SPE confidence limit.Experimental results and performance evaluation of the proposed anomaly detection model are given in Section 4.
Given the training set  = {(x  ,   ),  = 1, . . ., }, x  ∈   ,  = 1, 2, . . .,  is the training data with zero mean,   = {+1, −1},  = 1, 2, . . .,  is the class label, and the sample covariance matrix C in the feature space is expressed as where (⋅) is a nonlinear mapping function.The principal component can be obtained by where ⟨x, y⟩ denotes the dot product of x and y, k ̸ = 0 is the eigenvector, and  > 0 is the corresponding eigenvalue.The k corresponding to the largest  is the first principal component, and the k corresponding to the smallest  is the last principal component.The eigenvector k can be expressed as To obtain coefficients   ,  = 1, . . ., , defining the kernel matrix K of  × , the elements in the matrix are determined by   = (x  ) T (x  ) = (x  , x  ), where (x  , x  ) is the inner product of x  and x  .In this paper, the kernel matrix K is calculated through RBF kernel function, which is adopted as follows: where  is the parameter of RBF.The coefficient  can be obtained by The centered kernel matrix K can be calculated by where matrix E = (1/) [ Assuming (x new ) is a nonlinear mapping function that projects the new data x new ∈   from the original space to the feature space, the projection from (x new ) onto the eigenvector k  ,  = 1, 2, . . ., ; 1 ≤  ≤  can be obtained by The number of principal components can influence the detection rate and computational complexity.In the anomaly detection model, the cumulative percent variance theory [8] is adopted to calculate the number of principal components: where  is the threshold of principal components and  is the number of principal components.SPE statistic is adopted to detect abnormal flight data in this paper.In [11], SPE is expressed as follows: The confidence limit for SPE is calculated through its approximate distribution [11]: where  is the confidence degree of  2 distribution,  = /2, ℎ = 2 2 /,  is the estimated mean of the SPE, and  is the estimated variance of the SPE.

Improved KPCA Anomaly Detection Model
To improve the computation efficiency of KPCA-based method, an improved KPCA method based on parallel computing is proposed, where the RBF parameter is calculated Algorithm 1: Gradient descent algorithm for obtaining the optimal RBF parameter.
on GPU and implemented by CUDA [12], and the number of principal components and the confidence for the confidence limit are determined via parallel -fold cross-validation by OpenMP [13].

RBF Parameter Computation by CUDA.
Because the structure of the training data in the feature space is determined by the chosen kernel function, inappropriate choice of parameter of kernel function will seriously affect the detection performance.In this paper, an optimized method for calculating the parameter of the RBF kernel function is proposed based on the training data.Two important properties of the RBF kernel are (x, x, ) = 1 and 0 < (x, z, ) ≤ 1.The optimization of the RBF parameter  can be achieved by following properties [14]: (1) For samples in the same class, the RBF kernel (2) For samples in different classes, the RBF kernel The optimal RBF parameter  * is obtained by solving the following optimization problem [15]: where K is a constant matrix K = ( K )  ,=1 with entries K is the kernel matrix K = (K  )  ,=1 with entries The schematic illustration of function () is shown in Figure 1.The horizontal axis represents the RBF parameter  and the vertical axis represents the corresponding ().The graph shows that there is only one minimum value which is the optimal RBF parameter value in the proposed method.
The optimal RBF parameter  * is obtained via gradient descent [16].The partial derivative of function () is expressed as The iterative formula of gradient descent is expressed as follows: where ∇   is the partial derivative of function () and  is the learning step.The gradient descent algorithm for obtaining the optimal RBF parameter is given in Algorithm 1.
The approach is executed with the sequential algorithm on CPU.When the amount of sample increases, computation time increases rapidly.If the dimension of the kernel matrix is 8350, the total computation time increases to 1106 s.The computation of lines (5) and (6) in Algorithm 1 is implemented through level 2 for-loop, so it is better to perform them on GPU.The optimal RBF parameter  * is obtained by a parallel algorithm on GPU which replaces overlapped ones by the sequential CPU algorithm.The GPU code of the gradient descent algorithm for obtaining the optimal RBF parameter is given in Algorithm 2.
If kernel matrix K is transferred from GPU to CPU in each step of the convergence loop, the data transfer becomes an acceleration bottleneck.To reduce the amount of transferred data, we focus on the structure of matrices Ω and K as described in lines (2) and (3) of Algorithm 2.
The matrix Ω and matrix K do not change with the optimal RBF parameter  * in each step of the convergence loop; the transfer of matrices Ω and K is necessary only in the initialization.In addition, Ω and K are symmetric matrices.Thus, it is necessary to transfer upper triangular part of matrices Ω and K from CPU to GPU.The size of the transferred data in Algorithm 2 on lines (4) and ( 5) is (( + 1) × )/2, respectively.

Parameters 𝛾 and 𝜂 Computation by OpenMP.
To make the anomaly detection model more accurate, the number of principal components and the confidence for the confidence limit are determined via -fold cross-validation [20].
The -fold cross-validation process randomly splits D into  disjointed parts with almost equal size  1 , . . .,   .At each th fold, D  ( = 1, . . ., ) and D (−) = D − D  are used as the test dataset and the training dataset.To assess the performance of the -fold cross-validation, the balanced error rate matrix is utilized.The balanced error rate (BER) [4] is given by BER = FPR + FNR, (16) where false positive rate (FPR) denotes the percentage of incorrectly identified normal flight data and false negative rate (FNR) denotes the percentage of incorrectly identified abnormal flight data The threshold of principal components  and the confidence for the confidence limit  are confirmed according to a fivefold cross-validation with log

Numerical Experiments and Discussions
The performance evaluation of the proposed algorithm in the CPU-GPU heterogeneous environment is given in this section.For the GPU algorithm, i7-3770 CPU with Nvidia GTX 660 is used, and the bandwidth is about 144.2 GB/s.For the CPU algorithm i7-3770@3.40GHz CPU with 8 GB SDRAM is used.Synthetic datasets [3] are used in the experiment, which include 300 flights, each flight with 1000 sample points.In this paper, fault type I in the synthetic data is used to access the performance of the improved KPCA.The specifications of the experiment environment are listed in Table 1.

Computation Performance.
Experiment results are showed in Figures 2, 3, and 4. Figure 2 shows the dimension of the kernel matrix, elapsed time for computing the parameter of RBF function on GPU and CPU, and speedup achieved by the parallel code over the sequential code.Figure 3 shows the breakdown of elapsed time for computing the parameter of RBF function on GPU. Figure 4 shows the dimension of the kernel matrix, elapsed time for computing the -fold cross-validation process, and speedup achieved by the parallel code over the sequential code.(5) for  = 1 to 7 do (6) for  = 1 to 7 do (7) Training(T,   ,   ) (8) Testing(D  ,   ,   ) (9) BER , = FPR , + FNR , (10) end for (11) end for (12)   As shown in Figure 2, the speedup on GPU code is larger than 100 when the dimension of the kernel matrix is larger than 5050.Elapsed time on GPU is slower than that on CPU with small dimension of kernel matrix, but faster with large kernel matrix dimension.When the dimension of kernel matrix is 100, the elapsed time on GPU is up to 2.3 times slower than that on CPU.However, it is up to 112.9 times faster on GPU than that on CPU while the dimension of kernel matrix is 8350.The reason is that when kernel matrix dimension is small, the synchronization cost for parallelism on GPU is larger than that needed on CPU and the data transfer between the CPU and GPU becomes the accelerating bottleneck.When the dimension of the kernel matrix is small, the computation cost of the exponentiation and addition arithmetic is negligible on CPU, and the effective acceleration by parallel processing cannot offset the data transfer time between the CPU and GPU.
As shown in Figure 3, the ratio of time taken up by "Data Transfer" and "Others" is relatively high for smalldimension kernel matrix.As the dimension of the kernel matrix becomes larger, computation cost becomes higher, and almost all the elapsed time is spent on exponentiation and addition arithmetic for RBF parameter computation, so the time for data transfer is relatively reduced.
The time for "Data Transfer" is significantly lower than that in other processes when the dimension of the kernel matrix is larger.For large-dimension kernel matrices, the occupancy of "Data Transfer" is less than 15%.In other words, data transfer does not become the acceleration bottleneck and the size of data transferred between the CPU and GPU is effectively reduced in the proposed GPU-based method, especially for large-dimension matrices.
As shown in Figure 4, the running time of the parallel algorithm and the sequential algorithm increases quickly when the kernel matrix dimension increases.When the dimension of the kernel matrix is 100, the parallel algorithm is 1.8 times faster than the sequential algorithm; when the dimension of the kernel matrix is 500, it is up to 2.4 times faster, but 1.1 times slower when the dimension of the kernel matrix is 1500.The reason is that the OpenMP-based parallel algorithm needs to synchronize among multiple processors.In this case, when kernel matrix dimension is large, the synchronization cost on parallel algorithm becomes higher than that needed on sequential algorithm.

Detection Performance.
In the synthetic datasets [3], a set of Gaussian distributions was defined for each continuous parameter.The continuous parameters would draw from their defined distribution at each time step.The statistical distribution of each continuous parameter is shown in The initial training data number (including normal sample and abnormal sample), the test sample number, the best parameters, the detection rate (DR), FNR, and FPR are listed in Table 2.The continuous parameters are decomposed for three layers through Daubechies-4 (DB4) wavelet.Wavelet coefficients of the third layer are used to reconstruct highfrequency bands and scaling coefficients of the third layer are used to reconstruct low-frequency bands.Then, the feature parameter vector is constructed by the high-frequency and low-frequency bands of the continuous parameters.DR denotes the percentage of correctly identified abnormal flight data, FPR denotes the percentage of incorrectly identified normal flight data, and FNR denotes the percentage of incorrectly identified abnormal flight data.From the result, when normal sample is 1000 and abnormal sample is 266, DR is up to 93.57% with FPR of 1.11%.

Comparison of Some Detection
Methods.Two kinds of kernel-based fault detection methods, one-class support vector machine (OCSVM) [21] and improved KPCA, are compared, and results are listed in Table 3.For both algorithms, the number of initial samples and testing samples are 1266 (1000 normal samples and 266 abnormal samples) and 10000, respectively.
For improved KPCA, parameter  of the RBF kernel is adjusted by the gradient descent algorithm, which is given in Algorithm 1.The parameters  and  are searched according to a fivefold cross-validation with log From Table 3, it can be clearly seen that the FPR of the improved KPCA fault detection method is better than that of OCSVM.But the DR and FNR of OCSVM fault detection method are better than those of the improved KPCA.Considering DR, FNR, and FPR, the detection performance of the improved KPCA fault detection method is more favorable compared to that of OCSVM.

Conclusion
Kernel method is often used in anomaly detection in the field of aviation safety.The computation performance is the main problem in aviation data analysis domain.In this paper, an improved KPCA solution is proposed for efficient anomaly detection.The RBF parameter is optimized by GPU and OpenMP-based -fold cross-validation is adopted for training KPCA anomaly detection model.The experiment was performed with i7-3770 CPU and Nvidia GTX 660.Results show that RBF parameter computation accelerates up

2 ∈
{−7, −6, . . ., −1}.The -fold cross-validation process can be exploited by loop-level parallelism which executes loop iterations across multiple processors concurrently.The execution time can be reduced significantly.The OpenMPbased -fold cross-validation algorithm for obtaining optimal parameters  * and  * is given in Algorithm 3.

end for Algorithm 3 :Figure 2 :
Figure 2: Elapsed time and speedup for RBF parameter computation on GPU and CPU.

Figure 3 :
Figure 3: Breakdown of elapsed time for RBF parameter computation on GPU.

Fig- ure 5 .
It can be clearly seen that the statistical distribution of each continuous parameter follows Gaussian distribution, and favorable results are obtained by normal probability density function.

Table 1 :
Specifications of the experiment environment.

Table 2 :
Detection performance by the improved KPCA method.

Table 3 :
Comparison of different methods.