KPCA and AE Based Local-Global Feature Extraction Method for Vibration Signals of Rotating Machinery

,


Introduction
Rotating machinery is widely used in the modern industry. However, after long operating time under harsh working conditions, it will inevitably break down and even result in catastrophic accidents and huge economic losses [1,2]. erefore, developing accurate fault diagnosis technologies is of great significance to avoid potential losses [3,4].
In many studies, fault diagnosis for rotating machinery is converted to classification of the features extracted from the vibration signals [5][6][7][8][9]. Hence, high accuracy of fault diagnosis always benefits by fault features extracted from original signals. Fault features in time domain, frequency domain, and time-frequency domain extracted by signal analysis technology have been widely investigated [10,11]. Wavelet transform was adopted in [12] to extract 2-D timefrequency features from raw 1-D vibration. Complementary ensemble empirical mode decomposition was used in [13] to remove undesirable noise and extract fault signature in the vibration signals of rolling bearings. Optimized variational mode decomposition and envelope demodulation analysis were utilized in [14] to extract features from vibration signals of bearings. However, most of the aforementioned feature extraction methods require comprehensive experience of domain experts on the specific problems to select the most discriminative features, which makes the choice of features sensitive to different cases. Instead of selecting features by a human operator, deep-learning techniques extract useful features in a data-driven way and have become a research hotspot. A weighted sparse representation classification based on dictionary learning was proposed in [15] to extract the fault features and identify the fault status. is method considers the locality of the sample and solves the time-shift deviation problem of vibration signals. An improved convolutional neural network based on LeNet-5 was constructed in [16] to learn and enhance the features from the samples of multisensor data fusion and achieved good results. Although fault diagnosis based on deep-learning techniques, such as convolutional neural networks [17][18][19], can automatically learn useful information from original signals, they are supervised feature extraction processes that require large amounts of labeled data. In a practical situation, it is difficult to acquire self-contained negative samples from the online monitoring of critical plants [20]. erefore, a majority of unlabeled data bring about difficulties for supervised methods.
To solve the above dilemma, unsupervised learning which discovers useful representations from unlabeled data has caught the attention of researchers. e commonly used methods of unsupervised feature learning include kernel principal component analysis (KPCA) and an autoencoder (AE). In the field of fault diagnosis, these two unsupervised feature learning methods are often used to extract the timedomain features of signals, and they also have great potential in the frequency-domain feature extraction. In practice, analysis of the spectrum obtained by the fast Fourier transform (FFT) is a commonly used fault diagnosis technique. On one hand, the change of the main frequency components in the spectrum contains a wealth of mechanical motion state information. On the other hand, the frequency components with significant amplitude changes or the frequency components which have a fixed ratio relationship with the rotation frequency are often used for fault detection and regarded as key objects of fault type identification. Consequently, there is a lot of information in the frequency spectrum of vibration signal that is very critical for fault diagnosis, and the two unsupervised feature learning methods mentioned above can be used to extract the important information. e redundant information in the frequency spectrum of vibration signals, such as those frequency components with the same amplitude in different samples, can be removed by using KPCA as a data preprocessing method. en, the most important information in the frequency spectrum can be extracted by the autoencoder in an unsupervised way.
In this paper, in order to take advantage of the aforementioned unsupervised feature learning techniques and extract key information that can distinguish the state of the device from the frequency spectrum of vibration signals of rotating machinery, an unsupervised feature learning method based on the KPCA and AE is proposed. KPCA is a technique for nonlinear principal component analysis. Experiments comparing the utility of kernel principal component analysis features for pattern recognition using a linear classifier have shown that nonlinear principal components afforded better recognition rates than corresponding numbers of linear principal components [21]. In the proposed method, KPCA reduces useless information in the frequency spectrum and roughly extracts useful features in the frequency spectrum. Next, an autoencoder is used to learn the effective encoding of the useful features, so that the final extracted features can retain the most important information in the frequency domain of the signal with a very low dimension. Generally, it is difficult to predict which frequency components will change abnormally and can be used as a fault feature when a device fails, so attention should be paid to both local and global features of the frequency spectrum to ensure that important failure information is not missed. As KPCA is a popularly used method for global structure analysis of datasets and it omits the local information [22][23][24][25], the method of segmenting the frequency spectrum is adopted in this study. First, FFT is utilized to obtain the frequency spectrum of vibration signals. Next, the frequency spectrum is divided into several segments. en, KPCA is employed to extract the local features from these segments, and these local features are connected in sequence to obtain one global feature vector. After that, the global feature vector is sent to the AE to get the most discriminative features with low dimension. e proposed feature extraction method combined with a classifier can realize fault diagnosis for rotating machinery. A rotor dataset and a rolling bearing dataset participated in verifying the effectiveness and generalization of the proposed method. Several experiments were conducted to confirm the robustness of the proposed method under the condition of different numbers of randomly selected training samples and the condition of load fluctuation. Besides, the superiority of the proposed method was validated by designing comparative experiments.

Principal Component Analysis.
Principal component analysis (PCA) is an unsupervised learning method that can be used in feature extraction, noise reduction, and pattern recognition. PCA is a basis transformation to diagonalize an estimate of the covariance matrix of the data x k , k � 1, . . . , l, x k ∈ R N , l k�1 x k � 0, defined as follows [21]: e new coordinates in the eigenvector basis, i.e., the orthogonal projections onto the eigenvectors, are called principal components.

Kernel Principal Component Analysis.
Kernel principal component analysis performs a nonlinear form of principal component analysis by using integral operator kernel functions [21]. Assuming that the data x k , k � 1, · · · , l, x k ∈ R N is mapped into the feature space F by and Φ(x 1 ), . . . , Φ(x l ) is centered, i.e., l k�1 Φ(x k ) � 0. To do PCA for the covariance matrix, is used to find eigenvalues λ ≥ 0 and eigenvectors V ∈ F\ 0 { } satisfying Considering the equivalent system of (4), and there exist coefficients α 1 , . . . , α l such that Substituting (3) and (6) into (5) and defining an l × l matrix K by we arrive at where α denotes the column vector with entries α 1 , . . . , α l . To find solutions of (8) is to solve the eigenvalue problem for nonzero eigenvalues. en, the solutions α k belonging to nonzero eigenvalues are normalized by normalizing the corresponding vectors in F, i.e., (V k · V k ) � 1. By virtue of (6), (7), and (9), this can be converted to For principal component extraction, projections of the image of a test point Φ(x) onto the eigenvectors V k in F can be computed according to Note that neither (7) nor (11) requires the Φ(x i ) in explicit form, which means they are only needed in dot products.
erefore, kernel functions can be used for computing these dot products without actually performing the map Φ. e kernel functions commonly used in KPCA are shown in Table 1.
Assuming that the dimension of the data x k , k � 1, · · · , l, x k ∈ R N is to be reduced to J. e KPCA algorithm is described in the following steps [26]: (1) Substitute kernel functions for all occurrences of (Φ(x i ) · Φ(x j )).
(3) Diagonalize K according to where K ij is the centralized dataset and I l is an l × l matrix whose every element is 1. Normalize the eigenvector expansion coefficients α n by requiring equation (10). (4) Solve (9). (5) Extract principal components of a test point x by computing projections onto eigenvectors according to equation (11). Take the principal components corresponding to the largest J eigenvalues to be the output.

2.3.
Autoencoder. An autoencoder is one of the best-known unsupervised machine learning algorithms which is generally used for dimension reduction and feature extraction [27][28][29][30]. Assuming that there is a D-dimensional dataset, x (n) ∈ R D , 1 ≤ n ≤ N, the encoder maps each sample in this dataset to the feature space to get the encoding, z (n) ∈ R M , 1 ≤ n ≤ N, of each sample. e goal is to use the encoding of each sample to reconstruct the original sample. e original AE is a three-layer feedforward neural network, consisting of an input layer, a hidden layer, and an output layer. e input layer and the hidden layer constitute the encoder: which transforms the input data into low-dimensional features. e hidden layer and the output layer constitute the decoder: which reconstructs the input data from the corresponding features. e learning goal of the autoencoder is to minimize reconstruction error: If M < D, the autoencoder is equivalent to a dimension reduction or feature extraction method. After completing the training of the AE, the decoder is generally removed. Only the encoder is retained, and the output of the encoder can be used as a feature of each sample.

Proposed Feature Extraction Method.
e time-domain statistical characteristics of rotating machinery vibration signals, such as mean square value, standard deviation, and kurtosis, can only reflect whether the state of the equipment is normal. To know the location and the type of the failure requires further analysis. In practical situations, spectrum analysis is an important analysis method, and fast Fourier transform is the most commonly used tool for spectrum analysis. In the frequency spectrum of the vibration signal of a rotating machine, the frequency components which have a Mathematical Problems in Engineering fixed relationship with the rotation frequency are the main objects of concern. However, neglect of the other frequency components may cause a misjudgment when a device fails. On one hand, it is difficult to predict which frequency component's amplitude will change when the state of the device is abnormal. is means it is difficult to determine which information should become the key feature of the frequency spectrum. On the other hand, abnormal changes in some frequency components of some samples may correspond to new types of faults, and these are unlabeled samples. ese unlabeled samples can bring difficulties to the supervised fault diagnosis method. To sum up, more frequency components need to be paid attention to, so that the local feature and the global feature of the frequency spectrum can be extracted and used to judge the state of the rotating machine.
In this study, in order to extract the key information that can distinguish the state of the device in the frequency spectrum of vibration signals of rotating machinery, an unsupervised feature extraction method combining KPCA and AE is proposed. In addition, to overcome the shortcomings of KPCA in ignoring the local feature of the data, the frequency spectrum segmentation method is used to extract the features of each frequency spectrum segment. en, merge the features of these frequency spectrum segments to obtain the global feature of the frequency spectrum, which not only avoids missing local features of the frequency spectrum but also reduces the influence of noise components on the result of frequency spectrum feature extraction. e feature extraction for vibration signals using the proposed method is implemented using the following steps: (1) Calculate the FFT on the entire signal to obtain its frequency spectrum. For a given signal data T with N points, [t 1 , t 2 , . . . , t N ], let R with L spectrum lines, [r 1 , r 2 , . . . , r L ] be its frequency spectrum, and R is normalized into [− 1, 1]. (2) Divide R into N s segments, and each segment contains L/N s points: (3) Apply KPCA to each item in r 1 , r 2 , . . . , r N s to extract the principal components, where V s is the number of principal components. e principal components C N s of each segment are regarded as local features of the frequency spectrum and are connected in sequence to obtain a global feature vector x with a dimension of N s × V s as follows: (4) All samples in the dataset are processed as the form of x according to the previous three steps, and then x is normalized into [− 1, 1]. (5) Select some processed samples as training samples and send them to the constructed AE for training, as detailed in Figure 1. (6) After training, the AE has the capability of reconstructing the input x: x ⟶ h ⟶ x, where h is a three-dimensional vector output from the trained encoder. h is also the latent feature of frequency spectrum desired to be obtained.
Fault diagnosis integrating the proposed method and a classifier is depicted in Figure 2. As shown in Figure 2, fault diagnosis can be divided into three parts: data preprocessing, feature extraction, and fault recognition [1]. erefore, the effectiveness of feature extraction can be verified by ensuring the faults are correctly identified. e accuracy of fault identification is always converted into the classification accuracy of the classifier. e classification model establishes the relationship between fault features and fault types during the training phase, and then such a relationship is employed to make decisions for testing samples [31]. After that, the classification accuracy of the testing dataset is calculated and used as an index to evaluate the performance of feature extraction methods. However, except for feature extraction methods, there are other factors that affect classification accuracy, such as the number of training samples [32,33] and how they are selected (fixed or randomly) and the type and the parameter configuration of the classifier. On the whole, an outstanding feature extraction method should have strong stability, which means that it can lead to a high classification accuracy in the case of random selection of training samples, changing the training sample number or changing classifier type.

Case Study 1: Feature Extraction of Rotor Signals
e data were acquired from the experimental rotating machinery system displayed in Figure 3. e rotor laboratory bench is driven by a DC motor controlled by the DH5600 speed controller. e rotor with a diameter of 10 mm and a length of 850 mm is supported by the bearing brackets located at the two sides and contains two single shafts coupled together by a coupling. Two rub screw housings are installed on the rack of the system, and two mass disks with a diameter of 75 mm are fixed on the rotor. e sensors for signal acquisition contain a photoelectric sensor for speed measurement, a piezoelectric accelerometer for vibration measurement, and two eddy current sensors for displacement measurement. e signals collected by eddy current sensors and the piezoelectric accelerometer are sent to a front-end processor for filtering and amplification and then transmitted to the computer for storage and analysis. e system can simulate many types of rotating machinery faults. Here, three kinds of faults, including unbalance, misalignment, and contact-rubbing, are simulated. e unbalance condition is simulated by screwing 2 g mass block into the threaded hole near the edge of the mass disk. e misalignment condition is simulated by changing the relative position of the coupling shafts, while the contactrubbing is simulated by screwing the rub screw into the rub screw housing. e speed of the system is assigned to 1200 rpm, the sampling frequency is 2048 Hz, and the sample length is 1 s during the experiments [1]. Under each of the three fault conditions and normal condition, 45 vibration signals are acquired, respectively, for a total of 180 samples. e time-domain diagram of the vibration signals in the four states is illustrated in Figure 4.

Parameter Selection.
Following the steps described in Section 2.4, FFT is calculated on the entire rotor signal to get the frequency spectrum.
e frequency spectrum of one group containing four different rotor health states is illustrated in Figure 5.
It can be observed in Figure 4 that the signals are buried in noise, which causes difficulties to distinguish different states of rotors in time domain, while it can be seen from Figure 5 that the frequency spectrum of the four rotor states has the following differences: (1) the amplitude of the first maximum value (the rotor's rotation frequency, 20 Hz) is significantly different; (2) the six spectrum lines with the largest amplitude values in each spectrum are circled in red.
ese spectrum lines can be regarded as the dominant frequency components in the frequency spectrum. It can be seen that the dominant frequency components are different under different device states. Meanwhile, in some frequency bands, the amplitude of the same frequency component is obviously different, such as the frequency bands 150 Hz ∼ 200 Hz, 400 Hz ∼ 500 Hz, and 600 Hz ∼ 700 Hz. erefore, aiming to extract the features of each frequency band in the frequency spectrum and avoid missing the local information of the frequency spectrum, a strategy of spectrum segmentation is adopted in this study.
In the proposed method, there are two parameters that should be selected before using KPCA, i.e., N s and V s . As described in Section 2.4, N s is the number of segments that the frequency spectrum is to be divided into, while V s is the output dimension of KPCA. ese two parameters not only determine the input dimension of the AE but also have influence on the structure complexity and training difficulty of the AE. erefore, they should be determined carefully. e parameter N s , which determines the frequency range included in each frequency spectrum segment should be determined first. In this experiment, the sampling frequency of the rotor's vibration signal is 2048 Hz and the sampling time is 1 s, so the frequency resolution in the frequency spectrum is about 1 Hz, and the rotation frequency of the rotor corresponds to the 21st spectrum line in the frequency spectrum. According to the comparison result of Figure 5, it is inferred that the performance of feature extraction will be better when the first spectrum line is included in the first frequency spectrum segment, which means N s should be less  Mathematical Problems in Engineering than 48. Meanwhile, as the proposed method divides the entire frequency spectrum equally, the set of values of N s is [2,4,8,16,32]. And comparative experiments were conducted to verify this inference.
Here, local features obtained by applying KPCA to each segment of the frequency spectrum are connected in sequence to obtain a global feature and then classified by the knearest neighbor (KNN) classifier. Classification accuracy is thereafter calculated to help select the appropriate parameters.
e classification accuracies of the KNN classifier configured with different parameters are shown in Figure 6. e parameter configurations with mean classification accuracy higher than 95% are shown in Table 2. e effect of randomness is reduced by conducting 100 trials of experiments, and the mean accuracy was calculated. It is noted that when N s � 1, it means that KPCA is applied to the frequency spectrum without division. Unfortunately, there has been no standard method for choosing the best kernel for a given problem until now [21]. So, "Gaussian" which is commonly used with only one hyperparameter (c) as the kernel function is chosen, and the c is set as c � 1/V s .
Comparing the mean classification accuracies in Figure 6 and Table 2, the following conclusions can be drawn: (1) the highest accuracy is 98.89%, when N s � 32 and V s � 2, which means each frequency spectrum contains 32 frequency spectrum lines, and the first frequency segment only includes one frequency component that has an integer proportional relationship with the rotor rotation frequency, that is, 1 times rotation frequency; (2) when N s � 1, that is, the principal components are extracted without frequency spectrum segmentation, no matter how many principal components are extracted, the classification accuracy is lower than 95%, indicating that the segmentation of the frequency spectrum is necessary to help improve the performance of KPCA; (3) the more segments there are, the fewer principal components need to be kept in each segment. For example, when N s � 1, the classification accuracy of V s equal to 5, 6, or 7 is the highest; when N s � 128, the  classification accuracy of V s � 1 is the highest. When N s takes the value of 2, 4, 8, 16, 32, or 64, V s � 2 has the highest accuracy. Finally, N s � 32, V s � 2, and c � 0.5 is regarded as the best parameter configuration.
Based on the above experimental results, the experience of selecting N s and V s is summarized. When the proposed method is applied to other projects, the following methods can be used to initially determine the optimal values of N s and V s : (1) when selecting N s , the upper limit of the frequency range of the first spectrum segment should be more than 1 times but less than 2 times the rotation frequency. (2) e number of V s is less than or equal to 3. According to the above analysis, the proposed method is applied to the rotor signals as follows. First, each frequency spectrum with 1024 spectrum lines is divided into 32 segments. Next, two principal components considered as local features of the spectrum are extracted from each spectrum segment using KPCA. en, these local features are connected in sequence to obtain the global feature vector. Finally, the global feature vectors are sent to the constructed

Mathematical Problems in Engineering 7
AE to extract the most critical and lower-dimensional features of the frequency spectrum. In this study, the modules in AE are parameterized through neural networks. Both the encoder and decoder have only one layer, containing 100 neurons, and the activation function is Rectified Linear Units (ReLUs), given by the following: ere are only three neurons in the output layer of the encoder, and the activation function is linear. Activation of the last layer of the decoder is hyperbolic tangent (tan h) to match the input which is normalized into [− 1, 1].    Mathematical Problems in Engineering Full-batch training was used during the experiments because of the limited size of samples. en, the loss of the validation dataset was investigated to choose an appropriate learning rate. Training samples, validation samples, and test samples account for 60%, 20%, and 20%, respectively. All samples are randomly selected during the experiments.
In most cases, the learning rate (lr) can be fixed, decreasing, momentum-based, or adaptive and is set in the range [1e − 1, 1 − 4]. In this study, two learning rate setting strategies are studied: one is fixed learning rate which means the learning rate does not change with epochs; the other is decreasing learning rate which means the learning rate decreases with the increase of epochs. e learning rate setting schemes are displayed in Table 3, where patience � 10 (p � 10) means that if the loss of the validation dataset no longer decreases after 10 epochs, the learning rate will reduce in the form of lr n+1 � lr n * 0.1. e validation loss varies with the increase of training epochs, as shown in Figure 7.
It can be seen from Figure 7 that (1) when the fixed learning rate is 1e − 4 and 1e − 2, the validation loss converges at the slowest and fastest speed, respectively; (2) when the learning rate is set as 1e-1, the validation loss suddenly increases around the third epoch; (3) when the learning rate is set as 1e − 1 and 1e − 2, the validation loss first decreases and then increases, indicating that the model is over-fitted. e above information suggests that the model shows bad performance when the learning rate is set as 1e − 1, 1e − 2, and 1e − 4. It is also observed from Figure 7 that the validation loss converges around 800 epochs under different learning rate settings, and the lowest validation loss is 0.0092 in scheme 7. Ultimately, scheme 7 is adopted and training epoch � 1000 is set during the following experiments.

Comparative Experiments.
e effectiveness of the proposed method was verified by combining with the KNN classifier. Comparative experiments were designed to validate the superiority of the proposed method. In Method 1, the SFS-KPCA-AE model was set up as described above. In Method 2, the steps were the same as those in Method 1, except that the frequency spectrum in Method 1 was replaced by the time series. In Method 3, the first four steps were the same as the proposed method, while at Step 5, AE was replaced by KPCA. In Method 4, the first two steps were the same as the proposed method, while at Step 3, KPCA was replaced by AE. It is noted that in order to reduce the complexity of Method 4, only two autoencoders were trained, one for extracting local features, namely, AE-1, and one for extracting global features, namely, AE-2. During model training, 90 samples were randomly selected in each round of experiments. Because each sample contained 32 spectrum segments, a total of 2880 spectrum segments were obtained and constituted the training dataset. First, AE-1 was trained to get the local features of each frequency spectrum. Second, these local features were connected in sequence to obtain a global feature. Finally, AE-2 was trained with the global features to get a three-dimensional feature.
In Method 5, the same features (F20), as in [34], were extracted from time domain and frequency domain to comprehensively describe the healthy status through FFT and statistical methods. For a given signal data T with N points, [t 1 , t 2 , · · · , t N ], let R with L spectrum lines, [r 1 , r 2 , · · · , r L ] be its frequency spectrum, and f l is the frequency of the l-th spectrum line. Ten time-domain feature parameters, (TF 1 − TF 10 ), and ten frequency features, (FF 1 − FF 10 ), can be calculated as shown in Tables 4 and 5, respectively. Let F � [TF 1 · · · TF 10 FF 1 · · · FF 10 ] denote the extracted features from each sample. Normalize F into [− 1, 1] to reduce the huge discrepancy of different feature parameters. Finally, the dimension of the normalized mixeddomain feature is reduced to 3 by using KPCA. In Method 6, the same twenty features were regarded as the input of the AE whose hyperparameter configuration was the same as Method 1. en, these 20-dimensional features were converted into 3-dimensional features as the output of the trained encoder.
In Method 7, Ensemble Empirical Mode Decomposition and Curve Code (EEMD-CC) was used for the rotor dataset. Details of the method can be found in reference [32].
Each of the experiments was conducted in 20 trials. Training samples (90) and testing samples (90) were selected randomly to reduce the influence of randomness. e classification accuracies of each method in each trial are detailed in Figure 8. e mean accuracy with corresponding standard deviation and cost time of each method are calculated and listed in Table 6.
As illustrated by Table 6, Method 3 takes 1.04 s which is the minimum, Method 7 requires 28.31 s which is the maximum, and the remaining methods spend no more than 4 s. Because the autoencoder needs to be trained in 1,000 epochs, the methods using the autoencoder cost more time than the method without the autoencoder. As shown in Figure 8, Method 1 achieves high classification accuracy with strong robustness. Moreover, as can be seen from Table 6, the mean testing accuracy of Method 1 is higher than that of other methods. As for Method 2, the testing accuracy of 47.00% is the lowest in Table 6. It indicates that the features extracted from time series are less discriminative than features extracted from the frequency spectrum. us, it leads to lower classification accuracy than that of Method 1. e mean testing accuracies of Method 3 and Method 4 are 86.67% and 84.55%, both are lower than that of Method 1, which indicates that both the KPCA algorithm and the autoencoder have great influence on the performance of feature extraction and combining KPCA and AE can extract the most discriminative features. Moreover, the mean testing accuracies of Method 4 and Method 6 are lower than that of Method 3 and Method 5, which suggests that KPCA has a better performance than AE in feature extraction on the frequency spectrum and the mixed-domain features. It is noted that the dataset used here is the same as in [31], but the data were not denoised, which implies why the EEMD-CC method did not perform as well as in the literature. For further comparison, the proposed method was applied to the denoised vibration signals as in reference [31], and a test accuracy of 97.92% and a standard deviation value of 1.15% were achieved. is accuracy is lower than that of Method 5. However, comparison of the computing time between Method 1 and Method 7 in Table 6 shows that the latter takes significantly more time than the former. Hence, Method 1 is more suitable for rapid fault diagnosis than Method 7. Most importantly, this comparative experiment indicates that the proposed method has excellent robustness and can reduce the interference caused by noise on feature extraction.

Data Description.
To study the generalization of the proposed method, the classic rolling bearing datasets provided by the Case Western Reserve University Bearing Data Center Website [35] were used for the test. e datasets include acceleration measured in the vertical direction on the housing of the drive-end (DE) bearing and acceleration measured in the vertical direction on the fan-end (FE) bearing housing and on the motor supporting base plate (BA). Vibration signals were collected under four different loads: 0 hp, 1 hp, 2 hp, and 3 hp, and each load condition contained four health states of bearing: normal, fault in inner race (IR), fault in ball (BA), fault in outer race (OR). Bearing damage is a single-point damage caused by electro discharge machining, and each fault case contains three different fault widths: 0.007 in, 0.014 in, and 0.021 in. e sampling frequency is 12,000 Hz, and the sampling time for each sample in the original dataset is approximately 10 s.
Drive-end bearing faults were adopted in this case study. In order to obtain enough samples for training and testing, random sampling from the original dataset was utilized to get samples.
For instance, the data of inner race faults with 0.007 in under 1 hp, of which ID is 106, contain a total of 121991 points. One hundred nonrepeating points were selected randomly from these data points as the start point for each sample [11]. en, 12000 points suitable for FFT were collected from those start points to obtain a high frequency resolution. As detailed in Figure 9, 100 samples each including 12000 points could be acquired from the original sample through the described approach. Four datasets, labeled as set 0, set 1, set 2, and set 3, respectively, were constructed for the experiments. A total of 4000 samples participated in the experiments. e details of each dataset are listed in Table 7.

Validation of the Proposed Method (1) Fault Diagnosis with Different Training Sample Numbers.
According to the previous parameter selection knowledge detailed in Section 3.1.2, firstly, the value of N s should be determined, and the upper limit of the frequency range of the first spectrum segment should be more than 1 times but less than 2 times the rotation frequency. As the rotation frequency of the motor is about 30 Hz and the frequency resolution is about 1 Hz, the frequency spectrum of each sample in each dataset is divided into 100 segments (N s � 100). Secondly, two local features (V s � 2) are extracted by applying the KPCA algorithm to each segment, and these local features are connected in sequence to obtain a global feature vector. irdly, the global feature vector is regarded as input of the AE whose hyperparameters are the same as those of the rotor dataset.
Different training sample numbers were adopted to verify the stability of the proposed method. Each of the experiments was conducted in 50 trials. In addition, training samples and testing samples were selected randomly to reduce the influence of randomness. e KNN classifier was again used for classification during the experiments. e testing accuracies of SFS-KPCA-AE + KNN for bearing dataset are shown in Figure 10. As illustrated by Figure 10, testing accuracies of each dataset with different training sample numbers are not less than 99%, and the average    testing accuracy of each dataset is not less than 99.5%. It is observed from Figure 10 that for samples in set 2 and set 3, testing accuracies with different training sample numbers all reach 100%. e results indicate that the feature extraction capability of the proposed method is stable when the number of training samples changes.
(2) Fault Diagnosis under Load Fluctuation. Experimental results under the same load show that the proposed method has a good feature extraction ability for the rolling bearing vibration signals. However, in practice, bearings usually have to work under unstable load. Rotation speed variation appears with the load fluctuations, which brings about   difficulty for feature extraction [11]. Hence, another experiment was conducted to research the stability of the proposed method under different motor loads. In this experiment, twelve conditions were tested. Condition 0_1 means that the autoencoder in the proposed method is trained on 50% of the samples randomly selected in set 0 and tested on all samples in set 1. After that, the KNN classifier is used to classify the samples in set 1. Fifty percent of the samples are randomly selected for training, and the rest for testing. Results obtained by repeated experiments are given in Figure 11. It can be observed from Figure 11 that the testing accuracy keeps high under different conditions of motor load fluctuation, and none of them is less than 94%. ese results suggest that the proposed method has good stability under load fluctuation.

Comparative Experiments.
As discussed previously, a good feature extraction could be the key step of fault diagnosis. Yet, the performance of the classifier used also has an impact on the result of fault diagnosis. It is difficult to judge whether the high accuracy of a trained classifier is due to the classifier itself or to the strong discriminating ability of the extracted features. A good feature extraction method should be able to extract the most discriminative features to help different classifiers obtain a high classification accuracy.
Here, 25% of samples of each dataset were used to train the AE in the proposed method and the rest were used to test. e latent coding spaces of the AE are displayed in Figure 12, where the coordinate axes are the three features extracted by the proposed SFS-KPCA-AE. As can be seen from Figure 12, the feature space successfully divides the 10 healthy states of the bearing, which indicating that the extracted three features are discriminative, verifying the strong feature extraction ability of the proposed method.
Furthermore, the features extracted by the proposed method were sent to the KNN classifier and SVM classifier, respectively, for classification. e average testing accuracies of SFS-KPCA-AE + KNN and SFS-KPCA-AE + SVM for four datasets are shown in Table 8. It can be seen that the testing accuracy of SFS-KPCA-AE + KNN is 99.93%, and that of SFS-KPCA-AE + SVM is 99.97%. en, the proposed method was compared with other methods published in the literature using the same dataset. e results of comparison are listed in Table 8.
In reference [36], the experimental samples included different fault locations: inner race, ball, outer race, and different fault widths: 0.007 in and 0.021 in with the same load of 0, 1, 2, or 3 hp. e method was based on ensemble   0-IR-1  1-IR-1  2-IR-1  3-IR- empirical mode decomposition (EEMD) and optimized support vector machines. e average testing accuracy under different load was 98.22%. In reference [37], empirical mode decomposition (EMD), wavelet kernel local fisher discriminant analysis (WKLFDA), and SVM were integrated to distinguish ten health conditions in the bearing dataset and the average testing accuracy reached 98.80%. In reference [38], bispectrum features were used to obtain features. SVM was used to classify four bearing health conditions, and the classification accuracy was 96.98%. In reference [39], the stacked denoising AE-based deep neural network (SDA-DNN) was used to realize the bearing health state identification and the average classification accuracy of 92.35% was obtained. In reference [40], Gated Recurrent Unit-based nonlinear predictive denoising AEs (GRU-NP-DAEs) were constructed to classify different fault types of the bearings, and an average accuracy of 99.22% was achieved. A local connection network (LCN) constructed by the normalized sparse AE (NSAE) was proposed in [41] to learn meaningful and dissimilar features from raw vibration signals and recognize mechanical health conditions. is method achieved an accuracy of 99.92%.
It is noted that the methods in references [36][37][38] extract features by using different signal processing technologies, such as EEMD and wavelet analysis, while in references [39][40][41], features are extracted by using deeplearning techniques, such as DNN and DAE. It is evident that all these methods are thoroughly developed, yet they get lower diagnosis accuracies compared with SFS-KPCA-AE + KNN and SFS-KPCA-AE + SVM. In addition, compared with the proposed method, these methods may rely too much on prior experience to select the most discriminative features or require a lot of time in the training complex model, while the proposed feature extraction method extracts features based on unsupervised learning and is easy to train. e results of comparative experiments demonstrate the robustness and superiority of the proposed SFS-KPCA-AE.

Conclusions
To extract the most discriminating fault signature from the frequency spectrum and ease the burden of designing features for fault diagnosis, a feature extraction method, namely, SFS-KPCA-AE, is proposed in this paper. In this method, FFT is first used to acquire the frequency spectrum of vibration signal, later the frequency spectrum is divided into several segments. Next, several principal components considered as local features of the spectrum are extracted from each spectrum segment using KPCA. en, these local features are connected in sequence to obtain the global feature vector. Finally, the global feature vectors are sent to the constructed AE to extract the most critical and low-dimensional features of the frequency spectrum. A rotor dataset and a bearing dataset are used in the experiments to verify the effectiveness of the proposed SFS-KPCA-AE approach. By discussing the experimental results, three main conclusions are obtained: (1) e proposed method integrating the segmented frequency spectrum, KPCA and AE, is based on unsupervised learning techniques, which means that it is independent of labeled data and a lot of human labor to design the most discriminative features. However, since the proposed method is used for feature extraction of the frequency spectrum, the sensitivity to the sampling length of the signal remains to be studied further. In order to adapt to more types of machine vibration signals, it is necessary to further study how to segment the spectrum (evenly or unevenly).

Conflicts of Interest
e authors declare that they have no conflicts of interest.