Data-Driven Bearing Fault Diagnosis of Microgrid Network Power Device Based on a Stacked Denoising Autoencoder in Deep Learning and Clustering by Fast Search without Data Labels

The traditional health indicator (HI) construction method of electric equipment devices in microgrid networks, such as bearings that require different time-frequency domain indicators, needs several models to combine. Therefore, it is necessary to manually select appropriate and sensitive models, such as time-frequency domain indicators and multimodel fusion, to build HIs in multiple steps, which is more complicated because sensitivity characteristics and suitable models are more representatives of bearing degradation trends. In this paper, we use the stacked denoising autoencoder (SDAE) model in deep learning to construct HI directly from the microgrid power equipment of raw signals in bearings. With this model, the HI can be constructed without multiple model combinations or the need for manual experience in selecting the sensitive indicators. The SDAE can extract the representative degradation information adaptively from the original data through several nonlinear hidden layers automatically and approximate complicated nonlinear functions with a small reconstruction error. After the SDAE extracts the preliminary HI, a model is needed to divide the wear state of the HI constructed by the SDAE. A cluster model is commonly used for this, and unlike most clustering methods such as k-means, k-medoids, and fuzzy c-means (FCM), in which the clustering center point must be preset, cluster by fast search (CFS) can automatically find available cluster center points automatically according to the distance and local density between each point and its clustering center point. Thus, the selected cluster center points are used to divide the wear state of the bearing. The root mean square (RMS), kurtosis, Shannon entropy (SHE), approximate entropy (AE), permutation entropy (PE), and principal component analysis (PCA) are also used to construct the HI. Finally, the results show that the performance of the method (SDAE-CFS) presented is superior to other combination HI models, such as EEMD-SVD-FCM/k-means/k-medoids, stacked autoencoder-CFS (SAE-CFS), RMS, kurtosis, SHE, AE, PE, and PCA.


Introduction
e microgrid power equipment (bearings) is a very commonly used mechanical device in the industrial field, but it wears down easily. Its status and reliable operation are of great significance in ensuring power system safety and reducing equipment operating costs. As bearing running time increases, performance will gradually degrade. e quality of bearings also affects the operation of adjacent entire power system [1]. An indicator is commonly used to assess the health status of the bearing, which can provide a sound foundation for bearing performance degradation assessment (PDA) [2]. e signals are often used as an indicator to monitor the status because the quality of the bearing vibration signal can indicate bearing health in the PDA [3,4]. Many models, including various statistical parameters and mechanical signal-processing methods, are often used to extract useful degradation features for constructing bearing health indicators. Root mean square (RMS) and kurtosis are the most commonly used time-domain statistical parameters and can be considered for monitoring the health status of bearings by using a vibration signal. Williams et al. used RMS and kurtosis and demonstrated that they could effectively reflect and extract the bearing fault features [5]. Tse and Wang developed a method based on RMS to construct a health indicator for bearing PDA models, after the original vibration signal has been filtered from a ranged frequency band [6]. In [6], RMS is used to track the degradation status at the point when the vibration energy has changed. Shen et al. used multiple time indicators, including the RMS, to extract the useful degradation characteristics of bearings. ey used a multivariable support vector machine to predict the remaining useful life of microgrid power equipment (bearings) [7]. Lei et al. also considered RMS for extracting the degradation trend to evaluate the degradation status of bearings. ey demonstrated that the RMS could provide useful status information in the degradation stage, but not under normal conditions [8]. In [9], RMS and kurtosis were used to monitor bearings with a low filter, to filter out the useless frequency band and retain the useful band, according to the bearing working frequency. Lei et al. used multidimensional time-frequency, including RMS and kurtosis, for bearing fault feature extraction and fault diagnosis [10]. Other statistical indicators such as entropy models, including Shannon entropy (SHE), approximate entropy (AE), and permutation entropy (PE), are useful ways to assess the gradation trend of a mechanical device. AE can represent the regularity of multidimensional time series and contains more time-related information by using a coarse-graining operation for a time series [11,12]. Yan et al. presented a health indicator for bearing PDA based on AE. As the working condition of the bearing deteriorates, the degree of wear increases and the number of frequency components contained in the vibration signal will increase, eventually causing its regularity to decrease and its corresponding AE value to increase. A detailed analysis of the impact of parameters on the AE model is described in this study [13]. e computational efficiency of PE is higher than that of AE as it only uses the order of the values and it is robust under a nonlinear distortion of the signal [14]. Yan et al. used PE as a health indicator to track the degradation of bearings, as it can describe the complexity of a vibration signal measured in a physical system by using phase-space reconstruction and takes into account the nonlinear behavior of the vibration signal [15]. Many traditional mechanical signal-processing models, such as wavelet transfer, EMD [16], and ensemble empirical mode decomposition (EEMD) [17], are often used to construct health indicators for the bearing PDA model. Qiu et al. used wavelet transfer to filter the noise, which could mask the bearing vibration signal, and then SHE was used to optimize the Morlet wavelet shape to extract the weak fault feature [18]. Lou et al. used the wavelet and fuzzy inference to extract useful fault features for bearing fault diagnosis [19]. Compared with wavelet transfer, EMD can decompose the original vibration signal into intrinsic mode functions (IMFs) adaptively without wavelet bias function selection and decomposition level choices. Wang et al. [20] used the EMD to decompose the vibration signal into IMFs and then applied singular value decomposition (SVD) to calculate the singular values of the IMFs. After these two steps, the Mahalanobis distance was used to construct a health indicator for bearing PDA. Rai et al. [21] considered the EEMD for decomposing the bearing vibration signal into IMFs and then SVD was used to calculate the singular values (SVs). Finally, k-medoids were used to find the available cluster center points, which can reflect the bearing degradation status by using a health indicator, known as the confidence value (CV), to build the bearing PDA model. ese have achieved significant success in health indicator construction and are used for bearing PDA. However, there are some problems with these PDA models.
(1) Time-frequency domain indicators are commonly used methods and must select sensitive indicators to show the difference between different faults and to improve the accuracy of fault classification. Wei et al., for example, considered the self-weight to evaluate and judge the quality of the time-frequency domain indicators for fault diagnosis [22]. e optimized indicators can then be used to diagnose faults and to improve the identification accuracy. Tse et al. selected commonly used sensitivity time-frequency indicators to extract the oil sand pump degradation trend with principal component analysis (PCA) [23]. us, manual experience can be applied in filtering the sensitivity indexes selected, ensuring they can achieve and improve the performance of fault diagnosis and the PDA of a device in a complex mechanical system.
(2) e operating environment of complex mechanical systems is variable, and the commonly used timefrequency domain indicators will exhibit different advantages and disadvantages depending on the operating conditions. erefore, relying on manual experience to select sensitive features applicable to complex and varied mechanical equipment operating environments is difficult. (3) In addition, commonly used models should be combined to extract useful degradation information from the raw vibration, such as EEMD combined with SVD and a clustering model to construct the health indicator for evaluating the PDA of bearings. us, these combined models to some extent lack versatility. erefore, these fusion time-frequency indicators and combined models are complicated and are reliant on manual experience.
To overcome these drawbacks, in this study, the stacked denoising autoencoder (SDAE) [24,25] from deep learning is used to extract the initial degradation level for bearing PDA directly from the raw vibration signal, without selecting various indicators and model combinations. e deep architectures of the SDAE enable it to extract the 2 Complexity representative information adaptively from the original data through several nonlinear hidden layers and approximate complicated nonlinear functions, with a small reconstruction error and without manual experience and data labels [26]. us, there is no need for manual experience or prior knowledge of the SDAE. e bearing degradation trend can be extracted by using encoder and decoder processes and reconstructing the input through several hidden layers. e SDAE is an unsupervised model, which is robust when there is noise in the original vibration signal. It is an improved model based on a stacked autoencoder (SAE), which is a basic deep learning model that is widely used in different domains, such as fault feature extraction and fault diagnosis. Feng et al. [26] used the SAE to extract the bearing fault information from the frequency-domain signal directly after fast Fourier transform (FFT) without time-frequency indicator selection. Lv et al. used a weighted time series fault diagnosis method based on the SAE. e proposed model in this study not only captures the high-order correlations among monitoring variables but also uses the time correlations among samples [29]. To further explore more representative fault characteristics using the SAE network, Qi et al. combined EEMD and the autoregressive (AR) model to preprocess the collected nonstationary vibration signals and obtain AR parameters based on intrinsic extraction. e decomposed mode function components are selected as an input feature of the SAE network [27,28]. e vibration data gathered from these various engineering devices contain noise, and the empty and destroyed data collected will make the analysis difficult. e SDAE is robust because it destroys the original data into zero according to the denoising probability and reconstructs the input data by using encoding and decoding. is denoising operation improves the SAE so that it can learn a more robust representation. erefore, the SDAE is more robust and stable than the SAE. Xu et al. used the SDAE to train the bearing vibration signal after a fast Fourier transform (FFT) and extracted useful fault features under various conditions. e SDAE was found to reduce the feature dimension to 2 directly without PCA. e clustering model was then used for bearing fault diagnosis [29]. e authors also demonstrated that the SDAE is more robust than the SAE. Additionally, the SDAE has been used successfully in other domains, including multimodal video classification, physiological signal processing, and 3-D object identification [30][31][32][33][34]. However, few studies have focused on the PDA of bearings when using the SDAE, and most instead consider r bearing fault diagnosis. e SDAE has had many successful applications, so it is used in this study to extract the initial degradation trend for bearings directly from the original vibration signal.
After the SDAE has extracted the degradation trend to construct a health status without a data label, the clustering model is a common method of building a health indicator for bearing PDA and for determining the degradation status by calculating the distance between each sample point and its cluster center point. Pan et al. proposed a model based on wavelet transfer and fuzzy c-means (FCM) for bearing PDA [31] and developed a method based on EEMD, SVD, and k-medoids clustering models to construct a health indicator for bearing PDA. ey demonstrated that the proposed model was better than other models such as EEMD-SVD-kmeans [21], RMS, and kurtosis. However, clustering models such as FCM, k-means, and k-medoids require cluster number selection before calculation. Typically, the three degradation statuses of Normal, Slight, and Severe are suitable for bearing degradation division. ese clustering models have successfully been used in the PDA of bearings.
However, these clustering models need to preset the number of clusters. Several states may sometimes exceed these predefined three statuses. Manually selecting the number of degraded trend states by using and selecting the number of cluster center points will lead to erroneous judgments. For example, a bearing may have only two degradation stages such as Normal and Severe at one time, and not the three statuses.
is preset three-degradation status method cannot adaptively meet the requirements of dynamic changes in different situations of data acquisition under complex project operating conditions.
To solve this problem, in this paper, we use a clustering by fast search (CFS) model for bearing PDA. CFS can find the available cluster center number automatically according to the local density and the distance between any two samples [34]. CFS has been successfully used in other domains. To solve most current clustering techniques, only static data can be processed into clusters. Zhang et al. proposed a CFS model based on the peak of k-medoids to integrate the current model into the previous model to achieve final clustering and applied it to industrial dynamic acquisition data analysis. Effectively analyzing these data can help improve industrial services and ensure the system has no possibility of symptomatic failure [35]. Xu et al. used the EEMD with base-scale entropy to extract the useful fault information of bearings under different conditions. e base-scale entropy-based feature vector is then used as the input of CFS for fault diagnosis. CFS has been successfully applied in various areas, but few studies report that CFS has been used for the PDA of bearings [36]. us, in this study, CFS is used to find the available cluster center point and then construct a health indicator, as in [21], to evaluate the degradation status.
As mentioned above, an unsupervised method based on the SDAE and CFS is proposed to construct a health indicator for bearing PDA without data labels or prior knowledge. e contribution of this paper is as follows: (1) e SDAE extracts the bearing degradation from the original vibration signal directly without manual intervention, as is often used to select sensitive timefrequency indicator and combine several available models to construct HI. e SDAE and CFS are used in this paper with bearing PDA because another research has investigated bearing PDA using the SDAE and AP. (2) To demonstrate that the model proposed is better than other combined models, including EEMD-SVD-FCM/k-means/k-medoids, SAE-CFS, PCA, Complexity 3 RMS, kurtosis, SHE, AE, and PE, a detailed comparative analysis is provided. e rest of this paper is organized as follows. e basic theories of the SDAE and CFS are given in Section 2. Section 3 describes the procedures of the method proposed, the experiment comparison analysis is given in Section 4, and Section 5 concludes the paper.

Basic Theories of the SDAE and CFS
2.1. Basic eory of the SDAE. In this section, the basic unit in the SDAE, DAE, which is based on AE, is used. en, the basic structure of the SDAE is then described, which is stacked from DAE.

Autoencoder (AE).
e main idea of AE [25] is to build constant functions between input X and output Z and to achieve dimensionality reduction and preserve data feature information. It can be divided into two parts: encoder and decoder.
(1) Encoder. Figure 1 illustrates how encoding is the process of implementing an input dataset X mapped into a lowdimensional space by an activation function. e encoder performs a mapping conversion from the input vector X � x 1 , x 2 , . . . , x n to the output representation Y � y 1 , y 2 , . . . , y n } by using an active function. n is the total number of samples. e calculation expression is as follows: wheref δ and s are the sigmoid activation functions, wheref δ (X) � 1/1 + e − X , and f δ (X) is the abbreviation of s(Wx + b) for input X. W is the weight vector in the neural network between the former and latter layers, and b denotes the bias item.
(2) Decoder. Decoding is the procedure of mapping Y from a high-dimensional space into a high low-dimensional space Z � z 1 , z 2 , . . . , z n and reconstructing the input sample X into Z. e calculation is as follows: where g δ and s are the sigmoid activation functions, whereg δ (Y) � 1/1 + e − y . g δ (Y) is the abbreviation of s(Wy + b). erefore, the reconstructed error (lost function) between Z and X is defined as e backpropagation algorithm is used to adjust the weight vector W and bias item b and train the autoencoder network to reduce the reconstructed error. Hence, the restrictive error J(X, Z) is converged and minimized until it meets the termination condition, i.e., it exceeds the maximum iteration.

Denoising Autoencoder (DAE).
e data collected in actual engineering often contain noise, and hence the characteristics obtained by the autoencoder will cause errors owing to the presence of noise. e denoising autoencoder (DAE) [37] solves this problem by destroying the noisecontaining data into zero according to the denoising probability P and reconstructing the destroyed input X 1 into output Z by using the encoder and decoder in AE. e basic structure of DAE is shown in Figure 2.
e black round node in Figure 2 is the damaged data point in X 1 , and P denotes the denoising probability. Some parts of the input data X are therefore set as zero and then X is changed to dataset X 1 . f and g denote the sigmoid activation functions in formulas (1) and (2). e following calculation steps are the same as for AE when an encoder and a decoder are used to reconstruct the output Z into the original input data X.

Stacked Denoising Autoencoder (SDAE).
e SDAE contains three layers: (1) an input layer; (2) several hidden layers; and (3) an output layer. It uses several hidden layers, which are stacked from several DAE units, to extract the useful information. erefore, the output Z from the previous DAE hidden layer is regarded as the input of the next DAE hidden layer. e connection weight matrix W and bias vector b are then iteratively updated during the pretraining period. After the training has been completed, the entire network is fine-tuned, and after the above steps, the SDAE is formed. e basic structure of the SDAE is shown in Figure 3. Here, N is the number of DAE hidden layers in the SDAE.

Basic eory of CFS.
e CFS clustering algorithm is mainly based on the characterization of the cluster center.
(a) e cluster center point itself has a high density and is surrounded by data points whose densities are not more than those of its own neighbors. (b) e distance between the cluster center point and other data points in another cluster is better.
e detailed computational procedures of CFS clustering are given in [32], and its calculation process is as follows: Encoder Decoder Figure 1: Structure of AE.
Output Z is regarded as the input of next layer .

Input layer
Output layer Several hidden layers

Complexity
(1) For a given dataset X � x 1 , x 2 , . . . , x n e distance between any two points x i and x j can be calculated by (2) e local density ρ i for a data point x i is calculated by where ρ i denotes the total number of distances that are less than the cutoff distance d c between the data point x i and other points and N is the total number of samples.
(3) Calculate the distance δ i , assuming that the q i N i�1 is the descending order of ρ i N i�1 , where ρ i N i�1 meets the condition: ρ q1 ≥ ρ q2 ≥ · · · ≥ ρ qN , and hence δ i can be defined as where δ i indicates the greatest distance between any one point and point x i when x i has the greatest local density. Otherwise, δ i is the smallest distance between any one point and point x i when x i has the smallest local density. (4) Compute c according to the following equation: (5) Use c to determine the potential clustering center point when c is in descending order. e authors suggest that the greater the c i , the greater the possibility of point x being a cluster center point. e stepped data points with the greater c values are sequentially selected as the cluster center points, that is, the c value shows a significant jump when the cluster center point transitions to the noncluster center point. ese points with skipping are thus selected as cluster center points, according to this characteristic. Finally, the number of distances between each data point and the cluster center is less than the number of cutoff distances required to achieve classification.

Procedure of the Presented Model
e procedure of the method comprises four steps: (1) data preprocessing; (2) preliminary degradation trend generation; (3) degradation trend dimension reduction; and (4) bearing degradation assessment and confrontation analysis. e details of these steps are as follows.
(1) Data preprocessing: to extract the useful bearing degradation trend and preprocess the data more easily, the absolute amplitude values of all original vibration signals are regarded as the input of the SAE and SDAE for training, after standardizing into [0, 1]. (2) Preliminary degradation trend generation: for the SAE and SDAE, there are nine hidden layers from which the useful initial degradation trend of the bearing can be extracted. e dimension of the extracted bearing degradation trend by each hidden layer, except the last, is not 2. To demonstrate that the SDAE is more robust and stable than the SAE, PCA is used to reduce the dimension of the extracted degradation for the first eight hidden layers. After EEMD is decomposed using the original vibration signal, SVD is used to calculate the SVs to identify the degradation trend. In addition, to show that the model presented is superior to others, RMS, kurtosis, SHE, AE, PE, and PCA are also used for extraction of the degradation feature.
(3) Degradation trend dimension reduction: for data visualization, the number of neural nodes at the last hidden layer in the SAE and SDAE is set directly to 2. For EEMD, the two IMF components are selected according to the two largest correlation coefficients, which are calculated from each IMF and the original vibration signal. e greater the correlation coefficient value, the greater the amount of useful vibration information it contains. en, SVD is used to compute the SVs for dimension reduction. For PCA, the first two principal components (PCs) are selected as the extracted features. (4) Bearing degradation severity assessment and confrontation analysis: (a) Bearing degradation severity assessment: the twodimensional degradation features extracted using the SAE and SDAE are selected as the input of CFS to find the available cluster center point. e health indicator, or confidence value (CV), is then used to build a PDA model. e details of the CV calculation are as follows: where DI is the Euclidean distance, which is often used to compute the distance between each point A (x 1 , y 1 ) and its cluster center point B (x 2 , y 2 ). DI is then calculated by e main purpose of DI is to transfer all of the CVs to [0, 1] by using one cluster center point. c denotes the scale factor. e CV is close to 1 when the "Normal" clustering center point is used, which indicates that 6 Complexity the sample belongs to "Normal" [21]. For the ease of the comparison analysis, all CVs are normalized to [0, 1], and then the CV degradation trend curve is smoothed by using a smoothing function through the four-time window. (b) Confrontation analysis: the method proposed is demonstrated to be superior to other models such as the SAE-CFS, EEMD-SVD-FCM/k-means/kmedoids in [21], PCA, RMS, kurtosis, SHE, AE, and PE, through the detailed analysis given in the following section.
e SVs in SVD obtained from EEMD and SVD are regarded as the input of FCM, k-means, and k-medoids to find the available cluster center points. en, the CVs are calculated according to equation (8).
rough the above steps, all CVs are obtained from the proposed method, SAE-CFS, EEMD-SVD-FCM, k-means, and k-medoids. e steps of the method are shown in

PDA Building and Comparison Analysis
In this section, the experimental data and the data collected platform are first introduced, and in the following step, the SAE and SDAE are used to extract the degradation; hence, the last step is the smoothness comparison. e extracted features are then used to find the clustering center points with CFS. Finally, a comparison analysis is given.

Original Vibration Signals.
e experimental data acquisition platform is shown in Figure 5. e operating conditions of the bearing depend on the instantaneous measurement of the radial force exerted on the bearing, the rotation speed of the shaft that manipulates the bearing, and the magnitude of the torque exerted on the bearing. e bearing degradation feature is based on two sensors, vibration and temperature. e vibration sensor consists of two micro accelerometers that are perpendicular to each other. e first is in the vertical position, and the other is in the horizontal position. In addition, the vibration sensor is fixed on the outer ring of the bearing. e data sampling frequency is 25.6kHz. e temperature sensor is not described in detail here. e vibration data in the horizontal position are used in the experiment.
For more information on the experimental platform, refer to the literature [38]. e experimental dataset is an accelerated degradation test of the bearing under various operating conditions to obtain the measured data in the bearing life cycle for fault detection and prediction of the bearing's remaining life [38]. e three load conditions are 4000 N, 4200 N, and 5000 N. e corresponding speeds are approximately 1800 rpm, 1650 rpm, and 1500 rpm. e experimental device samples the data every 0.1 seconds. e data length of each sample is 2560. e details of the experimental data for bearing 1 are given in Table 1. e original vibration time-wave for bearings 11-15 is shown in Figure 6. Bearings 11-14 have 2 or 3 degradation statuses. For bearings 11 and 13, the amplitude of the vibration signal is gradually increased. e marked red rectangle denotes the Severe status in Figure 6. Bearing 12 clearly shows a jump and some noise under the Normal condition. Hence, it has only two degradation statuses (Normal and Severe). Compared with bearing 12, bearing 14 has two obvious jumping points; hence, bearing contains three statuses. e vibration signal in the blue rectangle denotes the Slight condition, and the Severe condition is shown in the red rectangle. It is difficult to identify the status at first glance without extensive manual experience.
e degradation status of bearing 15 is even more problematic as it cannot be seen clearly when extracted manually because there is massive noise. We also take one sample to show the frequency result. e FFT results are shown in Figure 7.
e signal is more prominent at the frequency of the range [350, 450] except bearing 15, which are not the approximate integer times the working frequency (25.6 Hz). erefore, this result indicates that the frequencydomain signal contains no more useful degradation trend information. Moreover, there is no useful information from frequency domain for bearing 15 because the massive noise result makes the degradation trend in the frequency domain not good. erefore, we use the absolute amplitude from original signal to extract the degradation of bearing and reduce the manual experience.
In the following section, the SDAE and SAE are used to extract the preliminary degradation trend and CFS is then used to find some available clustering center points, which are in turn used to determine the degradation trends and construct the health indicator CV for assessing the bearing's PDA.

Degradation Trend Extracted by the SAE and SDAE.
In this part, the SAE and SDAE are used to extract the preliminary degradation trend through several hidden layers. Before preliminary degradation extraction, the absolute values of the vibration amplitudes for all bearings are considered as the input of the SAE and SDAE to reduce the data dimension for convenient data visualization. e hidden layers in the SAE and SDAE have a triangular structure, that is, the number of hidden layer nodes is half of the former adjacent hidden layer. In [29], the authors use a triangular hidden layer structure to extract the potential fault feature and confirm the validity of the proposed model. Note that the triangular structure often results in the number of latter hidden layer nodes being half of the number of former adjacent hidden layer nodes. erefore, the number of the first nine hidden layers' neural nodes is set at [1280, 640, 320, 160, 80, 40, 20, 10, and 2] in the SAE and SDAE. e maximum iteration number for each layer is 50. With regard to the learning rates in the SAE and SDAE, the lower the learning rate is, the more slowly the update speed changes for the cost function. A small value will result in a local minimum [37,39]; hence, we use 0.1 in this study. In SDAE, if the value of the denoising probability P is too great, more information will be lost from the original data. e authors suggest that the parameter P is typically fixed below 0.5 [37,39]. erefore, in this study, a low value of 0.1 is used for denoising probability P.

Complexity 7
To demonstrate that the robustness and feature extraction performance of the SDAE are superior to those of the SAE, the degradation trends of all bearings from the first eight hidden layers are used for comparison. For easy visualization of the data, PCA is used to reduce the dimensions of the extracted degradation vectors to two for the first eight   Complexity   Figures 8 and 9 (due to limited space, only bearing 11 is given as an example in this study). Figures 8 and 9 show that the degradation trend can be extracted successfully from the original vibration signal after the original vibration data have been trained by the SAE and SDAE. All curves show monotonic growth or reduction at each hidden layer, and compared with the SAE, they are more stable and less noisy in the SDAE at each layer, and there is little or no noise before the first 1500 points. In Figure 9, they look like a straight line without fluctuation when the SDAE is used, but there is a small amount of curve fluctuation at the seventh hidden layer in the SAE. Take the 391 st data point as an example. In Figure 8, as the number of hidden layers increases, the 391st data contains obvious noise, even though the 391st data point in the trend is extracted from the last hidden layer. But Figure 9 shows that with the increase of number of hidden layers, the noise contained in the 391 st data point gradually weakens, for example, starting from the sixth hidden layer, the noise of the 391 st data point is obviously weakened. SDAE sets part of the input data to 0 and reconstructs it through the denoising rate; this can reduce the noise, and hence the denoising effect of SDAE is better than SAE. e SAE and SDAE are also used to extract the degradation trend for different bearings (11)(12)(13)(14)(15), and the corresponding results through the ninth hidden layer are shown in Figures 10 and 11, respectively. Similar to bearing 11, as the number of hidden layers increases, the noise of the HI curve extracted through the final hidden layer by SDAE is not obvious, while the noise of the HI curve extracted by SAE is still obvious, such as bearing 14 in Figures 10 and 11.
In Figures 10 and 11, all curves show a monotonous increase and decrease, except for bearing 15. Compared with the SAE, there is an obvious monotonous increase or decrease curve for bearings 11-14 when the SDAE is used in Figure 11. Starting from around the 1000 th point, the curve shows a stable status for bearing 15 in Figure 11. Before the 1000 th data point, there is an evident rising and falling trend when SDAE is used. ere is no conspicuous trend at the ninth hidden layer when the SAE is used in Figure 8, as it is submerged in massive noise. e SDAE destroys and reconstructs the input to improve its robustness, confirming that the SDAE has a denoising ability. e trend for bearings 12 and 14 looks like a straight line and is not rising and falling because the vibration amplitude is very smooth at each stage. is is in accordance with Figure 6. In Figure 10, particularly for bearing 14, there is some noise in the SAE under the Normal condition, while there is a stable line in Figure 11 when the SDAE is used. e line pattern for bearing 13 is similar.
ese results demonstrate that the robustness and stability of the SDAE are superior to those of the SAE. In addition, the SAE and SDAE can extract the preliminary degradation characteristics well without extensive manual experience and tagged data labels.
However, at first glance, bearings 12 and 14 have only two states in the manual process. However, there are two degradation statuses: Normal and Severe, and perhaps three: Normal, Slight, and Severe, for bearing 14. Identifying how many degradation statuses there are for bearing 15 is difficult with the naked eye, and determining the number of degradation statuses a bearing should have using manual experience and the naked eye can in general be erroneous. erefore, CFS is used to find the cluster center point under different degradation statuses and can be an option an engineer can use to determine the degradation status of a bearing.

Constructing a Health Indicator CV and Judging
Degradation by Using CFS. Before CFS calculation, some parameters need to be preset, such as the cutoff d c . In [32], the authors advise that the average number of a neighbor data point for other points should not exceed d c . In general, the average neighbor data point accounts for about 1-2% of the total number of data points. Hence, the average number of a neighbor data point is often set at about 1-2% of the total sample size. If the local density ρ i of point x i is too great, it will result in low discrimination; if d c is too small, the same cluster class will be split into multiple parts [32]. e results of local density ρ i and the distance δ i when the SAE/SDAE-CFS is used for bearings 11-15 are shown in Figure 12  d c and the local densityρ i , the more likely it is to become the cluster center point. Hence, these three points with obvious jumping are selected as the clustering center points, and there are only two points with obvious jumping.
(2) Compared with the SAE, the selected clustering center points with jumping for bearings 13 Figure 16(c). It is easier to choose and determine the clustering center points by using the SDAE. is is also shown in Figure 16(e), where four selected clustering center points are very clearly filtered far away from other points that are close to   12 Complexity the horizontal axis. However, in Figure 14(c), many data points are scattered because there is massive noise in Figure 8 when the SAE is used. us, the feature extraction of the SDAE is superior to that of the SAE.
(3) Figures 12 and 13 show that compared with the SAE-CFS, the SDAE-CFS performs better at clustering. In Figures 13(b) and 13(d), all samples are separated well by using the SDAE, as few points are scattered around its clustering center point in Figure 12  14 Complexity but there is only one point identifiable under the Normal condition in Figure 13(d) using the naked eye. is is consistent with the situation in Figure 10 because when the SAE is used, the extracted degradation curve contains some noise under the Normal condition throughout the entire life.  Figure 17, and it is evident that starting from the 66 th point, the RMS curve increased sharply until the 141 st point. After   Figure 18(e), the degradation trend is similar to that of RMS, and several turning points, such as the 66 th point, the 359 th point, and the 1104 th point, can reflect the degradation trend of bearing 15 well by using our presented model. Each subfigure in Figure 18(e) can be regarded as a reference to determine the degradation trend together with other curves. e first three subfigures in Figure 18 show that starting from the 359 th point, these three curves clearly become stable, so it is easier to judge the Severe status for bearing 15   Complexity models presented in [21], such as the time-frequency indicators RMS and kurtosis and PCA, are considered and compared with our proposed model. (1) In Figure 20, the degradation trend is obviously obscured by noise, which will easily result in the degradation status being misjudged, particularly for bearings 11 and 12. Unlike kurtosis and RMS, the SDAE-CFS, shown in Figure 18, can reflect the degradation well. (2) Some RMS curves have small fluctuations and few noises. us, all of the stable and smooth curves of RMS are inferior to those of the SDAE-CFS. In Figure 18, most of the CV curves under different conditions look like straight lines at first glance, especially under the Normal condition; these CV curves can be used to identify the degradation status more easily than those of RMS when the degradation status has changed. For example, the status change from Normal to Severe in bearing 12 is very clear, starting from the 827 th point; there is an obvious jump in Figure 18, while the RMS curves in Figure 19 show a gradual change, not a jump. (3) For bearing 15, the degradation trend must be judged manually, while CFS can find the available number of the degradation status automatically. erefore, these results demonstrate that the proposed model is better than RMS and kurtosis. In addition, CFS can provide the available clustering center point to assess the number of the degradation statuses.  Figure 18: CVs for bearings 11-15 resulting from using the SDAE-CFS under various conditions such as "Normal," "Slight," and "Severe." degradation. In Figure 21, there is no obvious degradation for bearing 15 because much noise masks the trend of the degradation. e SDAE used the DAE to destroy the data to zero and then reconstruct it. erefore, the SDAE is more robust and stable than PCA and CFS in finding suitable clustering center points to determine the degradation status. In addition, most of the CV curves obtained from PCA are not as stable and smooth as those of the SDAE-CFS.

e Presented Method Compared with EEMD-SVD-k-
Means/k-Medoids/FCM. In this section, the proposed model is compared with other models in [21], such as EEMD-SVD-k-medoids/k-means/FCM. Some parameters should be set before calculating EEMD and k-medoids/kmeans/FCM.
(1) EEMD: two parameters must be selected before the EEMD calculation; m is the ensemble number and   the amplitude of the added white noise n i (t) [17]. e added white noise is calculated from the standard deviation (SD) of the original vibration signal. In [17], the authors advise that the white noise should be set at 20% of the standard deviation from the original data [17]. For parameter m, a few hundred numbers will result in greater accuracy. Hence, the parameter m � 100 is selected in this study.
First, EEMD is used to decompose the original signals to IMFs. As space is limited, here we only use bearings 11 and 12 as examples. IMFs obtained from EEMD are shown in Figure 22. e amplitude of the first two IMFS is greater than that of the others because all IMFs are decomposed in order of frequency from high to low. In addition, the correlation coefficient is used to calculate the degree of relevance between each IMF and the original signal. e values of the corresponding correlation coefficients are shown in Figure 23. e two highest values are for IMF1 and IMF2. is indicates that these first two IMFs contain useful information about the original signal. erefore, IMF1 and IMF2 are used to calculate the SVs (SV1 and SV2) through SVD. e results for SV1 and SV2 are shown in Figure 24. In this figure, bearing 11 has 3 statuses while bearing 12 has 2. e black rectangle denotes the Severe status. erefore, these two extracted feature vectors [SV1, SV2] are regarded as the input of k-medoids/kmeans/FCM for finding the available clustering center points.
e two-dimensional clustering figure of the bearings when EEMD-SVD-k-medoids/k-means/FCM is used for bearings 11-15 is shown in Figure 25. e corresponding CVs for bearings 11-15 obtained from EEMD-SVD-k-medoids/k-means/FCM under various conditions are shown in Figure 26.
(1) Figures 12 and 13 show that the SAE/SDAE-CFS performs better at clustering when compared with EEMD-SVD-k-medoids/k-means/FCM. In Figure  13  (2) For bearing 15, the number of clustering center points is set at 4 according to the CFS clustering result referred to above. us, CFS can provide us with an available option to determine the number of degradation statuses for bearing 15 without prior knowledge, but k-medoids/k-means/FCM cannot do this.
(3) In Figures 26(b), 26(g), and 26(l), there is some noise under the Severe status for bearing 12. is noise may be mistakenly assessed when judging the state of degradation. In Figure 18(b), there are only two straight lines to divide the trend statuses. (4) In Figure 18(a), the CV line shows an obvious increase between the Normal and Slight statuses for bearing 11, and it is easy to identify these statuses.
However, in Figure 26, they are similar under the Normal and Slight conditions when EEMD-SVD-kmedoids/k-means/FCM is used. (5) All CV lines in Figure 18 are more stable than those in Figure 26, particularly under the Normal status. In addition, there is some noise in Figure 26 when different models are used.

e Presented Method Compared with SHE, AE, and PE.
In this section, typical health indicators such as SHE, AE, and PE are used to assess the bearing degradation trend.

Complexity
Some parameters should be preconfigured before AE and PE calculation.
AE: the two parameters that should be set before calculation are embedded dimension and tolerance. Increasing the embedded dimension will cause the AE to include more useful information in the calculation, but it will also increase the computational cost. e authors suggest that the embedded dimension is often fixed at 2 [13]. Tolerance is often set at (0.1∼0.25) * SD, where SD is the standard deviation from the original data [13].
PE: in references [15,40], the authors demonstrate that the embedded dimension should be in the range of 3∼7. If the embedded dimension is more than 8, the corresponding calculation efficiency is poor because the reconstruction of phase space will homogenize the vibration signals. If the time delay is more than 5 and the embedded dimension is less than 4, the calculation cannot accurately detect small changes in the vibration signal. In addition, the experiment result demonstrates and the authors suggest that fixing the embedded dimension at 6 and the time delay at 3 could provide a suitable PE   where it is easier to misjudge different degradation statuses, and there is a lot of noise in Figure 28. Compared with SHE and PE, when AE is used, the curve line in Figure 29 has a blurred degradation trend for bearing 15, and AE cannot provide a suggestion for the number of degradation statuses with which to identify the degradation trend, while CFS can. In Figure 18, the four different CV curves can be used to  identify the differences and can be combined to divide the state by using different clustering center points.
(2) In Figure 27, not all curves show a monotonous increasing and decreasing trend, such as the SHE curves for bearings 11, 13, and 15. e PE values for bearing 11 are similar to those of SHE, while all CV curves in Figure 18 are monotonous increases and decreases. e noise in Figures 28 and 29 is obvious, for example, in Figure 29(b). (3) For PE, starting from the 200 th point, the curves for bearings 12 and 14 are close to stable in Figure 28, but the degradation trend for bearing 12 is from Normal to Severe around the 820 th point, not around the 200 th point. In addition, bearing 14 has three degradation statuses, but after the 200 th point, the curve is stable until the end, as in Figure 28. e status of bearing 15 is similar to that of bearings 12 and 14.
To further demonstrate that the denoising effect of SDAE is good, we also use monotonicity (Mon) index. Mon uses the difference of any two adjacent HI points to calculate and assess the monotonicity of extracted HI [41]. If the difference value by using two adjacent HI data points is greater than 0, then the HI curve rises monotonically and vice versa. If the curve rises and falls monotonically within short spans, then the HI curve has significant noise and oscillations. e calculation of Mon is as follows: where dF is the difference between any two adjacent HI points. e closer Mon is to 1, the better the performance is [41][42][43][44][45][46][47][48][49][50][51][52][53]. We take bearing 12 as an example; the Mon result of different models is shown in Table 2. Table 2 shows that the Mon of SDAE-CFS is higher than that of other models. e SDAE sets the input data of each hidden layer according to the denoising rate and then reconstructs the input data. erefore, the SDAE can denoise the extracted HI curve well.

Conclusions
e original vibrations over the entire life of the bearings under different conditions were used as the input for the SAE and SDAE to extract the degradation trend and directly reduce the dimension of the extracted feature to two without PCA. e results demonstrate that the SDAE was more robust and had better feature extraction performance than the SAE. CFS was then implemented to find the available clustering center point, which was used to assess the health status through the CV index without data labeling or prior knowledge. To verify the performance of the proposed method, it was compared with other combination models, such as EEMD-SVD-k-medoids/k-means/FCM, RMS, kurtosis, SHE, AE, PE, and PCA. e experimental results confirmed that the SDAE-CFS was more robust and stable than the other models. Finally, we also use the Mon index to evaluate the denoising effect of different models. e larger the value of Mon, the smaller the noise of the extracted HI curve, and vice versa. e model proposed in the article can significantly increase the value of Mon by a quantitative level, as shown in Table 2. SDAE-CFS can improve the value of Mon from two decimal places to one decimal point.

Data Availability
Previously reported bearing data were used to support this study and are available at https://ti.arc.nasa.gov/tech/dash/ groups/pcoe/prognostic-data-repository/.  Conflicts of Interest e authors declare that they have no financial and personal relationships with other people or organizations that can inappropriately influence their work.
ere is no professional or other personal interest of any nature or kind in any product, service, and/or company that could be construed as influencing the position presented in, or the review of, the manuscript.