Fault Diagnosis of Rolling-Element Bearing Using Multiscale PatternGradient SpectrumEntropyCoupledwith Laplacian Score

Feature extraction is recognized as a critical stage in bearing fault diagnosis. Pattern spectrum (PS) and pattern spectrum entropy (PSE) in recent years have been smoothly applied in feature extraction, whereas they easily ignore the partial impulse signatures hidden in bearing vibration data. In this paper, the pattern gradient spectrum (PGS) and pattern gradient spectrum entropy (PGSE) are firstly presented to improve the performance of fault feature extraction of two approaches (PS and PSE). Nonetheless, PSE and PGSE are only able to evaluate dynamic behavior of the time series on a single scale, which implies there is no consideration of feature information at other scales. To address this problem, a novel approach entitledmultiscale pattern gradient spectrum entropy (MPGSE) is further implemented to extract fault features across multiple scales, where its key parameters are determined adaptively by grey wolf optimization (GWO). Meanwhile, a Laplacian score(LS-) based feature selection strategy is employed to choose the sensitive features and establish a new feature set. Finally, the selected new feature set is imported into extreme learningmachine (ELM) to identify different health conditions of rolling bearing. Performance of our designed algorithm is tested on two experimental cases. Results confirm the availability of our proposed algorithm in feature extraction and show that our method can recognize effectively different bearing fault categories and severities. More importantly, the designed approach can achieve higher recognition accuracies and provide better stability by comparing with other entropy-based methods involved in this paper.


Introduction
With the rapid progress and improvement of the level of industrial modernization, large-scale mechanical equipment has been widely used in electric power, petrochemical, aerospace, marine, and other fields, including generator, wind turbine, compressor, aeroengine, and hydroturbine [1]. Rolling bearings are regarded as an important joint of modern mechanical equipment; due to the influence of various factors (e.g., manufacturing error, material defect, improper installation, inadequate lubrication, poor sealing, high speed, and heavy load working), its running state will inevitably change and even appear with various faults (e.g., wear, notch, scratch, smudginess, and fretting corrosion). Microlesion emerged in rolling bearing will bring great security risks to enterprises directly, light will cause some economic losses and heavy will result in personnel casualties [2][3][4][5]. On the one hand, owing to the unstable operation condition of mechanical system and the impact of some nonlinear elements (e.g., friction, clearance, and stiffness), the practical bearing fault data has the properties of nonlinear, nonstationary, and low signal-to-noise ratio (SNR), which implies that it is difficult to make effective diagnosis directly through fast Fourier transform (FFT) and its variants [6]. On the other hand, according to relevant data statistics, in the key parts (e.g., bearing, gear, and rotor) of mechanical equipment, the equipment failure rate caused by the damage of bearing is up to 30%, about 20% of the faults in the gearbox are caused by the damage of bearing, and the damage rate of bearing in the motor is up to 40%, which indicates that rolling bearing is considered to be one of the most easily damaged parts in mechanical equipment. Accordingly, exploring efficient damage detection approach is an urgent and challenging task in engineering application.
Mathematical morphology (MM) is a nonlinear signal processing algorithm and known for its simplicity and practicality [7]. Currently, there are many applications of morphological filter in mechanical condition monitoring, including feature extraction and intelligent fault diagnosis, where feature extraction is a critical stage in intelligent fault diagnosis of rolling-element bearing. For instance, Bai and Zhou [8] put forward a modified top-hat transformation to capture periodic impulse characteristics related to bearing fault. Dong et al. [9] proposed a modified morphological method based on the average operator of opening and closing and signal-to-noise ratio (SNR) to extract bearing fault feature information. Li and Liang [10] presented a continuous-scale mathematical morphology to identify impulsive fault feature information located in the optimal scale band. Shen et al. [11] presented a fast and adaptive varying-scale morphological analysis approach, and an effective detection result can be obtained. Li et al. [12] proposed a diagonal slice spectrum assisted optimal scale morphological filter for bearing feature extraction and achieved a good diagnosis result. Lv and Yu [13] proposed an average combination difference morphological filter (ACDIF) and combined Teager energy kurtosis to select the length of structuring element (SE) to extract bearing fault feature.
On the other hand, the studies of the MM-based intelligent fault diagnosis approach are also universal (e.g., one-dimensional adaptive rank-order morphological filter [14], the combination of morphological filter and k-nearestneighbor classifier [15], the combination of morphological operators and fuzzy inference [16], the combination of MM and support vector machine [17], the combination of morphological filter and local tangent space alignment [18], and the combination of morphological filter and grey relational degree [19]). Meanwhile, most MM-based fault diagnosis studies focus on three aspects including morphological fractal dimension [20], morphological particle [21], and pattern spectrum (PS) [22]. ereinto, PS can reveal shape characteristics of bearing vibration signal based on the morphological opening or closing operation with multiscale SE, while pattern spectrum entropy (PSE) is defined based on PS, which is regarded as a newly spawned index of complexity estimation [23]. At present, some applications with respect to the PS and PSE in fault detection have been conducted. Hao et al. [24] applied PSE to extract the feature vectors from bearing fault signal, and then the extracted fault feature is imported into support-vector machine (SVM) to complete the identification of different fault types of bearing. Zhang et al. [25] combined local mean decomposition (LMD) and pattern spectrum to extract fault signatures and recognize different health conditions of bearings. Zheng et al. [26] firstly adopted ensemble empirical mode decomposition (EEMD) to decompose vibration signal into several components named intrinsic mode function (IMF), then calculate PS of the first three IMF to build a feature set, and finally apply the SVM classifier to provide the identification results of different health conditions of bearing. However, the traditional PS and PSE are developed based on morphological opening or closing operation. Obviously, morphological opening or closing operation is very suitable for extracting unilateral fault pulse characteristics of rolling bearing, but it is easy to neglect feature information hidden on another side [27]. According to theoretical knowledge of MM available, it can be known that the morphological gradient operation (MGO) can extract effectively bilateral impulse signatures (the positive and negative direction) from bearing fault signal [28,29]. Meanwhile, MGO has the advantages of noise reduction and detail preserving. erefore, based on MGO and PSE, a modified indicator called pattern gradient spectrum entropy (PGSE) is introduced in this paper, which can improve fault feature extraction performance of PSE and measure dynamic change of various time series. Unfortunately, whether PSE or PGSE, they assess the complexity of a time series within a single scale, which indicates that the relevant characteristics on other scales are ignored. Consequently, it is necessary to design an effective nonlinear dynamic algorithm to address this issue.
Multiscale entropy (MSE) proposed by Costa et al. [30] is an effective algorithm for estimation of complexity of signal, which can obtain different feature information over multiple scales by applying a coarse-grained procedure and overcome the weakness of inadequate feature extraction of single-scale entropy. Besides, MSE has been utilized in various fields including biomedical engineering [31], EEG data denoising [32], and mechanical fault detection [33]. For instance, Zhang et al. [34] firstly applied improved multiscale entropy to extract fault features, and then the extracted features are taken as the input of SVM classifier for achieving the identification of different health conditions of bearing. Liu and Han [35] firstly adopted LMD to separate bearing fault signal to a series of product function (PF) and then take multiscale entropy of PF as the inputs of BP neural network to recognize different fault categories of rolling bearing. Hsieh et al. [36] firstly used empirical mode decomposition (EMD) to decompose a vibration signal into several IMF components and then adopted the MSE curve of IMF component to identify the local fault of high-speed spindle. Nevertheless, the conventional MSE easily appears as an undefined entropy value for short-term time series and possesses an underlying instability with the increase of the scale factor. Furthermore, computational efficiency of traditional MSE is not high, so it will cost significantly for the application of real-time condition monitoring of industrial scene [37]. Consequently, to solve these problems existing in traditional MSE and capture more comprehensively fault symptoms over multiple scales, through the combination of the coarse-grained procedure and two entropies (i.e., PSE and PGSE), two new multiscale entropies (i.e., multiscale pattern spectrum entropy (MPSE) and multiscale pattern gradient spectrum entropy (MPGSE)) are proposed in this paper to quantify the complexity of time series, which has the performance of multiresolution analysis and can extract fault characteristics over a range of scales.
As is known to all, because the large feature dimension easily leads to the low computational efficiency and even brings about dimension disaster, it is required for feature selection after using entropy-based method to extract multiscale features [38]. Laplacian score (LS) is a newfangled feature selection approach [39], which estimates the importance of different features according to the score of the alternative features and then complete effective selection of sensitive features. Compared with some other feature selection approaches (e.g., Fisher score [40] and data variance [41]), the LS approach is simpler and easier to understand and has advantages of reserving immensely local and global structure information. Hence, the LS approach in this paper is employed to feature selection, which is aimed at reducing information redundancy and improving calculating efficiency of the whole fault diagnosis. A novel sensitive feature set can be obtained after using the LS algorithm for feature selection. Ultimately, in order to realize automatically the health status identification of bearings, an appropriate classification model needs to be employed. Existing classification models are divided into supervised and unsupervised approaches. Generally speaking, the supervised method with the mapping relation between the sample and label can achieve better classification results than the unsupervised method without the label information [42]. Various supervised classifiers in recent years are devoted to fault detection in rolling bearing, such as artificial neural network (ANN) [43], SVM [44], and Bayesian classification model [45]. However, these approaches are equipped with some shortcomings. For example, the ANN is easy to result in a local minimum, slow convergence speed, and overfitting due to the empirical risk minimization principle. Two important parameters (i.e., penalty factor c and kernel parameter g) of SVM are demanded to be defined in advance and have the very tremendous influence on its classification ability. e Bayesian classification model needs to assume that the distribution is independent, but complete independence distribution is difficult to satisfy in reality. Compared with the supervised approach above, extreme learning machine (ELM) has some advantages, including the higher computational efficiency, stronger generalization ability, and lesser human intervention [46]. Hence, in this paper, the selected new feature set is regarded as the input of ELM to identify different fault patterns of rolling bearing. In a word, the main contributions and originality of this paper are to put forward a novel bearing intelligent diagnosis scheme based on MPGSE and LS. Compared with the previous MSE, the proposed MPGSE method has some merits, which are mainly expressed in three aspects. Firstly, because of the simpleness of morphological operation, the calculation of MPGSE is faster than MSE, especially for long-term time series. Secondly, MPGSE can avoid the undefined entropy value existing in MSE and provide a more reliable and accurate estimation of complexity of the nonlinear signal.
irdly, MPGSE is relatively insensitive to noise and is more stable than MSE in assessing the irregularity of the signal. Of course, the proposed MPGSE also has some areas for improvement. e biggest drawback is that its computational efficiency still needs to be improved if MPGSE is applied to real-time health monitoring of mechanical equipment under variable conditions, which is planned for the focus of future research work. Specific novelties of this paper are summarized as follows: (1) A modified method called PGSE is introduced for improving dynamic change detection of the signal (2) A neoteric approach termed as MPGSE is developed to extract more abundant features over multiple scales (3) e LS-based feature selection strategy is employed to choose sensitive characteristics with a higher discrimination (4) Simulation data and experimental examples are performed to verify the feasibility of our presented approach e remainder of this paper is organized as follows. Section 2 reviews the theory of PS and PSE. Besides, the concept of PGS and PGSE is presented in Section 2 and simulation analysis is conducted to compare their performance. In Section 3, a new method named MPGSE in which coarse-grained procedure and PGSE are combined is proposed. Section 4 describes the flowchart of the proposed approach in detail. In Section 5, two experimental cases are analyzed to show the effectiveness of the proposed approach. Finally, conclusions and some future works are given in Section 6.

Multiscale Morphological Operation.
For a given signal f(n), set morphological operation g as a unit SE; λg is the SE at scale λ and is defined as where ⊕ represents dilation operation. Multiscale dilation and erosion are expressed as where Θ denotes erosion operation. Multiscale opening or closing operation can be obtained by Correspondingly, multiscale morphological gradient operation (MGO) can be written as [47] Complexity 3 (4)

PS and PSE.
If f(n) is a given one-dimensional signal and g(m) is the SE, PS is defined as where λ is the SE scale, A(f) is a finite area in the domain of definition, and ∘ and • are the opening and closing operation, respectively. When λ ≥ 0, equation (5) is called as PS for the open operation and denoted by PS + (f, λ, g), while when λ < 0, equation (5) is called as PS for the closing operation and denoted by PS − (f, λ, g). PS is usually composed of the positive and negative intervals, where PS + (f, λ, g) represents intrinsic information of the signal and PS − (f, λ, g) represents background information of the signal. Because the positive and negative intervals are consistent, PS − (f, λ, g) is usually used for calculating the PS. If f(n) is an one-dimensional discrete signal, equation (5) is rewritten as where A(f) � n f(n). According to the definition of Shannon entropy, PSE is defined as 2.3. PGS and PGSE. PS and PSE are defined based on the opening or closing operation, whereas the opening or closing operation only can extract negative impulses, which is easy to ignore some key information. Because MGO can disclose simultaneously the positive and negative impulse sequences, PGS and PGSE are presented in this section by combining MGO and two methods (PS and PSE). If f(n) is an one-dimensional signal and g(m) is the SE, PGS is expressed as where A(f) � n f(n). According to the definition of Shannon entropy, PGSE is defined as Apparently, the larger the PGSE is, the more the complexity and uncertainty of the signal is.

Simulation Analysis.
To verify the effectiveness of PGS and PGSE, a simplified model with bearing local fault is established as follows: where t 0 � mod(k/f s , 1/f m ), k � 0, 1, 2, . . . , 4999, α and f c denote attenuation coefficient and carrier frequency of the system, respectively, and f s and f m denote sampling frequency and fault frequency of the signal, respectively. Because the noise contained in the practical measured data is often colored noise, r(t) in the above fault model should be considered as a nonwhite noise (i.e., pink noise). To study more comprehensively the efficacy of the proposed method, here we investigate two cases (i.e., simulation signal containing a white noise/nonwhite noise (pink noise)). Specifically, in equation (11), the sampling frequency f s � 10 kHz, the sampling length L � 5000 points, the attenuation coefficient α � 800 rad/s, the carrier frequency f c is set to be 2000 Hz, 3000 Hz, and 4000 Hz, respectively, r(t) is a white noise/nonwhite noise, and the fault frequency f m is set to be 60 Hz, 120 Hz, and 180 Hz, respectively. Hence, three simulation signals containing a white or nonwhite noise can be obtained, which are, respectively, considered as the vibration signal generated by the outer race fault (ORF), inner race fault (IRF), and ball fault (BF) of rolling bearing. Figure 1 shows temporal waveform of three simulation signals under two noises (i.e., white noise and pink noise). Apparently, the familiar SE includes the flat, triangle, and semicircle. e flat SE is simple in structure and its computing time is lesser compared with the triangle and semicircle SE, so the flat SE g � [0, 0, 0] is selected directly to perform PGS and PGSE. In the process of calculating the PGS and PGSE, the largest SE scale λ max is set as 20.

MPSE and MPGSE
3.1. Multiscale Entropy. MSE presented by Costa et al. [30] can generate a sequence of scale time series and effectively describe the irregularity of the nonstationary signal. To be specific, MSE is considered as the set of sample entropy of a time series at different scales, but MSE easily cause the undefined value for short-term time series. e concept of MSE is summarized as follows: (1) For a given time series the coarsegrained time series at different scale factors τ can be obtained by equation (12). Taking scale factor τ � 2 and τ � 3 as an example, the corresponding coarsegrained procedure is displayed in Figure 4. As shown in Figure 4, coarse-grained procedure of MSE is briefly interpreted as the process of averaging the raw time series in the window with a length of τ, and the downsampling is subsequently carried out at a scale factor of τ: where τ � 1, 2, . . . denotes the scale factor. Specifically, when τ � 1, the coarse-grained time series is equivalent to the raw signal. For τ > 1, the raw signal is separated into τ coarse-grained time series with a length of ⌊N/τ⌋, where ⌊ • ⌋ denotes the round-off number.
(2) Computing sample entropy of each coarse-grained time series y (τ) j (1 ≤ j ≤ N/τ) and then plotting whole sample entropy as a function of scale factor τ, where m means the embedding dimension, r represents the similarity tolerance, and τ denotes the scale factor. e flowchart of the MSE algorithm is shown in Figure 5, where τ m represents the maximum scale factor.

MPSE and MPGSE.
To extract different fault features over multiple scales and avoid the drawbacks of the undefined entropy value existing in MSE, two new multiscale Complexity entropies (i.e., MPSE and MPGSE) are proposed by combining the coarse-grained process and two entropies (i.e., PSE and PGSE), which are defined as follows: the coarse-grained time series at different scale factors τ is obtained using the following formula: where τ max stands for the defined largest scale factor. Note that τ max coarse-grained time series with the length of ⌊N/τ⌋ are obtained when τ � τ max , where ⌊ • ⌋ denotes the round-off number.
(2) Compute PS and PGS of each coarse-grained time series where λ � 0, 1, . . . , λ max and λ max is the largest SE scale.  6 Complexity where λ means the SE scale and τ denotes the scale factor. Figures 6(a) and 6(b) describe the flowchart of two methods (   (i.e., MPSE and MPGSE) are introduced to deal with nonlinear and nonstationary data. Firstly, because the flat SE is simple in structure and its computational efficiency is higher than that of the triangle and semicircle SE, we select the flat SE in this study. Moreover, according to properties of multiscale morphological analysis, for the selection of SE scale λ, if λ is selected as too large, two multiscale entropies (i.e., MPSE and MPGSE) have a good noise immunity, but they will have poor capability of preserving detailed information and are more time-consuming. Conversely, if λ is selected as too small, two multiscale entropies (i.e., MPSE and MPGSE) have a superior detail-preserving ability and high calculating efficiency, but their antinoise property is inferior. Hence, based on these facts, we suggest the flat SE scale λ � 1 ∼ f s /f g − 2 for two multiscale entropies (i.e., MPSE and MPGSE), where ⌊ • ⌋ is the round-off number, and f s and f g are sampling frequency and fault frequency, respectively. Here, there are several reasons for the choice of the flat SE scale. On the one hand, according to Dong's suggestion [9], the maximal analysis length of flat SE is normally set as f s /f g , which is meaningful and can cover feature information of one fault repetition period completely. On the other hand, according to [48,49], the relationship between the length and scale of flat SE satisfies L � λ + 2. at is, the maximal analysis scale of flat SE can be set as f s /f g − 2. erefore, we recommend the range of the flat SE scale is located between 1 and f s /f g − 2, which can achieve a tradeoff between noise reduction and detail preserving.
Secondly, according to related properties of MSE, for the selection of scale factor τ, if τ is set as too small, two multiscale entropies (i.e., MPSE and MPGSE) will not be able to extract complete and comprehensive fault feature signatures [50]. On the contrary, if τ is set as too big, two multiscale entropies (i.e., MPSE and MPGSE) will give rise to the instability entropy value at the larger scale factor. Besides, the bigger τ will increase computation time of two multiscale entropies (i.e., MPSE and MPGSE). Consequently, according to the recommendation of [51], the maximum scale factor τ max is usually chosen as 20, which can achieve an accurate evaluation of complexity of the signal and is sufficient to handle the actual data.

Comparison between MPGSE and MPSE Using Simulation
Signal. To validate the efficacy of the presented MPGSE algorithm, two stochastic signals (white noise and 1/f noise)  Figure 7, the 1/f noise has stronger complexity compared with white noise due to the long-range correlation properties. Meanwhile, white noise is more uncertain compared with 1/f noise. In other words, the entropy value of white noise is theoretically larger than that of 1/f noise. Firstly, the MPGSE and MPSE are applied to deal with the white noise and 1/f noise over 20 scales and the acquired results are displayed in Figure 8. It can be found from Figure 8 that the entropy curve obtained using MPGSE is more stable and smoother than that obtained using MPSE. In addition, for MPGSE and MPSE, the entropy curve of 1/f noise is greater than that of white noise across most scales, which are consistent with the intuitive result of spectrum analysis. Preliminary results of the comparison show that MPGSE can give a stable estimation of complexity of nonlinear and nonstationary time series, which indicates MPGSE is suitable for detecting dynamic variation of the complex signal.
To further compare the performance of the MPGSE and MPSE, we apply the MPGSE and MPSE to analyze 100 independent white noise and 1/f noise, each of which contains 2000 data points. Figure 9(a) shows the error bar of white noise for two entropies (MPGSE and MPSE), while Figure 9(b) shows the error bar of 1/f noise for two entropies (MPGSE and MPSE). As can be seen from Figures 9(a) and 9(b), the mean value curve of MPSE has bigger fluctuations than MPGSE. In addition, the standard deviation (SD) of MPGSE at each scale is smaller than that of MPSE, which implies MPGSE can offer a more accurate and stable estimation of entropy.
To investigate the impact of the SE scale λ on MPGSE, we calculate two entropies (MPGSE and MPSE) of two noise signals (white noise and 1/f noise) on different SE scales λ, and Figure 10 shows the obtained results. It is obvious in Figure 10 that the MPSE curve of two noise signals fluctuates significantly, whereas the MPGSE curve of two noise signals has a stabilization changing trend, which indicates that MPGSE overmatches the MPSE method in estimation accuracy of entropy. Besides, as the SE scale increases, the entropy value of two noise signals at the same scale factor gradually grows larger. According to multiscale morphological theory, the larger SE scale implies the more computing time. Consequently, to make a tradeoff between reliable estimation and computing efficiency, according to [48], the SE scale λ should not be set as too large or too small and its value is commonly recommended as 1 to f s /f g − 2, where ⌊ • ⌋ is the rounding operation and f s and f g are the sampling frequency and fault frequency, respectively.
To study the effect of data size of the signal on MPGSE, we apply MPGSE and MPSE, respectively, to analyze two noise signals (white noise and 1/f noise) containing five data sizes (i.e., N � 2000, 4000, 6000, 8000, and 10000). Figure 11 reveals the analyzed results of two noise signals under different data sizes. As shown in Figure 11, as the data size Coarse graining series y (τ) Complexity 9 increases, the results of MPGSE basically remain unchanged, which indicates MPGSE is barely affected by data size of the signal. us, in this case, the signal with a data size of 4000 points is usually sufficient for the implementation and calculation of MPGSE. However, in practice, it should be noted that we may need more data size for more accurate estimation of complexity of the signal. Considering that there are no specific criteria to accurately determine the data size of the signal, we intend to choose different data sizes to evaluate the complexity of signals according to the specific signals, which is seen as the focus of our future research.
Besides, it can be observed in Figure 11 that the results of MPGSE have better stability than MPSE. Namely, MPGSE has more superior performance in complexity estimation of signal compared with MPSE.
To evaluate the computational efficiency of two entropies (MPGSE and MPSE), Tables 2 and 3 list the CPU running time of MPGSE and MPSE for two noise signals containing different data sizes, respectively. As shown in Tables 2 and 3, the CPU time of MPGSE is slightly smaller than that of MPSE. In addition, when using MPGSE and MPSE to detect the dynamic mutation of the signal, the bigger data length in Tables 2 and 3 implies the more CPU time and a smaller CPU time indicates the better feature extraction ability. Hence, according to the overall comparison result above, in this paper, MPGSE is utilized to extract multiscale fault features from the raw vibration data in priority.

The Proposed Fault Detection Scheme
4.1. Laplacian Score for Feature Selection. As is well known, a high-dimensional feature space will be constructed when MPGSE is applied to excavate the fault signatures across 20 scales. Nevertheless, the more features represent the more information redundancy and the lower computing efficiency. Laplacian score (LS) is an effective feature selection algorithm which can refine the sensitive feature information from the acquired multiscale features and improve fault classification accuracy. Hence, this paper selects LS for feature selection.
eories of LS are summarized as follows.   Assume that m data samples are collected and each data sample contains n features. Suppose that L r indicates the Laplace score of the r-th feature, where r � 1, 2, . . . , n, and f ri indicates the r-th feature of the i-th sample, where i � 1, 2, . . . , m. Specific procedure of LS for feature selection is as follows: (1) Establish a nearest neighbor graph G with m nodes, where the i-th node is corresponding to x i . If x i and x j are "close," such as x i is among k-nearest neighbors of x j or x j is among k-nearest neighbors of x i , there are edge joins; otherwise, there is no edge join. When the node label is known, one can connect a line between two nodes of the same label. (2) If nodes i and j are connected, weight matrix S ij of the graph models is defined as where t represents a befitting constant. Otherwise, S ij � 0.
(3) For the r-th feature, define f r as where the matrix L is known as the Laplacian matrix of the graph G. To avoid the influence of some dimensional data with significant difference on construction of the neighbor graph, each feature is averaged by (4) Obtain Laplacian score of the r-th feature as where Var(f r ) represents the variance of the r-th feature.
According to a previous research study, the greater the S ij value is, the smaller (f ri − f rj ) 2 value is and the smaller the Laplacian score L r is, which indicates the smaller the difference between adjacent samples in this feature is, the stronger the local information preserving ability of this feature is. Besides, the greater Var(f r ) value represents the smaller Laplacian score L r , which means the larger the difference between different samples of this feature is, the better the classification performance of this feature is. In concrete terms, a superior and important feature is equipped with a small LS value. Hence, for feature selection, we first calculate the LS value of each feature, then rank these LS values from low to high, and finally choose the first several features containing smaller LS values as the sensitive features to fault pattern identification.

ELM for Fault Pattern Identification.
After using the LS algorithm for feature selection, an appropriate method needs to be adopted to identify automatically fault categories and severities of bearing. ELM proposed by Huang et al. [52] is an effective pattern recognition method, which can avoid some shortcomings (e.g., the local minimum, improper learning rate, and overfitting) of the conventional feedforward neural network. Hence, this paper selects ELM for bearing fault pattern identification.
eories of ELM are summarized as follows.
Suppose that N different training samples x j , y j , j � 1, 2, . . . , N are available, where x j � [x j1 , x j2 , . . . , x jd ] T ∈ R d is the network input vector and t j � [t j1 , t j2 , . . . , t js ] ∈ R s denotes the target output vector. e mathematical model of the ELM model containing L single hidden layer nodes is expressed as 1, 2, . . . , N, i � 1, 2, . . . , , L), (21) where g(·) represents the sigmoid activation function, ω i � [ω i1 , ω i2 , . . . , ω i d ] T indicates the weight vector between the input layer and the i-th hidden layer neuron, β i � [β i1 , β i2 , · · · , β is ] T represents the connection weight of the i-th hidden node to the output layer, b i denotes the bias of the i-th hidden node, and o j implies the output of the ELM model of the j-th sample. Another form of equation (21) can be described as the following equality: where β � [β 1 , β 2 , . . . , β L ] T represents the output weight matrix, T � [t 1 , t 2 , . . . , t N ] T denotes the target matrix, and H indicates the hidden layer output matrix of network, which is given by e purpose of training of the ELM model is to find the parameters β which can reduce the error between output matrix and target matrix to minimum. e output weight β is also expressed as where H + is the Moore-Penrose generalized inverse matrix of H. It can be seen that there is no need to select the input weight and bias of the hidden layer in ELM. Moreover, in ELM, the problem of local optimal solution will not appear. Hence, in this paper, it is feasible to adopt ELM for pattern recognition after feature extraction and selection.

e Proposed Fault Detection Scheme.
To identify different health status of rolling bearing accurately, a novel fault diagnosis scheme based on MPGSE and LS is presented. e flowchart of our algorithm is illustrated in Figure 12, which mainly consists of three steps (i.e., multiscale fault feature extraction, feature selection, and fault classification). Overall procedures of our algorithm are given as follows:

Complexity 13
Step 1: bearing vibration data are collected from a mechanical fault simulator by using a data collection unit Step 2: the MPGSE algorithm is proposed to obtain multiscale fault signatures from the collected bearing vibration data. Concretely, in this step, to overcome the  Figure 12: Flowchart of the proposed algorithm.
disadvantage of relying on experience to select the parameters of MPGSE, a novel intelligent optimization algorithm termed as grey wolf optimization (GWO) is firstly employed to determine self-adaptively two important parameters (i.e., SE scale λ and scale factor τ) of MPGSE. Subsequently, MPGSE with the optimized parameters is applied for multiscale feature extraction. Figure 13 shows the flowchart of parameter optimization of MPGSE. A specific optimization procedure is described as follows: (1) Input data sample, initialize the population X i (i � 1, 2, . . . , m), and set the parameters of GWO. Concretely, define population size of wolves m � 30 and maximum iterations T � 10. Because the preoptimization parameters only involve two variables (λ, τ), each wolves are expressed as X i � (x λ , x τ ), i � 1, 2, . . . , m, where x λ and x τ represent the SE scale and scale factor, respectively. (2) Calculate and compare the fitness value according to the misclassification rate between the number of misclassified samples and the amount of training samples, find the optimal position X i best of the individual grey wolf, and determine the global optimal position X g best of whole wolves.
(3) Update the position of each wolf in terms of grey wolf movement pattern shown in the following equation: where A denotes the convergence factor and meets A � 2d · r 1 − d, C denotes the swing factor and meets C � 2 · r 2 , d denotes the range control parameter which linearly decays from 2 to 0 over whole iteration, and r 1 and r 2 are the random numbers between 0 and 1. (4) Determine whether the stopping condition is satisfied. Concretely, judge whether the current iterations are less than maximum iterations (i.e., t ≤ T) or whether the minimum error is small enough. If iteration conditions are satisfied, stop the iteration and output the best wolves X best (i.e., the best parameters x best λ and x best τ of MPGSE). Otherwise, let t � t + 1 and go back to Step 2 to continue to run until the iteration condition is satisfied.
Step 3: the LS method is applied to calculate the score of each feature and sort the obtained fault feature according to the size of Laplacian score of each feature, and then establish a new low-dimensional eigenvector on the basis of the first four sensitive features containing the lower Laplacian score.
Step 4: the obtained low-dimensional eigenvector is fed into the ELM for recognizing different fault categories and severities of rolling bearing, and the final diagnosis results and reports are given automatically.

Experimental Investigation
As everyone knows, each algorithm has advantages and disadvantages. In other words, each algorithm has certain limitations in the application scenario and sphere. Given this, in this subsection, two experimental cases are conducted, respectively, to show the effectiveness of our proposed algorithm in detecting different working conditions of rolling bearing. Meanwhile, the proposed algorithm is also compared with other entropy-based methods to reveal the benefits of our proposed algorithm.

Description of the Dataset.
e presented algorithm is first used to process bearing vibration data collected from Case Western Reserve University (CWRU) [53]. Laboratory equipment and its sketch are displayed in Figure 14, which is mainly composed of a three-phase induction motor (left), a torque transducer (middle), and a load motor (right). During this experiment, four loads (0, 1, 2, and 3 hp) are added separately to the test bearing. Single-point faults with four sizes (0.007, 0.014, 0.021, and 0.028 inches) were manufactured, respectively, on normal bearing by using electric discharge machining. Figure 15 is the description of data type of bearing under different health conditions. As shown in Figure 15, the inner race, outer race, and ball fault is abbreviated to "IR," "OR," and "B," respectively. e number after the abbreviation indicates the fault size, for example, "7" in IR7 denotes the damage size of 0.007 inches. e test bearing is located at the motor drive end. e accelerometer is mounted on the 12 o'clock position of the drive end of motor housing for collecting the vibration data. All data samples were obtained at a sampling frequency of 12 kHz. Table 4 lists the detailed specification of testing bearing. In order to verify the recognition effect of our algorithm on bearing fault type and degree, three datasets (A, B, and C) under 1797 rpm are investigated, which are illustrated detailedly in Table 5. Concretely, in this case, we only investigated three kinds of bearing faults (i.e., bearing inner race, outer race, and ball fault). Each fault is manufactured on normal bearing by using electric discharge machining. e fault size of each fault is divided into three kinds (i.e., 0.007, 0.014, and 0.021 inches), where 1 inch roughly equals 25.4 mm. Besides, the bearing outer race fault is fixed, so the bearing outer race faults located at 6 o'clock (orthogonal to the load zone) are only analyzed in this experiment. A more specific description of the bearing fault can be found in [53]. For data collection, 200 samples for each dataset are obtained through a 2048-point nonoverlapping window. For each working state (i.e., normal, IR fault, OR fault, and ball fault), there are total 50 samples, where 25 data samples under each working status are selected randomly for training and the remainder 25 samples are taken as testing. erefore, for each dataset, 100 training samples and 100 test samples are established. Taking A dataset as an example, Figure 16 shows the temporal waveforms and their corresponding frequency spectra of four running status. As shown in Figure 16, although fault type of bearing can be identified using time domain and frequency domain analysis, this approach is not automatic and has high requirements for professional knowledge.
us, it is necessary to apply the intelligence identification method to process these vibration data.

Experimental Results and Analysis.
Taking the A dataset as an example, the presented algorithm is firstly applied to analyze the A dataset. In the first step, the GWO method is firstly employed to select automatically the parameters of MPGSE and MPSE as λ � 4 and τ � 20. Meanwhile, MPGSE with the optimized parameters is used to extract the multiscale fault feature of the A dataset and obtain a high-dimensional feature matrix with a size of 200 × 20. Figure 17 shows the PS and PGS curve of bearing vibration signal under different health conditions. It is obvious from Figure 17 that the PGS curve of four working conditions possesses higher distinguish degree than the PS curve. Besides, the PGS curve has better stability in feature extraction. en, the LS algorithm is employed to sort the obtained fault feature from a low score to a high score, and the rank situations of the obtained features are shown in equation (27). According to equation (27), the first four features τ � 2, 1, 4, and 3 with lower score are selected as important features to establish a new fault feature set. Figures 18(a) and 18(b), respectively, show the two-dimensional distribution of MPGSE of different health conditions before and after applying LS. Finally, the selected new feature set is fed into the ELM to identify different health conditions of bearings. Figure 19 gives the diagnosis results of our method for the A dataset. As can be seen in Figure 19, our method can achieve an accuracy rate of 100%, which validates the feasibility and effectiveness of our method in intelligent fault detection of bearings.
To illustrate the superiority of our proposed approach, the comparisons among three methods (MPGSE and LS; MPSE and LS; and MSE and LS) are performed by analyzing the same experimental data. For each algorithm, 20 trials are conducted to avoid the randomness of diagnostic results. For the sake of fairness of comparison, the important parameters of MSE are also determined by the GWO method, which are, respectively, determined as m � 4, τ � 20, and r � 0.15σ, where σ is the standard deviation of the original signal. Figure 20 plots the identification results obtained by different methods, and the detailed diagnosis results are listed in Table 6, including the maximum, minimum, mean, and       Complexity standard deviation (SD) of accuracies and the average CPU time. As can be seen, our method has the highest recognition accuracy (100%). Besides, the proposed approach has the lowest SD and CPU time, which indicates our method has better stability and higher computing efficiency. Experimental results show that our method is effective in identifying different fault patterns of bearing.
To validate the effectiveness of combining MPGSE and LS, Figures 21(a)-21(c) show plotting of the first two most important features obtained by the three methods (MPGSE and LS; MPSE and LS; and MSE and LS), respectively. As can be seen, the sensitive features selected by the LS method are discriminated clearly. Besides, feature aggregation and differentiation of the proposed algorithm are superior to that of other two methods, which means that the LS algorithm can select fault features with higher discrimination. As a comparison, two randomly selected features in the three methods (MPGSE, MPSE, and MSE) are shown in Figures 22(a)-22(c), respectively. From Figure 22, we can find that distinguish ability of the features selected randomly is not good enough, which indicates that it is essential to apply LS to refine the sensitive features.
To show the efficacy of applying the LS method, 20 trials are conducted in the three methods (MPGSE, MPSE, and MSE) without LS-based feature selection. Specifically, four random features (τ � 1, 3, 7, and 9) in a multiscale feature set are directly inputted to an ELM classifier to identify different fault patterns of bearing. Table 7 shows the classification results of the three methods (MPGSE, MPSE, and MSE). As can be seen in Table 7, the average accuracy of MPGSE without using LS is 99%, which is slightly lower than the average accuracy obtained using MPGSE and LS, which indicates the advantage of applying the LS method is not obvious. Here, we give a specific explanation for this issue. Concretely, in case 1, the difference between benchmark bearing data under different fault patterns is relatively obvious, which indicates the complexity of bearing vibration data under different fault patterns is diverse from each other (i.e., MPGSE features of different bearing vibration data have a great difference), so it becomes easier to distinguish. In other words, for the benchmark bearing data, the inner-class distance in the sample of the same category is small enough and the between-class distance in the sample of the different category is large enough, so the average accuracy obtained by MPGSE with and without LS has a small difference. In addition, it is important to note that benchmark bearing vibration data in case 1 suffer from less noise interference compared with the actual vibration data and MPGSE of benchmark data can be identified effectively whether we use LS or not.
is further verifies the powerful feature extraction ability of MPGSE. Nonetheless, from another point of view, the average accuracy (96.75%) of combining MPSE and LS is apparently higher than that of the average accuracy (73.55%) of using only MPSE, which indicates LS-based feature selection is still a necessary step for improving diagnostic accuracy. Moreover, the average accuracy rate of MPGSE is still higher than that of MPSE and MSE, which further verify the advantage of the proposed method in fault recognition.
To show the influence of feature dimension on the diagnosis result, Figure 23 plots the relation curve between accuracy and feature dimension selected by LS. As shown in Figure 23, as the feature dimension increases, accuracy of different methods has an upward tendency. However, the greater feature dimension implies the more computing time.
us, to make a tradeoff between diagnosis accuracy and computational efficiency, this paper selects the first four most important features for fault identification, which meets the engineering requirements.
Our approach is adopted to further analyze the B and C dataset, and 20 trials are also performed. e detailed diagnosis results of the three methods (MPGSE and LS; MPSE and LS; and MSE and LS) are given in Table 8. As can be seen in Table 8, mean accuracies of our approach reach 94.70% and 100% for the B and C dataset, respectively, which is higher than that of other two methods. is indicates that our approach can effectively recognize different fault patterns of bearing.
Similarly, for the B and C dataset, classification performance of three methods (MPGSE, MPSE, and MSE) is investigated when LS is not used for feature selection. Table 9 lists the detailed diagnostic results. As shown in Table 9, whether B or C dataset, the average accuracy obtained without using LS is less than that of using LS. at is, the LS method is effective in improving fault identification rate.   Figure 24 displays global picture of the experimental bench, which primarily consists of loading equipment, bearing test module, driving system, electrical control system, and computer monitoring equipment. In this experiment, four kinds of faults are separately manufactured on normal bearing by using spark machining, including outer race fault (ORF), inner race fault (IRF), outer race-ball compound fault (ORBF), and outer-inner race compound fault (OIRF). Specifically, whether bearing inner race or outer race, their single local fault size is 0.5 mm in depth and 0.1 mm in width. Besides, a scratch fault is machined on the ball surface of bearing. Figure 25 shows bearing with different defects. e specific size of the testing bearing is displayed in Table 10. A PCB accelerometer with a sensitivity of 100 mV/g was installed near the testing bearing block to gather the bearing fault signal. For each fault pattern, motor speed is stable at 1050 rpm, and bearing fault data are sampled with a sampling frequency of 10240 Hz.
In this experiment, for every failure state, 50 samples with the data length of 2048 points are selected, which means that a total of 200 data samples are generated. More specifically, in the process of verification of the algorithm, 25 samples for every failure state are randomly chosen as the training data and the remainder 25 samples are regarded as the testing data. Apparently, 100 training samples and 100 testing samples can be acquired to show the effectiveness of our algorithm, respectively. It is essentially a four-classification problem to be addressed. Figure 26 plots the timedomain graph of different fault signals and their corresponding FFT spectrum. As illustrated in Figure 26, because the collected bearing vibration signal contains heavy background noise and other external interference and the raw bearing vibration data under different fault patterns have some similarity in their waveform and spectrum, the bearing fault pattern cannot be exactly recognized directly through the waveform and the spectrum, which implies that an appropriate technology should be introduced to complete the efficient recognition of each fault pattern.

Experimental Results and Analysis.
e proposed approach is utilized to analyze the abovementioned fault data. Firstly, we apply GWO-based parameter-optimized MPGSE to extract multiscale fault features over different scales. Note that the optimal parameters of MPGSE are selected as λ � 4 and τ � 20, respectively. Figures 27(a) and 27(b) show the PS and PGS curves of bearing fault data under four states, respectively. As can be seen in Figure 27, discrimination degree of PGS under different fault status is better than that of PS. Meanwhile, as the SE scale increases, the PGS value of fault data tends to stabilization, whereas the PS value of fault data fluctuates largely, which has a negative influence on fault classification and complexity estimation. Secondly, the LS algorithm is employed to rank the extracted multiscale features according to their importance and sensitivity. e new order of multiscale features is shown in equation (28). According to equation (28), the first four features (τ � 3, 8, 1, and 7) containing the richest fault information are selected to build a new feature set with the size of 200 × 4. Figures 28(a) and 28(b) display the distribution of multiscale features after and before applying LS, respectively. Finally, the new feature set is regarded as the input of ELM to recognize fault categories of bearings. Figure 29 plots the diagnostic results of our method. As can be seen in Figure 29, our method can obtain an accuracy rate of 100%, which verifies the validity of our method.
To highlight the advantages of the proposed approach, the same fault data are analyzed on two comparison methods (MPSE and MSE). Note that the important parameters of MPSE and MSE are also determined by the GWO method, which are optimized as m � 4 and τ � 20. To reduce the contingency of the diagnosis results, 20 trials are conducted on each method. e identification results obtained using different methods are shown in Figure 30, and the detailed results are given in Table 11. As you can see, the proposed method (MPGSE and LS) achieves the highest average accuracy (100%), the second method (MPSE and LS) has a diagnostic accuracy of 95% to 98%, and the third method (MSE and LS) has a diagnostic accuracy of 90% to 98%. In addition, SD of recognition results of the proposed approach is lower than those in the second and third methods, which means that the recognition rate of MPGSE has better stability than that of MPSE and MSE. Meanwhile, e CPU running time of the proposed MPGSE is lesser than that of MPSE and MSE. Hence, superiority of the presented MPGSE method in fault classification is demonstrated by the comparative analysis.
To show the necessity of integrating the LS method,     Figures 32(a)-32(c), respectively. As can be seen in Figure 31(a), features selected by LS have a nice cluster and recognition degree, whereas the randomly selected features in Figure 32(a) cannot be recognized clearly. is implies that the combination of MPGSE and LS is helpful in fault classification. Besides, by comparing Figures 31(b) and 32(b), we can find that features obtained by MPSE before and after applying LS have no good differentiation. By comparing Figures 31(c) and 32(c), it can also be found that features obtained by MSE before and after applying LS are also not good enough.
To further show the effectiveness of using the LS approach, we randomly select four features (τ �1, 7, 9, 11) as the input of the ELM classifier to identify different bearing fault patterns. For each approach, 20 trials are also carried out to avoid the randomness of their classification results. Detailed diagnosis results of different approaches are given in Table 12. It can be observed that our method can achieve     Complexity an average accuracy of 90.30%, which is higher than that of MPSE and MSE. In other words, MPGSE has better performance in feature extraction than MPSE and MSE. However, it is very obvious that the average accuracy of Table 12 is lower than those of Table 11, which validates the efficacy of LS in feature selection.

Results and Discussion.
According to the experimental analysis results of two datasets above, the proposed fault diagnosis scheme is demonstrated to be effective for bearing fault identification. Concretely, the proposed method (i.e., MPGSE and LS) can achieve a classification accuracy of 99% and above at the same time to ensure that its computational efficiency is higher than that before improvement. In addition, various combination comparisons are performed to verify the superiority of the proposed diagnosis algorithm. Concretely, several metrics (i.e., maximum, minimum, mean, and SD of classification accuracy) and CPU running time are utilized to compare the diagnosis performance of different methods, which prove that the classification accuracy of the proposed approach is larger than that of other comparison methods (i.e., MPSE and LS; MSE and LS). Meanwhile, the CPU running time of the proposed approach is smaller compared with other comparison methods (MPSE and LS; MSE and LS). Nevertheless, some challenges remain when the proposed diagnosis approach is applied to address the problems of the identification of different health conditions.
(1) Although the proposed MPGSE approach can overcome the shortcomings of the undefined entropy value existing in traditional multiscale entropy and has superior feature extraction performance for rolling element bearing, the selection of two parameters (i.e., SE scale λ and scale factor τ) of the proposed MPGSE is still empirical. Hence, except for GWO used in this paper, for the future work, we can also adopt other swarm intelligent optimizers (e.g., genetic algorithm (GA), particle swarm optimization (PSO), cuckoo search algorithm (CSA), firefly algorithm (FA), fruit fly optimization algorithm (FOA), and whale optimization algorithm (WOA)) to adaptively optimize the parameters of MPGSE.
(2) e core idea of this paper is that the presented MPGSE and LS are combined to achieve intelligent fault diagnosis of rolling element bearing. at is, except for MPGSE-based feature extraction, feature selection is also very important in the proposed method. Hence, in the future research, it is necessary to introduce other effective feature extraction techniques to analyze and compare the experimental data, such as hybrid feature selection scheme [54], local and global principal component analysis (LGPCA), minimum redundancy maximum relevance (mRMR) [55], partial maximum correlation information (PMCI) [56], and multicluster feature selection (MCFS) [57]. e research and comparison of these methods are the focus of the follow-up research.
(3) Another important point worth mentioning is that the proposed fault diagnosis scheme increases running time due to the fusion of three stages (MPGSE-    26 Complexity based feature extraction, LS-based feature selection, and ELM-based fault classification). Running time of the algorithm is not a big issue, thanks to the development of computer science; in order to make our approach quickly applicable to the on-line condition monitoring and diagnosis of machinery, the improvement of computational efficiency of our approach is regarded as our future research direction.

Conclusions
In this paper, a novel dynamical indicator named PGSE is proposed for evaluating complexity and uncertainty of the time series. To extract different fault signatures over multiple scales, the coarse-grained procedure and PGSE are combined to design a new algorithm called MPGSE, where its key parameters are selected by GWO. Secondly, LS approach is utilized to select the sensitive features and establish a new feature set, which can remove the redundant or irrelevant feature information and improve computational efficiency. Ultimately, the acquired new feature dataset is entered into the ELM classifier to identify automatically different health conditions of rolling bearing. According to the experimental analysis results from two examples above, our designed algorithm is proved to be effective in identifying different fault categories and severities of rolling element bearing.
Innovations and main contributions of this article are summarized as follows: (1) A modified method called PGSE is formulated, which can improve detection of dynamic change of time series (2) A neoteric algorithm termed as MPGSE is developed to extract abundant fault feature over multiple scale. (3) e LS method is employed to select several sensitive characteristics with a higher discrimination (4) e advantages of our algorithm in fault classification are validated by the application of two experimental cases e abovementioned results show that our method is satisfactory and promising in intelligent fault detection of rolling bearing. However, performance of our method is unknown for health status recognition of bearing under variable speed. It will be very valuable to apply MPGSE to diagnose different fault patterns of rolling bearing when rolling bearing is running at variable speed. is research point will be carried out in the future work.

SNR:
Signal-to-noise ratio FFT: Fast

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare there are no conflicts of interests regarding the publication of this paper.