Fault Diagnosis Method for Rotating Machinery Based on Hierarchical Amplitude-Aware Permutation Entropy and Pairwise Feature Proximity

With a view to solving the defect that multiscale amplitude-aware permutation entropy (MAAPE) can only quantify the lowfrequency features of time series and ignore the high-frequency features which are equally important, a novel nonlinear time series feature extraction method, hierarchical amplitude-aware permutation entropy (HAAPE), is proposed. By constructing high and low-frequency operators, this method can extract the features of different frequency bands of time series simultaneously, so as to avoid the issue of information loss. In view of its advantages, HAAPE is introduced into the field of fault diagnosis to extract fault features from vibration signals of rotating machinery. Combined with the pairwise feature proximity (PWFP) feature selection method and gray wolf algorithm optimization support vector machine (GWO-SVM), a new intelligent fault diagnosis method for rotating machinery is proposed. In our method, firstly, HAPPE is adopted to extract the original high and low-frequency fault features of rotating machinery. After that, PWFP is used to sort the original features, and the important features are filtered to obtain low-dimensional sensitive feature vectors. Finally, the sensitive feature vectors are input into GWO-SVM for training and testing, so as to realize the fault identification of rotating machinery. -e performance of the proposed method is verified using two data sets of bearing and gearbox. -e results show that the proposed method enjoys obvious advantages over the existing methods, and the identification accuracy reaches 100%.


Introduction
With the rapid development of modern manufacturing, rotating machinery is widely applied in various large-scale precision equipment such as wind turbines, aeroengines, and driverless cars and plays an important role in bearing loads and transmitting power [1]. e most representative rotating machinery is bearings and gears. Generally speaking, the working environment of such components is relatively harsh, and they often work in the state of high speed and high load, which are easy to produce faults. Once the fault occurs, it will cause equipment downtime at least and even huge economic losses and casualties [2]. erefore, the research of efficient and accurate new fault diagnosis method of rotating machinery has important practical significance for ensuring the safe and stable operation of equipment and reducing accidents.
Rotating machinery fault diagnosis based on vibration signal analysis is the current research hotspot, while feature extraction is the most critical step in the fault diagnosis process [3]. e fault vibration signals of rotating machinery are typical nonlinear and nonstationary signals, and the traditional linear processing methods cannot effectively extract the fault features representing the working state of rotating machinery. erefore, many nonlinear time-frequency analysis methods have been proposed and applied to fault diagnosis domain, such as ensemble empirical mode decomposition (EEMD), local mean decomposition (LMD), and wavelet packet transform (WPT). Feng et al. [4] employed EEMD to process the bearing vibration signal to get the corresponding intrinsic mode function (IMF) and then screened the sensitive IMF for reconstruction, so as to identify different fault types. Chen et al. [5] used wavelet packet decomposition to process the original signal and then calculated the energy entropy value of each subsignal as the fault feature for fault identification of bearings.
In addition to the above time-frequency analysis method, in recent years, nonlinear dynamic parameters based on entropy theory have developed rapidly and are widely used in feature extraction of nonlinear time series. Commonly used entropy methods include sample entropy (SE) [6], fuzzy entropy (FE) [7], and permutation entropy (PE) [8]. As effective feature extraction tools, these methods have been widely used in the domain of medical signal analysis, image processing, and vibration signal analysis. However, these methods also have certain shortcomings: SE has low computational efficiency and is highly dependent on the length of time series; the computational process of FE is similar to SE, so its computational efficiency is also low; Compared with FE and SE, PE enjoys simpler principles and higher computational efficiency. Nevertheless, PE only considers the order structure of the time series, but ignores the amplitude information of each element in the time series, which leads to the poor robustness of PE against noise. In addition, PE does not well solve the problem of inaccurate evaluation of the permutation pattern caused by the presence of elements with the same amplitude in the time series. In order to overcome the abovementioned shortcomings of PE, Azami et al. et al. [9] proposed amplitude-aware permutation entropy (AAPE). is method fully considers the amplitude information of time series and the amplitude difference between different elements and is more sensitive to the amplitude and frequency of the signal. Moreover, by improving the measurement method of the same pattern in PE, AAPE can better deal with the problem of equal element amplitude. In the literature [9], the simulation signal and biomedical signal experiments proved that compared with SE, FE, and PE, AAPE enjoys better dynamic feature evaluation ability.
Unfortunately, AAPE can only evaluate the complexity of time series on a single scale, while the important information hidden in other scales is discarded; meanwhile, single-scale analysis cannot consider the long correlation of time series, resulting in inaccurate complexity assessment. With a viewing to solving this defect, based on the multiscale entropy theory [10], multiscale amplitude-aware permutation entropy (MAAPE) [11] was proposed, which extended the dynamic feature quantification based on AAPE to multiple time scales. However, multiscale expansion adopts the traditional averaging operation to obtain multiple coarse-grained time series, which enjoys poor stability. To improve the performance of MAAPE, Chen et al. proposed an improved MAAPE (IMAAPE) method [12], which uses the improved coarsegrained method instead of the traditional method so as to significantly improve the stability and feature extraction performance of MAAPE. However, both methods can only extract the dynamic feature information of the low-frequency components, but ignores the equally important information hidden in the high-frequency components. With a view to solving the defect that multiscale entropy cannot extract high-frequency features, Jiang et al. [13] proposed the concept of hierarchical entropy and divided the original time series into different frequency bands by constructing high-frequency and lowfrequency operators, thus obtaining the corresponding highfrequency and low-frequency components. In view of this, this paper combines hierarchical entropy with AAPE and proposes hierarchical amplitude-aware permutation entropy (HAAPE). e method adopts AAPE to quantify the complexity of each subseries obtained through hierarchical decomposition, which can extract the low-frequency and high-frequency features of the time series at the same time, thereby overcoming the drawback of high-frequency information loss in MAAPE.
In view of the theoretical advantages of HAAPE, this paper introduces it into the field of fault diagnosis to extract fault features from vibration signals of rotating machinery. However, the original fault feature vectors extracted by HAAPE are highdimensional, which not only contains sensitive information closely related to the working state but also inevitably has redundant information that does not contribute to fault identification. Meanwhile, high-dimensional feature vectors mean high computational cost and possible overfitting phenomenon of the classifier. erefore, it is necessary to choose an effective feature selection method to screen sensitive features from the original feature vectors and reduce the impact of redundant information [14].
Pairwise feature proximity (PWFP) [15] is a dimensionality reduction tool recently proposed by SL Happy et al. In this method, according to the principle of keeping minimum within-class distance and maximum betweenclass distance, all features are ranked by statistical method and the features with low scores are the required sensitive features. In literature [15], the dimensionality reduction performance of PWFP was verified through several realworld data sets, and the results showed that, compared with traditional methods such as Relief-F, Laplace Score, and Fisher Score, the performance of PWFP was superior when processing high dimension data with low sample size.
SVM [16] is a nonlinear supervised learning classifier and enjoys excellent generalization ability when dealing with small sample classification problems. However, the performance of SVM depends on the setting of two key parameters, namely, penalty factor and kernel function parameter. To give full play to the best performance of SVM, many optimization algorithms have been applied to the parameter selection of SVM. Among them, the gray wolf optimization (GWO) [17] enjoys great convergence speed and global search ability, so we employ it in this paper to determine the best parameters of SVM by iterative optimization. Based on the above, this paper proposes a new fault diagnosis method for rotating machinery based on HAAPE, PWFP and GWO-SVM. Firstly, HAAPE is used to extract the original fault features representing the working state of rotating machinery from the vibration signals.
en, PWFP is used to rank the original features according to the feature sensitivity, and the best features are selected to form the low-dimensional sensitive feature vectors to reduce the impact of redundant information. Finally, the sensitive feature vectors are input into the GWO-SVM-based classifier for training and recognition, and the rotating machinery fault diagnosis is completed. With a view to proving the effectiveness and universality of the proposed method, the performance of the proposed method is verified using two fault data sets of bearing and gear. e results display that the proposed method can effectively complete the fault identification of rotating machinery, and the accuracy reaches 100%. At the same time, to prove the superiority of our method, we compare the proposed HAAPE with some comparison methods such as MAAPE, IMAAPE, multiscale permutation entropy (MPE), hierarchical permutation entropy (HPE), multiscale sample entropy (MSE), and hierarchical sample entropy (HSE). e results show that the feature extraction performance of HAAPE is superior.
All in all, the main contributions and innovations of this paper are as follows: (1) HAAPE, a new nonlinear time series feature extraction method, is proposed in this paper and applied to the field of rotating machinery fault diagnosis, which shows better feature extraction performance than comparative methods such as MAAPE, MPE, HPE, and so on. e rest of the paper is organized as follows: Section 2 explains the basic principle of AAPE, MAAPE and HAAPE, verifies the advantages of HAAPE over MAAPE through simulation signal experiment and discusses the choice of HAAPE's parameters. In Section 3, the diagnostic procedure of the proposed method is described in detail; meanwhile, the principles of PWFP and GWO-SVM are also summarized in this section. In Section 4, two data sets of bearing and gearbox are used to verify the performance of the proposed method. Finally, the whole paper is summarized in Section 5.

AAPE.
In order to solve the problem that PE cannot consider the amplitude of each element in the time series and the amplitude difference between adjacent elements, which leads to the inaccurate evaluation of dynamic characteristics, AAPE is proposed. In other words, the principle of AAPE is based on PE. erefore, to clearly explain the calculation process of AAPE, we firstly summarize the principle of PE as follows: Given a random time series Y � y i , i � 1, 2, . . . N. Based on the time delay λ and embedding dimension m, Y can be transformed into multiple reconstruction series as follows: (1) According to the size relationship of each element in the reconstructed series, the ascending sorting is carried out, and different permutation patterns π v 0 ,v 1 ,...,v m− 1 are obtained as follows: where i * represents the position index of each element in the reconstruction series. When the embedding dimension is m, the length of the reconstructed series is also m. erefore, there are m! possible permutation patterns, and the k-th permutation pattern is denoted as π i 。 e occurrence probability of each permutation pattern can be calculated by where Num(π k ) represents the number of π k pattern in all reconstruction subseries. Specifically, the number of Num(π k ) will increase by 1 whenever the permutation pattern of Y m,λ j is π k . Based on the Shannon entropy theory, the PE value of time series Y can be calculated as follows: It can be seen from the above that PE sorts different elements according to their amplitudes to obtain different permutation patterns, and the final entropy value is calculated through the statistics of the occurrence probability of different permutation patterns. ere are two drawbacks in this process. First of all, PE only considers the order information of time series, but ignores the amplitude information of each element. Secondly, when elements with the same amplitude appear in the time series, PE numbers them according to their occurrence order, which leads to the imprecise statistics of the permutation patterns. In order to solve the above defects, AAPE is proposed. Compared with PE, the specific improvements of AAPE are as follows: Set the initial value of p(π k ) as 0. When the permutation pattern of Y m,λ j is π k , different from the counting rule of PE, AAPE adopts equation (5) to calculate the contribution degree of the different pattern. Among which, A ∈ [0, 1] denotes the adjustment coefficient which is utilized to adjust the weight of the time series amplitude average and the deviation between the amplitudes. l refers to the number of elements with the same amplitude in Y m,λ j . For example, when there are three elements with the same amplitude in Y m,λ j , l � 6, and when there are no elements with the same amplitude in Y m,λ j l � 1.
Correspondingly, the relative probability of the permutation pattern π k can be calculated as follows:

Shock and Vibration
erefore, the AAPE value of the time series Y can be calculated by

MAAPE.
To improve the feature extraction performance of AAPE, MAAPE is proposed based on multiscale entropy theory. For time series Y � y i i � 1, 2, . . . N, coarse-graining process is conducted on it using equation (8) firstly.
where τ is the scale factor and y τ j is the obtained coarsegrained time series.
MAAPE can be obtained by calculating AAPE values of τ coarse-grained time series.
2.3. HAAPE. As shown in the principle of MAAPE, the essence of multiscale extension is to calculate the mean value of τ adjacent elements in the time series, so as to obtain different coarse-grained time series. Each subseries obtained by this processing method is the low-frequency component of the time series, so MAAPE can only extract the lowfrequency information of the time series, but ignores the high-frequency information. To solve this issue, this paper proposes hierarchical amplitude-aware permutation entropy (HAAPE). is method adopts hierarchical processing method instead of multiscale extension, which can extract the high frequency and low-frequency features of time series simultaneously. e specific principle of HAAPE is summarized as follows: (1) For the time series Y � y i of length N � 2 n , lowfrequency operator Q 0 and high-frequency operator Q 1 are defined as follows: where Q 0 (y) and Q 1 (y) are the low-frequency and high-frequency components of the time series, respectively. e matrix representation of Q 0 and Q 1 is as follows, where j � 0 or 1.
(2) When the number of hierarchical layers is k, k dimension vector [s 1 , s 2 , . . . , s k ] is constructed, where s k � 0 or 1. e hierarchical node l can be calculated using the following equation: (3) Based on [s 1 , s 2 , . . . , s k ], repeat step (1) to obtain the hierarchical component Y k,l corresponding to node l when the number of hierarchical layers is k: (4) Based on the above, the HAAPE of the original series Y � y i can be calculated as follows: To intuitively explain the process of hierarchical decomposition, Figure 1 shows the schematic diagram when the number of hierarchical layers is 3, where Y 1,0 and Y 1,1 , respectively, represent the high-frequency component and low-frequency component of Y when the hierarchical layer is 1.
According to the above definition, the high-frequency operator Q 1 and low-frequency operator Q 0 of hierarchical entropy correspond to the low pass and high pass filter of the Harr wavelet, respectively. us, the essence of HAAPE is firstly, wavelet packet decomposition based on Harr wavelet is used to decompose the original signal to obtain subseries of different frequency bands, and then the AAPE value of each subseries is calculated to obtain corresponding feature vectors.

Simulation Signal Experiment Analysis.
With a view to testing the performance of the proposed HAAPE method as well as comparing it with the existing MAAPE method, 50 independent random white noise signals with a length of 2048 are adopted for comparative experiments. Figure 2 displays the time domain and frequency domain waveforms of random white noise. Obviously, the complexity of white noise remains roughly constant throughout the frequency band, so theoretically, the entropy values of white noise will maintain stable [13]. HAAPE and MAAPE are used to extract the entropy values of all white noise samples, where the parameters of HAAPE and MAAPE are set to respectively. e error bar curves corresponding to the two different methods are shown in Figure 3, where the mean entropy value and standard deviation are well shown. It can be seen that, for MAAPE, the entropy value of white noise shows an obvious downward trend with the increase of scale factor, which indicates that MAAPE can only extract the low-frequency information of time series, but cannot evaluate the complexity of high-frequency band well. On the contrary, for HAAPE, the entropy value of white noise on each node is basically unchanged, which is consistent with the reality, proving that HAAPE can effectively prevent the loss of high-frequency information. In addition, observation shows that the standard deviation of HAAPE method is smaller than that of MAAPE, indicating that HAAPE method is more stable. As a whole, HAAPE enjoys better feature extraction performance than MAAPE.
2.5. e Parameter Selection of HAAPE. As described above, the main parameters of HAAPE include embedding dimension m, time delay λ, adjustment coefficient A, and hierarchical layer k. e average amplitude and amplitude difference of rotating machinery vibration signals are all very important for the extraction of fault features, so adjustment coefficient is set to A � 0.5 in this paper, which is also in line with the recommendation of literature [9]. Generally speaking, time delay has little influence on the final entropy value, which is generally set to λ � 1 [18]. For the embedded dimension m, if m is too small, AAPE cannot effectively detect the dynamic features of time series, while if m is too large, the calculation cost will increase. In literature [6], m is recommended to be set between 3 and 7. For the hierarchical layer, if k is too small, it will lead to insufficient extraction of high frequency and low-frequency information of the time Shock and Vibration series; if k is too large, the calculation cost will be greatly increased and the practicability of the method will be reduced [19]. Considering feature extraction performance and calculation cost, as well as the common parameter settings for hierarchical entropy and AAPE [9,11,20], this paper finally sets the parameters of HAAPE as follows:

The Proposed Fault Diagnosis Method for
Rotating Machinery 3.1. PWFP Feature Selection Method. By constructing lowfrequency and high-frequency operators, HAAPE can simultaneously extract the high-frequency and low-frequency features which represent the state of time series, thereby effectively avoiding the drawback of high-frequency information loss in MAAPE. However, the original feature vectors extracted by HAAPE are high-dimensional, and there is inevitably redundant information that affects the final pattern recognition. In order to eliminate the influence of redundant information and improve the separability of feature vectors, it is necessary to use an effective feature selection method to screen sensitive features from original feature vectors. e PWFP feature selection method is a dimensionality reduction tool proposed recently which is suitable for the classification of high dimension data with low sample size, whose criteria is based on the minimum within-class distance and maximum between-class distance. Different from the traditional methods, which take all feature samples of the same class as a whole for distance measurement, this method adopts two feature samples of the same or different classes as subsets and counts the number of optimal features corresponding to each subset. After that, each feature is scored and ranked, while the feature corresponding to the low score is the sensitive feature with better separability. In literature [15], it is proved through comparative experiments that PWFP enjoys better dimensionality reduction effect than Fisher Score, Laplace Score, and Relief-F. In view of the excellent performance of PWFP, this paper employs it to further process the original high-dimensional features extracted using HAAPE. e specific principles of PWFP can be summarized as follows: (2) Select the first d features from p * jk as a subset p * . And, the same method is adopted to get the feature subset p * corresponding to all paired samples with the same label. Hierarchical node or Scale factor  Shock and Vibration (4) Select the first d features from q * jk as a subset. And, the same method is used to obtain the feature subset q * corresponding to all paired samples with different labels. en, the occurrence times of all features are counted to obtain the statistical vector q � [c 1 , c 1 , . . . c d ].
(5) Use equation (15) to calculate the occurrence probability of the required features in all paired samples: where m k stands for the number of samples labeled k. (6) e criteria shown in equation (16) is adopted to screen the optimal features: where S(i) represents the score of each feature, and selecting the optimal m features means selecting the m features with the lowest score in the original feature vector.

GWO-SVM.
SVM is a machine learning method which is widely used in the field of pattern recognition. It adopts kernel function to map the nonseparable samples in lowdimensional space to high-dimensional space and constructs a hyperplane to make the samples linearly separable in highdimensional space, so as to improve the generalization ability. For small sample classification problem, SVM is generally the preferred method, so SVM has been widely used in the field of fault diagnosis. For reasons of space, the principle of SVM is not described in detail in this paper. e performance of SVM depends on the setting of penalty factor c and kernel function parameter g. Reasonable parameters setting can bring out the best classification effect of SVM. On the contrary, unreasonable parameters setting will easily lead to overfitting or underfitting. In order to determine the optimal parameters of SVM adaptively and avoid the influence of artificial parameters setting, with average error obtained by trifold cross-validation of training samples as the fitness value, the gray wolf optimization (GWO) with strong global search ability is employed to determine the optimal parameters of SVM. e specific steps of GWO-SVM are as follows: (1) Set the parameters of GWO, and generate the initial wolf pack with [c, g] as the wolf's position.
(2) e corresponding fitness values of all wolves in the initial wolf pack are calculated, and the wolves corresponding to the three minimum fitness values are set as α, β, σ, respectively, while their positions are set as X α , X β , X δ . e other wolves are all ω wolves, and the positions are set as X.
(3) According to X α , X β , X δ , X is updated by the following equation to obtain new wolf pack: where j � α, β, δ, X k n represents the position of the k − th wolf in the n − th generation wolf pack, D j represents the distance between ω wolf and j wolf, and X k n+1 represents the position of updated wolf pack. A and C stand for calculation coefficients, which are calculated as follows: (3) Repeat steps 2 and 3 after getting new wolf pack.
(4) Iterate until the maximum number of cycles T max is satisfied, and output the best [c, g] and corresponding classification accuracy.
e parameters of GWO-SVM in this paper are set as shown in Table 1 (3) Feature dimensionality reduction: PWFP is used to sort the original features according to the separability of different features, and the features with low scores are selected as sensitive features to form the lowdimensional sensitive feature vector. In this paper, the number of sensitive features is set to 8, that is, the sensitive feature vector is a low-dimensional vector with length of 8.

Experimental Verification
For the purpose of verifying the effectiveness of the proposed fault diagnosis method, in this section, we firstly adopt the bearing fault data set of Case Western Reserve University, which is widely used in the fault diagnosis field, to test the performance of the presented method. In addition, to prove the universality of our method, the measured gearbox fault data set is also used to verify the performance of the proposed method. With a view to highlighting the advantages of the proposed method, we conduct a series of comparative experiments to verify the advantages of the proposed HAAPE feature extraction method and the necessity of using the PWFP feature selection method.

Case 1. Fault diagnosis based on rolling bearing
In this section, the publicly available Case Western Reserve University bearing fault data set is used to verify the diagnostic performance of the proposed method. e experimental platform is displayed in Figure 5, which is mainly composed of motor, torque sensor, and dynamometer. e test bearing is located at the motor drive end, and the model is SKF6205. Using EDM technology to deal with normal bearings to simulate different fault types and degrees of bearings. e inner ring fault bearing, outer ring fault bearing, and rolling ball fault bearing with fault diameters of 0.1778, 0.3556, and 0.5334 mm are obtained, so there are ten different working states, including normal working state and nine fault states. During the experiment, the speed of the motor is 1797r/min, and the load is 0. e accelerometer is installed at the drive end to collect vibration signals of rolling bearings under different working states with a sampling frequency of 12 kHz. Figure 6 Table 2 lists the details of different working states of bearings.
According to the fault diagnosis process described in Section 3.3, HAAPE is firstly employed to extract the fault features of all samples to obtain the original high-dimensional fault feature vectors. e mean entropy curves corresponding to different states are shown in Figure 7(a). en, PWFP is used to rank the obtained original fault features, and the results are shown in Figure 7(b). e first eight features after ranking (i.e., features with positions of 12, 2, 3, 10, 13, 1, 14, and 9 in the original feature vectors) are selected as sensitive features in this paper to form sensitive feature vectors. e criteria of PWFP feature selection method are based on the minimum within-class distance and maximum between-class distance. erefore, the eight sensitive features represent the most separable features in the original feature vector. e sensitive feature vectors of the training samples are input into the GWO-SVM-based fault classifier for training to construct the optimal SVM classifier. en, sensitive feature vectors of testing samples are input for identification. e running results are shown in Figure 8. It can be seen that the proposed method can effectively distinguish different fault types and fault degrees of rolling bearings, and the fault recognition accuracy is 100%, which  proves that the proposed method enjoys good fault identification effect. In order to highlight the feature extraction advantages of the proposed HAAPE method, four multiscale entropy methods, namely, MAAPE, IMAAPE, MPE, and MSE, and two hierarchical entropy methods, namely, HPE and HSE, are, respectively, used to replace HAAPE in the proposed method for fault feature extraction, while other diagnostic processes remain unchanged. Parameter setting of different methods is displayed in Table 3. e identification results of different methods are shown in Table 4 and Figure 9, where "time" in Table 4 refers to the time required to extract the fault features of a single sample using the corresponding method. e experimental results show that both HAAPE method proposed in this paper and IMAAPE method proposed by Chen et al. enjoy the highest recognition accuracy, reaching 100%. However, the computational efficiency of HAAPE is significantly higher than that of IMAAPE, which can extract fault features of samples at a faster speed. In addition, it can be observed that the recognition accuracy of hierarchical entropy methods is higher than that of other multiscale entropy methods (HAA-PE > MAAPE, HPE > MPE, HSE > MSE). is reason is that hierarchical entropy can extract the high-frequency and lowfrequency features of vibration signal simultaneously and overcome the defect of high-frequency information loss in multiscale entropy, so the feature extraction performance is better. In terms of computational efficiency, the PE-based method is significantly more efficient than the SE-based method.
e calculated cost of the proposed HAAPE method is slightly higher than that of MAAPE, HPE, and MPE, but it is sufficient to meet the needs of practical application. Overall, compared with other methods, the feature extraction performance of the proposed HAAPE is better.
For the purpose of studying the relationship between the number of features and the recognition accuracy as well as to prove the necessity of using the PWFP method to reduce the dimension of features, after using PWFP to rank the original features, the relationship curve between the number of features and the recognition accuracy is drawn, as depicted in Figure 10. It can be seen that, with the increase of the number of features, the recognition accuracy corresponding to different methods first increases, and then remains basically stable. It is worth noting that the higher the number of features, the higher the training cost. erefore, it is not necessary to use all the features for training and recognition. e ideal situation is to obtain the highest recognition accuracy with the least number of features, which indicates that it is necessary to adopt the PWFP method for feature selection. It can be found that when HAAPE, MAAPE, and IMAAPE achieve the highest accuracy, the corresponding feature number is the least, both of which are 8. However, the highest recognition accuracy of HAAPE and IMAAPE is 100%, and that of MAAPE is 97.67%.
is phenomenon proves that compared with other methods, HAAPE and IMAAPE enjoys stronger robustness and can perfectly realize bearing fault identification with only a small number of features. In addition, according to Table 4, the calculation cost of HAAPE is significantly lower than that of IMAAPE. erefore, overall, HAAPE enjoys the best feature extraction performance among these methods.

Case 2. Fault diagnosis based on gearbox
In the previous section, the rolling bearing fault data set is used to verify the performance of the proposed method. Experimental results show that our method can effectively identify different fault types and fault degrees of bearings and has advantages in recognition accuracy and robustness compared with other methods. In order to further verify the universality of the proposed fault diagnosis model, we use the gear fault data set to further verify the performance of the method in this section.
Gear fault data is collected from QPZZ-II rotating machinery fault experimental platform, and its structure is shown in Figure 11. e gear box used in the experiment is a single reduction gear unit, where the pinion is driven wheel with 55 teeth, and the gear wheel is driven wheel with 75 teeth. We use different fault types of gears to replace normal gears in the gearbox to simulate different fault states of the gearbox. ere are four different faults: pinion wear, gear wheel pitting, gear wheel broken teeth, and gear wheel pitting + pinion wear. In the experiment, the motor is connected to the synchronous belt to drive the shaft to provide power for the gearbox. e motor speed is 880r/min, and the load is 0. e acceleration sensor installed on the bearing seat on the motor side of the gearbox output shaft is used to collect the vibration signals of the gearbox under different states. e sampling frequency is 5120 Hz, and the sampling time is 10.4s. Figure 12 shows the time-domain waveforms corresponding to different working states. Due to the limited data length, this paper adopts a sliding window with a length of 2048 and a step size of 1024 for sampling, and 50 samples with a length of 2048 are taken for each working state, among which 20 samples are randomly selected as training samples and the remaining 30 samples are testing samples. Table 5 lists the details of the different working states of the gearbox.
Same as Case 1, the HAAPE values of all samples are extracted to obtain the original fault feature vectors. e mean entropy curves corresponding to different states are shown in Figure 13(a). It can be seen that not all features have good separability, so feature selection is necessary. e features sorted by PWFP are shown in Figure 13(b). e first 8 features are selected (that is, the features whose positions in the original feature vectors are 7, 12, 8, 15, 1, 5, 11, and 2) to form sensitive feature vectors. e sensitive feature vectors of training samples are input into GWO-SVM for training, and the optimal parameters of SVM are determined as c � 361.09, g � 140.34. en, the sensitive feature vectors of testing samples are input into the trained SVM for recognition. At the same time, in order to compare the performance of different feature extraction methods, MAAPE, IMAAPE, HPE, MPE, HSE, and MSE are used to replace HAAPE for feature extraction, respectively. e parameters setting of different methods are shown in Table 3 and the diagnostic results are depicted in Table 6 and Figure 14. It can be seen that the identification accuracy of the proposed method reaches 100%, which Shock and Vibration indicates that the method can effectively realize the identification of different gear fault types. Considering the two aspects of calculation efficiency and recognition accuracy, the proposed method obviously has superior performance. In addition, it can be seen from the recognition accuracy of different methods that the feature extraction performance of hierarchical entropy is better than that of multiscale entropy, which is the same as the conclusion obtained by Case 1, thus fully proving the advantage of hierarchical entropy over multiscale entropy.           Figure 9: e recognition results corresponding to different feature extraction methods.  Similar to Case 1, to study the relationship between the number of features and recognition accuracy, after ranking the original features using PWFP, the relationship curve between the number of features and recognition accuracy is drawn, as shown in Figure 15. It can be observed that the proposed method only needs seven features to achieve a recognition accuracy of 100%, which not only demonstrates that this method is less dependent on the number of features and enjoys strong robustness but also proves the necessity of using PWFP for feature selection. In general, Case 2 once again proves the effectiveness of the proposed method and its advantages over other methods, which indicates that our method has good universality and provides a new idea for the fault diagnosis of rotating machinery.   Entropy value 12 8 15 1 5 11 2 3 6 14 4 13 10 9  7 Hierarchical nodes

Conclusion
With a view to accurately identifying different fault states of rotating machinery, a new fault diagnosis method based on HAAPE, PWFP, and GWO-SVM is proposed in this paper, and the performance of the proposed method is verified using two fault data sets of bearing and gearbox. e main work of this paper can be summarized as follows: (1) Aiming at the shortcoming that multiscale amplitude-aware permutation entropy (MAAPE) can only extract low-frequency features of time series, but ignores the high-frequency features, this paper proposes hierarchical amplitude-aware permutation entropy (HAAPE). By constructing high-frequency and low-frequency operators, HAAPE can extract the high-frequency and low-frequency information of time series at the same time, which effectively overcomes the information loss problem in MAAPE.
(2) Using HAAPE as feature extraction method, combining PWFP feature selection method and GWO-SVM classifier, a new fault diagnosis method for rotating machinery is proposed. First of all, the fault features of rotating machinery are extracted by HAAPE, then the original features are ranked by PWFP and sensitive features are screened to form sensitive feature vectors. Finally, the obtained sensitive feature vectors are input into GWO-SVM classifier for training and recognition. e performance of the proposed method is verified using two data sets of bearing and gearbox, and it is proved that the proposed method could efficiently and accurately identify different fault types of rotating machinery. (3) In order to compare the advantages of the proposed method over the existing methods, a series of comparative experiments are carried out. e results show that HAAPE is superior to MAAPE, IMAAPE, HPE, MPE, HSE, and MSE in feature extraction performance and robustness. In addition, we also study the relationship between the number of features and the recognition accuracy and prove the necessity of using the PWFP-based dimensionality reduction method.
Data Availability e rolling bearing data are provided by the Case Western Reserve University and can be downloaded in https://csegroups. case.edu/bearingdatacenter/pages/download-data-file.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.