Principal Components of Superhigh-Dimensional Statistical Features and Support Vector Machine for Improving Identification Accuracies of Different Gear Crack Levels under Different Working Conditions

Gears are widely used in gearbox to transmit power from one shaft to another. Gear crack is one of the most frequent gear fault modes found in industry. Identification of different gear crack levels is beneficial in preventing any unexpectedmachine breakdown and reducing economic loss because gear crack leads to gear tooth breakage. In this paper, an intelligent fault diagnosis method for identification of different gear crack levels under different working conditions is proposed. First, superhigh-dimensional statistical features are extracted from continuous wavelet transform at different scales.The number of the statistical features extracted by using the proposed method is 920 so that the extracted statistical features are superhigh dimensional. To reduce the dimensionality of the extracted statistical features and generate new significant low-dimensional statistical features, a simple and effective method called principal component analysis is used. To further improve identification accuracies of different gear crack levels under different working conditions, support vector machine is employed. Three experiments are investigated to show the superiority of the proposed method. Comparisons with other existing gear crack level identification methods are conducted. The results show that the proposed method has the highest identification accuracies among all existing methods.


Introduction
Gears are commonly used in mechanical transmission systems to transmit power from one shaft to another.Because gear is a mechanical component, its performance degrades over time when it is used [1][2][3][4].Gear crack evaluation is a special case only when the gear crack is taken as the first agent in gear performance degradation process [5].Identification of different gear crack levels is beneficial in preventing any unexpected machine breakdown and reducing economic loss because gear crack leads to gear tooth breakage [6].Vibration analysis is a major tool to diagnose gear faults because vibration signals are easily collected from the casing of gearboxes [7,8].In the past years, vibration signal based wavelet analysis has been widely used to diagnose gear faults [9][10][11].Additionally, in recent years, to diagnose the multistage gearbox used in a bucket wheel excavator system, Bartelmus and Zimroz [12][13][14][15][16] proposed a simple and effective health indicator for distinguishing the good and bad health conditions of the multistage gearbox.Data collected from such complex machines are nonstationary gearbox vibration signals under time-varying working conditions.Because the gearbox used in a bad health condition is more susceptible to external varying loads, the proposed health 2 Shock and Vibration indicator was designed to be a function of the instantaneous input speed.The results show that the proposed health indicator has a linear relationship with the instantaneous input speed when the gearbox is in the good or bad health condition.Moreover, as the gearbox degrades, the inclination of the linear relationship increases.
Wavelet analysis is one of the most popular methods for diagnosing gear faults.For the use of wavelet analysis, the selection of a proper wavelet basis is crucial because wavelet analysis aims to calculate the inner product between a signal and the wavelet basis.Recently, Rafiee et al. [17,18] made an exhaustive study for investigating the performance of 324 mother wavelet candidates on gear fault feature extraction.Their results showed that Daubechies 44 has the most similar shape with gear fault features.Even though gear fault diagnosis becomes a hot topic, according to our literature review, only some methods, such as mean frequency of Scalogram [19], instantaneous energy density [20], regularization dimension [21], cyclic spectral analysis [22], and selfsimilarity [23], are highly related to identification of different gear crack levels.However, because those methods are based on signal processing, the explanation of the results obtained by using these methods requires expertise.
To automatically identify different gear crack levels, intelligent methods [24][25][26] are required to be developed.To fill out this gap, first, a series of experiments [5] were conducted to collect gear vibration signals under different working conditions including four different motor speeds and three different loads.Five different artificial gear crack levels, including crack level 0%, crack level 25%, crack level 50%, crack level 75%, and crack level 100%, were produced to simulate gear crack deterioration levels.The definition of these different gear crack levels will be illustrated in Section 3.1.Then, a weighted -nearest neighbor algorithm [24] was proposed to identify three different gear crack levels including crack level 0%, crack level 25%, and crack level 50% under four different motor speeds and three different loads because identification of early gear crack levels is more useful to conduct preventive maintenance.Their results showed that the weighted -nearest neighbor algorithm achieves high prediction accuracies to identify three different gear crack levels under the four different motor speeds and the three different loads.
In this paper, to improve the work reported in [24], identification of three gear crack levels under the different loads and speeds is extended to identification of five gear crack levels under the different loads and speeds.It means that two more crack deterioration levels including crack level 75% and crack level 100% are considered.The gear crack levels ranged from 0% to 100% with an increase of 25% are used to describe the whole gear crack level deterioration.Because the number of the gear crack levels increase, the gear crack level identification problem under different loads and speeds becomes complicated and difficult.Therefore, it is necessary to propose an advanced gear crack identification method.
The rest of this paper is organized as follows.The proposed method for identification of different gear crack levels under different loads and speeds is introduced in Section 2. Three experiments are investigated in Section 3 to illustrate how the proposed method works and comparisons with other existing gear crack level identification methods are conducted.Conclusions are drawn in Section 4.

The Proposed Method for Identification of Different Gear Crack Levels under Different Motor Speeds and Loads
The proposed method for identification of five different gear crack levels under different loads and speeds is summarized in Figure 1, where the mathematical formulas of the proposed method are introduced in the following subsections.First, to represent different gear crack levels, statistical features must be extracted from a gear vibration signal.Traditionally, statistical features are directly extracted from the temporal gear vibration signal and its corresponding frequency spectrum.The frequency spectrum is obtained by conducting Fourier transform on the temporal gear vibration signal.These statistical features directly extracted from the gear vibration signal and its corresponding frequency spectrum can be regarded as the "global" statistical features.These "global" statistical features are useful if the signalto-noise ratio of the gear vibration signal is high.In other words, the fault features caused by a gear crack can be clearly found in the gear vibration signal and its corresponding frequency spectrum.However, besides the fault features caused by a gear crack, there are many noises and unknown vibration components existing in the gear vibration signal.It is necessary to enhance the signal-to-noise ratio of the gear vibration signal before the statistical features are extracted.Unlike Fourier transform, which aims to decompose the gear vibration signal to the sum of globally complex exponentials, a continuous wavelet transform uses inner product operation to measure the local similarity between a gear vibration signal and a wavelet mother function.It should be noted that the wavelet mother function is a locally oscillated analyzing function and can be shifted and scaled.The smaller the scale, the more compressed the wavelet mother function.The larger the scale, the more stretched the wavelet mother function.The continuous wavelet transform at different scales facilitates detecting the different local characteristics of the gear vibration signal, such as the features generated by a gear crack.Therefore, in this paper, the continuous wavelet transform at different scales, namely, Scalogram [27,28], is conducted on the gear vibration signal to highlight the local gear fault features.If statistical features are extracted from the continuous wavelet coefficients, these statistical features can be regarded as "local" statistical features and can be used to better represent different gear crack levels.Additionally, it is not difficult to find that the resulting wavelet coefficients are so redundant that the statistical features extracted from the resulting wavelet coefficients are redundant because the wavelet functions at some of these different scales have very similar shapes.
10 popular statistical features, including mean, standard deviation, root mean square, peak, skewness, kurtosis, crest factor, clearance factor, shape factor, and impulse factor, are applied to the Scalogram at the scales ranged from 1 to 45 Second, compared with the statistical features used in other intelligent gear crack level identification methods [24][25][26], in which only 10 to 30 statistical features were extracted, these 920 statistical features extracted by the proposed method are highly redundant and they are superhigh dimensional so that it is impossible to directly use all of the redundant statistical features to train and test statistical models.Before any statistical model is used, it is necessary to reduce the dimensionality of the 920 redundant statistical features.There are many dimensionality reduction methods including linear and nonlinear methods [30].For the nonlinear dimensionality methods, they have some disadvantages listed as follows.First, the calculation efficiencies of the nonlinear dimensionality methods are low.Second, they need much computer memory.Otherwise, the nonlinear dimensionality methods fail to generate low-dimensional features due to lack of computer memories.Third, even though these nonlinear dimensionality methods could be used to process training data, mapping of testing data to a low-dimensional space, namely, out-of-sample extension, is still questionable and often results in distinct estimation errors.Besides, through some numerical and real case studies, [30] concluded that nonlinear dimensionality reduction methods are not outperforming the traditional linear dimensionality reduction methods, such as principal component analysis.Therefore, a simple and efficient linear dimensionality reduction method called principal component analysis [31] is employed in this paper to generate new significant low-dimensional statistical features, namely, principal components, to distinguish different gear crack levels under the different working conditions.
At last, to ensure high training and testing accuracies, support vector machine [32,33] is utilized to identify different gear crack levels under the different working conditions because it is able to use the kernel trick to map these new significant statistical features to a high-dimensional feature space, where linear classification is possible.

Shock and Vibration
where * is the complex conjugate operator and (, ) are the wavelet coefficients.From (1), it is seen that the continuous wavelet transform converts a one-dimensional signal to a two-dimensional signal (a time-scale representation), which generates redundant wavelet coefficients at different scales.Following the use of the continuous wavelet transform, 10 statistical features shown in the following can be applied to quantify the wavelet coefficients at the scales ranged from 1 to 45 and their corresponding frequency spectra.Considering 20 more statistical features extracted from the original signal without being processed by the continuous wavelet transform and its frequency spectrum, there are 45×10+45×10+2×10 = 920 redundant statistical features in all.The statistical features used for quantifying wavelet coefficients at different scales (note that the same statistical features are used to quantify the frequency spectra of the wavelet coefficients at different scales and  is the length of the signal) are as follows.

Mean
Value Root Mean Square Standard Deviation Skewness Kurtosis Crest Factor Clearance Factor Shape Factor Impulse Factor Maximum 2.2.Dimensionality Reduction.In the previous section, the superhigh-dimensional statistical features are extracted based on the use of the continuous wavelet transform.The dimensionality of the statistical features is 920.The direct use of all 920 statistical features for identification of different gear crack levels under the different working conditions will lead to the curse of dimensionality [34], which means that the number of the data used for supporting any result grows with the dimensionality exponentially.To relieve this problem, the dimensionality reduction is required to be conducted prior to the use of support vector machines.As discussed at the beginning of Section 2, in this paper, principal component analysis is chosen and its fundamental is introduced in the following [31].Suppose that  training samples for different gear crack levels are obtained.Based on the  training samples and the 920 statistical features extracted from each sample,  × 920 feature matrix is constructed as follows: From (12), it is seen that the statistical features used in (12) are very redundant.Principal component analysis aims to generate the significant new statistical features from the high-dimensional space and form a low-dimensional orthogonal space to express different gear crack levels.Each of the new generated features is called a principal component.Additionally, the first principal component has the greatest variance.The second principal component has the second greatest variance, and so on.Suppose that each column or feature of (12) has a zero mean and a unit variance.To achieve the above statement, the following optimization problem is constructed: where  is the transpose operator.The Lagrange function of ( 13) is built as follows: where  is the Lagrange multiplier.The first partial derivative of  with respect to w is obtained as follows: If ( 15) is set to zero, the relationship between w and the Lagrange multiplier is the eigenfunction of the symmetric matrix F  F and it is written as follows: Suppose that the first column of w is the eigenvector corresponding to the largest eigenvalue, the second column of w is the eigenvector corresponding to the second largest eigenvalue, and so on.Consequently, the feature matrix shown in (12) can be mapped to a new space consisting of the principal components: where the first column of t is the first principal component  1 , the second column of t is the second principal component  2 , and so on.For fair comparison with the other existing gear crack level identification methods reported in [24], where seven statistical features were selected from 25 statistical features, in this paper, the first seven principal components obtained by the proposed method are used to train and test support vector machines.Besides, the testing data can be directly mapped to the new principal component space by using the established linear transformation matrix w.

Identification of Different Gear Crack Levels under Different Working Conditions.
To automatically identify different gear crack levels under different motor speeds and loads, support vector machine [32] is used in this paper and it is a kind of supervised learning method which has been widely investigated in the past years for solving various classification and regression problems.Given the training data of two different gear crack levels , where y  is the statistical feature vector with a dimensionality of  and  is the binary classification label, if the training data are linearly separable, a linear decision function can be determined by solving the following optimization problem [32]: where  is the normal vector to the decision function, ⋅ is the dot product, and  is the offset of the decision function.
The objective function of (18) aims to maximize the distance between two hyperplanes, where there are no training data between them.It means that the linear decision function creates the maximum distance between the linear decision function and the nearest training data.
Considering the noise with the slack variables   and the error penalty constant , (18) is revised as [32] arg min Solving ( 19) is equivalent to solving the following Lagrangian problem with Lagrange multipliers   and   : arg min ,,,, Taking the derivatives of ( 20) with respect to  and , respectively, it is derived that Substituting ( 21) into ( 20), ( 20) becomes a dual quadratic optimization problem [32]: y  ⋅ y  subject to  ≥   ≥ 0,  = 1, . . ., , After solving (22), the linear decision function is obtained as follows: sign To extend the linear classification problem to the nonlinear classification problem, kernel trick can be used to map the training data to a high-dimensional space, where the linear classification problem is possibly solved, prior to the establishment of (23).Considering the kernel function, (23) can be revised as [32] sign where (y  , y) is the linear or nonlinear kernel function which should satisfy Mercer's theorem.There are three popular kernel functions including linear, polynomial, and Gaussian radial basis functions [32].Generally, the Gaussian radial basis function is the preferable choice for the use of the support vector machine because, unlike the linear kernel, where  is the kernel parameter and ‖ ⋅ ‖ is the modulus of the feature vector.To classify five different gear crack levels, the popular one-against-all strategy and one-against-one strategy can be used [32,33].

Experimental Platform.
In this paper, one of the coauthors [5] designed the experiments to collect the different gear crack level data under different motor speeds and loads from the experimental setup shown in Figures 3(a) and 3(b).
The experimental setup included a gearbox, a 3-hp ac motor, which was used to drive the input shaft of the gearbox, and a magnetic brake, which was used to provide different loads.Four different rotation motor speeds including 1200 rpm, 1400 rpm, 1600 rpm, and 1800 rpm and three different loads including no load, half load, and full load were used.Gears 1, 2, 3, and 4 had 48, 16, 24, and 40 teeth, respectively.Gear 3 was the tested gear used in the experimental setup.Some artificial gear crack levels denoted as crack levels 0%, 25%, 50%, 75%, and 100% were produced to simulate all gear crack deterioration levels and their geometries are tabulated in Table 1.The crack thickness was 0.4 mm because the available thinnest knife in the coauthor's lab was 0.4 mm.For the four gear crack levels, the crack depths were 0.25, 0.5, 0.75, and 1 times the half of the chordal tooth thickness, respectively, because the tooth will break rapidly when the crack depth is more than half of the chordal tooth thickness.Here, the chordal tooth thickness 2 is the tooth thickness at the pitch line.The crack widths were 0.25, 0.5, 0.75, and 1 times the face width  equal to 25 mm, respectively.The crack angle was 45 degree.The diagrammatic sketches of the chordal tooth thickness, crack width, and crack angle are plotted in Figures 4(a The vibration signals were measured by two acceleration sensors, which were produced by PCB Electronics with model number 352C67.These two sensors were mounted on the casing of the gearbox in the vertical and horizontal directions, respectively.In [5], it was reported that the vertical direction is more sensitive to identification of the different crack levels.Therefore, the vibration signals collected from the vertical direction were used in this paper.The sampling frequency was set to 5120 and, for each sample, the sampling points were 8192.For each working condition, two samples were collected.Consequently, there were 2 samples ×3 loads ×4 speeds ×5 gear health conditions = 120 samples in all.By considering the different combination of different motor speeds and loads, the similar three experiments designed in [24] were used in this paper.Compared with the experiments designed in [24], two more gear crack levels including 75% and 100% were considered in this paper, which makes the gear crack level identification difficult.The designed experiments used in this paper are tabulated in Table 2.In the first experiment, for each gear crack level, 24 samples were collected from the machine under 4 different motor speeds and 3 different loads.Therefore, 24 × 4 × 3 × 5 samples were collected in all.Then, half of the samples (12 × 4 × 3 × 5) were used in the training phase.The other samples (12 × 4 × 3 × 5) were used in the testing phase.
In the training phase of the second experiment, for each gear crack level, 12 samples were collected from the machine under 2 different motor speeds and 3 different loads.Therefore, 12 × 2 × 3 × 5 samples were collected.Then, in the testing phase, for each gear crack level, other 12 samples were collected from the machine under other 2 different motor speeds and 3 different loads.Therefore, 12 × 2 × 3 × 5 samples were collected.The design of experiment 2 aims to investigate the influence of different motor speeds on the prediction accuracies of the proposed method.
In the training phase of the third experiment, for each gear crack level, 8 samples were collected from the machine under 4 different motor speeds and 1 load.Therefore, 8 × 4 × 1 × 5 samples were collected.Then, in the testing phase, for each gear crack level, 16 samples were collected from the machine under another 4 different motor speeds and 2 different loads.Therefore, 16×4×2×5 samples were collected.The design of experiment 3 aims to investigate the influence of different loads on the prediction accuracies of the proposed method.

Comparisons of the Proposed Method with the Four
Existing Gear Crack Level Identification Methods Reported in [24].For experiment 1, support vector machines are trained and tested by the five different gear crack levels under four different motor speeds and the three different loads.It means that all different working conditions are considered in the training of support vector machines, which makes the different crack levels relatively easy compared with experiments 2 and 3, in which only part of the working conditions are used to train support vector machines.In vibration analysis, different motor speeds and loads have great influence on the amplitudes and waveforms of vibration signals, the changes of which result in the changes of the statistical feature values.Therefore, the design of experiments 2 and 3 makes the different gear crack level identification complicated.For the use of SVM, Gaussian radial basis function was used and its kernel width was optimized to 0.14.The prediction accuracies of the proposed method and the four existing advanced gear crack level identification methods reported in [24] are tabulated in Table 3, where method 1 is the direct use of -nearest Table 3: Comparisons of the prediction accuracies obtained by using the proposed method and the four methods reported in [16] (unit: %).(Note: methods 1 to 4 are used to distinguish three gear health conditions, while the proposed method is used to classify five gear health conditions.)[24].From the result shown in Table 3, it is found that the proposed method has the highest prediction accuracies among all methods.Besides, the prediction accuracies achieve 100%.The reasons why the proposed method has such high prediction accuracies are explained as follows.First, the  statistical features used in the proposed method are very redundant and their number is high to 920.The redundant statistical features provide more gear crack fault signatures.Second, the principle components are the new significant statistical features generated from the redundant statistical features to represent different gear crack levels.For experiment 1, the first two principal components and the first three principal components of the training data are plotted in Figures 6(a    linearly separated in a high-dimensional space, this technique enhances the prediction accuracy of the five different gear crack levels.

Conclusions
In this paper, an intelligent gear crack level identification method under different working conditions is proposed.
The major idea of the proposed method is to use superhighdimensional redundant statistical features to represent five different gear crack levels under different working conditions.The number of the redundant features is high to 920, which is obtained by using 10 statistical features extracted from the Scalogram and its frequency spectra.Then, to reduce the dimensionality of the redundant statistical features and relieve the curse of dimensionality, principal component analysis is performed on the redundant statistical features to generate new significant statistical features.At last, support vector machines with a Gaussian radial basis function are used to identify the five different gear crack levels under the four different motor speeds and three different loads.The comparisons with the four existing gear crack level identification methods show that the proposed method has the highest prediction accuracies among all existing methods and the prediction accuracies obtained by the proposed method are high to 100% for the three different experiments.The reasons for such high prediction accuracies of the proposed method are summarized as follows.
First, extraction of high-dimensional redundant statistical features from the continuous wavelet transform at different scales is helpful to mine more gear fault signatures under different working conditions because these highdimensional redundant statistical features can be used to reflect the global and local characteristics of the gear crack level data.Second, the new significant statistical features, namely, the principal components, generated from these high-dimensional redundant statistical features are useful to distinguish different crack levels under different working conditions.From the principal components shown in Figures 6 to 11, it is obvious to find that as the number of the principal components increases from 1 to 3, the five different crack levels are well separable.At last, support vector machine uses the kernel trick to map the principal components to a highdimensional feature space, where the separation of the five different crack levels is more notable.

Figure 2 :
Figure 2: The analyses of the gear crack data: (a) the waveform of Daubechies 44; (b) the gear data with crack level 0%; (c) the gear data with crack level 100%; (d) the absolute wavelet coefficients of the gear data with crack level 0%; (e) the absolute wavelet coefficients of the gear data with crack level 100%.
Besides, it is found that the wavelet mother function has a significant impact on the wavelet coefficients.Different wavelet mother functions result in different wavelet coefficients.Therefore, for the use of continuous wavelet transform, proper selection of the wavelet mother function becomes an open question.As mentioned in Introduction, Daubechies 44 is used in this paper and its temporal waveform is plotted in Figure2(a).To show the redundant wavelet coefficients obtained by the continuous wavelet transform, the data with gear crack levels 0% and 100% are plotted in Figures2(b) and 2(c), respectively, and their corresponding wavelet coefficients are plotted in Figures2(d) and 2(e), respectively, in which the absolute wavelet coefficients are used to enhance the three-dimensional visualization of the wavelet coefficients.From the results shown in Figures 2(d) and 2(e), it is seen that the each of the one-dimensional gear signals shown in Figures 2(b) and 2(c) is, respectively, transformed to a two-dimensional time-scale diagram, which are the redundant wavelet coefficients.As a result, compared with the original signals shown in Figures 2(b) and 2(c), the redundant wavelet coefficients provide more fault signatures.
) and 4(b), respectively.The different gear crack levels are shown in Figures 5(a) to 5(d), respectively.

Figure 4 :Figure 5 :
Figure 4: The diagrammatic view [24] of (a) the crack angle and (b) the face width and the chordal tooth thickness.

Figure 6 :
Figure 6: The principal components of the training data for experiment 1: (a) the first two principal components; (b) the first three principal components.

Figure 7 :
Figure 7: The principal components of the testing data for experiment 1: (a) the first two principal components; (b) the first three principal components.

Figure 8 :Figure 9 :
Figure 8: The principal components of the training data for experiment 2: (a) the first two principal components; (b) the first three principal components.

Figure 10 :
Figure 10: The principal components of the training data for experiment 3: (a) the first two principal components; (b) the first three principal components.

Figure 11 :
Figure 11: The principal components of the testing data for experiment 3: (a) the first two principal components; (b) the first three principal components.

Table 1 :
The geometries of different gear crack levels.

Table 2 :
The design of the three experiments (N, H, and F denotes no load, half load, and full load, resp.).