Discriminant Feature Distribution Analysis-Based Hybrid Feature Selection for Online Bearing Fault Diagnosis in Induction Motors

Optimal feature distribution and feature selection are of paramount importance for reliable fault diagnosis in inductionmotors.This paper proposes a hybrid feature selection model with a novel discriminant feature distribution analysis-based feature evaluation method. The hybrid feature selection employs a genetic algorithm(GA-) based filter analysis to select optimal features and a kNN average classification accuracy-based wrapper analysis approach that selects the most optimal features. The proposed feature selection model is applied through an offline process, where a high-dimensional hybrid feature vector is extracted from acquired acoustic emission (AE) signals, which represents a discriminative fault signature. The feature selection determines the optimal features for different types and sizes of single and combined bearing faults under different speed conditions. The effectiveness of the proposed feature selection scheme is verified through an online process that diagnoses faults in an unknown AE fault signal by extracting only the selected features and using the k-NN classification algorithm to classify the fault condition manifested in the unknown signal. The classification performance of the proposed approach is compared with those of existing state-of-theart average distance-based approaches. Our experimental results indicate that the proposed approach outperforms the existing methods with regard to classification accuracy.


Introduction
Induction motors are very vital and perhaps the most widely used equipment in modern industry, accounting for most of the energy consumption, and are the primary means of providing rotary stimulation at different speeds and under varying load conditions [1].Friction is inevitably associated with all sorts of mechanical motion and is a major cause of energy loss and tribological wear in machines.In rotary motion, the effects of friction are mitigated through the use of bearings and lubricants.
During the operational life of an induction motor, bearings are constantly subject to varying stresses due to changes in mechanical and electrical loads, which adversely affect their fatigue life.These cyclical variations in load lead to fatigue failure of the bearing material, which ultimately appears in the form of surface cracks.These cracks and the resultant spalls in a bearing can cause an induction motor to fail and lead to unexpected downtime in a manufacturing process [2,3].In nearly 50% of cases, bearings have caused an induction motor to fail; hence, machine condition monitoring, which aims to alleviate unexpected downtime, mostly focuses on the bearings [4].Early detection and reliable diagnosis of bearing faults can help in avoiding unexpected machine failure and process shutdown.
Research on bearing fault diagnosis has adopted two main courses: model-driven [5][6][7] and data-driven.In the modeldriven approach, detailed physical models of the system are developed, and the difference between the actual output of the system and the output predicted by the physical model is monitored to determine the current condition of the system.In the data-driven approach, no models of the system are developed, rather data collected from the actual system, in its normal and abnormal conditions, is mined for features that are representative of the two conditions.

Journal of Sensors
The data-driven approach has gained a lot of attention in the research community due to its simplicity and effectiveness.An effective data-driven fault diagnosis model involves three basic steps: (1) data acquisition from the bearing during operation, (2) feature extraction from the acquired data or fault signal to characterize fault signatures, and (3) classification of different bearing fault conditions on the basis of characteristic features.
Data-driven fault diagnosis utilizes various kinds of signals or data including current, voltage, vibration, temperature, and chemical, each with its own advantages and shortcomings.The current and voltage signals enable a low cost and easy signal measurement system that offers a reasonably satisfactory performance in bearing fault detection, as demonstrated in [8][9][10].However, current signals contain unnecessary components, and current signal analysis is not suitable for bearing fault diagnosis in machines operating at low rotational speeds [11].Alternatively, vibration signals have been used to classify induction motor faults [11].Vibration-based condition monitoring is also susceptible to performance degradation at low rotational speeds and is not appropriate for diagnosing faults that are still in the early stages.In order to detect bearing faults before they manifest in the form of cracks on the bearing surface, acoustic emissions (AEs) have been found to be very useful.The AE signal-based approach is also very effective in diagnosing faults at low rotational speeds because it utilizes low-energy signals that are captured using a wideband AE sensor [12].
Data-driven approaches generally employ various signal processing techniques to extract characteristic fault information from signals obtained from defective bearings.This involves the calculation of different statistical parameters or features of the time-domain signal, which include the mean, standard deviation, skewness, kurtosis, and root mean square (RMS) value [13,14].Feature extraction from the fault signal in the frequency domain usually involves the use of fast Fourier transform, power spectrum estimation, cepstrum analysis, and envelope spectrum analysis [14,15].The features obtained using these techniques are suitable for particular situations only, which are determined by the characteristics of the fault signals [16].
In order to evolve a fault diagnosis strategy that works well for a range of different conditions, this study uses a hybrid feature vector, which includes statistical features from the time domain, frequency domain, and envelope spectrum of an AE signal, to extract the maximum possible fault information.However, a high-dimensional feature vector can degrade the classification performance due to irrelevant and redundant information and has a high computational overhead associated with the classifier.Therefore, this paper proposes an efficient feature selection model with discriminant feature distribution analysis to ensure optimal features that can improve classification accuracy and reduce computational complexity.
Feature selection or dimensionality reduction is carried out by evaluating individual features according to some criteria and selecting only those that will render the best classification results.In general, feature selection techniques can be divided into three categories: wrapper, filter, and hybrid approaches [17,18].
The wrapper approach selects the feature subset with feature variables showing the highest classification accuracy for a particular classifier, whereas the filter approach creates a rank of the feature variables using some property values or an evaluation model [19].The filter approach, however, does not consider the classifier when ranking feature variables and hence cannot ensure the best results in every situation.The hybrid approach combines both the wrapper and filter approaches and is highly effective in complicated feature spaces [20][21][22][23].
The optimal features are selected by performing a complete, sequential, or heuristic search of the feature space.A complete search ensures a high-quality feature subset, but it is very costly in terms of computational time.In contrast, a sequential search is comparatively faster but does not guarantee the best results.Heuristic techniques including genetic algorithm (GA) provide a good tradeoff between computational complexity and quality of selected optimal features [17].
The evaluation of feature subsets using an appropriate evaluation method is very vital in determining the most discriminatory features during feature selection.Several feature evaluation methods have been proposed depending upon classification accuracy or Euclidean distance-based feature distribution criteria.In [24], Kanan and Faez proposed selecting the optimal features by evaluating candidate features using average classification accuracy, which improves classification performance but only for a specific classification algorithm.In [25], Kang et al. proposed a feature subset evaluation method based upon intraclass compactness and interclass separation calculated using average Euclidian distances.However, they did not consider all possible feature distributions.Moreover, the intraclass compactness value considers dense areas only, and samples that are located in less dense areas or on the outskirts of a class are ignored, which can affect the multiclass distribution.Similarly, a high distance value between two classes can dominate the distance values between other classes and hence the overall interclass separation value.
To overcome the limitations of conventional average distance-based methods, this paper proposes a hybrid feature selection scheme that uses GA and a new discriminant feature distribution analysis-based filter approach.First, a set of suboptimal features is selected from the original set of hybrid features, which are extracted from known datasets.This is done through an offline process that uses  iterations and creates a feature occurrence histogram.Then, these suboptimal features are further evaluated using a -NN classifier to select the optimal features from the feature occurrence histogram.Finally, the proposed scheme is tested through an online process that extracts the selected optimal features from an unknown AE signal and classifies faults based on those optimal features alone, using the -NN classification algorithm.
The rest of this paper is organized as follows.Section 2 describes AE fault signal acquisition and the different datasets.Section 3 describes the proposed hybrid feature  selection model, and Section 4 presents the experimental results and analysis.Finally, Section 5 concludes the paper.

AE Fault Signal Acquisition
The performance of any fault diagnosis system depends on the quality of the acquired fault signal.In this study, a wideband AE sensor is used to collect acoustic data from the bearings.
As shown in Figure 1, the AE sensor is attached to the top of the bearing housing at the nondrive end of the shaft in the experimental test rig [12] developed by Intelligence Dynamics Lab (Gyeongsang National University, Korea).The signal from the AE sensor is sampled at 250 kHz.
In this study, a normal (defect-free) bearing signal, single fault bearing signals, and combined fault bearing signals with different physical crack sizes were collected under various rotational speeds.The physical cracks were etched on different parts of the bearing using a diamond cutter bit. Figure 2 shows the different single (i.e., inner raceway, outer raceway, and roller fault) and combined bearing faults (i.e., inner raceway and roller fault, outer and inner raceway fault, outer raceway and roller fault, and inner and outer raceway and roller fault) that were etched into the bearings.
In this study, 10 datasets were collected for two physical crack sizes (see Figure 3) and five rotational speed conditions (300 rpm, 350 rpm, 400 rpm, 450 rpm, and 500 rpm).Moreover, a normal (no crack) bearing signal under each rotational speed condition is also included in these datasets.Figure 3 shows the datasets used in this study.
Each dataset contains AE signals for eight different fault types including inner, outer, roller, inner + roller, outer + inner, outer + roller, and inner + outer + roller.For each fault type, 90 different AE signals are involved, each with a length of 10.

Proposed Method
The proposed fault diagnosis model, as depicted in Figure 4, has two processes, that is, an offline process for fault feature analysis and discriminative feature selection from previously acquired datasets and an online process that detects and classifies faults in an unknown AE signal, by extracting only those features selected in the offline process.

Hybrid Feature Extraction.
To extract the maximum possible fault information, this study considers a highdimensional hybrid feature vector that consists of a total of 22 features, including 10 time domain and three frequencydomain statistical parameters of the AE fault signal, along with nine statistical parameters extracted from its envelope spectrum.The time-domain statistical features are RMS, square root of amplitude (SRA), kurtosis value (KV), skewness value (SV), peak-to-peak value (PPV), crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF), and kurtosis factor (KF), whereas the frequency-domain statistical features are frequency center (FC), RMS frequency (RMSF), and root variance frequency (RVF).Tables 1 and  2 present all of the time-domain and frequency-domain statistical features, along with the mathematical relations to calculate them.
To find more specific fault information, we extract the RMS features of each fault's defect frequency region using the envelope power spectrum of the corresponding AE signal.Figure 5 illustrates the steps involved in the process of extracting the envelope spectrum RMS features of a given fault signal.and 6(c), the green, black, and red rectangular windows represent the inner, outer, and roller defect frequency ranges, respectively.The frequency ranges for each defect are calculated using (1), (2), and (3), respectively, where   is the operating frequency,   is the case frequency,   is the inner defect frequency,   is the outer defect frequency, and   is the roller defect frequency: range roller = 2 × (number of sideband × (  +   × error rate) + error rate ×   ) . (3)

Hybrid Feature Selection with Discriminant Feature Distribution Analysis.
This study uses a hybrid feature selection scheme to reduce the dimensionality of the original feature space by selecting an optimal set of features that would yield a higher classification performance.The proposed hybrid feature selection model is comprised of two parts: a filter-based feature selection part in which the feature subset evaluation process is independent of the classification algorithm and     a wrapper-based part that selects the optimal features based on classification accuracy, as depicted in Figure 7.
The filter-based feature analysis is performed in  iterations.For each iteration, 2/3 of all of the samples are selected randomly, and then GA with the proposed evaluation function (fitness function) is applied to select the suboptimal features.At the completion of this step, the sets of suboptimal features, which were selected each time the filterbased feature analysis was performed, are used to generate a feature occurrence histogram.Finally, in the wrapper-based selection part, the optimal features are selected by evaluating different levels of the feature occurrence histogram based on a -NN average classification accuracy.The overall process is depicted in Figure 8.
In the filter-based feature selection, GA is used to search different feature subsets.The GA is a robust and effective optimization technique based on the natural evolutionary theory [26][27][28].The GA operates in certain discrete steps, which usually include encoding, parent selection, crossover and mutation, replacement, and evaluation of each solution.The best solution is generated in the form of the best chromosome, which is a combination of genes, each representing an element of the optimal feature vector.In this study, generational GA is used.For each generation,  offspring are generated, and  chromosomes in the population are replaced with those of the newly generated offspring.Figure 9 shows the flow of the GA.
In the proposed GA-based optimization, we use binary encoding, roulette-wheel parent selection, uniform crossover [29][30][31], and one-point mutation.In this study, we fix the maximum number of generations at 500.The initial population size is 500.In each generation, 50 offspring are generated, and the 50 worst-valued chromosomes in the population are replaced with those of these newly generated offspring.is usually complex and high dimensional.The within-class compactness and between-classes separation, as determined by the average Euclidean distance-based approach, are not always sufficient to fully describe the distribution of samples of all classes.Figure 10(a) shows a distribution of three classes, where the average distance-based compactness does not cover all possible sample distributions of these classes.In Figure 10(b), all samples of the three classes are distributed in such a way that two classes overlap each other and one class is located at a larger distance.This large distance value can lead to imprecise inferences as to the distribution of samples using the average distance-based interclass separability.To overcome these issues, this paper proposes a discriminant feature distribution analysis-based objective function.

Discriminant Feature Distribution Analysis-Based
In this study, the Euclidean distance is used to calculate the distance between two samples in the feature space.The Euclidean distance between samples  and  with  features is formulated as follows: Figure 11 illustrates the processes of calculating the within-class compactness and between-class distance.This model determines the Euclidean distance between the class median point and the farthest point as a within-class compactness value for that class.The distance between two classes (i.e., between-class distance) is the minimum distance between the boundary points of those classes.It is usually determined by calculating the minimum value of the Euclidean distances from all samples of a class to all samples of the other classes.
The main goal of the discriminant feature distribution analysis is to maximize the objective function such that the within-class compactness is minimal and the betweenclasses distance is maximal.The objective function value of the optimal feature set ensures the best distribution of samples of different fault classes, which improves the classification performance.The final objective value is calculated as follows: objective value = between class distance within class compactness .

Fault Classification
Using -NN.The proposed feature optimization model selects an optimal feature set with the most effective feature elements for fault diagnosis.In the online process, we use the -NN classifier to validate our optimal feature sets in terms of the classification performance.The -NN classifier is one of the most popular classification methods [32], and it is widely used because of its simplicity and computational efficiency.The basic idea of -NN is to classify a sample depending on the votes of its nearest neighbors.The nearest neighbors are determined by calculating the distance parameter [32][33][34].

Experimental Results and Analysis
The proposed feature selection model is validated through experiments involving the 10 datasets.The datasets are divided into two categories: one for offline feature analysis and selection and the other for online evaluation.The analysis datasets consist of 30 of the 90 signals for each fault type for a given speed condition.The remaining 60 signals of each dataset are used as the unknown signals for online evaluation of the proposed fault diagnosis scheme.
In the fault feature extraction phase, a hybrid feature vector is constructed using a total of 22 feature elements including 10 time-domain statistical features of the signal, three frequency-domain statistical features of the signal, and nine RMS features from the defect frequency zone of the envelope power spectrum of that signal.Figure 12 shows the extracted feature vector for a sample dataset.
The large variations in the feature values of different dimensions, as shown in Figure 12, are due to the extraction of signal features from different domains.These large variations in feature values result in the domination of low-magnitude features by the high-magnitude ones.To minimize this problem, min-max normalization is applied to all of the features.
Combined faults are usually not very distinctly represented by all of the features in the original feature vector.In the high-dimensional space, the feature distributions of some classes (fault types) overlap with those of other   classes, which degrade the classification performance.However, some features very distinctively represent the combined faults.Figure 13 represents the feature distribution of some sample dimensions for a dataset.
The main objective of the proposed discriminant feature distribution analysis-based hybrid feature selection scheme is to select the minimum number of discriminative features.To do so, GA is applied to each iteration of the analysis in order to identify any suboptimal subsets.In GA, the proposed discriminant feature distribution analysis-based objective function is utilized to calculate the fitness values of all of the chromosomes (feature set) in the population.During this evaluation, proper feature distributions are considered to identify optimal features that yield minimum withinclass compactness (well-compacted classes) and maximum between-class distances.Figure 14    is high.Thus, the objective function value is the highest among the four cases considered in Figure 14.In case 2, one class distribution is relatively less compact, which decreases the objective function value.However, in cases 3 and 4, different class samples overlap with each other, resulting in low objective function values.
The proposed hybrid feature selection method performs  iterations of the filter-based analysis.In this study, we used  = 10.Therefore, 10 suboptimal feature subsets are selected during the filter-based analysis and are used to create an optimal feature occurrence histogram for each dataset.The histogram shows how frequently a feature has been selected as an optimal feature.Figure 15 shows the optimal feature occurrence histograms for different datasets.
In the wrapper-based analysis, the average classification accuracies are calculated using all of the features that have appeared in the feature occurrence histogram at a particular frequency or level.The best and most optimal features are selected from different subsets of features at different occurrence levels according to the highest classification accuracy.Figure 16 shows the average classification accuracies at different feature occurrence levels for different datasets.
The final sets of the most optimal features determined by the filter-and wrapper-based analyses are listed in Table 3.
Finally, in the online fault diagnosis process, the evaluation dataset is used as unknown data to evaluate the efficiency of the selected optimal feature vector.To do this, only optimal features are extracted from the unknown signals of the evaluation dataset, and the -NN classifier is applied to calculate the classification accuracy.In this study, "" has been set to three for both the -NN classifier and the -fold cross-validation.The k-fold (threefold) cross-validation splits the evaluation dataset into three.One is used as a training set, and the other two are used for testing in order to increase the reliability of the experimental results.The average accuracy of the threefold classification is considered to be the final accuracy of this system.To measure the multiclass classification performance of a classifier, several performance measures are employed including accuracy, sensitivity (true positive rate), specificity (true negative rate), and precision.In this study, we use precision to gauge the performance of our model, which is useful in determining the classification performance for individual classes.Precision values for each class are calculated from the individual class confusion matrix using the following relation: where, by definition,  True Positive(TP) is the number of samples of class "A" correctly classified as class "A" and

𝑁 False Positive(FP) is the number of samples of another class incorrectly classified as class "A."
To verify the performance of the proposed feature distribution evaluation algorithm for feature selection, we compare it with an existing average distanced-based, state-of-the-art feature subset evaluation algorithm (i.e., Algorithm 1 [25]).Table 4 shows the classification precision values of different fault types for the different datasets using the original feature vector, our proposed feature selection method, and the average distance-based, state-of-the-art feature selection method.
The experimental results listed in Table 4 and depicted in Figure 17 clearly show that the proposed feature selection model outperforms the other approaches under different conditions.In the datasets for small crack size, the weak generated fault signals are not significantly distinguishable, which affects the classification performance.The proposed feature selection model selects the optimal set of features with the best distribution in the high-dimensional feature space to increase the classification performance of the fault diagnosis system.In contrast, the existing average distance-based approaches do not consider the distribution of features and hence render a reduced classification performance.In some instances, the proposed method shows a similar performance to the approach that does not feature selection.However, our proposed model drastically reduces the dimensionality of the feature vector from 22 to 2 by selecting the most optimal features, which significantly reduces the computational overhead.
The proposed fault diagnosis scheme is also applied to benchmark data acquired from the Bearing Data Center Website of Case Western Reserve University.This data was collected using a test bench with a 1 HP electric motor operating at 1730 rpm, 1750 rpm, 1772 rpm, and 1797 rpm (28.83 Hz, 29.17 Hz, 29.53, and 29.95 Hz, resp.), a torque For each dataset from the standard benchmark data, the set of optimal features as determined by the proposed feature selection scheme is listed in Table 5.The proposed scheme is used to classify different fault conditions including normal, ball fault, outer race fault, and inner race fault using the set of optimal features for each dataset.The classification performance of the proposed scheme is compared with that of a state of the art algorithm and a model that uses all the features.Table 6 and Figure 18 show classification performance of the three different approaches.
As shown in Table 6 and Figure 18, the proposed scheme outperforms other methods in classification performance by exhibiting 1.5-4.5% classification performance improvement when the fault features are not easily distinguishable (e.g., 14 mil crack size).For the 21 mil crack size in which the fault features are more pronounced and easily distinguishable, all the methods perform equally well.We also measure the classification performance of these three schemes in the presence of 15 dB white Gaussian noise to show better

Conclusions
This paper presented a hybrid feature selection model with a discriminant feature distribution analysis-based feature subset evaluation method.Based on a robust analysis and evaluation using several datasets representing different operating conditions, the proposed feature subset evaluation method was shown to identify the optimal feature dimensions, where all the samples of different classes are well distributed.With the help of the evaluation method, GA selects an optimal feature set in 10 iterations of filter-based analysis on randomly selected data from the analysis dataset and creates an optimal feature occurrence histogram.Using wrapper-based analysis, the most optimal features for each dataset are selected from the feature occurrence histograms based upon -NN average classification accuracy.Finally, the selected optimal feature sets for different datasets are used by an online process for fault diagnosis.In the online fault diagnosis, only the best features are extracted from the unknown signals of the evaluation dataset, and the -NN classifier is used to evaluate the classification performance.During classification, -fold cross-validation was used, and the average value was considered the final performance in order to increase the reliability of the experiments.The analysis of our experiments on the acquired acoustic emission signals shows that the proposed feature selection and subset evaluation model outperforms the state-of-the-art average distancebased selection model and a model with no feature selection by a performance margin of about 5%.

Figure 3 :
Figure 3: Summary of datasets used in this study.

Figure 4 :
Figure 4: Overall block diagram of the proposed fault diagnosis model.

Figures 6 (
Figures 6(a), 6(b), and 6(c) represent the envelope spectrum RMS features extracted from the inner, outer, and roller defect frequency ranges, respectively.In Figures6(a), 6(b), and 6(c), the green, black, and red rectangular windows represent the inner, outer, and roller defect frequency ranges, respectively.The frequency ranges for each defect are calculated using (1),(2), and (3), respectively, where   is the operating frequency,   is the case frequency,   is the inner defect frequency,   is the outer defect frequency, and   is the roller defect frequency:

Figure 5 :
Figure 5: Process for extracting the envelope spectrum RMS features.

Figure 8 :
Figure 8: Working process of the hybrid feature selection model.

Figure 9 :
Figure 9: Block diagram of the genetic algorithm-(GA-) based feature selection.

Figure 10 :Figure 11 :
Figure 10: Examples of exceptions in feature distribution.

Figure 12 :
Figure 12: Hybrid feature vector of different fault types: (a) all features, (b) time-domain features, (c) frequency-domain features, and (d) envelope spectrum RMS features.

Figure 15 :
Figure 15: Optimal feature occurrence histograms of different datasets.

Figure 16 :
Figure 16: Average classification accuracies of feature occurrence level for different datasets.

Table 1 :
Ten time-domain statistical parameters.

Table 3 :
Final optimal feature sets of the different datasets.

Table 4 :
Classification performance of the three different approaches for individual fault types of the different datasets.

Table 5 :
Final optimal feature set for the different datasets of standard benchmark data.

Table 6 :
Classification performance of the three different approaches for different datasets of standard benchmark data.

Table 7 :
Classification performance of the three different approaches for standard benchmark data with white Gaussian noise of 15 dB. Figure 17: Average classification accuracy of the proposed model and other models.Figure 18: Average classification accuracy of the proposed model compared with that of other models using standard benchmark data.capability of the proposed model.The results listed in Table7and depicted in Figure19clearly show that the proposed fault diagnosis scheme outperforms other methods by exhibiting up to 10% classification performance improvement.Figure 19: Average classification accuracy of the proposed model compared with that of other models using standard benchmark data with white Gaussian noise of 15 dB.