Distance and Density Similarity Based Enhanced k-NN Classifier for Improving Fault Diagnosis Performance of Bearings

1School of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan, Republic of Korea 2Department of Computer Science and Engineering, University of Asia Pacific, Dhaka, Bangladesh 3Power Generation Laboratory, KEPCO Research Institute, Jeollanam-do, Republic of Korea 4Department of Energy Mechanical Engineering, Gyeongsang National University, Gyeongsangnam-do, Republic of Korea


Introduction
Rotary machines, in both industry and common households, use bearings to reduce friction and ensure steady and energy efficient operation.Bearings reduce the noise and vibration levels associated with a machine, which is essential for the long term health of both the machine and its operators.Although bearings are very sturdy components and have very long useful lives; nevertheless, material fatigue due to variations in operating load, currents due to electric discharge, thermal stresses due to variations in operating temperature, corrosion, and contaminants in the operating environment can cause them to fail abruptly.A bearing failure can result in the abrupt shutdown of a machine, which leads to tremendous financial losses.Bearings account for more than 50% of failures in induction motors alone [1], which makes their condition monitoring essential to preventing any abrupt failures.Thus, early and reliable detection of bearing defects is very important as these defects lead to bearing failure.
Many data driven techniques have been proposed for diagnosing faults in bearings.These techniques largely use time-frequency analysis of the fault signals for the extraction of meaningful information about underlying faults [2,3].Fault signals, such as stator current, vibration acceleration, and acoustic emissions, are inherently nonstationary and hence they are processed in the time-frequency domain, using the short-time Fourier transform (STFT) [4], wavelet transforms [5][6][7][8][9][10], empirical mode decomposition (EMD) [11][12][13][14][15], and the Hilbert-Huang transform [16][17][18], to extract characteristic information about different bearing defects.Acoustic emissions are characterized by their low energies and very high bandwidths.They are captured using wideband acoustic sensors and are very effective in diagnosing nascent faults [19][20][21].This paper presents a data driven approach for fault diagnosis in bearings, which extracts hybrid features from the acoustic emission (AE) signals and then employs the proposed enhanced -NN classifier to diagnose different bearing defects.The hybrid feature vectors are constructed by calculating different statistical measures of the time and frequency domain AE signal and its envelope power spectrum.This rather extensive set of features is constructed to uniquely identify each fault condition; nevertheless, all features are not of equal utility in classifying a given fault correctly.Moreover, a high dimensional feature vector is bound to make the classification process computationally more expensive.Furthermore, if the feature vector contains too many redundant or irrelevant features, it may also degrade the classifier's accuracy.Hence, the dimensionality of the feature vector is reduced using feature selection methods, which prune the high dimensional feature vector by eliminating the suboptimal features and selecting only those, which would result in the highest classification accuracy.These optimal features are used to create a model of the data by training a classifier, which is then employed to classify the unknown fault signals.
Due to its simplicity and effectiveness, -NN is usually the first choice in solving any classification problem.However, two factors can degrade its performance.First, -NN determines the similarity between two samples using only a distance measure of similarity; the widely used distance measures are the Euclidean and Manhattan distance.Second, the classification decision and hence accuracy are sensitive to the neighborhood size, .These problems have been highlighted in Figure 1, where the classification decision for the unknown test sample (shown as a red circle) changes with change in the neighborhood size.The test sample is labeled as "B" if  = 3, whereas it is labeled as "A" if  = 5.The limitations of traditional -NN, due to its use of distance based similarity measure, can be overcome using the local outlier factor (LOF) [22,23] and local correlation integral (LOCI) [24], which are measures of similarity, based on the density of data samples.Hence, in this study, hybrid similarity measures (i.e., both distance and density based) are proposed to improve the diagnostic performance of the classical -NN and make it more resilient to the choice of neighborhood size, .
The main contribution of this study is that an enhanced -NN classifier is proposed, which uses hybrid measures of similarity between data samples to make it more resilient to the choice of neighborhood size, , and to increase its diagnostic performance relative to classical -NN.The density based similarity measure (i.e., LOF) is used to boost the decision of classical -NN, which classifies an unknown sample based only upon its Euclidean distance from its "" nearest neighbors using the majority rule.In the proposed -NN, when the  nearest neighbors of an unknown sample do not belong to the same class, then the LOF is used to decide the class membership of the unknown simple.
The organization of the rest of the paper is as follows.In Section 2, the fault simulator and data acquisition setup are presented.In Section 3, the fault diagnosis scheme and the proposed enhanced -NN classifier are discussed in detail.In Section 4, a discussion of the achieved results is provided, whereas, in Section 5, conclusions of this work are provided.

Fault Simulator and Data Acquisition System
The acoustic emission (AE) signals are acquired using a machinery fault simulator, which is used to simulate different fault conditions.The fault simulator uses cylindrical roller element bearings (FAG NJ206-E-TVP2), which are ingrained with cracks on its different parts.AE signals are collected for bearings at the nondrive end of the simulator using a wide-band acoustic sensor and a PCI-2 based data acquisition system, which samples the AE signals at a rate of 250 KHz [25].The acoustic sensor is connected to the top of the bearing housing and is at an approximate distance of 21.48 mm from the bearing, as shown in Figure 2. The nondrive end shaft is connected to the drive end through a gearbox with a reduction ratio of 1.52 : 1.
The bearings are seeded with cracks of two different sizes (e.g., 3 mm and 12 mm), and these cracks are introduced on either one or two components of the bearing to study both single and compound bearing defects.The AE signals recorded for bearings with 3 mm cracks and for bearings with 12 mm cracks are grouped into separate datasets.Moreover, for each crack size, the AE signals are recorded at two different shaft speeds (e.g., 300 RPM and 350 RPM).Thus, a total of four datasets are considered, each with AE signals recorded at a different shaft speed along with different crack sizes.The types of single and compound bearing defects are shown in Figure 3; they include cracks on the roller (BFR), inner raceway (BFI), outer raceway (BFO), inner and outer raceways (BFIO), inner raceway and roller (BFIR), outer raceway and roller (BFOR), and both inner and outer raceways and the roller (BFIOR).For each shaft speed, AE signal for a healthy bearing (FFB) is also recorded.
As mentioned earlier, the AE signals are divided into 4 datasets based upon the crack size and shaft speed, as given in Table 1.For every bearing defect, 90 AE signals are recorded; each signal is of 5-second duration.Similarly, 90 AE signals are recorded for the healthy bearing.Thus, every dataset contains a total of 720 AE signals.

The Proposed Methodology for Bearing Fault Diagnosis
The proposed methodology for bearing fault diagnosis works in two phases, as illustrated in Figure 4.The first phase comprises an offline process that involves feature extraction and feature selection, which are discussed in detail in Sections 3.1 and 3.2, respectively.The offline process is used to determine the set of optimal features that would yield the highest classification accuracy.In the second phase, an online process is used to classify the unknown AE signals using the proposed enhanced -NN classifier.The online process calculates only the optimal set of features for each AE signal and, using only those features, it labels the unknown AE signals.

Features Extraction.
In order to accurately identify each bearing defect, a high dimensional hybrid feature vector is constructed using 22 different features of the AE signal.These features are useful in extracting maximum information about each fault [26] and include ten statistical measures of the time-domain AE signal and three statistical measures of the frequency domain AE signal.These features are listed in Table 2 along with the mathematical relationships for their calculation.Moreover, nine statistical measures, calculated over the envelope power spectrum of the AE signal, are also included in the hybrid feature vector.The features from the envelope power spectrum include the root mean square (RMS) values for each of the three defect frequencies and its first two harmonics.The defect frequencies include the ball pass frequency over inner race (BPFI), the ball pass frequency over the outer race (BPFO), and the ball spin frequency (BSF).
The range of values for these defect frequencies and their harmonics is shown in Figure 5.
The range of values for the defect frequencies and their first two harmonics is calculated using (1), (2), and (3), respectively.frequency,   is the outer defect frequency,   is the cage frequency, and   is the roller defect frequency.

Feature Selection.
Although a high dimensional hybrid feature vector is highly desirable to capture the characteristics of different types of defects, the diagnostic performance of the proposed method can be degraded by potentially irrelevant and redundant features.Moreover, a high dimensional feature vector entails an increased computational cost during feature extraction and classification, which involves the calculation of distances and densities between different samples [25][26][27].Hence, the original feature vector is evaluated to determine the set of optimal features that would yield the best diagnostic performance and reduce the computational cost of the proposed method.
In this study, sequential forward selection (SFS) is used for feature selection, which is a simple and fast greedy search algorithm.It starts with an initially empty set,  = 0, and then iteratively selects the most significant feature from the original set with respect to the set, .This is done by first selecting a feature from the original set and then adding it to the set, , only if the newly selected feature maximizes the value of the objective function for the set, .The feature is discarded and the process moves to the next feature, if the selected feature decreases the value of the objective function for the set, .The objective function for SFS is given by ( 4), which is basically the ratio of interclass separability to intraclass compactness [25].The interclass separability is given by the interclass distance  inter class , whereas  intra class is the intraclass compactness.Although SFS is simple, efficient, and reasonably accurate, it has its own disadvantages.It suffers from the nesting problem; that is, a feature retained once cannot be discarded, which can result in suboptimal feature selection [28][29][30].As shown in Figure 6, the proposed -NN first calculates the membership probabilities for the unknown test samples using probabilistic -NN, which uses Euclidean distance as a  measure of similarity.The probabilistic -NN does not assign any class labels to the test samples; instead it only calculates their membership probabilities for all the classes.
If, for each class, the membership probability of a test sample is less than 1.0, then the output of the majority rule is ignored and the final membership of the test sample is determined using the LOF value, as shown in Figure 7.

Calculating the Local Outlier Factor (LOF).
The local outlier factor (LOF) has been used for the detection of outliers or anomalous data points [22], which have relatively lower probabilities of being members of any class.An unknown sample is classified by comparing its density with that of its neighbors.Points with densities like their neighbors are classified accordingly; that is, points with lower densities are labeled according to their neighbors with lower densities, whereas points with higher densities are labeled according their neighbors with higher densities.The LOF can be calculated as follows: (i) First, the calculation of the distance of every data point "" to its th nearest neighbor (i.e.,    is calculated), for  = 3, is illustrated in Figure 8(a).
(ii) Second, for each data point "", its reachability distance with respect to the data point "" (i.e.,  ,  is calculated) is the true distance between points "" and "" with a minimum value of    , as illustrated in Figure 8(b).It can be calculated as follows: (iii) Third, for each data point "", its local reachability density (i.e.,    is calculated) is defined as the inverse of its average reachability distance from its "" nearest neighbors, as given in (6).The value of "" is set to 16, as given in Table 3: (iv) Finally, for each data point "", its local outlier factor or LOF value is determined, by comparing its local reachability density to that of its "" nearest neighbors using the following relation: The LOF values for all the training samples are computed using (7) during the training phase.The unknown test samples are classified based upon the similarity of their LOF values to that of their neighbors.

Results and Discussion
In this section, a discussion of the experimental results achieved by the proposed method for bearing fault diagnosis is provided.As mentioned earlier, four datasets are used to test the proposed method, details of which are given in Table 1.The method uses the enhanced -NN classifier, which has been proposed to address the limitations of traditional -NN.The enhanced -NN classifier was used with the parameters given in Table 3.To demonstrate the effectiveness of the proposed k-NN classifier, the classification of inner race fault samples from dataset 1 is illustrated in Figure 9, using both the traditional and proposed -NN classifiers with neighborhood sizes of 3 and 7 (i.e.,  = 3 and  = 7).The samples shown inside the red ellipse are to be classified; their true label is "inner race fault" (i.e., these samples belong to the inner race fault class).However, the classification result of the traditional -NN classifier varies with the value of  (i.e., for  = 3); it correctly classifies these samples as inner race fault samples, whereas, for  = 7, it classifies these as outer race fault samples, which is incorrect.It happens because traditional -NN uses the majority rule to decide the class label for an unknown test sample.In this particular case, among the nearest three neighbors of these unknown test samples, two are inner race fault and one is outer race fault.Hence, for the case of  = 3, they are correctly classified as inner race fault samples.However, among the nearest seven neighbors of these unknown test samples, four are outer race fault and three are inner race fault.Hence, for the case of  = 7, they are incorrectly classified as outer race fault samples.In contrast, the proposed -NN always classifies these samples as inner race fault samples, irrespective of the size of neighborhood (i.e., the value of ).
The proposed -NN classifier correctly classifies these unknown test samples because it uses the LOF, which is a density based similarity measure.LOF is used only when the nearest neighbors of a given test sample do not belong to Shock and Vibration the same class (i.e., the vote is not unanimous).Therefore, the class membership probabilities for the unknown test samples are determined.In this particular case, for  = 3, the probability that a given test sample is a member of the inner race fault is 66.7%, and the probability that it belongs to the outer race fault is 33.33%.Since both class membership probabilities are less than one, the proposed -NN classifier employs the LOF values of the unknown test samples and their neighbors to determine the final class labels.This is demonstrated in Figure 10 Similarly, when  = 7, the probability that a given test sample is a member of the inner race fault is 42.86%, and   Likewise, for other datasets and fault types, this is how the proposed -NN classifier improves the classification accuracy of traditional -NN.It is clearly evident in Figure 11, which compares the performance of these two classifiers in terms of average classification accuracy, and Table 4, which lists the classification accuracies for each dataset and individual fault type.Moreover, it can also be observed that the accuracy of the proposed k-NN is not affected by the neighborhood size, , whereas the accuracy of traditional -NN varies with variations in the neighborhood size, .It achieves a maximum accuracy for  = 3.
The size of the optimal neighborhood, which maximizes the classification accuracy of traditional -NN, has to be determined on a case to case basis.There are no general rules that work equally well in all situations and for all classes, which can be challenging as it makes the whole process computationally expensive and inflexible.The robustness of the proposed -NN to variations in the neighborhood size, , makes it more flexible and efficient to use.It delivers better and steadier performance.Moreover, in multiclass problems like the one considered in this study, where the densities of different classes vary, traditional -NN performs poorly as it

Conclusion
In this paper, an enhanced -nearest neighbor (-NN) classification algorithm was presented, which employs both density and distance based similarity measures to improve the diagnostic performance in bearing fault diagnosis.The density based similarity measure, LOF, was used to boost the classification performance of traditional -NN, which deteriorates in case of overlapping samples, outliers, and multiple classes that show different feature distributions.Moreover, the distance based similarity measure makes the classification performance of traditional -NN highly susceptible to the neighborhood size, .These limitations were addressed through the use of both distance and density based similarity metrics, between the training and test samples.Using the enhanced -NN classifier, the diagnostic performance of the proposed bearing fault diagnosis scheme was significantly improved, and the results were more robust to variations in the neighborhood size, .

Figure 1 :
Figure 1: Limitation of classical -NN using only the distance based similarity measure.

Figure 2 :Figure 3 :
Figure 2: (a) The fault simulator with a three-phase induction motor, a wide-band acoustic sensor, gearbox, and (b) a PCI-2 based system for AE data acquisition.

Figure 4 :
Figure 4: The proposed methodology for bearing fault diagnosis.

Figure 5 :
Figure 5: Fault frequency regions up to three harmonics at (a) inner, (b) outer, and (c) roller fault frequency.
= 1.0 for class A and mp = 0.0 for class B mp = 0.0 for class A and mp = 1.0 for class B For example, mp = 0.4 for class A and mp = 0.6 for class B LOF = 2.5 for class A LOF = 5.5 for class B Sample is identified as class A using distance similarity rule

Figure 7 :
Figure 7: Classifying a test sample using the enhanced -NN classifier.

Figure 9 :
Figure 9: Classification of inner race fault samples from dataset 1 using the traditional k-NN classifier (a) with  = 3 and (b) with  = 7 and using the proposed -NN classifier (c) with  = 3 and (d) with  = 7.
, which shows the LOF values for the test samples and their nearest neighbors.The LOF values of the test samples for outer race fault class are 5.09, 5.069, and 4.979, whereas, for the inner race fault class, their LOF values are 3.33, 3.399, and 3.192, respectively.If the LOF values of these test samples for both the outer and inner race fault classes are compared to the LOF values of their nearest training samples, it can be observed that the LOF values of the test samples for inner race fault are similar to the LOF values of training samples from the inner race fault class.Hence, it can be argued that these test samples are outliers to the outer race fault class and inliers to or members of the inner race fault class.

Figure 10 :
Figure 10: The classification of unknown test samples using LOF based density similarity measure.

Figure 11 :
Figure 11: Performance comparison of traditional k-NN and the proposed enhanced k-NN in terms of average classification accuracy: (a) dataset 1, (b) dataset 2, (c) dataset 3, and (d) dataset 4.

Table 1 :
Datasets for the proposed bearing fault diagnosis methodology.

Table 2 :
Statistical measures calculated over the time and frequency domain AE signal.
4Peak-to-peak value(PPV) PPV = max (  ) − min (  )  ∑ =1 (  −   ) 2where  sidebands is the number of sidebands,   is the operating frequency,  rate is the error rate,   is the inner defect

Table 3 :
Values of various parameters for the enhanced -NN classifier.

Table 4 :
Diagnostic performance of the two classifiers for different fault types and datasets.