Association Rule-Based Feature Mining for Automated Fault Diagnosis of Rolling Bearing

,


Introduction
e machinery reliability is critical to the system operational safety.Rolling element bearing, as the widely used component in large mechanical system, plays an important role in ensuring the availability of machineries such as aircraft engine, wind turbine, and compressor [1].Due to the harsh operating conditions (e.g., high speed, heavy load, and great heat), bearings may lead to a sudden and catastrophic failure [2].If it fails to diagnose earlier, bearing fault may incur great losses, and even a terrible accident.Effective algorithms for bearing defect diagnosis and prognosis are demanded and have remained an active research field to increase system reliability [3].
Fault diagnosis is the problem of detecting the potential faults hidden in the observed instances that are related to specific application domains [4].Extensive efforts have been made by taking a forward approach for bearing fault diagnosis including feature extraction, feature selection, and fusion, and then fault diagnosis modeling.Various signal processing techniques including wavelet transform [5][6][7], empirical mode decomposition [8], order tracking [9], and spectral analysis [10,11] have been investigated for incipient defect feature extraction and diagnosis.An integrative algorithm of sparse coding and online dictionary learning is developed in [2] to extract impulse features for machinery fault detection.en, the extracted features are selected or fused to build a data-driven model based on artificial intelligence techniques including neural network [12,13], support vector machine [14][15][16], and fuzzy c-means [17,18] for bearing fault classification.In such a forward approach, feature extraction plays a key role in the model for bearing defect diagnosis.e feature extraction is usually performed empirically based on prior experience.us, it lacks a systematic manner for bearing defect signature analysis.
To address the above issues, the association rule mining method provides a new approach for bearing defect signature analysis [19].It takes an inverse approach to search the relevance and association of large information and has been investigated in commerce [20,21], traffic [22], tourism [23], biomedical applications [24,25], power plant equipment diagnosis [26], and the analysis of telecommunication networks [27].In the association rule mining technique, data discretization is a critical step to find the quantitative attributes from the relation tables of potential items.In terms of data discretization, the equal density discretization and equal width discretization are commonly used uniform partitioning approaches.However, such approaches neglect the probability distribution characteristics of these features.
us, it may make data unbalance, such as excessive concentration or decentralized data, and generate unsatisfactory association rules [28].
In line with the above challenges, this paper presents a new association rule mining method based on equal probability for bearing defect features analysis.First, a series of extracted features of signal data are discretized following the guideline of equalized probability distribution of the data in order to avoid excessive concentration or decentralized data.To evaluate the data discretization performance, the new criteria named information entropy of interval class is formulated.Next, the data matrix composed of arrays of discretized features and defect labels is exploited to generate the association rules representing the relation between features and fault types.e generated rules are then used for bearing defect classification based on fuzzy proximity methods and feature selection that generating the representative features related to typical defects.e related features selected by the proposed method can be used directly to analyze signals for fault classification and defect severity identification avoiding the impact of irrelevant features on the premise of keeping the original feature state.An experimental study is performed to validate the effectiveness of the presented method using the bearing test data provided by Case Western Reserve University (CWRU), and the experimental results reveal that the new method can effectively generate a series of underlying association rules for bearing fault diagnosis, and yields the best discretization performance and classification accuracy.e related features selected by the proposed method can be used directly to analyze bearing signals for fault classification and defect severity identification avoiding the impact of irrelevant features on the premise of keeping the original feature state.
e intellectual merits of this paper rest on two folds.(1) A new association rule mining method with the data discretization of equal probability distribution is firstly presented, and a new criterion of information entropy of interval class is also formulated.(2) e presented method, as an inverse approach for bearing defect signature analysis, provides a new tool to guide the feature extraction instead of empirical feature extraction in the current forward approach of bearing defect diagnosis.e rest of the paper is structured as follows.In Section 2, the theoretical framework is introduced including the association rule mining technique and the Apriori algorithm.
e proposed method is discussed in detail in Section 3. e information entropy criteria of interval class are also formulated to assess the performance of data discretization.In Section 4, the effectiveness of the presented method is demonstrated using bearing test data provided by CWRU.Finally, the conclusions are drawn in Section 5.

Related Work
Many studies published in the literature adopt association rule mining to find useful knowledge from database proactively.e discovered knowledge with association rules can be applied to information management, decision making, process control, and many other applications.

Association Rule Mining.
Association rule mining is a technique to detect and extract meaningful association relationships hidden in databases.It is firstly introduced by Agrawal et al. [34] and has been investigated in different applications including commodity sales [35], disease study [36], quality improvement of a production process [37], and alarm correlation analysis [38].e association rule mining method is formulated as follows.
Let I � {i 1 , i 2 , . .., i m } be a set of literals, referred to as items.Let D � {t 1 , t 2 , . .., t n } be a set of transactions.Each transaction t in D has a unique transaction TID and contains a subset of items I′ where I′ ⊆ I.An association rule is defined as an implication of the form X ⟹ Y where X, Y ⊂ I, and X ∩ Y � ∅. e support for an itemset X (supp(X)) is defined as the proportion of transactions in the transactions which contains the itemset.Itemsets with the minimum support are called large itemsets and all others small itemsets.e confidence of a rule is defined as conf(X ⟹ Y) � supp(X ∪ Y)/supp(X).
erefore, the association rule X ⟹ Y will satisfy: where σ and δ are the minimum support and confidence, respectively.Association rules are typically summarized as two steps.First, find all large itemsets that have transaction support above the minimum support.en each large itemset is used to generate the desired rules, which satisfies the minimum confidence constraint.

Apriori Algorithm.
As a classical algorithm, Apriori algorithm discovers the frequent itemsets which make enormous passes over the data.In the first pass, the frequent itemsets are calculated by finding the support of individual 2 Shock and Vibration items with the minimum support.In each subsequent pass, a seed set of itemsets which are found to be large in the previous pass are taken as the objects. is seed set is used to generate new potentially large itemsets, called candidate itemsets.e actual support is counted for these candidate itemsets during the passes over the data.At the end of the pass, the actual large candidate itemsets become the seed for the next pass.is process continues until no new frequent itemsets are found.e Apriori algorithm generates the candidate itemsets without considering the transactions in the database to improve the computational efficiency.

The Proposed Method
In bearing defect diagnosis, it is a challenge to analyze the bearing defect signatures in a systematic manner.In order to recognize bearing fault efficiently and accurately, this paper presents a new equal probability-based association rule mining method for the bearing defect signature analysis method.

Formulation of the Proposed Approach.
e framework of the presented method consists of four different modules including data acquisition, feature extraction, equal probability-based association rule mining, association analysis, and fault diagnosis as shown in Figure 1.First, it collects normal and fault signal data from the monitoring equipment and extract features in time and frequency domains from the obtained data.en, the extracted features are discretized and transformed into symbolic sequences.Next, the relation between discretized features and defect modes labeled is used to formulate the rules.Finally, the representative features related to typical defects are extracted and investigated based on the rules, which can not only be used to classify the bearing in different conditions, but also provide a guiding significance to traditional bearing fault diagnosis.
As a new feature selection method, it is different from other data dimensionality reduction methods, such as PCA, KPCA, and LLE, which need to transform the data matrix leading to bereave of the actual physical meaning of the selected features.e related features selected by the proposed method can be used directly to analyze the fault to avoid the impact of irrelevant features on the premise of keeping the original feature state.e range of eigenvalues obtained from the representative features can be used to determine the type and size of fault according to the respective values of features.
is makes it possible to assess bearing status directly from sensing measurements instead of relying on complex models in conventional bearing defect diagnosis.

Association Rule Mining Based on Equal Probability.
Typically, it is required that the data follow Boolean attributes, such as "0" and "1," for association rule mining [39].However, the relational tables in most business and scientific domains have the rich attribute forms such as quantitative attributes (e.g., age and income), while the Boolean attributes can be considered as a special case of quantitative attributes [39].To solve such a problem, a simple approach is to partition the values into intervals and then map each interval into a Boolean attribute.erefore, data discretization becomes an essential step for mining association rules to transform data from quantitative attributes to Boolean attribute.While inspired by symbolic aggregate approximation (SAX, a symbolic representation of sequential data) [40], this paper proposes a new association rule mining method based on equal probability distribution.According to the characteristics of data distribution, this method divides the sequence into several intervals according to the criterion of equal probability distribution.First, the sequence is standardized and the normalized sequence is subject to the Gaussian distribution, X∼N(0, 1).e equation is as follows: where B is the normalized array of A, μ is the mean of the sequence A, and σ is the standard deviation.
When the data obey the Gaussian distribution, the probability that the data points fall within the range of [a, b] is the area surrounded by the standard Gaussian distribution curve, as shown in Figure 2. e probability formula is described as According to the characteristics of the Gaussian distribution, the data can be graded in the form of the equal probability distribution.en, the data matrix consists of an array of features in different levels and a series of labels that represent different states.
Next, the association relation between the discretized features and labeled defect modes is drilled to formulate the rules.Finally, the sensitive features related to typical defects are extracted and investigated according to the rules.e presented method supports the bearing status assessment directly from sensing measurements instead of relying on complex models in the traditional fault diagnosis approach.

Feature Extraction and Discretization Effect Assessment
Method.A total of 12 commonly used bearing vibration signal features are extracted, including eight features from time domain and four features from frequency domain, as listed in Table 1.ese features are often used to depict the waveform, mutation, and distribution characteristics of the bearing vibration signal for bearing fault diagnosis.In order to compare the discretization performance of different methods and demonstrate the effectiveness of the discretization represented in this paper, the interval class-information entropy criteria is introduced as Typically, the interval class-information entropy reflects the category diversity of one interval.Normally, the larger the interval class-information entropy, the worse the discretization performance.

Shock and Vibration 3
In order to evaluate the performance of the proposed interval class-information entropy criteria, the common RMS-based indicator is introduced to compare the discretization performance of different methods.e indicator is described as follows: where BRI i is the RMS-based discretization performance indicator of evaluating different methods.A larger calculation result leads to more different kinds of data that exist in an interval, and the discretization method shows a worse performance.

Experimental Setup and Dataset.
e experimental data were provided by Case Western Reserve University [41], and the experimental setup is shown in Figure 3.A motor drives a shaft via a dynamometer and electronic control system.e test data used in this study come from deep groove ball bearings (6205-2RS JEM SKF) installed in the motor-driven mechanical system at the drive end of the motor.e failures of the test bearings were set using an electrodischarge machining (EDM) with single point faults.For each test, vibration data were collected through accelerometers attached to the housing with a magnet at the drive end with a sampling rate of 12,000 Hz. e motor speed is 1,797 r/min, and the theoretical shaft frequency is 29.95 Hz. ere are three types of fault datasets for outer and inner race faults and rolling element faults, and each fault is grouped into three categories according to the fault diameters of 0.007, 0.014, and 0.021 inches.Considering the normal state, there are 10 types of data samples in total.1, and the types of bearing running states are total of 10 types.e matrix is processed using the equal probability-based discretization representation technique to divide each column feature array into 10 intervals.To validate the performance of the proposed discretization method intuitively, the interval class-information entropy is introduced.e results of the discretization methods based on equal probability, equal density, and equal width are listed in Table 2.
Besides, the RMS-based criteria have been used to compare with the newly interval class-information entropy criterion and evaluate the performance of the newly proposed criterion.e results of the discretization methods based on equal probability, equal density, and equal width are listed in Table 3.
Figure 4 is a line chart more clearly showing the performance comparison of different discretization methods.We can see that the average value of the interval class-information entropy and the RMS-based indicator based on equal probability are smaller than the other two methods, which indicates the proposed method is conspicuous with a better discretization performance.We draw distribution maps related to the equal probability discretization method and the two other discretization methods based on equal density and equal width, as illustrated in Figure 5. Comparing with the other two methods, the waveform of the equal probability discretization method is most like the distribution of the raw data.
In addition, Support Vector Machine (SVM) is applied to discretization classification.ere are 15 sets of data in each type of failure, including seven sets of training data and eight sets of predictions.Table 4 shows the results of classification accuracy of the three different discretization methods.
e equal probability method also achieves the highest classification accuracy.
As a most basic discretization technique, evenly divided methods such as equal density and width approaches only consider the data boundary and neglect the overall distribution of the data, which facilitate excessive concentration or decentralized data, while the equal probability-based discretization method follows the guideline of equalized probability distribution of the data to naturally divide an array of features into reasonable sections, avoiding unbalancing each data extent and blindness caused by manually setting in traditional methods.

Representative Feature Mining of Bearing.
In the experiment, the 150 × 13 matrix introduced previously is still used for further analysis.e number of the intervals is still 10, so the grade is divided into 10 levels.For easier observation and analysis, we transform each section into a glyph with sign like "a 1 a 2 . . .b 1 . . .e 10 ."e alphabet presents a specific feature, and the numeric as sign indicates the level of the interval.ere are 12 column feature vectors, so "a-l" are used to represent features "f BPFO , f BSF , f BPFI , f x MV , x SD , x RMS , x RA , x CF , x S , x K , x KC ," respectively.us, the numerical matrix is transformed into a symbolic matrix.en, the symbolic matrix can be mined for generating a series of underlying association rules for bearing fault diagnosis by the Apriori algorithm [42].e minimum support and confidence are set at 0.07 and 0.9, respectively.Table 5 presents the mining results.
According to the mining results, a map of three bearing failures in different fault size is drawn to illustrate the range      Shock and Vibration of characteristic value intuitively and it can clearly reflect the sensitivity and relevance between the features and bearing defects with the fault degree changing as shown in Figure 6. e eigenvalues associated with the normal running state located in the low level interval.at is, the feature amplitude of the fault-free operation is relatively small.For roller fault, the eigenvalues are close to that of the normal state except that f BSF and x RA which have increased slightly.In addition, since the eigenvalues of the fault in 0.014 in.diameter have an unusual performance, the priority is given to the features of the faults in diameters of 0.007 and 0.021 in.as for the inner race defect, and the amplitudes of f BSF , f BPFI , x SD , x RMS , and x RA are much higher than the normal, of which f BPFI changes dramatically, while the other features f BSF , x SD , x RMS , and x RA change steadily.When the defect severities increase, the amplitude increases obviously.So do x K and x KC , although the changes are not so obvious.Similarly, it is easy to find that the features f BPFO , x SD , x RMS , x RA , and x CF are more sensitive to the outer race defect, while f BPFO fluctuates remarkably and is extracted as a representative feature.It is also found that with the increase of the outer race defect severities, the related features fluctuate distinctly instead of increasing steadily.e eigenvalues f BSF and x RA are selected as ball rolling element defect-related features and f BSF is chosen as a representative feature fluctuating regularly.Representative features are listed in Table 6 according to different bearing defects.en the range of eigenvalues in 0.007 and 0.021 in.fault is mined, as presented in Table 7, which can be used to find the type and size of the fault based on the respective values of the features.

Shock and Vibration
Fuzzy proximity is applied to validate the classification effectiveness of the rules mined by the proposed method, and the analysis results are shown in Figure 7. e fuzzy proximity classification accuracy of each type of bearing defect with the proposed method is generally higher than that of the other methods, and the average accuracy of the 10 types of bearing defects is calculated as shown in Figure 7(b).e proposed method gets the highest accuracy of 98.67%, which demonstrates that the equal probabilitybased association rule mining is an effective and excellent method.
According to Table 7, nine representative features related to various defects can be selected from 12 features, and the nine selected features are used as the indicators of data     8. e representative features selected by the proposed method achieve superior performance compared to the traditional feature selection methods based on data transformation.

Conclusions
To improve the reliability of rotary machinery, effective and efficient diagnosis methods are highly needed.ere is a new equal probability-based association rule mining method presented in this paper which provides an approach directly unearthing the underlying relation between labeled defects and unusual features for bearing fault analysis.In view of the shortcomings of the traditional evenly dividing methods used in association rule mining, this paper presents a new association rule mining approach based on the equal probability discretization method to avoid data excessive concentration or dispersion.First, a series of extracted features of signal data are discretized following the guideline of the equalized probability distribution of the data.en, the data matrix composed of arrays of discretized features and defect labels is exploited to generate the association rules representing the relation between features and fault types.e rules are used for bearing fault diagnosis and help take a bearing defect signature analysis in a systematic manner.e proposed method has been compared with two other evenly dividing discretization methods in the experimental study.Moreover, as a new feature selection method, it does not need to transform the data matrix leading to bereave of the actual physical meaning of the selected features compared to the traditional PCA, KPCA, and LLE dimension reduction methods.From the analysis and compare results, conclusions can be drawn as follows: (1) Discretization is the most important process of the quantitative association rule mining.Two types of methods, including interval class-information entropy criteria and SVM, are used to assess discretization performance.It turns out that the equal probability method possesses prominent superiority.(2) From the study, some features, which can be called representative features, show their sensitivity to the bearing defects such as the ball pass frequency, and inner race to the bearing inner fault.And the feature map drew by the proposed method intuitively illustrates the sensitivity and relevance between the features and bearing defects as the fault degree changes.However, there are also some special circumstances, such as the mining result of bearing rolling element defect.ere is no obvious characteristic except the inconspicuous relationship between ball (roller) spin frequency and root amplitude, which is partially due to the complex rolling mechanism.So, the proposed method also provides a new idea for feature selection which does not need to transform the data matrix leading to bereave of the actual physical meaning of the selected features.
(3) e effectiveness of the proposed method is confirmed by the fuzzy proximity method compared to common evenly dividing discretization methods.e presented method gets the highest classification accuracy.In addition, the proposed method also achieves superior performance compared to the traditional feature selection methods based on data transformation.

Figure 1 :Figure 2 :
Figure 1: e framework of association rule mining for bearing defect diagnosis.
the RMS-based index (b)

Figure 4 :
Figure 4: e performance comparison based on different discretization methods.(a) e average value of the interval class-information entropy.(b) e average value of the RMS-based indicator.

Figure 5 :
Figure 5: Performance comparison of features from time and frequency domains.

8
Shock and Vibration classification.SVM and BP neural networks are used as the classifier, respectively.en, based on the three types of fault datasets including outer race fault, inner race fault, and rolling element faults, the traditional dimensionality reduction methods including PCA, KPCA, and LLE are chosen to be contrast methods for reducing the data dimension from 12 to 9. e classification results are shown in Table

Figure 7 :
Figure 7: e classification accuracies of defect types using fuzzy proximity method.(a) e classification accuracy of each type of bearing defect and (b) the average accuracy of 10 types of bearing defects.
x i : Value of the i th point in the sequence n: Length of time series f r : Shaft speed N: Number of rolling elements D: Bearing pitch diameter d: Rolling element diameter ϕ: Angle of the load from the radial plane k: Number of intervals r i : Total number of samples in the i th interval c ij : Number of samples in the i th interval with the j th eigenvalue.

Table 1 :
List of extracted features.

Table 2 :
Interval class-information entropy based on different discretization methods.

Table 3 :
e indicator of the RMS-based criteria based on different discretization methods.

Table 4 :
Classification accuracy of SVM based on different discretization methods.

Table 5 :
Results of mining association rules.

Table 6 :
Representative features of different defects.

Table 7 :
e range of eigenvalues in different fault sizes.

Table 8 :
Performance comparison of fault classification with different feature selection schemes.