The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN) algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR) method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1) 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2) the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3) 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA) algorithm and a back propagation (BP) network is constructed, and (4) the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.
Feature extraction is key factor in pattern recognition because only sufficient and effective features can describe a given sample comprehensively and then differentiate between classes [
Feature selection is a necessary preprocessing step between feature extraction and pattern recognition. Its main purpose is to choose more sensitive features from the original multidimensional features as the subset that should maintain the same ability of recognition. To achieve this goal, several algorithms based on the principal component analysis (PCA), artificial neural network (ANN), genetic algorithm (GA), support vector machine (SVM), and pattern recognition theory-based algorithm are proposed. The PCA algorithm is the most common linear dimensionality reduction algorithm that can map multidimensional features into a space of lower dimension. Reference [
In this study, an interesting method called the mean impact variance (MIVAR) method is constructed to determine the feature that is more sensitive to classification. This method is obtained after the BP network training step by changing the magnitude of all the features separately. The feature with the larger MIVAR value is considered the better choice when the BP network is used as the classifier. To verify the effectiveness of this method, we use it to rank multidimensional time-domain features and select more representative features for a bearing fault diagnosis.
The rest of the paper is organized as follows: Section
MIVAR is a new method that can be used to select more representative features from multidimensional features. To specify the algorithm in detail,
First,
The
The network is simulated with these
The absolute value of the difference between the
The process is repeated from Step
The process is repeated from Step
The variance of the four MIVs of each feature is calculated for the four different states, and a method called MIVAR, which represents the fluctuation in the MIVs, is obtained. Consider
MIVAR is a proposed method that can determine the feature that is more suitable for classification. Thus, we should select a feature with a larger MIVAR as the one for final classification.
In this paper, the effectiveness of the MIVAR-based feature selection algorithm is proven by selecting more representative features for a bearing fault diagnosis using the data from the Bearing Data Center of Case Western Reserve University [
The raw signal is rounded to the nearest hundredth, and the original signal data are divided into
The
Probability density curves for four different running states: (a) NO, (b) IR, (c) RE, and (d) OR.
New features are extracted on the basis of the probability density curve. The corresponding
Schematic diagram for conceptual explanation.
While extracting the all waveform features of the training and testing sets, we find that the maximum value of
All waveform features with 70 dimensions.
In this section, the MIVAR-based feature selection algorithm is proven by ranking the aforementioned 70-dimensional all waveform features.
First, a network with a structure of
Feature MIVs in different states: (a) NO, (b) IR, (c) OR, and (d) RE.
In Figure
Top three MIV sequence numbers for different classes.
Ordinal number | Top NO sequence numbers | Top IR sequence numbers | Top RE sequence numbers | Top OR sequence numbers |
---|---|---|---|---|
1 | 33 | 1 | 1 | 1 |
2 | 32 | 2 | 3 | 2 |
3 | 9 | 10 | 6 | 3 |
In Table
Second, the corresponding MIVARs of every feature are calculated by (
MIVAR histogram of every feature.
In Figure
MIVAR ranks of 70-dimensional features.
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 33 | 1 | 32 | 9 | 3 | 38 | 4 | 28 | 35 | 19 |
|
||||||||||
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 40 | 10 | 2 | 11 | 23 | 7 | 47 | 12 | 6 | 31 |
|
||||||||||
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 17 | 8 | 22 | 29 | 25 | 34 | 13 | 46 | 16 | 49 |
|
||||||||||
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 44 | 39 | 14 | 20 | 15 | 43 | 30 | 41 | 27 | 42 |
|
||||||||||
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 24 | 50 | 58 | 18 | 21 | 5 | 57 | 37 | 45 | 52 |
|
||||||||||
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 26 | 54 | 55 | 48 | 53 | 36 | 51 | 59 | 61 | 62 |
|
||||||||||
Ranking number |
|
|
|
|
|
|
|
|
|
|
|
||||||||||
Sequence number | 56 | 60 | 63 | 64 | 66 | 67 | 65 | 68 | 70 | 69 |
Third, several comparisons are presented to prove the validity of the ranking results by constructing 14 groups of features as follows: Features 33, 1, 32, 9, 3, 38, 4, 28, 35, and 19, whose sequence numbers are the top ten; Features 40, 10, 2, 11, 23, 7, 47, 12, 6, and 31, whose sequence numbers are the second top ten; Features 17, 8, 22, 29, 25, 34, 13, 46, 16, and 49, whose sequence numbers are the third top ten; Features 44, 39, 14, 20, 15, 43, 30, 41, 27, and 42, whose sequence numbers are the fourth top ten; Features 24, 50, 58, 18, 21, 5, 57, 37, 45, and 52, whose sequence numbers are the fifth top ten; Features 26, 54, 55, 48, 53, 36, 51, 59, 61, and 62, whose sequence numbers are the sixth top ten; Features 56, 60, 63, 64, 66, 67, 65, 68, 70, and 69, whose sequence numbers are the bottom ten; new constructed 10-dimensional features with the top ten scores based on the PCA algorithm; new constructed 10-dimensional features with the second ten scores based on the PCA algorithm; new constructed 10-dimensional features with the third ten scores based on the PCA algorithm; new constructed 10-dimensional features with the fourth ten scores based on the PCA algorithm; new constructed 10-dimensional features with the fifth ten scores based on the PCA algorithm; new constructed 10-dimensional features with the sixth ten scores based on the PCA algorithm; new constructed 10-dimensional features with the bottom ten scores based on the PCA algorithm.
In detail, 14 new training sets and testing sets are generated to train and test a newly constructed network with the structure
Recognition rate of different groups.
Network input | Group 1 | Group 2 | Group 3 | Group 4 | Group 5 | Group 6 | Group 7 |
---|---|---|---|---|---|---|---|
Recognition rate (%) | 98 | 95 | 90 | 75 | 73 | 50 | 25 |
|
|||||||
Network input | Group 8 | Group 9 | Group 10 | Group 11 | Group 12 | Group 13 | Group 14 |
|
|||||||
Recognition rate (%) | 90 | 25 | 25 | 25 | 25 | 25 | 25 |
According to the comparison results listed in Table
Recognition rate of MIVAR-based features.
Recognition rate of PCA-based features.
Last, we display the 70-dimensional all waveform features in the order of the corresponding MIVAR value in Figure
Ranking results of all 70-dimensional all waveform features.
In this paper, a MIVAR method was proposed to determine the feature that is more suitable for ANN-based classification. The MIVAR values of all the features were calculated by changing the input vectors and then measuring the differences of the output vectors after the training process of the BP network. It was proven that using the features with higher MIVAR values can lead to higher recognition rates.
As an example, 70-dimensional all waveform features of a rolling bearing vibration signal were ranked based on the MIVAR method. The features with the largest ten MIVAR values can lead to a recognition rate of 98%, and the corresponding recognition rate of the second, third, fourth, fifth, sixth, and seventh largest ten MIVAR values are 95%, 90%, 75%, 73%, 50%, and 25%, respectively. This decreased recognition rate proved the effectiveness of the MIVAR method. To compare the effectiveness of the MIVAR method to the traditional algorithm, the PCA algorithm is then used to generate 7 groups of 10-dimensional features (Group 8 to Group 14). And the 10-dimensional features with the top ten scores can lead to a recognition rate of 90%, which is not as good as that for Groups 1 and 2.
In addition, it should be pointed out that the discussion is limited to the use of time-domain features to describe a steady vibration signal. Moreover, the MIVAR algorithm can be extended also to the selection of frequency-domain features.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported in part by the National Natural Science Foundation of China (51275030) and the “Fundamental Research Funds for the Central Universities M11JB00210.”