A Novel Clustering Method Combining ART with Yu ’ s Norm for Fault Diagnosis of Bearings

Clustering methods have been widely applied to the fault diagnosis of mechanical system, but the characteristic that the number of cluster needs to be determined in advance limits the application range of the method. In this paper, a novel clustering method combining the adaptive resonance theory (ART) with the similarity measure based on the Yu’s norm is presented and applied to the fault diagnosis of rolling element bearings, which can be adaptive to generate the number of cluster by the vigilance parameter test. Time-domain features, frequency-domain features, and time series model parameters are extracted to demonstrate the faultrelated information about the bearings, and then considering the irrelevance or redundancy of some features many salient features are selected by an improved distance discriminant technique and input into the proposed clustering method to diagnose the faults of bearings.The experiment results confirmed that the proposed clusteringmethod can diagnose the fault categories accurately and has better diagnosis performance compared with fuzzy ART and Self-Organizing Feature Map (SOFM).


Introduction
In order to decrease the downtime on production machinery and to increase reliability against possible failures, some important machinery is equipped with condition monitoring systems, but how to be intelligent to classify the data samples collected by the condition monitoring system is challenging.Artificial neural networks (ANNs) used as an intelligent classification tool have been widely applied in the fault diagnosis field of machine conditions which are treated as classification problems based on learning pattern from empirical data modeling in complex mechanical processes and systems [1].But in the diagnosis process some ANNs are unable to detect unexpected fault changes.In other words, if the trained ANN has not learned a new fault category at the stage of training, the trained networks cannot identify it when the fault category occurs.In this case, these networks are required to retrain and learn the knowledge using the complete data sets; the trained network model must be modified to learn the new category.Thus, the previously learned knowledge is forgotten and the memories of prior training are destroyed.This can result in a time-consuming and costly process.And in reality, it is impossible to obtain all the data set representing the features of all fault categories.These characteristics limit the application of these neural networks in fault diagnosis field.In order to solve this problem, the adaptive resonance theory (ART) network has been developed and applied to the field of pattern recognition and fault diagnosis.It is designed according to the adaptive resonance theory to overcome the stability-plasticity dilemma; that is, its learning system is able to protect useful historical data from the corruption (stability) while simultaneously learning new data (plasticity) [2,3].Owing to the advantages of ART network, some ART models such as ART2 and fuzzy ART have been developed and applied in some specified fields [4][5][6].Furthermore, on the basis of advantage of ART some classification methods about the combination of ART and neural network, such as the CNN-ART algorithm [7] and ART-KOHONEN neural network [8], have been suggested to be applied in the field of fault diagnosis, which can be adaptive to expand the 2 Shock and Vibration knowledge continuously without the loss of the previous knowledge during learning new knowledge.
Currently clustering methods, owing to their superiority in independency of supervisors, have been widely studied and applied to the field of fault diagnosis.According to the principle that similar objects are within the same cluster and dissimilar objects are in different clusters, most of clustering methods mainly employ the similarity and distance measure to partition a dataset into several clusters, such as  nearest neighbor (KNN) and fuzzy -means (FCM) [9][10][11].Recently a novel clustering method using similarity measure derived from Yu's norm is also developed to classify medical datasets [12], which can deal with the uncertainty problem through the fuzzy formalism.But prior to classification, using these methods, the number of cluster nodes must be determined in advance.However, in the real world, it is rather hard to predict the number of classes.This is especially true in industry application where the mechanical equipment operates in a dynamic environment and under the influence of numerous uncertainty factors.Thus, these characteristics limit the application range of these clustering methods.
To the best of our knowledge, the clustering method using similarity measure based on the Yu's norm is seldom applied in the fault diagnosis of mechanical system.And in the fault diagnosis application when the samples of different fault classes overlap in some regions in the feature space, the traditional hard (crisp) clustering method mainly uses distance to compare data sample to fault classes and classify the data sample into one and only one cluster [13], these can result in the misclassification.But in fuzzy clustering the data sample is subject to one cluster with a certain grade of membership, and the clustering method based on similarity measure makes classification decision by comparing how similar the data sample is to the class vectors.Therefore, it is preferable to make use of the similarity measure based on the Yu's norm to develop a new diagnostic method which is capable of learning from the process data steam by identifying the different fault categories automatically; namely, the novel diagnostic method can be adaptive to generate the number of cluster nodes according to the number of faults in real time.Based on the elaboration mentioned above, a novel clustering method combined ART with similarity measure (ART-similarity) based on Yu's norm is proposed to diagnose the faults of rolling element bearing.
The rest of the paper is organized as follows.The review of adaptive resonance theory and clustering method using similarity measure based on Yu's norm is introduced in Section 2. Section 3 describes the proposed ART-similarity clustering method based on Yu's norm.Section 4 explains the diagnosis system using the proposed clustering method; namely, many feature parameters are extracted through the signal processing methods to depict the related-fault information, and some salient features are selected by the modified distance discriminant technique to be input into the ART-similarity clustering method to diagnose the faults of bearings.Finally, the conclusions about the proposed clustering method in the field of fault diagnosis are given in Section 5.

Review of ART and Clustering Method Using Similarity Measure Based on Yu's Norm
Adaptive resonance theory (ART) which is treated as a theory of human cognitive information processing was designed by Carpenter and Grossberg in 1976.ART network is an online learning system, and it mainly utilizes the self-organization to develop the stable and plastic clustering of input samples; namely, the ART network uses the vigilance test to resolve the stability-plasticity dilemma.In the learning process when a new sample is input into the ART network, it can attempt to categorize the sample by comparing it with the stored weight vectors of existing cluster node which represented a category.If the sample is similar to the existing categories and the match degree is greater than or equal to the vigilance value, the sample is classified into the specified category and the weight vectors of the corresponding cluster node are modified.Otherwise, a new category is created without affecting the existing memory.Its detailed dynamic character and algorithm can be seen in [4].
Because the fuzzy relation can be called a similarity relation [14], some clustering methods using similarity measure based on fuzzy set theory are developed.In 2007, a similarity measure based on Yu's norm is constructed and the corresponding clustering method is presented by Luukka [12].Yu's norms, namely, Yu's -norm and Yu's -norm, in fuzzy logic, are written as follows, respectively: in which  > 1.According to the equivalence relations from -norms and -norms and negation [7], the similarity measure can be derived by the following: (, ) =  (Sn (, ) , Sn (, )) .
Thus, the data samples can be clustered by the similarity measure.The clustering algorithm is described as follows.Assume that a set  of objects need to be classified into  different classes  1 , . . .,   by  different feature parameters which depict the information about the object, and these feature parameters are normalized so that these objects are vectors that belong to [0, 1]  .Then, the weight vector   = (V  (1), . . ., V  ()) ( = 1, . . ., ) which presents the class  can be obtained by calculating the arithmetic mean of some sample set   that are known to belong to class .Once the weight vector is determined, the decision to which class the sample vector  ∈  belongs can be made by comparing it with the weight vector of each class.The comparison is made through computing the similarity degree based on Yu's norm, which can be written as follows: ⟨, ⟩ = max (0, (1 + ) (Sn (, ) + Sn (, ) − 1 − Sn (, ) Sn (, ))) ,  where  = 1 −  for all  ∈ [0, 1]  .Also,  > −1, affecting the classification accuracy.And as  is smaller, the classification accuracy is higher generally.Without the loss of generalization here it is set as 0.6.Thus, that sample  which belongs to   can be determined by the following:

ART-Similarity Clustering Method Based on Yu's Norm
The ART neural network and the clustering method using similarity measure based on Yu's norm have their respective advantages as has been noted.The proposed ART-similarity clustering method is the synthesized product of their respective advantages.Its architecture which is shown in Figure 1 is similar to that of the fuzzy ART excluding the adaptive filter that can be adaptive to adjust the number of clusters by the vigilance parameter .It is mainly comprised of input layer storing the input samples, comparison layer receiving bottom-up input from input layer and top-down input from discernment layer, and discernment layer containing the active category and storing category nodes.Each normalized input vector  is denoted as a 2 dimensional vector (, 1 − ), where  is the normalized data sample and  is the dimensional vector.The weight vector associated with category node in discernment layer is represented as   ( = 1, . . ., ).Since the proposed clustering method is on the basis of similarity measure, the categories generated are merging of the similar samples.Initially, the weight vector is set to zero vector, and the number of category node is set to zero.For every input sample except the first sample, the categorization with the ART-similarity clustering method is performed by category choice, vigilance test, and learning.
(1) Category Choice.The purpose of this stage is to select the winner category node from all existing category nodes.The choice function is a similarity measure between the th category and the th input sample, by ( 4), which is represented as where   and   are the th input sample and the weight vector of the th category node, respectively.The winner category  which has the biggest similarity degree is selected by the following equation: (2) Vigilance Test.In order to determine whether the th input sample matches with the selected winner category  in this stage, a vigilance parameter  which is introduced as the evaluation criterion of similarity is used to test the similarity degree to which   is a subset of category   .If the similarity degree meets the vigilance criterion where  ∈ [0,1], it indicates that the input sample   is sufficiently similar to the selected winner category   , the sample   is classified into the th category, and the learning is also performed.Otherwise, a new category node is generated to contain the input sample in the discernment layer; correspondingly the weight vector of the category is given by the following formula: where  is total number of current category node.
(3) Learning.When the selected category satisfied the vigilance criterion, the weight vector of the current category is updated by the following equation, that is, learning: where    is the enhanced weight vector of category ,   is the origin weight vector of category , and  is the number of samples that belong to the category .

Diagnosis System Using ART-Similarity
Clustering Method The test bearing type is 6205-2RS JEM SKF, deep groove ball bearing.The single point defects are introduced into the drive-end bearing of the motor by the electrodischarge machining.Four different defect diameters (0.007, 0.014, 0.021, and 0.028 inch) are introduced into the balls to simulate different fault severity of bearings; 0.014 inch defect diameter is introduced into the inner race and outer race, respectively, to simulate the different fault categories of bearings, and these defects' depth is all 0.011 inch; each bearing is tested under four different loads (0, 1, 2, and 3 hp) and rpm ≈ 1800.Thus, the bearing data sets can be obtained from the experimental system under different operation loads and seven different fault conditions: (1) in normal condition; (2) with outer race fault; (3) with inner race fault; (4) with four different severity levels of ball faults.

Features Extraction.
Feature parameters are mainly utilized to depict the fault-relate information about the bearings.To acquire more information many different feature parameters are extracted from the vibration signals.
Statistical feature parameters in time domain and frequency domain are often used to characterize the shape of vibration signal from different perspectives.In this study nine time-domain feature parameters and seven frequencydomain feature parameters are extracted and used as the basis for the fault diagnosis of bearings, which are listed in Table 1.
Time series model can characterize the dynamic process of mechanical system.Because of sensitiveness of these model parameters to the shape of vibration data, these parameters are also used as feature parameters to demonstrate the faultrelated information about the bearing.Autoregression (AR) model which is the basis time series model can work as predictor; its basic expression can be written as follows: where  −1 ,  −2 , . . .,  − are the  previous samples,   is the predicted sample of the signal   , and  1 ,  2 , . . .,   is AR model parameters.The algorithm that these AR model parameters are obtained is described concretely in [16].Here the parameter  is set as 16.
Thus, an original feature set containing 32 feature parameters is obtained, which can preserve fault-related information that cover time-domain, frequency-domain and waveletdomain.

Features Selection.
When the above 32 features are used as the input of the proposed ART-similarity clustering method to diagnose the bearings, there is a possibility that the diagnosis accuracy decreases and the computation time is increased because of the redundancy or irrelevance of some features.In order to improve the diagnosis performance, some sensitive features providing characteristic information for the diagnosis system need to be selected, and irrelevant or redundant features must be removed.Here, the distance discriminant technique [17] is adopted to select the salient features from the original feature set.Considering the overlapping degree among different classes, an improved version is proposed.
Assume that a feature set of  classes consists of  samples, and in the th class there are   samples, where  = 1, 2, . . ., , and  = ∑  =1   .Each sample is represented by  features, and the th feature of the th sample is written as    .The feature selection process can be described as follows.
Step 1. Calculate the standard deviation and the mean of all samples in the th feature: Step 2. Calculate the standard deviation and the mean of the sample in the th class in the th feature, respectively:

𝑛
Average frequency that is wave shape of signal crosses the mean of time-domain signal Stabilization factor of wave shape

󵄨 󵄨 󵄨 󵄨 𝑥 𝑖 󵄨 󵄨 󵄨 󵄨
where   is a signal series for  = 1, 2, . . ., , and  is the number of data points.
Step 3. Calculate the weighted standard deviation of the class center   in the th feature: where is the center of all samples in the th feature,    is the center of the samples of the th class in the th feature,  1 and  2 are the weighted mean of the squared class center  2  and the class center   in the th feature, and   is the prior probability of the th class, which can be calculated by the formula   =   / ∑  =1   ,  = 1, 2, . . .,  and ∑  =1   = 1.
Step 4. Calculate the distance of the th feature between different classes: Step 5. Define and calculate the variance factor of    in the th feature as follows: Step 6. Calculate the distance of the th feature within classes: Step 7. Define and calculate the variance factor of    in the th feature as follows: Step 8.The compensation factor of the th feature can be defined and calculated as follows: Step 9. Calculate the modified distance discriminant factor of the th feature: where   is used to control the impact of    .
Step 11.Set a threshold value , and select the sensitive features whose modified distance discriminant factor   ≥  from the set of  features.
Further, in order to demonstrate the superiority and character of the improved distance discriminant technique, a numerical example to compare the improved distance discriminant technique with the original distance discriminant technique is presented in the appendix.

Fault Diagnosis.
In the phase of fault diagnosis some data samples of bearings are utilized to evaluate the performance of the proposed method, the data samples contain seven different fault conditions, the fault conditions are labeled by Arabic numerals (1, 2, 3, 4, 5, 6, 7), respectively, and each fault condition contains 25 data samples.The detailed description is shown in Table 2.
The detailed fault diagnosis flow chart of the proposed method is shown in Figure 4. First, data samples are preprocessed to obtain 36 feature parameters.Second, to reduce the computation time and improve diagnosis accuracy the proposed improved distance discriminant technique is used to select the salient features from the original feature set. Figure 5 shows the modified distance discriminant factors   of all features.From the figure it can be seen that when the threshold  = 0.88, the number of selected salient features is 19.
Finally, the proposed ART-similarity clustering method based on Yu's norm is applied to the fault diagnosis of bearings.Its characteristics are training and test together.The 175 data samples are used for training and test.In the    is bigger than the set vigilance parameter , the input sample is classified to the first cluster, and the corresponding weight vector of the cluster node is modified by (7); otherwise, the second cluster node is produced.When the third input sample enters the model, it is compared with all the produced cluster nodes, and the cluster node that has the biggest degree of similarity is the winner.If the similarity degree meets the vigilance criterion, the sample is classified to the winner cluster, or else a new cluster node is produced.According to the above-mentioned reasoning, a trained classifier is obtained.
Generally, one fault class is needed to use many cluster nodes to learn because of the complex fault mechanism.To evaluate the performance of the proposed Yu's norm based on ART-similarity clustering method and understand the relationship of classification accuracy, the number of cluster nodes, and vigilance parameter, a series of fault diagnosis experiments with the increasing vigilance parameters are conducted.For convenience of computation, the classification accuracy can be obtained by the following formula [8]: where  is classification accuracy,  is the sample number of correct classification,  is the number of total samples, and  is the number of generated cluster nodes. −  which is the difference of the total sample number minus the training sample number means the number of used samples for test.Figure 6 shows the relationship of classification accuracy and the increasing vigilance parameters.It can be seen that the classification accuracy rises with the increasing vigilance parameter  from the figure, but it is not continuous.As  ≥ 0.999995 the classification accuracy reaches 100%.Figure 7    shows the relationship of cluster nodes number and the increasing vigilance parameters.The number of cluster nodes rises with the increasing vigilance parameters and each fault class can be composed of many cluster nodes.When  = 0.999999, the number of cluster nodes which covers 7 fault conditions of bearings is as high as 20.And from these two figures it can be seen that when  < 0.99999, the number of cluster nodes is about 10, but the classification accuracy is very low and is lower than 75%; when  > 0.99999 and rises, the classification accuracy and the number of cluster nodes all increase; when  > 0.999995, the classification accuracy is the highest and reaches 100%, and the corresponding number of cluster nodes is 15 and is the least relatively.Thus according to the relationship of cluster nodes number and vigilance parameter  and classification accuracy,  is set as 0.999995 here.
For convenience of understanding, Figure 8 and Table 3 show the classification result using all samples for the training and test when  = 0.999995.From the table and figure it can be seen that all samples are classified accurately, and each condition of bearing includes different cluster nodes number.Conditions 1, 3, 4, 5, and 6 only use one node to learn and test, respectively, but the condition 7, namely, very severe bearing ball fault, needs eight cluster nodes.This is mainly because the region of the cluster node becomes bigger or smaller depending on the space distribution of the data samples with the same condition.When the node region becomes small, the corresponding number of cluster nodes increases.

Performance Test of ART-Similarity Clustering Method
Based on Bootstrap Method.It is well known that the initial conditions affect the performance of ART-similarity clustering method.To study the stability and generalization of the  5.
From the table it can be drawn that the classification accuracy of the ART-similarity is the highest and reaches 100%, and the corresponding number of cluster nodes is the least and is 15.For the fuzzy ART and SOFM, the classification accuracy is 96.57% and 94.36%, and the corresponding number of cluster nodes is 79 and 76, respectively.These indicate that the proposed ART-similarity clustering method has superior classification performance comparatively.

Conclusions
In this paper a new clustering method that combines the adaptive resonance theory (ART) with the similarity measure based on Yu's norm is presented to diagnose the faults of rolling element bearings, which can generate the cluster nodes dynamically.Before application of the proposed clustering method to the fault diagnosis of bearings, time-domain statistical characteristics features, frequency-domain statistical characteristics features, and AR time series model param-eters are extracted to characterize the fault-related information of bearing.
Owing to the redundancy and irrelevance of some features the improved distance discriminant techniques are used to select the sensitive features, and then they are input into the proposed clustering method to diagnose the fault categories of bearings.The experiment result showed that the proposed ART-similarity clustering method can diagnose the faults of bearings successfully, and its diagnosis accuracy is higher than fuzzy ART and SOFM.And because the initial conditions affect the performance of the proposed clustering method, the bootstrap method is utilized to analyze the diagnosis results.The statistical analysis result shows that the proposed clustering method is stable and generalized.All these indicate that the proposed method has better diagnosis ability and performance and further demonstrate that the proposed clustering method has a good promise in the field of fault diagnosis of mechanical system.modified distance discriminant technique is superior to the original distance discriminant technique.

Figure 1 :
Figure 1: The architecture of ART-similarity classifier based on Yu's norm.

Figure 4 :
Figure 4: Fault diagnosis flowchart of the proposed method.

Figure 6 :
Figure 6: Classification accuracy with the different vigilance parameters.

Figure 7 :
Figure 7: Relationship of cluster nodes number and vigilance parameters.

Figure 8 :
Figure 8: Classification of all data samples.

Table 1 :
Time-domain and frequency-domain feature parameters. 3) where (  ) is power spectrum density which is obtained by the Welch method. is the number of spectrum lines used to calculate the parameters, and

Table 2 :
Statistics of each fault condition of bearing.

Table 3 :
Number of neurons presenting each condition.

Table 4 :
Statistical performance of ART-similarity clustering method.

Table 5 :
Comparison of classification with different neural networks.

Table 4 .
From the table it can be observed that the estimated statistical mean of diagnosis result is 98.95%, the standard deviation is 0.54%, and the 95% confidence interval achieves [97.87%, 99.38%].These all can indicate that the performance of the proposed ART-similarity clustering method is stable and generalized.