Fuzzy ARTMAP Ensemble Based Decision Making and Application

Because the performance of single FAM is affected by the sequence of sample presentation for the offline mode of training, a fuzzy ARTMAP (FAM) ensemble approach based on the improved Bayesian belief method is supposed to improve the classification accuracy.The training samples are input into a committee of FAMs in different sequence, the output from these FAMs is combined, and the final decision is derived by the improved Bayesian belief method. The experiment results show that the proposed FAMs’ ensemble can classify the different category reliably and has a better classification performance compared with single FAM.


Introduction
Recently, artificial neural networks (ANNs) have been widely used as an intelligent classifier to identify the different categories based on learning pattern from empirical data modeling in complex systems [1].For example, the BP, RBF, and SVM models have been developed quickly and utilized to classify the different fault classes of the machine equipment [2][3][4][5][6].However, these traditional neural network methods have limitation on generalization, which can give rise to overfitting models for training samples.To solve the problem, the fuzzy ARTMAP (FAM) neural network is created and applied to the classification field [7][8][9], which is an incremental and supervised network model and designed in accordance with adaptive resonance theory.Although the FAM is able to overcome the stability-plasticity dilemma [10], in real-world application, the performance of FAM is affected by the sequence of sample presentation for the offline mode of training [11,12].
For this drawback, some preprocessing procedures, known as the ordering algorithms such as min-max clustering and genetic algorithm [13,14], have been proposed for FAM.Furthermore, a number of fusion techniques have been proposed for FAM to overcome this problem.Tang and Yan employed the voting algorithm of FAM to diagnose the bearings faults [15], Loo and Rao applied the multiple FAM based on the probabilistic plurality voting strategy to medical diagnosis and classification problems [16].Since these voting algorithms do not consider the effect of the number of the sample in each class, an improved Bayesian belief method (BBM) is used to combine multiple FAM classifiers which are offline trained in different sequence of samples in this paper.
In view of the above principles, a novel ensemble FAM classifiers is proposed to improve the classification performance of single FAM.The identification schematic graph is shown in Figure 1.Firstly, through different features extraction methods, some feature parameters are extracted from the raw signals.Secondly, by the modified distance discrimination technique, the optimal feature set is selected from the original feature set.Finally, multiple FAM classifiers ensemble based on the improved BBM is employed to come up with the final classification results.The proposed method is applied to the fault diagnosis of hydraulic pump.The experiment results show the effectiveness of the proposed ensemble FAM classifiers.

Fuzzy ARTMAP Ensemble Using the
Improved Bayesian Belief Method  via a map field [10], which is capable to forming associative maps between clusters of input domain in which ART  module functions as clustering and output domain in which the module ART  functions as clustering.Each module comprises three layers: normalization layer  0 , input layer  1 , and recognition layer  2 .The structure of FAM is shown in Figure 2. When the output domain is a finite set of class labels, FAM can be utilized as a classifier.The algorithm of FAM can be depicted simply as follows.
The ART  module receives the input pattern, and the normalization of a -dimensional input vector a, is complement-coded to a 2-dimensional vector Then, the dimension of the input vector is kept constant: where ∧ is a min operator,   is the choice function of ART  , and    is the weight vector of the th category node.
When a winning category node is selected, a vigilance test (VT), namely, a similarity check against a vigilance parameter   of the chosen category node, is taken place: where    is the winning th node.When the above category match function (CMF) is satisfied with criterion, the resonance occurs and learning takes place; namely, the weight vector   is updated according to the following equation: where  ∈ [0,1] is the learning rate.Otherwise, a new node is created in   2 which codes the input pattern.In the meantime, for the ART  the same learning algorithm occurs simultaneously using the target pattern.
After the resonance occurs in the ART  and ART  , the winning node in   2 will send a prediction to ART  via the map field.The map field vigilance test is used to detect the test.If the test fails, it indicates that the winning node of ART  predicts an incorrect target class in ART  ; then a match tracking process initiates.During the match tracking, the value of   is increased until it is slightly higher than | ∧    ||| −1 ; then a new search for the other winning node in ART  is carried out, and the process continues until the selected   2 node can make a correct prediction in ART  .

Decision Fusion Using Bayesian Belief Method. The novel
Bayesian belief method is supposed in [17].It is based on the assumption of mutual independency of classifiers and considers the error of each classifier.Assume that in pattern space  there are  classes and  classifiers.A classifier   can be considered as a function: () = ,  = 1, 2, . . ., ,  ∈ {1, 2, . . ., ,  + 1} .
It signifies that the sample  is assigned to class  by the classifier   .And its two-dimensional confusion matrix can be represented as ] , ( = 1, . . ., ) ,  + 1 is an unknown label (7) which is obtained by executing   () on the test data set after   () is trained.Each row  corresponds to class   and each column  corresponds to   () = .The matrix unit    means the input samples from class   while are assigned to class   by classifier   ().The number of samples in class   is   .= ∑ +1 =1    , where  = 1, . . ., , and the number of samples labeled  by   () is   .= ∑  =1    , where  = 1, . . .,  + 1. Considering the difference of the number of samples in each class, on the basis of the confusion matrix a belief measure of classification can be calculated for each classifier by the following belief function [18]: When multiple classifiers  1 ,  2 , . . .,   are developed, their correspondent beliefs  1 ,  2 , . . .,   are computed based on the performance of base classifiers.Combining the belief measures of all fusion classifiers can result in the final belief measure of the multiple classifier system.In case of equal a priori class probabilities, the combination rule can be depicted as follows: , =1, 2, . . .,  + 1. (9)

Case Study
In order to evaluate the effectiveness of the supposed ensemble FAM, the fault identification of hydraulic pump is taken as example.Figure 3 shows the schematic diagram of experiment rig.Four accelerometers are attached to the housing with magnetic bases and mounted at the positions P1, P2, P3, and P4.Pressure sensor is mounted at the position P5.Considering the sensitivity to the fault conditions of hydraulic pump, the vibration signal which is acquired by the accelerometer at the position P2 is utilized to identify the fault categories.And the vibration signals are acquired, respectively, under normal condition and the different fault conditions, such as inner plunger wear, inner race wear, ball wear, swashplate wear, portplate wear, and paraplungers wear.

Data Preparation.
The data set contains 490 samples.These data samples are divided into 245 training and 245 test samples.The detailed descriptions of three data sets are shown in Table 1.In order to identify the different fault categories, a seven-class classification problem need be solved.

Feature Extraction and Selection
3.2.1.Feature Extraction.Feature parameters are used to characterize the information relevant to the conditions of the hydraulic pump.To acquire more fault-related information, many features in different symptom domains are extracted from the measured signals.
Frequency domain is another description of a signal.In [19], some novel features which can give a much fuller picture of the frequency distribution in each band of frequencies are proposed.Supposed  points of normalized PSD,   , of the vibration signal,   are divided into  segments, where  is 1 in this study.The four features based on the moment estimates of power can be obtained as follows: where "" is the number of total data points and   is the number of sample points in the lth segment.
In order to characterize the spectrum with a higher accuracy, the moment estimates of frequency weighed by power are calculated by the following formulas: where () is the corresponding frequency of   () and   is the total power in the segment.Then, the total number of features extracted for each spectrum is 1 × 8.
To depict the fault-related information about the hydraulic pumps quantitatively, the first-order continuous wavelet grey moment (WGM) [20] of vibration signal is extracted.Assuming the wavelet coefficients matrix [] × which can be displayed by the continuous wavelet transform (CWT) scalogram,  and  are the scales and the time of the scalogram, respectively, the matrix [] × is divided into  parts along the scale equally, and the first-order wavelet grey moment  1 of each part can be calculated by the following equation: where   is the element of matrix [] (/)× .In this paper, the  is set as 8 and the wavelet function is Morlet wavelet.
In addition, due to sensitiveness of these model parameters to the shape of the vibration data, AR model parameters are utilized to characterize the information about the conditions of hydraulic pumps.The AR model is written as follows: where  −1 ,  −2 , . . .,  − are the  previous samples,   is the predicted sample of the signal, and  1 ,  2 , . . .,  − is AR model parameters, which can be obtained by the least square method in [21] and expressed by the following formula: where In this study, the parameter  is set as 8. Thus, 24 features constitute the original feature set.

Feature Selection.
In order to improve the identification accuracy and reduce the computation burden, some sensitive features providing characteristic information for the classification system need to be selected, and irrelevant or redundant features must be removed.In this study, based on [22], a modified distance discriminant technique is employed to select the optimal features.Supposing that a feature set of  classes consists of  samples, in the th class there are   samples, where  = 1, 2, . . ., , and  = ∑  =1   .Each sample is represented by  features, and the th feature of the th sample is written as    .Then, the feature selection process can be described as follows.
Step 1. Calculate the standard deviation and the mean of all samples in the th feature: Step 2. Calculate the standard deviation and the mean of the sample in the th class in the th feature, respectively, Step 3. Calculate the weighted standard deviation of the class center   in the th feature: where , and   are the centers of all samples in the th feature;    is the center of the samples of the th class in the th feature;  1 ,  2 are the weighted means of the squared class center  2  and the class center   in the th feature;   is the prior probability of the th class, respectively; and ∑  =1   = 1.
Step 4. Calculate the distance discriminant factor of the th feature: where    is the distance of the th feature between different classes,    corresponds to the distance of the th feature within classes, and  is used to control the impact of    , which is set as 2 in this paper.
Considering the overlapping degree among different classes, a compensation factor is calculated as follows.
Firstly, define and calculate the variance factor of    in the th feature as follows: Secondly, define and calculate the variance factor of    in the th feature as follows: Then, the compensation factor of the th feature can be defined and calculated as follows: Thus, the modified distance discriminant factor can be calculated as follows: , ( = 1, 2, . . ., ) . ( Step 5. Rank  features in descending order according to the modified distance discriminant factors ()] =   ; then normaliz   by   = (  − min(  ))/(max(  ) − min(  )) and get the distance discriminant criteria.Clearly, bigger   ( = 1, 2, . . ., ) signifies that the correspondent feature is better to separate  classes.
Step 6. Set a threshold value  and select the sensitive features whose distance discriminant factor   ≥  from the set of  features.

Diagnosis Analysis.
It is well known that the data-ordering of training samples can affect the classification accuracy of single FAM, and that a single output used to represent multiple classes may lead to lower classification accuracy.In order to know how well the proposed FAMs' ensemble work, that is, how significant the generalization ability is improved by utilizing the improved Bayesian belief method to combine the classification results from a committee of single FAM trained with different data-ordering of training samples, the performance of single FAM is also conducted.
In the diagnosis phase performed by the single FAM and FAM ensemble, they are all trained in the fast learning and conservative mode (i.e., setting  = 1 in (5) and   = 0.001 in (3)).Besides, in order to ensure the performance of stabilityplasticity, the vigilance parameter of FAM is set as   = 0.5, and the ensemble size is set as 5.
In order to improve the classification accuracy and reduce the computation time, in each case some salient features are selected from each feature set by the modified distance discriminant technique, respectively, and then input into the five single FAM in different sequence in the process of training.Figure 4 shows the modified distance discriminant factor   of all features in the feature sets.From the figure it can be seen that the threshold  corresponding to the optimal features are different for the case.That is to say, the number of salient features is different.
Figure 5 summarizes the classification results in terms of test accuracy of single FAM and FAMs' ensemble.From the figure, it can be seen that the FAMs' ensemble (0.988) outperforms the single FAMs' in terms of accuracy.And the test accuracy is getting higher when the number of single FAM increases.These indicate that FAMs' ensemble can identify the different fault categories of hydraulic pump well.

Effect of Different Threshold for Feature Selection.
As shown in Figure 4, when the threshold value  is set properly, some redundant and irrelevant features can be removed from the original feature set.To test the effect of the proposed feature selection method based on the modified distance discriminant technique, a series of experiments is carried out against the threshold value , in which the parameter of the single FAM is the same as the above, and the size of ensemble FAM is set as 5.
Figure 6 lists the classification accuracy of five individual FAMs' and FAM ensemble against the different thresholds.
From the figure, it can be noticed that when  = 0 (original feature set), the test accuracy of single FAM and FAM ensemble is 0.824 and 0.845, respectively.The highest test accuracy of single FAM and FAMs' ensemble (0.915 and 0.988) is arrived synchronously when  = 0.8, where the optimal feature set is selected.However, when the threshold value continues to increase, the test accuracy of single FAM and FAMs' ensemble tends to decrease.And when threshold  > 0.9, the test accuracy of single FAM and FAMs' ensemble  is lower than that used in all features with threshold  = 0.This is mainly because the smaller number of features leads to the overfitting; namely, the drastic reduction of features can lead to a decrease in the test accuracy.

Classification Performance Comparison with
Other Classification Methods.In order to test the superiority of the proposed FAMs' ensemble method, the test results produced by FAMs' ensemble and single FAM are compared with those produced by other classification methods.In this experiment, the parameters of FAM ensemble and single FAM are the same as the above.
Table 2 shows the test results of the FAMs' ensemble versus other classification methods.From the table it can be seen that the average test accuracy using single FAMs' is the lowest.However, the test accuracy produced by two FAMs' ensemble methods is higher than that produced by the single classifier, and the test success rate of the proposed FAMs' ensemble is highest and higher than that of FAM ensemble with voting algorithm.These indicate that the proposed FAMs' ensemble has comparatively superior diagnosis performance.

Conclusions
The classification performance of FAM is affected by the sequence of training samples.A novel and reliable FAMs' ensemble based on the improved Bayesian belief method is described and proposed to improve the classification performance of FAM in this paper, which combines the output from a committee of FAM fed with different orderings of training samples and derives the combined decision.
And the supposed FAMs' ensemble method is applied to the fault identification of hydraulic pump.The experiment results testify that the proposed FAM ensemble can diagnose the fault categories accurately and reliably and has better diagnosis performance compared with single FAM.These indicate that the proposed FAMs' ensemble has a good promise in the engineering of classification and decision making.

8 Figure 4 :
Figure 4: Feature selection for three different data sets.

Figure 5 :Figure 6 :
Figure 5: Relationship between the test accuracy and the number of single FAM used in FAMs' ensemble.

Table 2 :
Test accuracy produced by different classification methods.