Improvement of Roller Bearing Diagnosis with Unlabeled Data Using Cut Edge Weight Confidence Based Tritraining

Roller bearings are one of the most commonly used components in rotational machines. The fault diagnosis of roller bearings thus plays an important role in ensuring the safe functioning of the mechanical systems. However, in most cases of bearing fault diagnosis, there are limited number of labeled data to achieve a proper fault diagnosis.Therefore, exploiting unlabeled data plus few labeled data, this paper proposed a roller bearing fault diagnosis method based on tritraining to improve roller bearing diagnosis performance. To overcome the noise brought bywrong labeling into the classifiers training process, the cut edgeweight confidence is introduced into the diagnosis framework. Besides a small trick called suspect principle is adopted to avoid overfitting problem.The proposed method is validated in two independent roller bearing fault experiment vibrational signals that both include three types of faults: inner-ring fault, outer-ring fault, and rolling element fault. The results demonstrate the desirable diagnostic performance improvement by the proposed method in the extreme situation where there is only limited number of labeled data.


Introduction
Roller bearings are one of the most commonly used components in rotational machines and their faults may lead to huge economic losses, environment pollution, and human casualties.Hence, the fault diagnosis of the roller bearing is vital to guarantee the smooth and safe functioning of the mechanical systems.
There are a great deal of researches on vibration-based fault diagnosis of roller bearings and several powerful diagnostic methods are available [1].Li et al. [2] presented an approach for motor roller bearing fault diagnosis using neural networks.Seryasat et al. [3] brought forward a ball bearing fault diagnosis method using fast Fourier transform (FFT) and wavelet energy entropy mean and root mean square (RMS).Peng and Chiang [4] used C4.5 decision tree and random forest algorithm to diagnose the fault of ball bearing of three-phase induction motor.Jin et al. [5] introduced a bearing fault diagnosis method using trace ratio linear discriminant analysis.And Liu et al. [6] proposed an extended wavelet spectrum analysis technique to achieve a more positive assessment of bearing health conditions.In fact, all these methods yield a rather excellent performance for fault diagnosis of different bearings.However, the data used in those methods are all labeled data, the kind that are already marked according to the bearing states.In the case of bearing fault diagnosis, however, the labeled data are quite expensive to obtain since they require human effort while large amount of unlabeled data is readily available.For a better practical value, the use of unlabeled data ought to be considered.Therefore, semisupervised learning, a technique that exploits unlabeled data plus few labeled data to train a good classifier, might be promising candidates in the area of roller bearing diagnosis when there is limited number of labeled data.
Good reviews [7,8] have given out a good review of semisupervised classification methods.Among them, generative models, self-training, and cotraining are three classic semisupervised learning methods.Generative models specify a joint probability distribution over observation and label sequences and thus are used for modeling data.Nigam et al. applied the expectation maximization (EM) algorithm, a classic generative model, on mixture of multinomial distribution for the task of text classification.And the result showed the classifiers performed better than those trained only from 2 Shock and Vibration labeled data [9].However, the generative model must be carefully constructed to reflect reality; otherwise unlabeled data that are supposed to help may actually hurt accuracy [10].Self-training is a technique where a classifier is first trained from the small amount of labeled data and then used to classify the unlabeled data that will be added to the training set for further retraining.Rosenberg et al. [11] applied selftraining to object detection systems from images and show the semisupervised technique compares favorably with a state-of-the-art detector.But self-training suffers from wrong labeling; note that the classifier uses its own predictions to teach itself [12].Cotraining, proposed by Blum and Mitchell [13], can be quite effective, where in the extreme case only one labeled point is needed to learn the classifier, which is utmost incredibly amazing [14].However, cotraining makes more than strong assumptions that (1) features can be split into two sets; (2) each subfeature set is sufficient to train a good classifier; and (3) the two sets are conditionally independent given the class on the splitting of features, which generally cannot be met in real life.To deal with this problem, Zhou and Li [15] proposed a cotraining style semisupervised learning algorithm named tritraining.In tritraining process, three weak classifiers are generated from the original labeled example set and are then refined using unlabeled examples.Tritraining neither requires the instance space to be described with sufficient and redundant views nor puts any constraints on the supervised learning algorithm.In addition, it possesses the merits of good efficiency and generalization ability.Tritraining has been successfully applied in Chinese chunking [16], biomedical named entity recognition [17], and web spam detection [18].With all these advantages and successful application in other areas, tritraining is supposed to be a promising method in bearing fault diagnosis too.However, the process of unlabeled data adopted in tritraining is the simplistic consistency principle.In detail, in each round of tritraining, an unlabeled example is labeled for the third classifier if the other two classifiers agree on the labeling, under certain conditions.This might undermine the performance stability of tritraining because the unlabeled data may often be wrongly labeled by both classifiers during the learning process [19].In order to overcome this problem, the cut edge weight statistic (CEWS) [20] is utilized to give the confidence of each predicted label of the unlabeled data.Only when the confidence is high enough can the predicted label be added to training set.With this problem solved by cut edge weight confidence (CEWC) plus all its merits, there is no doubt that tritraining will be a promising semisupervised algorithm for improvement of bearing fault diagnosis.
Hence to fully appreciate the large amount of unlabeled data of roller bearing and thus improve the performance of bearing fault diagnosis, this paper presents a roller bearing fault diagnosis method based on the combination of tritraining and CEWC.And the remainder of the paper is organized as follows.In Section 2, a detailed description of the methodologies used in this paper is presented.In Section 3, the experiment setup and relative information of two independent roller bearing fault datasets are presented.In Section 4, the results are presented.In Section 5, the results are discussed.And finally in Section 6, the conclusion of the research is given.

Tritraining.
Tritraining is semisupervised machine learning proposed by Zhou and Li [15].The procedure of tritraining is as follows.First three diverse classifiers are initially trained from the bagging samples from the original labeled example set.The diversity of the classifiers is guaranteed by the manipulation of the original labeled example set through a popular ensemble learning algorithm, that is, Bagging [21].Second, the three trained classifiers are used to predict the examples from the unlabeled set.Those who pass the consistency principle will be added to the labeled dataset.Third, the initial classifiers are updated and the process repeats.
Let  denote the labeled dataset with size || and  denote the unlabeled dataset with size ||.In standard tritraining algorithm, there are three diverse classifiers ℎ 1 , ℎ 2 , and ℎ 3 initially trained from the original .Then, for any classifier, an unlabeled example can be labeled for it as long as the other two classifiers agree on the labeling of this example.For example, if ℎ 1 and ℎ 2 agree on the labeling of an example  in , then  can be labeled for ℎ 3 .It is obvious that in such a scheme if the prediction of ℎ 1 and ℎ 2 on  is correct, then ℎ 3 will get a solid new instance for further training.Otherwise, ℎ 3 will get an example with noisy label.However, Zhou and Li [15] proved that, even in the worst case, the increase in the classification noise can be compensated if the amount of newly labeled examples is sufficient and the constraint condition (1) is met.
where   and  −1 are the set of examples that are labeled for a classifier by other two classifiers in the tth round and the ( − 1)th round, respectively.ê 1 is the upper bound of the classification error rate of those other two classifiers in the tth round.And   is the classification noise rate of ; that is, the number of examples in  that are mislabeled is   ||.
It is noteworthy that if the labeled examples are not sufficient or the constraint condition is not met, it is rather doubtable whether the benefits outweigh the drawbacks in case that an unlabeled example is wrongly labeled.Therefore, it is still necessary to measure the confidence of the labeling of each classifier.

Cut Edge Weight
corresponds to a vertex in the graph   .There will be an edge  connecting the two vertices of   and   if either   is among the k-nearest neighbors of   or   is among the -nearest neighbors of   .And a weight   ∈ [0, 1] is associated with the edge  computed as (1 + (  ,   )) −1 , where (  ,   ) is the Euclidean distance between   and   .
In the second step, the confidence of whether the label   associated with   is correct is evaluated through exploring information encoded in   's structure.As illustrated in Figure 1, an edge in   is called cut edge if the two vertices connected by it have different associated labels.The CEWS is as follows: where   corresponds to the set of examples which are connected with   in   and   corresponds to an i.i.d.Bernoulli random variable which takes value of 1 if   is different from   .When the size of   is sufficiently large, according to the central limit theorem,   can be approximately modeled by a normal distribution.Let    denote the standardized form of   .Then based on the left unilateral  value of    with respect to N(0, 1), the labeling confidence of (  ,   ) is as follows: where CF  (  ,   ) is the labeling confidence and Note that CF  (  ,   ) represents only a heuristic way to estimate the labeling confidence of (  ,   ) and should by no means be deemed to represent the ground-truth probability of  being the correct label of .Though, experimental results in [22] validated the usefulness of this heuristic confidence estimation strategy in discriminating correctly labeled examples from incorrectly labeled ones.

Diagnosis Framework.
The proposed approach combines the tritraining and CEWC to achieve bearing fault diagnosis and thus is called C-tritraining.The framework of it is illustrated in Figure 2. The data used for diagnosis are bearing vibration signals.First, the diagnostic features of the original vibration signals are extracted.Using ensemble empirical mode decomposition (EEMD) the original vibration signals can be broken down into intrinsic mode functions (IMFs) [23].The information entropies of IMFs, which are surprisingly good features for bearing fault diagnosis [24], are used as the features, as the input of proposed method.Then, three bagging sample sets are drawn from the labeled feature set and each of them is used for the initial training of the weak classifier that we adopt BP neural network in this paper.Three weak classifiers will be obtained and used to predict certain proportion of unlabeled feature examples.In detail, the prediction of weak classifier 1 and weak classifier 2, if the CEWC of both are higher than the threshold, will be added to sample set 3 for updating of weak classifiers 3. The same goes for the updating of weak classifier 1 and classifier 2; that is, the training set is enlarged by the prediction of other two weak classifiers.Besides, the initial proportion of unlabeled features examples from the database is set to be 0.5.The proportion updates as follows: The tritraining process keeps running until the termination condition is reached.The final output of the framework is the ensemble classifier that will be used to do the final bearing diagnosis using majority voting.
Trying to avoid the overfitting problem, a small trick named suspect principle is introduced into the classifiers updating process as the termination condition.The core of suspect principle lies in that when the three initial classifiers in tritraining have been updated to their best (the error rates stop decreasing) with the help of unlabeled examples, we remain doubtful whether they have reached their best or just fall into a local optimum.Therefore, the termination condition is set as the classifiers updating process keeps running after certain times that the error rates stop decreasing.It is worth discussing how we should set the suspect principle value.The experimental results in Section 4 show that four times is a good choice.

Case Study Description
To verify the effectiveness and generalization ability of the proposed method, datasets from two individual bearing fault cases conducted by different groups were adopted.
Case 1.As shown in Figure 3, the first case was originally conducted on rotational machinery fault simulation test bed (QPZZ-II) by Prognostic and Health Management Laboratory of School of Reliability and Systems Engineering, BUAA.The inner-ring fault, outer-ring fault, and roller element fault are introduced by wire-electrode cutting a crevice on the surface of inner ring, outer ring, and one of the roller elements as marked in Figure 4.The vibrational signals are sampled at a frequency of 5120 samples per second and the rotation speed is 1500 revolutions per minute.
The test bearings used are cylindrical roller bearing (N205EM HRB CHINA), the detailed structure information of which is listed in Table 1.Bearing faults in Case 2 include inner-ring fault, outerring fault, and roller element fault with an area of 3.8 mm 2 , 7 mm 2 , and 3 mm 2 circle-shaped spalling on the surface of inner ring, outer ring, and roller element, respectively.The test bearings used are deep groove ball bearing (6308), the detailed structure information of which is listed in Table 2.The sampling frequency is 20 K samples per second and the rotation speed is 1500 revolutions per minute.

Results
Through EEMD process, the original vibration signals collected from the two cases are transformed into two feature sets.According to [18], two parameters of EEMD, that is, the ratio of the standard deviation of the added noise and that of input, are set to be 0.15 and ensemble number is set to be 100.Information of the feature sets is tabulated in Table 3.  340 examples are then put into  and other 292 examples are put into  without their labels.To overcome the randomness of the results, 50 independent runs are performed and the averaged results are summarized as the final outcome.
Figure 6 shows the classification error rate of Cases 1 and 2 under different unlabeling rate and suspect principle value.When suspect principle value is set to four, the classification error rates are the lowest or the second lowest in most situations except only for the classification error rate in Case 2 under the unlabeling rate of 0.6.Therefore, it is naturally intuitive to determine that suspect principle value set to four is a practical optimal choice.With suspect principle value set to be four, the averaged results are summarized in Table 4, which presents the classification error rate of the initial ensemble weak classifiers, that is, the combination of the three initial BP neural network classifiers only trained from  and the final ensemble classifiers generated by tritraining and the improvement of the latter over the former.The architecture and parameters of the BP neural network are shown in Table 4.

Comparative Experiments with Different Semisupervised
Learning Models.In this paper, self-learning and tritraining models were conducted for comparison.The self-learning model is a traditional semisupervised learning method where the most confident unlabeled data samples, together with the predicted labels, are added to the initial training set, so that the neural network classifier can be retrained and the procedure repeats.The tritraining model is an elementary model whose parameters were the same as those of the Ctritraining except the CEWS optimization process.Detailed diagnosis is listed in Table 5-7 and Figure 7.

Comparative Experiments with Different Base Classifiers.
For the purpose of investigating the diagnosis performance with different base classifiers, an additional experiment was conducted where the support vector machine (SVM) was built with a RBF kernel function whose kernel parameter is set to 0.08 and the penalty factor set to 128.The SVM model was trained using the one-versus-all criterion.Note that the SVM model could be regarded as a more stable classifier while  neural network based classifiers are mostly unstable in terms of training mechanism.Taking Case 1 as an example, detailed diagnosis results are displayed in Table 8.

Discussion
(1) Different from the supervised learning based diagnosis methods for fault detection and identification, this manuscript proposes a new incremental learning approach that takes advantage of unlabeled data to improve diagnosis performance of rolling bearings.Considering that fault samples are continuously attained over monitoring time, semisupervised ensemble learning is employed so as to avoid manual labeling error, as well as improving classification accuracy for health assessment utilizing prior learned knowledge and newly attained information in a realtime diagnosis mechanism.In this regard, tritraining, where three diverse classifiers are generated from the bagging samples and integrated for fault diagnosis, is conducted to improve classification performance of base classifiers.On this basis, CEWC is employed in this study to further mine salient characteristics of unlabeled data with a view to design a more intelligent diagnosis model.This method was applied to two bearings with different proportions of unlabeled samples (20, 40, 60, and 80 percent, resp.).As shown in Table 5, the proposed method is able to effectively improve the performance of the initial ensemble classifiers under all unlabeled rates for both Cases 1 and 2. The improvement percentage ranges from 25.9% to 2.6%.
(2) It is noteworthy in Figure 8 that the improvement percentage increases sharply as unlabeling rate increases in both Cases 1 and 2. This means, by utilizing unlabeled data, the proposed method really makes a difference where there is limited labeled data to train the classifiers.And when there is less labeled data to train the classifiers, the proposed method is able to more improve the classifiers' performance.However, the absolute value of improvement and diagnostic error rate in Case 2 are commonly higher than those in Case 1.The difference between two results is caused by their dataset size.  in Case 1 that the diagnosis result of self-learning decreased with unlabeling rate of 0.8, which may due to some negative effects of improper training such as overfitting problems.
(4) From the diagnosis results of SVM based C-tritraining, it is noted that the fault classification performance was improved as well, demonstrating the effectiveness of the proposed semisupervised learning method in rolling bearing diagnosis; that is, such model can be appropriately applied using various base classifiers.However, the misclassification rates of the testing data were relatively high when compared to the BPNN based model, which may be due to the lesser difference between three SVM models.The ensemble process could only be more effective on condition that the base classifiers are of greater diversity.Therefore in this study, when determining the base classifier and architecture, we follow a simple idea that the classifiers should be as different as possible in the bagging process so that more sufficient information could be learned from the unlabeling data.

Conclusion
In order to improve performance of bearing fault diagnosis when there is limited labeled data, this paper presents a roller bearing fault diagnosis method based on the combination of tritraining and CEWC.The method is validated in two roller bearing fault cases conducted by two independent groups.The results showed that, with the help of unlabeled examples, the method is able to effectively improve the fault diagnosis for both cylindrical roller bearing and deep groove ball bearing when there is limited labeled examples.The proposed method still helps even when there is enough labeled data and the diagnostic accuracy can reach up to 95%.
Although the proposed method is promising, there is something that could be improved in the future work.The feature extracted from the vibrational signal is information entropy of IMFs through EEMD, which is an iterative process and so is tritraining.That makes the proposed method time-consuming, which undermines its applicability in roller bearing online diagnosis.Hence, the efficiency improvement is among the priorities in future work.
Confidence.The CEWC is established by a two-step process.In the first step, by employing the -nearest neighbor criterion, a neighborhood graph is constructed from the labeled examples  = {(  ,   ) |  = 1, 2, . . ., ||}, where   is the attributes of pth example in set  and   the label.Concretely, each example (  ,   ) ∈  example K in class 1 Class 1 Nearest examples in class 1 of K Nearest examples in class 2 of K Edge Cut edge d Cut edge weight: where prop  and prop −1 are the proportion at th and (−1)th iteration.  and  −1 are the training error at th and ( − 1)th iteration.The proportion updating process is rather intuitive.If the error decreases after the enlargement of training set with added unlabeled prediction, we naturally are confident that the weak classifiers are reliable and able to handle more unlabeled examples next time.However, if the error increases, we will have lower confidence of the weak classifiers and less unlabeled examples are trusted to them next time.

Figure 2 :
Figure 2: Framework of the proposed method.

Case 2 .
The second case was originally conducted by Institute of Intelligent instrument and Diagnosis, Xi'an Jiaotong University.The test rig shown in Figure5is completely designed and manufactured all by them.It consists mainly of a speed
For each feature set, 85 percent of the data are kept as training set while the rest are used as the test set to examine the trained classifiers.The training set, composed of labeled pool and unlabeled pool, that is,  ∪ , is partitioned under different unlabeling rates including 80 percent, 60 percent, 40 percent, and 20 percent.Take the data of Case 1 whose size is 400 examples; for example, the training set has 340 examples (85 percent) and test set 60 examples (15 percent).When the unlabeling rate is 80 percent, 68 examples out of

Figure 4 :
Figure 4: Faults of inner ring, outer ring, and roller element.

Table 1 :
Bearing structure information.

Table 2 :
Bearing structure information.

Table 3 :
Feature sets.Feature set Attribute Size Class Nor./inner/outer/element

Table 5 :
Classification error rate of initial and final hypothesis and the corresponding improvements of C-tritraining under different unlabeling rate with suspect principle value set to 4.
Figure 5: Test rig of Case 2.
The feature set of Case 1 has 400 examples while Case 2 has only 128 examples.For Case 2, when unlabeling rate is 0.8, then there are only 128 × 85% × 20% ≈ 22 labeled examples to train classifiers, which is apparently not enough to train good classifiers.No wonder the initial underfitting classifiers' classification error reaches 0.4589 when unlabeling rate is 0.8.The proposed method promotes 25.91% performance of the initial classifiers in this extreme situation.When there is enough labeled data, for example, Case 1 when unlabeling rate is 0.2, the classification error rate lowers to 0.0487 (95.13% diagnostic accuracy).It implies that roller bearing fault diagnosis based on tritraining is promising in either situation (a) where there is not enough labeled data to obtain good classifiers or situation (b) where Classification error rate of Case 1 (a) and Case 2 (b) under different unlabeling rate and suspect principle value.

Table 6 :
Classification error rate of initial and final hypothesis and the corresponding improvements of self-training under different unlabeling rate.

Table 7 :
Classification error rate of initial and final hypothesis and the corresponding improvements of tritraining under different unlabeling rate.

Table 8 :
Comparative diagnosis results of SVM and BPNN in Case 1. , and the Fundamental Research Funds for the Central Universities (Grant no.YWF-16-BJ-J-18).