The rolling element bearing is a core component of many systems such as aircraft, train, steamboat, and machine tool, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns. Due to misoperation, manufacturing deficiencies, or the lack of monitoring and maintenance, it is often found to be the most unreliable component within these systems. Therefore, effective and efficient fault diagnosis of rolling element bearings has an important role in ensuring the continued safe and reliable operation of their host systems. This study presents a trace ratio criterion-based kernel discriminant analysis (TR-KDA) for fault diagnosis of rolling element bearings. The binary immune genetic algorithm (BIGA) is employed to solve the trace ratio problem in TR-KDA. The numerical results obtained using extensive simulation indicate that the proposed TR-KDA using BIGA (called TR-KDA-BIGA) can effectively and efficiently classify different classes of rolling element bearing data, while also providing the capability of real-time visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Empirical comparisons show that the proposed TR-KDA-BIGA performs better than existing methods in classifying different classes of rolling element bearing data. The proposed TR-KDA-BIGA may be a promising tool for fault diagnosis of rolling element bearings.
1. Introduction
The rolling element bearing is a core component of many systems such as aircraft, train, steamboat, and machine tool, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns [1–6]. Due to misoperation, manufacturing deficiencies, or the lack of monitoring and maintenance, it is often found to be the most unreliable component within these systems. Therefore, effective and efficient fault diagnosis of rolling element bearings has an important role in ensuring the continued safe and reliable operation of their host systems.
Over the past few years, much research effort has been devoted to developing approaches to fault diagnosis of rolling element bearings. When faults occur in rolling element bearings, vibration signals in the relevant time/frequency-domain have been demonstrated to deviate from their normal ones because of the increased friction and impulsive forces [7–10]. Usually, several dozens or even hundreds of time/frequency-domain features are calculated from the bearing vibration signals to represent the different health status. In the current study, 9 time-domain features and 6 time-frequency-domain features are extracted from the bearing vibration signals to jointly construct a 15-dimension feature vector. In that way, fault diagnosis of rolling element bearings is usually solved as a high-dimensional pattern recognition problem. However, for high-dimensional data, the intrinsic dimension may be small. For example, the number of features responsible for a certain type of fault pattern may be small. Moreover, projection of high-dimensional data onto 2- or 3-dimension subspace can provide real-time visualization, which is convenient for the user to monitor the health status of rolling element bearings. In addition, projection of high-dimensional data onto low dimension subspace also plays a part of data compression, which is helpful for efficient storage and retrieval. Thus, dimensionality reduction techniques are often used to project the high-dimensional feature space to a lower-dimensional space while preserving most of “intrinsic information” contained in the data properties [11–15]. Upon performing dimensionality reduction on the data, its compact representation can be utilized for succeeding tasks (e.g., visualization and classification). Among various dimensionality reduction methods [16–24], principal component analysis (PCA) and linear discriminant analysis (LDA) are the two most common methods [21]. The former is an unsupervised method, which pursues the direction of maximum variance for optimal reconstruction. The latter is a supervised method, which aims to maximize the between-class scatter while minimizing the within-class scatter. Owning to the utilization of labeled information, the latter generally outperforms the former if sufficient labeled samples are provided [21]. In the past few years, a series of studies have been conducted to formulate the LDAs for pattern recognition by Fukunaga [21], Wang et al. [22], Sun and Chen [25], Guo et al. [26], Zhao et al. [27], Jin et al. [28], Jia et al. [29], and so on. Generally, the formulation of LDAs is based on the ratio trace criterion but not trace ratio criterion, because the ratio trace problem is more tractable than the trace ratio problem. Nevertheless, as pointed out by Wang et al. [22], solutions obtained based on ratio trace criterion may deviate from the original intent of the trace ratio problems. To improve the behaviour of LDA implementation, Wang et al. [22], Guo et al. [26], Zhao et al. [27], Jin et al. [28], and Jia et al. [29] presented various trace ratio criterion-based LDAs (TR-LDAs), in which the numerator and denominator of the criterion directly reflect Euclidean distances between of inter- and intraclass samples. Another advantage of trace ratio criterion is that the calculated projection matrix is orthogonal, which can eliminate the redundancy between different projection directions. In addition, the orthogonal projection can thus preserve such similarities without any change when using Euclidean distance to evaluate the similarity between data points [22]. Although the above TR-LDA formulation methods have the aforementioned advantages, they are criticized due to their incapability of dealing with the redundancy among eigenvectors. For example, if the most discriminative eigenvector is duplicated several times, the above TR-LDA formulation methods are prone to selecting all of them. This is problematic for selection of an optimal subset of eigenvectors because other discriminative and complementary eigenvectors will be missed. A classifier with the eigenvectors selected in this way can give rise to poor classification performance. Therefore, the issue of TR-LDA formulation has remained unresolved.
A review of the related literature also indicates that most of the previous work in the area of applying LDA or TR-LDA to fault diagnosis assumed that samples in each class follow a linear distribution. However, in many fault diagnosis practices, samples in each class that may follow a nonlinear distribution cannot satisfy the assumption. Without this assumption, the separation of different classes may not be well characterized by the scatter matrices, causing the classification results to be degraded [21]. To solve this problem, kernel trick [30–32], which is to extend many linear methods to its nonlinear kernel version, can be used to extend TR-LDA to handle nonlinear problem. Thus, this study develops a nonlinear kernel version of TR-LDA, that is, trace ratio criterion-based kernel discriminant analysis (TR-KDA), for fault diagnosis of rolling element bearings. However, like many other TR-LDA models, the TR-KDA model presented in this study shares the trace ratio problem in the formulation of projection matrix. Although the above TR-LDA formulation methods have the aforementioned advantages, they are criticized due to the inability to handle redundancy in eigenvector selection. For example, if the most discriminative eigenvector is duplicated several times, the above TR-LDA formulation methods are prone to selecting all of them. This is problematic for selecting the best set of eigenvectors because other discriminative and complementary eigenvectors will be missed. A classifier with the eigenvectors selected in such a way can lead to a poor classification performance. Fortunately, immune genetic algorithm (IGA), a novel evolutionary computation technique developed by Jiao and Wang [33], has the potential to determine a set of discriminative and mutually irredundant eigenvectors. In this study, we propose a method called TR-KDA-BIGA that uses binary IGA (BIGA) to formulate TR-KDA for dimensionality reduction of statistical and wavelet features extracted from the vibration signals and gives rise to effective and efficient fault diagnosis of rolling element bearings. In particular the contributions are to
use immune evolutionary computation technique such as BIGA to obtain a reduced set of discriminative and mutually irredundant eigenvectors for TR-KDA-BIGA formulation,
provide the capability of two-dimensional representation of bearing data that is very useful for the practitioners to monitor the health status of bearings,
build a TR-KDA-BIGA model architecture for the vibration measurements for effective and efficient fault diagnosis of rolling element bearings.
The rest of this study is structured as follows. Section 2 briefly reviews the basic concepts of TR-LDA and kernel extension. Section 3 presents a TR-KDA-BIGA method. Section 4 discusses its convergence and initialization. Section 5 conducts performance evaluations of the proposed TR-KDA-BIGA on benchmark problems. Section 6 describes an overall flowchart of the proposed TR-KDA-BIGA for fault diagnosis of rolling element bearings. Section 7 summarizes the conclusions drawn from this study.
2. Review of TR-LDA and Kernel Extension2.1. Review of TR-LDA
Suppose we are given a set of nd-dimensional samples x1,x2,…,xn, belonging to l different classes. The goal of LDA tries to obtain a linear projection matrix W∈Rd×k that can map the original d-dimensional data xi onto the k-dimensional data yi (usually k≪d) by maximizing the between-class scatter and meanwhile minimizing the within-class scatter. The between-class scatter matrix SB and the within-class scatter matrix SW are expressed as follows: (1)SB=∑i=1lnimi-mmi-mT,SW=∑i=1l∑j=1nixji-mixji-miT,where m represents the total sample mean vector, ni represents the number of samples in the ith class, m(i) represents the average vector of the ith class, and xj(i) represents thejth sample in the ith class. The new mapped feature vectors yi∈Rk can then be expressed as yi=WTxi. The original LDA formulation, known as the Fisher LDA [21], only handles binary classification problems. However, many practical applications involve multiclass classification. In order to overcome this issue, a number of researchers have proposed optimization criteria for extending the Fisher LDA to handle multiclass classification problems. The first optimization criterion is in a ratio trace form (referred to as RT-LDA): (2)W∗=argmaxWTW=ITrWTSBWWTSWW,where Tr[·] denotes the matrix trace; I is an identity matrix. In order to achieve a set of orthogonal normalized vectors, it usually adds the constraint WTW=I to (2). The second optimization criterion is in a trace ratio form (referred to as TR-LDA):(3)W∗=argmaxWTW=ITrWTSBWTrWTSWW.The optimization problem in (3) can be solved directly through the generalized eigenvalue decomposition (GED) method [22]:(4)SBwi=τiSWwi,where τi is the ith largest eigenvalue, wi is the eigenvector corresponding to τi, and wi constitutes the ith column vector of the matrix W. Although a closed-form solution for (3) can be approximately obtained with the GED, it does not necessarily guarantee best trace ratio optimization. Thus, this approximation of ratio trace optimization to trace ratio optimization may lead to classification capability loss of the derived optimal low-dimensional feature space. Moreover, the physical meaning of the trace ratio form is clearer than that of the ratio trace form. However, the optimization problem in (3) is generally nonconvex and a closed-form solution for it does not exist. Fortunately, a recent study conducted by Guo et al. [26] showed that, using the trace difference function z(λ)=maxWTW=ITrWTSB-λSWW, the trace ratio problem can be solved equivalently by finding zero points of the equation z(λ)=0. Following up Guo et al.’s work, Wang et al. presented an iterative method named ITR algorithm to solve the trace ratio problem [22]. The ITR algorithm optimizes the objective function λ=maxWTW=ITr[WTSBW]/Tr[WTSWW] in an iterative and incremental manner. The W in the tth iteration step (referred to as Wt) is obtained through solving the trace difference problem argmaxWTW=ITr[WT(SB-λtSW)W], where λt represents the trace ratio value derived from the W in the previous iteration step (referred to as Wt-1). However, the initialization for the W influences substantially the convergence performance of the ITR algorithm. A good initialization can generally make the ITR algorithm yield a quick convergence. A bad initialization usually increases the number of iterations. Moreover, in ITR algorithm, although it seems that the W formed with these eigenvectors corresponding to the k largest eigenvalues of SB-λtSW can maximize the trace difference Tr[WT(SB-λtSW)W], it cannot necessarily maximize the trace ratio Tr[WTSBW]/Tr[WTSWW]. On the other hand, from the perspective of fault diagnosis, the aim is mainly to find a set of projection vectors that can pose the highest levels of discrimination in the different fault patterns. Thus, these eigenvectors with the largest eigenvalues are not necessarily representative for discriminating one class from others as previously mentioned in Section 1. To overcome the above shortcomings, this study presents a BIGA-based solution method for trace ratio criterion (to be in detail discussed in Section 3).
2.2. Kernel Extension
In some applications, it is insufficient to model the data using the TR-LDA, which is a linear discriminating method. To address the issue of nonlinearities in the data, this section presents a nonlinear discriminating method using kernel trick [30–32], that is, TR-KDA. The so-called kernel trick is to map the original data to a high-dimensional Hilbert space through a nonlinear mapping function ϕ. Let ϕ(X) denote the data matrix in the Hilbert space: ϕ(X)=[ϕ(x1),ϕ(x2),…,ϕ(xn)]. The function form of the mapping does not need to be known since it is implicitly defined by the choice of kernel function K(xi,xj)=ϕ(xi)Tϕ(xj), that is, the inner product in the kernel-induced feature space. The kernel function K may be any positive kernel satisfying Mercer’s condition. Radial basis function (RBF) kernel function, one of the most popular kernel functions employed in various kernelled learning algorithms, is adopted in this study. Then, (3) in Hilbert space can be written as follows:(5)Wϕ∗=argmaxWϕTWϕ=ITrWϕTSBϕWϕTrWϕTSWϕWϕ,where Wϕ, SBϕ, and SWϕ are the matrices in Hilbert space corresponding to W, SB, and SW in (3), respectively. Notably, we can show that matrices SB and SW in (3) can be essentially expressed as SB=XLBXT and SW=XLWXT through simple manipulation, respectively. X is the vector, where X=[x1,x2,…,xn]. Matrices LB and LW are the graph Laplacian matrices [34] of the weighted undirected graphs reflecting the between-class and within-class relationship of the samples. Consider(6)SB=∑i=1lnimi-mmi-mT=∑i=1lnimimiT-m∑i=1lnimiT-∑i=1lnimimT+∑i=1lnimmT=∑i=1l1nix1i+x2i+⋯+xniix1i+x2i+⋯+xniiT-2nmmT+nmmT=∑i=1l∑j,q=1ni1nixjixqiT-nmmT=XGXT-nmmT=XGXT-X1neeTXT=XG-1neeTXT,where e=[1,1,…,1]T is an n-dimensional vector. We can simplify the above equation even further by defining that (7)Gij=1nqifsamplesxiandxjbelongtotheqthclass0otherwise(8)LB=G-1neeT.Thus, we get(9)SB=XLBXT.Then, the matrix SW can similarly be computed as follows:(10)SW=∑i=1l∑j=1nixji-mixji-miT=∑i=1l∑j=1nixjixjiT-mixjiT-xjimiT+mimiT=∑i=1l∑j=1nixjixjiT-nimimiT=∑i=1lXiXiT-1nix1i+x2i+⋯+xniix1i+x2i+⋯+xniiT=∑i=1lXiXiT-1niXieieiTXiT=∑i=1lXiI-1nieieiTXiT=∑i=1lXiLiXiT,where Xi=[x1i,x2(i),…,xni(i)] is a d×ni matrix, ei=[1,1,…,1]T is an ni-dimensional vector, I is the identity matrix, Li=I-1/nieieiT is an ni×ni matrix, and XiLiXiT is the data covariance matrix of the ith class. Based on (7), the above equation can be simplified similarly by defining (11)LW=I-G.Thus, we get(12)SW=XLWXT.Using the definitions in (9) and (12), (5) can be rewritten as follows:(13)Wϕ∗=argmaxWϕTWϕ=ITrWϕTϕXLBϕXTWϕTrWϕTϕXLWϕXTWϕ.In order to pursue the matrix Wϕ∗, solving the above equation involves decomposition of ϕ(X) into an orthogonal matrix Q (satisfying QQT=I) and a right triangular matrix R such that Q∗R=ϕ(X). We have(14)ϕXTϕX=RTR.Let us map Wϕ into the span of Q. Q is currently an orthogonal basis of ϕ(X), so we have (15)Wϕ=QVϕ,where Vϕ=Rk×n is an orthogonal matrix satisfying (Vϕ)TVϕ=I. Using the definitions in (14) and (15), (13) can be further rewritten as follows:(16)Vϕ∗=argmaxVϕTVϕ=ITrVϕTRLBRTVϕTrVϕTRLWRTVϕ.Let S~B=RLBRT and S~W=RLWRT; then (16) can be further rewritten as follows:(17)Vϕ∗=argmaxVϕTVϕ=ITrVϕTS~BVϕTrVϕTS~WVϕ.After the matrix Vϕ∗ is obtained with the BIGA-based solution method (to be in detail discussed in Section 3), the output points in the reduced data space can thus be expressed as (18)Yϕ∗=Wϕ∗TϕX=Vϕ∗TQTQR=Vϕ∗TR.
3. The Proposed TR-KDA-BIGA
As previously mentioned, construction of TR-KDA needs to select k out of d eigenvectors to form the matrix Vϕ for dimensionality reduction. However, finding a subset of eigenvectors based on the trace ratio criterion is not an easy task since the space of possible subsets is very large especially when d is a large number. Thus, it is not impractical to use exhaustive search to find an optimal subset of k eigenvectors. Instead, in this study, the BIGA is utilized to select k out of d eigenvectors of Vtϕ as the bases for projection matrix formulation based on the trace ratio criterion such that the trace ratio value λ=Tr[(Vϕ)TS~BVϕ]/Tr[(Vϕ)TS~WVϕ] can be maximized. Immune genetic algorithm, originally developed by Jiao and Wang [33], is a novel genetic algorithm based on the biological immune theory, which combined the immune mechanism with the evolutionary mechanism. In what follows, further discussion of the proposed TR-KDA-BIGA is carried out.
3.1. Chromosome Encoding
Encoding a solution of a problem into a chromosome is an important issue when using BIGAs. In this study, every chromosome in a BIGA corresponds to a discrete binary selector u=[u1,u2,…,ud], where each gene in the chromosome is “1,” indicating an eigenvector viϕ(i=1,2,…,d) of S~B-λtS~W appearing in forming the projection matrix Vϕ of the tth step, while “0” denotes its absence. Thus, the length of the chromosome is d.
3.2. Genetic Operators
Genetic operators give every chromosome the chance to become the fittest chromosome of its generation. If it is difficult to reach the target of trace ratio optimization, crossover and mutation may introduce degeneracy into generations of chromosomes.
3.2.1. Crossover Operator
Crossover operator in a BIGA is employed to generate two new children chromosomes based on two existing parent chromosomes selected from the current population in terms of a prespecified crossover rate. In this study, “one-point” crossover operator was adopted to randomly select a cut point to exchange the parts between the cut point and the end of the string of the parent chromosomes. Specifically, suppose that two parent chromosomes P1 and P2 selected randomly from the population are undergoing the crossover operation at a randomly selected crossover point g(1≤g≤d), where (19)P1=ui,1,ui,2,…,ui,d,P2=uj,1,uj,2,…,uj,d.Consequently, the offspring is generated by one-point crossover on the genes of two parents selected randomly from the population. We can thus get the two offspring chromosomes C1 and C2:(20)C1=ui,1,ui,2,…,ui,g-1,ui,g,uj,g+1,…,uj,d,C2=uj,1,uj,2,…,uj,g-1,uj,g,ui,g+1,…,ui,d.However, the exchange procedure is not simply exchanging their genetic information between gene segments after the crossover points. We must keep the number of eigenvectors to be included in the subset equal to k. In this study, therefore, a simple but effective crossover operator strategy in this study is performed in order to ensure that the crossover operator does not change the total number of “1” genes in chromosomes.
Let(21)nP1=∑q=g+1dui,q,nP2=∑q=g+1duj,q.When nP1 is not equal to nP2, the following retention criterion will be conducted:
If nP1 is larger than nP2, randomly select (nP1-nP2) genes with “0-bit” from the current offspring chromosome C1 and reset these (nP1-nP2) selected genes to “1-bit,” and then randomly select (nP1-nP2) genes with “1-bit” from the current offspring chromosome C2 and reset these (nP1-nP2) selected genes to “0-bit.”
If nP1 is smaller than nP2, randomly select (nP2-nP1) genes with “1-bit” from the current offspring chromosome C1 and reset these (nP2-nP1) selected genes to “0-bit,” and then randomly select (nP2-nP1) genes with “0-bit” from the current offspring chromosome C2 and reset these (nP2-nP1) selected genes to “1-bit.”
3.2.2. Mutation Operator
Mutation operator in a BIGA is used primarily as a mechanism for maintaining diversity in the population. For each gene in a chromosome that is undergoing the mutation, a real-valued number is randomly selected within the range of [0,1]. If the real-valued number is less than the prespecified mutation rate, then the gene will change from “0-bit” to “1-bit” and vice versa. Upon adding (or removing) one eigenvector in that way, we shall randomly remove (or add) a different one such that the number of eigenvectors to be included in the subset is equal to k. The mutation operator helps the chromosomes to guide the search in new areas.
3.3. Immune Operators
The immune ability of BIGAs is realized through two kinds of immune operators: a vaccination and an immune selection. The vaccination is responsible for improving individuals’ overall fitness levels. The immune selection is responsible for prevention of deterioration.
3.3.1. Vaccination Operator
Given a chromosome u, vaccination operation in a BIGA is employed to modify the genes on some bits according to a priori knowledge such that individuals with higher fitness have a greater probability of being selected. Let U=(u1,u2,…,un0) be a population; the vaccination operation on U means that the operation is performed on nα=αn0 chromosomes selected from U according to the proportion of α, where n0 represents the population size of a BIGA. A vaccine is abstracted from the prior knowledge of the pending problem, whose information amount and validity play an important role in the performance of the algorithm.
3.3.2. Immune Selection Operator
The immune selection operation consists of the following two steps. The first step is the immunity test: if the fitness of a chromosome u is smaller than that of the parent chromosome, which indicates that degeneration occurred during crossover and mutation, then the parent chromosome will be used for the next competition. The second step is the annealing selection [35]: a chromosome ui is selected from the current offspring population Uk to join with the new parents with the probability as follows:(22)Pui=expfui/Tk∑i=1n0expfui/Tk,where f(ui) is the fitness of the individual ui and {Tk} is the temperature-controlled series tending towards 0.
3.4. Fitness Evaluation
Fitness evaluation plays a critical role in selecting offspring chromosomes from the current population for the next generation. In this study, the fitness function for eigenvector selection is defined as(23)fu=u1h1+u2h2+⋯+udhdu1g1+u2g2+⋯+udgd=uhTugT,where hi denotes the viTS~Bvi value for the ith eigenvector, gi denotes the viTS~Wvi value for the ith eigenvector, h=[h1,h2,…,hd], g=[g1,g2,…,gd], u=[u1,u2,…,ud], ui∈{0,1}, u1T=k, and i=1,2,…,d. Notably, u is called the binary selector and k is the desired lower feature dimension. Finally, according to the evolved binary selector u, we can thus form the projection matrix Vϕ of the tth step by choosing the k eigenvectors with ui=1(i=1,2,…,d). The procedures of the proposed TR-KDA-BIGA are summarized in the procedures of the proposed TR-KDA-BIGA part. The computational flow of the BIGA obtained using the aforementioned genetic and immune operators is also provided in the computational flow of the BIGA part.
The Procedures of the Proposed TR-KDA-BIGA.
The procedures are as follows:
Construct the kernel matrix K=(ϕ(X))Tϕ(X).
Perform Cholesky decomposition to the kernel matrix K=RTR.
Form the kernel scatter matrixes as S~B=RLBRT and S~W=RLWRT.
Set iterations number t to 1.
Set the initial trace ratio value λt to Tr(S~B)/Tr(S~W).
Compute the eigendecomposition of S~B-λtS~W as (S~B-λtS~W)viϕ=τiviϕ, where viϕ(i=1,2,…,d) is the eigenvector of S~B-λtS~W.
Calculate hi=(viϕ)TS~Bviϕ and gi=(viϕ)TS~Wviϕ for each eigenvector viϕ(i=1,2,…,d).
Generate a population of BIGA selectors.
Evolve the population where the fitness of a BIGA selector u is measured as f(u)=uhT/ugT.
u∗ is the evolved best BIGA selector.
Form the projection matrix Vtϕ by choosing the k eigenvectors viϕ with ui∗=1(i=1,2,…,d).
Update the trace ratio value λt+1=Tr[(Vtϕ)TS~BVtϕ]/Tr[(Vtϕ)TS~WVtϕ], t=t+1, and go to step (6). Repeat this procedure until a convergence condition was established when the trace ratio value does not increase in consecutive 5 iterations.
Output Yϕ∗=(Vϕ∗)TR.
The Computational Flow of the BIGA. The computational flow is as follows:
Set l (time of generation) to 1.
Initialize randomly the original population Al.
Evaluate each chromosome in the original population Al.
Abstract vaccines according to the prior knowledge.
Check for termination criteria. If the fixed number of generations is not reached or the optimal chromosome found thus far is not satisfied, then go to the next step. Otherwise, output the optimal chromosome as the final solutions for further decision-making.
Perform crossover operation on the Al and then generate the population Bl.
Perform mutation operation on the Bl and then generate the population results Cl.
Perform vaccination operation on the Cl and then generate the population Dl.
Perform immune selection operation on the Dl and then generate the next generational population Al+1. Go to step (3).
4. The Convergence of the Proposed TR-KDA-BIGA
In this section, we analyze the convergence of the proposed TR-LDA-BIGA. Before doing this task, it should be worth noting that the BIGA is convergent. It has been demonstrated by Jiao and Wang [33] that as long as enough iteration has been completed, the immune genetic population converges towards the true optimum with probability one.
Recall the trace difference function (24)zλ=maxVϕTVϕ=ITrVϕTS~B-λS~WVϕ;it follows that (25)zλt+1=maxVϕTVϕ=ITrVϕTS~B-λt+1S~WVϕ⟹zλt+1=TrVt+1ϕTS~B-λt+1S~WVt+1ϕ.Since λt+1=Tr[(Vtϕ)TS~BVtϕ]/Tr[(Vtϕ)TS~WVtϕ] as previously mentioned, we get(26)TrVtϕTS~B-λt+1S~WVtϕ=0.Consider the inequality (27)TrVt+1ϕTS~B-λt+1S~WVt+1ϕ≥TrVtϕTS~B-λt+1S~WVtϕ,and the equation(28)TrVtϕTS~B-λt+1S~WVtϕ=0,and we have (29)zλt+1≥0,TrVt+1ϕTS~BVt+1ϕ-λt+1TrVt+1ϕTS~WVt+1ϕ≥0.Consequently, (30)TrVt+1ϕTS~BVt+1ϕTrVt+1ϕTS~WVt+1ϕ≥λt+1⟹λt+2≥λt+1.Substituting the subscript t+1 by t yields(31)λt+1≥λt.So we obtain the following inequality which gives the first expression of convergence of the proposed TR-KDA-BIGA.
Further, suppose that λ∗ is the optimal trace ratio value; it follows that(32)zλ∗=TrVϕ∗TS~B-λ∗S~WVϕ∗=0,where Vϕ∗ is the optimal projection matrix. We therefore have(33)zλ∗=TrVϕ∗TS~B-λ∗S~WVϕ∗≥TrVtϕTS~B-λ∗S~WVtϕ=TrVtϕTS~B-λt+1S~WVtϕ+λt+1-λ∗TrVtϕTS~WVtϕ=zλt+1+λt+1-λ∗TrVtϕTS~WVtϕ.Consider z(λ∗)=0, z(λt+1)≥0 and S~W is semipositive definite; we have(34)λt+1-λ∗≤0.So we obtain the following inequality which gives the second expression of convergence of the proposed TR-KDA-BIGA:(35)λt+1≤λ∗.We conclude therefore that, for a particular initial trace ratio value λt, the updated value λt+1 can always satisfy (1) λt+1≥λt and (2) λt+1≤λ∗.
5. Performance Evaluation on Benchmark Problems
In order to extensively verify the performance of the proposed TR-KDA-BIGA, it is first tested on wide types of commonly used benchmark problems taken from the UCI machine learning repository and evaluated with the classification rate (i.e., the number of correctly identified training examples/total number of training examples) by comparison with other existing methods such as PCA, LDA, KPCA [30, 31], KDA [32], and TR-LDA. These data sets include Heart-statlog, Ionosphere, Iris, Wine, Waveform, Balance, and Synthetic Control Chart Time Series (SCCTS) data sets (Table 1), which are of small sizes, low dimensions, large sizes, and/or high dimensions. For comparative study, we randomly select 50% data points from each data set as training set and the rest of the data points as test set. All methods use training set in the output reduced space to train one nearest neighborhood (1NN) classifier for evaluating the classification rate of test set. To restrict the influence of random effects, the experiments of PCA, LDA, KPCA, KDA, TR-LDA, and TR-KDA-BIGA compared on each benchmark problem are independently performed for 20 runs. Table 2 compares the classification rate for benchmark problems of the proposed TR-KDA-BIGA with that of the PCA, LDA, KPCA, KDA, and TR-LDA. As seen in Table 2, the proposed TR-KDA-BIGA can perform better than all the compared methods, except in the case of Heart-statlog.
Specification of benchmark problems.
Data set
#samples
#dim.
#class
Heart-statlog
270
13
2
Ionosphere
351
34
2
Iris
150
4
3
Wine
178
13
3
Waveform
5000
40
3
Balance
625
4
3
SCCTS
600
60
6
Results of the classification rate for the benchmark problems (mean ± derivation).
Data set
PCA
KPCA
LDA
KDA
TR-LDA
TR-KDA-BIGA
Heart-statlog
59.7±4.6
60.0±4.2
75.5±3.3
74.2±2.7
76.1±3.1
75.8±2.3
Ionosphere
74.4±4.0
74.9±3.8
74.7±5.7
75.1±4.7
75.2±5.9
76.7±4.5
Iris
91.0±4.0
91.0±4.2
91.0±3.5
91.8±3.0
92.3±3.2
92.9±3.6
Wine
68.3±5.2
69.0±4.9
77.7±8.4
77.0±6.9
78.4±8.5
78.6±6.3
Waveform
72.9±1.7
73.1±1.5
73.3±1.7
73.0±2.3
74.5±1.3
74.9±2.8
Balance
63.4±4.2
66.4±4.7
77.8±4.9
78.4±3.7
78.3±4.2
79.2±3.2
SCCTS
80.0±4.8
80.1±5.0
80.5±5.4
81.1±6.2
81.2±5.6
82.9±6.7
The results obtained demonstrate the ability of the proposed TR-KDA-BIGA in classifying different classes well. Thus, the proposed TR-KDA-BIGA may be effectively employed for fault diagnosis of rolling element bearings.
6. The Proposed TR-KDA Using BIGA for Fault Diagnosis of Rolling Element Bearings
In this section, the proposed TR-KDA-BIGA is applied to fault diagnosis of rolling element bearings. Vibration signals resulting from rolling element bearings are first filtered by using a low-pass filter. Then, the filtered vibration signals are divided into sections of equal window length. One set of relevant features obtained from each window is used for characterizing to some extent the health status of the rolling element bearings. Most of the faults occurring in rolling element bearings will introduce the increased friction and impulsive forces when bearings are rotating, which generally lead the vibration signals in time-domain, frequency-domain, and/or time-frequency-domain to vary (become different) from the normal ones. In this study, 9 time-domain statistical features (Table 3) are extracted from the vibration signal. All of these 9 time-domain statistical features reflect the characteristics of time series data in the time-domain. Moreover, 6 time-frequency-domain wavelet features about the percentages of energy corresponding to wavelet coefficients are extracted from the vibration signal by using Daubechies-4 (db4) wavelet to decompose the vibration signal into five levels [32]. Wavelet features extracted in such a way can to the greatest extent reflect the vibration energy distribution in the time-frequency-domain. Thus, 9 time-domain statistical features together with 6 time-frequency-domain wavelet features are used to represent each window’s vibration signals.
Time-domain statistical features.
Feature
Eq.
Feature
Eq.
Standard deviation
xstd=∑i=1Nx(i)-x-2N
Clearance factor
CLF=xpxsmr
Peak
xp=maxx(i)
Shape factor
SF=xrms(1/N)∑i=1Nx(i)
Skewness
xske=(1/N)∑i=1Nx(i)-x-3xstd3
Impact factor
IF=xp(1/N)∑i=1Nx(i)
Kurtosis
xkur=(1/N)∑i=1Nx(i)-x-4xstd4
Square mean root
xsmr=1N∑i=1Nx(i)2
Crest factor
CF=xpxrms
where x(i) is a digital signal series, i=1,2,…,N, N is the number of elements of the digital signal, and x-=∑i=1Nx(i)/N and xrms=∑i=1Nx(i)2/N are the mean value and root-mean-square value of the digital signal series, respectively.
6.1. Experimental Setup
In order to demonstrate the performance of the proposed TR-KDA-BIGA, rolling element bearing data obtained from the Bearing Data Centre, Case Western Reserve University [36], are used. The test rolling element bearings were SKF 6205 JEM, a type of deep groove ball bearing. Single-point faults were seeded into the drive end ball bearing using electrodischarge machining. Faults occurring in rolling element bearings introduced impact-like vibration signals when bearings were rotating. An accelerometer was mounted on the drive end of the motor housing to detect such impacts that behaved like damped oscillations. Vibration signals were captured from four different health statuses of bearing, that is, normal bearings (Normal), inner race fault (IR), ball fault (BA), and outer race fault (OR). For each of the three abnormal statuses (IR, BA, and OR), there are three different levels of severity with fault diameters (0.007 inches, 0.014 inches, and 0.021 inches). All the experiments were done for three different load conditions (1 HP, 2 HP, and 3 HP). Figure 1 illustrates the experimental setup. Experimental data were collected from the drive end ball bearing of an induction motor (Reliance Electric 2 HP IQPreAlert) driven test rig. Table 4 gives a short description of rolling element bearing data.
Description of rolling element bearing data.
#samples
#dimension
#class
#samples per class
800
15
10
80
Experimental setup: (a) test rig; (b) schematic description of the test rig.
6.2. Experiment Results6.2.1. Visualization of Bearing Data
Visualization performances of the proposed TR-KDA-BIGA are compared with those of PCA, LDA, KPCA, KDA, and TR-LDA using simulations, where KPCA and KDA are the kernel extensions to PCA and LDA, respectively. The two-dimensional visualization results of bearing data for three different load conditions (1, 2, and 3 HP) obtained with PCA, LDA, KPCA, KDA, TR-LDA, and the proposed TR-KDA-BIGA are summarized in Figures 2, 3, and 4, respectively. As seen in Figures 2, 3, and 4, the proposed TR-KDA-BIGA outperforms all the compared methods in not only closely conglomerating bearing data belonging to the same class but also clearly separating bearing data belonging to different classes of three different load conditions (1, 2, and 3 HP). Compared with the unsupervised methods (i.e., PCA and KPCA), the supervised methods (i.e., LDA, KDA, TR-LDA, and TR-KDA-BIGA) can preserve more discriminative information embedded in bearing data and obtain clearer and less overlapped boundaries. It can also be concluded from Figures 2, 3, and 4 that the methods using kernel trick (i.e., KPCA, KDA, and TR-KDA-BIGA) performed better than the methods without using kernel trick (i.e., PCA, LDA, and TR-LDA) in separating the discriminative property—samples from different classes in the learned subspace.
Two-dimensional representation of bearing data under 1 HP motor load by each method: (a) PCA; (b) KPCA; (c) LDA; (d) KDA; (e) TR-LDA; (f) TR-KDA-BIGA.
Two-dimensional representation of bearing data under 2 HP motor load by each method: (a) PCA; (b) KPCA; (c) LDA; (d) KDA; (e) TR-LDA; (f) TR-KDA-BIGA.
Two-dimensional representation of bearing data under 3 HP motor load by each method: (a) PCA; (b) KPCA; (c) LDA; (d) KDA; (e) TR-LDA; (f) TR-KDA-BIGA.
6.2.2. Classification of Bearing Data
Classification performances of the proposed TR-KDA-BIGA are compared with those of PCA, LDA, KPCA, KDA, and TR-LDA. In order to show the robustness of the proposed TR-KDA-BIGA, we perform 4 independent experiments for each load condition in terms of 4 different data partitions. In this study, 10, 20, 30, and 40 samples per class in bearing data set are randomly selected from each class in bearing data as the training set and the remaining samples as the test set. Then, each method uses the training set to train a 1NN classifier in order to classify different health status in test set. Tables 5, 6, and 7 summarize the average classification results of PCA, LDA, KPCA, KDA, TR-LDA, and the proposed TR-KDA-BIGA with various numbers of training samples for 1 HP, 2 HP, and 3 HP load conditions, respectively. It can be observed that the overall average performance of the classification of health status is fairly good. Tables 5, 6, and 7 demonstrate that the proposed TR-KDA-BIGA performs remarkably better than the compared methods (PCA, LDA, KPCA, KDA, and TR-LDA). It should be noted that the proposed model can also provide the capability of real-time visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Tables 5, 6, and 7 also demonstrate that the number of training samples does significantly affect the classification accuracy for bearing health status.
Classification accuracy of bearing data under 1 HP motor load.
Method
Number of training samples
Average
10
20
30
40
PCA
94.7793
95.9016
96.7564
97.7785
96.30395
KPCA
94.9203
95.9007
96.5873
98.4867
96.47375
LDA
96.2741
97.1929
98.0017
98.8438
97.57813
KDA
96.7026
97.1957
98.1504
99.1511
97.79995
TR-LDA
97.1914
97.9510
99.0587
99.7006
98.47543
TR-KDA-BIGA
97.5793
98.8971
99.5416
99.8068
98.95620
Classification accuracy of bearing data under 2 HP motor load.
Method
Number of training samples
Average
10
20
30
40
PCA
94.4654
95.4452
96.3670
97.6571
95.98368
KPCA
94.2155
96.5207
97.3084
98.7273
96.69298
LDA
96.0477
96.9640
97.6314
99.0035
97.41165
KDA
96.3333
97.5962
97.2495
99.4105
97.64738
TR-LDA
97.3188
98.0867
98.7036
99.7777
98.47170
TR-KDA-BIGA
98.9123
98.6209
99.7737
99.9311
99.30950
Classification accuracy of bearing data under 3 HP motor load.
Method
Number of training samples
Average
10
20
30
40
PCA
96.1337
97.3051
97.6719
97.9738
97.27113
KPCA
96.7442
97.0369
97.6968
98.2854
97.44083
LDA
97.0421
98.5900
99.2875
99.4996
98.60480
KDA
97.4820
98.8408
99.1516
99.6082
98.77065
TR-LDA
98.8961
99.3457
99.9190
99.6082
99.44225
TR-KDA-BIGA
99.2918
99.9153
99.9103
99.9291
99.76163
7. Conclusions
The rolling element bearing is a core component of many systems, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns. Effective and efficient fault diagnosis of rolling element bearings plays an extremely important role in the safe and reliable operation of their host systems. In the current study, fault diagnosis of rolling element bearings is done in a pattern recognition way by calculating a high-dimensional feature data set from vibration signals, which represents the different status of bearings. Specifically, the TR-KDA is presented for fault diagnosis of rolling element bearings and the BIGA is employed to solve the trace ratio problem in TR-KDA. The numerical results obtained using extensive simulation indicate that the proposed TR-KDA-BIGA can effectively classify different classes of rolling element bearing data, while also providing the capability of real-time visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Empirical comparisons show that the proposed TR-KDA-BIGA performs better than existing methods in classifying different rolling element bearing data. The proposed TR-KDA-BIGA may be a promising tool for fault diagnosis of rolling element bearings.
Three research directions are worth pursuing. First, although this study considers the specific fault diagnosis of rolling element bearings, the proposed method can be modified and extended to address the fault diagnosis of gearboxes [37, 38] and cutting tools [39, 40]. Second, frequency-domain information can be utilized for fault diagnosis of rolling element bearings [41, 42]; it would thus be interesting to integrate frequency-domain features to time-domain and time-frequency-domain features. Third, empirical mode decomposition is a very powerful tool for nonlinear and nonstationary signal processing [43–45]; it would be also interesting to employ the empirical mode decomposition to extract periodic components and random transient components from the bearing vibration signal mixture, which may be very helpful for extraction of fault signatures from a collected bearing vibration signal.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The research is funded partially by the National Science Foundation of China (51405239), National Defense Basic Scientific Research Program of China (A2620132010, A2520110003), Jiangsu Provincial Natural Science Foundation of China (BK20150745, BK20140727), Jiangsu Province Science and Technology Support Program (BE2014134), Fundamental Research Funds for the Central Universities (1005-YAH15055), and Jiangsu Postdoctoral Science Foundation of China (1501024C). The authors would like to express sincere appreciation to Professor KA Loparo and Case Western Reserve University for their efforts to make bearing data set available and permission to use data set.
JinX.MaE. W. M.ChengL. L.PechtM.Health monitoring of cooling fans based on mahalanobis distance with mRMR feature selectionJinX.ChowT. W. S.Anomaly detection of cooling fan and fault classification of induction motor using Mahalanobis–Taguchi systemZareiJ.Induction motors bearing fault detection using pattern recognition techniquesYuJ.-B.Bearing performance degradation assessment using locality preserving projectionsWangD.TseP. W.TseY. L.A morphogram with the optimal selection of parameters used in morphological analysis for enhancing the ability in bearing fault diagnosisWangW.PechtM.Economic analysis of canary-based prognostics and health managementRandallR. B.AntoniJ.Rolling element bearing diagnostics—a tutorialYangY.LiaoY.MengG.LeeJ.A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosisRafieeJ.RafieeM. A.TseP. W.Application of mother wavelet functions for automatic gear and bearing fault diagnosisHeW.JiangZ.-N.FengK.Bearing fault detection based on optimal wavelet filter and sparse code shrinkageTenenbaumJ. B.de SilvaV.LangfordJ. C.A global geometric framework for nonlinear dimensionality reductionRoweisS. T.SaulL. K.Nonlinear dimensionality reduction by locally linear embeddingPrietoM. D.CirrincioneG.EspinosaA. G.OrtegaJ. A.HenaoH.Bearing fault detection by a novel condition-monitoring scheme based on statistical-time features and neural networksZhaoM. B.ZhangZ.ChowT. W. S.Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reductionHeX.YanS.HuY.NiyogiP.ZhangH.-J.Face recognition using LaplacianfacesFengK.JiangZ.HeW.MaB.A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to indicator diagram diagnosisJiangQ.JiaM.HuJ.XuF.Machinery fault diagnosis using supervised manifold learningStrangasE. G.AviyenteS.ZaidiS. S. H.Time-frequency analysis for efficient fault diagnosis and failure prognosis for interior permanent-magnet AC motorsWangY.MaE. W. M.ChowT. W. S.TsuiK.-L.A two-step parametric method for failure prediction in hard disk drivesYuJ.Local and nonlocal preserving projection for bearing defect classification and performance assessmentFukunagaK.WangH.YanS.XuD.TangX.HuangT.Trace ratio vs. ratio trace for dimensionality reductionProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07)June 2007Minneapolis, Minn, USA10.1109/cvpr.2007.3829832-s2.0-35148823228ZhaoM. B.ChanR. H. M.TangP.ChowT. W. S.WongS. W. H.Trace ratio linear discriminant analysis for medical diagnosis: a case study of dementiaZhouL.WangL.ShenC. H.Feature selection with redundancy-constrained class separabilitySunT.ChenS.Class label versus sample label-based CCAGuoY.-F.LiS.-J.YangJ.-Y.ShuT.-T.WuL.-D.A generalized Foley-Sammon transform based on generalized fisher discriminant criterion and its application to face recognitionZhaoM. B.JinX. H.ZhangZ.LiB.Fault diagnosis of rolling element bearings via discriminative subspace learning: visualization and classificationJinX. H.ZhaoM. B.ChowT. W. S.PechtM. S.Motor bearing fault diagnosis using trace ratio linear discriminant analysisJiaY. Q.NieF. P.ZhangC. S.Trace ratio problem revisitedYangJ.FrangiA. F.YangJ.-Y.ZhangD.JinZ.KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognitionZhangC.NieF.XiangS.A general kernelization framework for learning algorithms based on kernel PCAJiS.YeJ.Kernel uncorrelated and regularized discriminant analysis: a theoretical and computational studyJiaoL. C.WangL.A novel genetic algorithm based on immunityChungF. R. K.ZhangJ. S.XuZ. B.LiangY.The whole annealing genetic algorithms and their sufficient and necessary conditions of convergenceLoparoK. A.Bearings vibration data setCase Western Reserve University, http://csegroups.case.edu/bearingdatacenter/pages/welcome-case-western-reserve-university-bearing-data-center-websiteWangD.MiaoQ.KangR.Robust health evaluation of gearbox subject to tooth failure with wavelet decompositionWangD.TseP. W.GuoW.MiaoQ.Support vector data description for fusion of multiple health indicators for enhancing gearbox fault diagnosis and prognosisAlonsoF. J.SalgadoD. R.Analysis of the structure of vibration signals for tool wear detectionZhuK. P.HongG. S.WongY. S.A comparative study of feature selection for hidden Markov model-based micro-milling tool wear monitoringWangD.MiaoQ.FanX. F.HuangH.-Z.Rolling element bearing fault detection using an improved combination of Hilbert and wavelet transformsLeiY. G.ZuoM. J.HeZ. J.ZiY. Y.A multidimensional hybrid intelligent method for gear fault diagnosisWangD.GuoW.TseP. W.An enhanced empirical mode decomposition method for blind component separation of a single-channel vibration signal mixtureLeiY. G.ZuoM. J.HoseiniM. R.The use of ensemble empirical mode decomposition to improve bispectral analysis for fault detection in rotating machineryLeiY. G.HeZ. J.ZiY. Y.Application of the EEMD method to rotor fault diagnosis of rotating machinery