The emergence of complex machinery and equipment in several areas demands efficient fault diagnosis methods. Several fault diagnosis methods based on different theories and approaches have been proposed in the literature. According to the concept
of intelligent maintenance, the application of intelligent systems to accomplish
fault diagnosis from process historical data has been shown to be a promising approach.
In problems involving complex nonstationary dynamic systems, an adaptive fault diagnosis system is required to cope with changes in the monitored process. In order to address fault diagnosis in this scenario, use of the so-called “evolving intelligent systems” is suggested. This paper proposes the application of an evolving fuzzy classifier for fault diagnosis based on a new approach that combines a recursive clustering algorithm and a drift detection method. In this approach, the clustering update
depends not only on a similarity measure, but also on the monitoring changes in the
input data flow. A merging cluster mechanism was incorporated into the algorithm
to enable the removal of redundant clusters. Multivariate Gaussian memberships
functions are employed in the fuzzy rules to avoid information loss if there is interaction
between variables. The novel approach provides greater robustness to outliers
and noise present in data from process sensors. The classifier is evaluated in fault
diagnosis of a DC drive system. In the experiments, a DC drive system fault simulator
was used to simulate normal operation and several faulty conditions. Outliers
and noise were added to the simulated data to evaluate the robustness of the fault
diagnosis model.
1. Introduction
The advance of technology has resulted in the emergence of machinery and complex equipment, which imposes great challenges for its management and maintenance. In industries, for instance, fault diagnosis in major processes is vitally important to assure normal operation of a plant. In these cases, due to the complexity of the systems, it is infeasible for human operators to diagnose abnormal situations (faults) in a timely manner, leading them to take wrong decisions. Statistical studies indicate that approximately 70% of the accidents in industries are caused by human error, which can account for economic losses, security reductions, and environmental damages [1].
This scenario led to the emergence of new concepts on management and maintenance of machinery and equipment, such as condition-based maintenance (CBM) [2]. CBM refers to the use of machine or equipment data obtained in real time to infer its working condition (or faulty condition), allowing maintenance scheduling and preventing equipment crashes. Based on CBM, the concept of intelligent maintenance has emerged [3]. It employs advanced fault diagnosis systems to achieve the desired goals. Thus, intelligent maintenance becomes necessary for current complex machinery and equipment.
Over the past decades, several intelligent fault diagnosis methods based on different theories and approaches have been proposed in the literature. In general, these methods use mathematical/statistical models, accumulated experience, or even process data to perform fault diagnosis [1]. Although methods based on models or experience have shown to be effective, they have the disadvantage of requiring previous knowledge of the dynamic system in question. On the contrary, methods based on process data do not require prior knowledge. They are based solely on data obtained directly from the system.
Recently, fault diagnosis methods based on process data have received great emphasis, since the acquisition of data through sensors is widely common in today’s automation systems [4, 5]. Given this current scenario, many times it is easier to extract knowledge from data than developing a model or accumulating experience. In this type of diagnosis, several works have already proposed data based diagnostics methods employing so-called “intelligent systems,” which are tools derived from computational intelligence, mainly artificial neural networks, fuzzy systems, and neurofuzzy networks, among others [2].
However, despite the good performance achieved by intelligent systems in fault diagnosis, they tend to face difficulties when the problem involves complex nonstationary dynamic systems, which represent the vast majority of the current real cases. In such systems, physical parameters, operating characteristics and fault behaviours change over time, requiring an adaptive fault diagnosis system, able to self-adapt in favor to cope with changes in the monitored system. In order to address fault diagnosis in this scenario, several works propose the use of the so-called “evolving intelligent systems” [6–10].
Evolving intelligent systems are systems based on fuzzy inference systems, artificial neural networks, or a combination of both, the neurofuzzy networks, whose main characteristic is the ability to gradually determine both its structure and parameters from input data acquired in online mode and often in real time [11, 12]. The application of evolving intelligent systems has been growing in recent years. Many works present successful applications in real world complex problems involving modeling, control, classification, or prediction [13]. An important aspect of evolving intelligent systems is that there are different theoretical and practical approaches which can be used for its implementation. Regardless of the approach to be used, the main features of evolving intelligent systems are as follows:
its structure is not fixed and is not defined a priori: it grows (expands or shrinks) naturally as the system evolves;
its parameters are adjusted (adapted) as the system evolves;
the operation is continuous; that is, they are based on online learning algorithms and, if necessary, in real time.
One of the most used approaches to define the structure of an evolving intelligent system is unsupervised recursive clustering. Generally, the algorithm performs data clustering in the input or input-output data space in an incremental manner, defining the center of each cluster, and in some cases, the radius of the cluster (or zone of influence). During the evolving process, the algorithm can create new clusters, update existing clusters, or eliminate redundant ones. The models proposed in [14–21] are examples of intelligent systems based on evolving clustering algorithms.
Most evolving intelligent systems based on recursive clustering adopt a mechanism to update the structure and parameters of the system (creation/modification/removal of clusters) using some measure of similarity between input data samples and existing clusters. Although this mechanism is functional, it may lead to an erroneous definition of the structure, since outliers or noisy samples (as usually are the data acquired by sensors in industrial environments) which exceed the measure of similarity may generate clusters that do not effectively represent the data spacial structure [21]. Some evolving intelligent systems adopt more elaborated mechanisms to update the model structure and system parameters, such as the models proposed in [20, 21], using methods to ignore/filter outliers and noise.
Considering the fault diagnosis problem, the use of evolving intelligent systems based on recursive clustering algorithms robust to outliers and data noise is mandatory. In this problem, each new cluster created is usually associated with a new faulty condition. Thus, if the clustering procedure is not robust, the fault diagnosis model tends to have a high false alarm rate; that is, new faulty conditions are erroneously detected. In this context, this paper proposes a fault diagnosis approach based on an evolving fuzzy classifier which uses a new robust unsupervised recursive clustering algorithm. The proposed classifier uses a modified version of the Gustafson-Kessel (GK) clustering algorithm [22] with the incorporation of the drift detection method (DDM) [23].
GK is a powerful clustering algorithm. Unlike many others, it allows the identification of clusters with different shapes and orientations in space. The algorithm employs a technique to adapt the distance metric to the shape of each cluster using an estimation of the cluster covariance matrix. Furthermore, the GK algorithm has also the advantage of being relatively insensitive to data scale and initialization of the partition matrix [24]. Several applications have been proposed in the literature based on this clustering algorithm, such as time series prediction, dynamic systems modeling, fault diagnosis, and prognosis.
According to the literature, a drift detection is a method to detect gradual changes in the context of input data. By context, it is understood as a set of generated data when the process is stationary. Thus, a method for drift detection is able to detect time instants when changes occur in the context of the data. The detection of a new context suggests that the current model is outdated and needs to be updated using current relevant information. Drift detection methods are suitable for applications involving machine learning, where algorithms are applied to real world problems, in complex, nonstationary, and dynamic environments. In these applications, large amounts of information are provided in a continuous flow of high-speed data presenting variations over time as, for example, real time monitoring of industrial plants [25]. The learning algorithms must be able to monitor the behavior of the dynamic system in question and adapt the model as changes occur. Among several methods proposed for drift detection, the DDM algorithm employs simple and computationally efficient method to detect moments when changes occur. It consists of an independent drift detection method, and it can be embedded into any learning algorithm, while increasing its efficiency in problems involving nonstationary dynamic models.
The new unsupervised recursive clustering algorithm proposed in this paper combines the advantages of the GK algorithm, especially the ability to identify clusters with different shapes and orientations in an online mode, with the DDM algorithm. The DDM algorithm is used to detect changes in the input stream triggering updates in the cluster structure. In the proposed algorithm, any clustering update depends not only on the similarity measure, but also on monitoring changes in the input data flow, which gives the algorithm a greater robustness to the presence of outliers and noise. A merging cluster mechanism was also incorporated into the algorithm to enable the removal of redundant clusters. The fuzzy rule base of the proposed classifier is updated whenever the cluster structure is modified. The clusters centers and covariance matrices are used as parameters of fuzzy rules. Multivariate Gaussian memberships functions are employed in the rules, characterized by a central vector and a dispersion matrix, which represents the current dispersion of the input variables, as well as the interactions between them [21].
In accordance with the characteristics of the proposed recursive clustering algorithm, the main benefits achieved by the classifier used in this work are
the ability to learn faults of the dynamic system in online mode and, if necessary, in real time, eliminating the need for prior knowledge of the system;
the ability to adapt whenever changes are detected in the monitored system, allowing the application to real problems;
low false alarm rate and high fault isolation rate due to the robustness to outliers and noise, increasing the reliability of diagnosis.
To evaluate the performance of the proposed approach in fault diagnosis, a DC drive system fault simulator was used to simulate normal operation and several faulty conditions. Outliers and noise were added to the simulated data to evaluate the robustness of the fault diagnosis model.
This paper is organized as follows. Section 2 presents the theoretical concepts regarding recursive clustering algorithm, drift detection method, and presents the proposed recursive clustering algorithm. Next, Section 3 presents the proposed classifier and its application in fault diagnosis. Section 4 presents the experiments and results. Finally, Section 5 presents the conclusion and suggestions for future works.
2. Recursive Clustering Algorithm and Drift Detection2.1. Recursive Gustaffson-Kessel Algorithm
In pattern recognition, clustering algorithms are among the most useful tools to solve problems that involve analysis of nonlabeled data, or unsupervised learning [26]. Over the past decades, thousands of clustering algorithms have been proposed [27], but most of them are based on the offline learning concept or batch learning; it is assumed that the entire dataset is previously available. However, for many applications, data is acquired in real time, requiring online learning.
In contrast to clustering algorithms for offline learning which find clusters employing an iterative strategy, such as K-means and Fuzzy C-Means (FCM) [27], clustering algorithms for online learning are based on recursive strategies, which allow the algorithm to find clusters processing each input data sample only once. Several algorithms have been proposed in the last years based on this approach, such as evolving clustering method (ECM) [14], evolving vector quantization (eVQ) [18], and eClustering [28]. A common feature of these algorithms is that they assume that the form of the clusters is spherical, which can be a limiting factor in real applications, where the clusters may have different shapes and orientation in space.
Unlike many clustering algorithms that employ Euclidian distance as measure of similarity, GK algorithm employs Mahalanobis distance, which allows the identification of clusters with ellipsoidal shapes. In this algorithm, the distance is defined as follows:
(1)dik2=xk-viAixk-viT,
where dik2 represents the distance between an input data sample xk=[xk1,…,xkn], k=1,…,N, and the cluster center vi, i=1,…,c, where N is the number of data samples, n is the number of data dimensions, and c is the number of clusters. The norm-inducing matrix Ai, i=1,…,c, defines the shape and orientation of each cluster in space, which depends on a fuzzy covariance matrix Fi, i=1,…,c, and of the membership degree of the input data sample uik, i=1,…,c, k=1,…,N. The GK algorithm uses an iterative process to estimate the parameters of the clusters (the cluster center and fuzzy covariance matrix), which are used to define the distance dik2 and membership degree uik. This process is finished when a certain convergence criterion is reached. But, as discussed at the beginning of this section, when the application requires the definition of clustering in online mode, a recursive procedure is required. More details about the GK algorithm can be found in [22].
In [24], an extended version of the GK algorithm named evolving GK-like algorithm (eGKL) is proposed. This approach estimates the number of clusters and performs the adaptation of its parameters recursively, maintaining the advantages of the GK algorithm, such as the ability to identify clusters with generic shapes and orientations. The eGKL algorithm does not demand any a priori information regarding the number of clusters. In order to estimate the number of clusters, a strategy to evaluate each new input data sample is used. The strategy checks if each sample belongs to an existing cluster. If the current data sample belongs to a cluster already set, the parameters of the cluster (center and covariance matrix) are updated. If the data sample does not belong to any of the existing clusters, it is used to define a new one. To evaluate the similarity between a new sample data and one of the existing clusters, the eGKL algorithm employs the Mahalanobis distance, defined as follows:
(2)Dik2=xk-viFi-1xk-viT.
In this strategy, the current data sample belongs to an existing cluster if the distance to the cluster center is smaller than the cluster radius. The eGKL algorithm uses an approach inspired in concepts of statistical process control to estimate the radius of each cluster. In this approach, it is assumed that a sample belongs to a cluster if the following relationship holds:
(3)Dik2<χn,β2,
where χn,β2 is the value of a Chi-squared distribution with n degrees of freedom and a confidence interval β. The degrees of freedom n correspond to the input space dimension. This approach has the advantage of avoiding the problem called “curse of dimensionality” [29], that is, the problem of increasing the distance between two adjacent points with the increase in the input space dimensionality, since χn,β2 is proportional to the dimension of the input data.
In eGLK algorithm, if condition (3) is satisfied, it means that the current data sample belongs to a cluster, so the cluster parameters are updated. Otherwise, it is assumed that the current data sample does not belong to any one of the existing clusters, and a new cluster is created. The complete procedures of the eGLK algorithm can be seen in [24].
To increase the eGKL algorithm robustness to outliers, the authors propose a mechanism based on the number of data samples that belong to a cluster. In this mechanism, if the number of data samples Mi, i=1,…,c, already assigned to an existing cluster is less than Mmin (a minimum number initially chosen), even if the new data sample does not belong to that cluster, the cluster parameters are updated. Although it is functional, this mechanism depends on the proper choice of parameters to the problem at hand, which can be difficult for problems where a priori information is not available.
2.2. Drift Detection Method
Several drift detection methods have been proposed. In general, they can be classified into two categories: methods that perform adaptive learning at regular intervals regardless of the occurrence of changes and methods that detect changes first and subsequently adapt the learning to these changes [25]. Considering the first category, methods can use time windows of fixed size or weight the data according to their age or utility [30–32]. When the time windows of a fixed size are used, at each time frame, learning is performed only with data samples included in the window. An inherent difficulty with methods using fixed-size windows is choosing the appropriate window size for each problem. In the second category, methods use some indicators monitored over time to detect changes, such as performance measures, data distribution, or data properties [23, 33, 34]. If during the monitoring process a drift is detect, actions are taken to adapt the model to the change that has occurred, as in the case of using adaptive size time window, where the actions are to adjust the window according to the extent of the context change.
The DDM algorithm, which belongs to the second category, employs a simple method with direct application. This method is based on monitoring the number of errors produced by a learning model during prediction. The method uses the Binomial distribution to determine the general form of the probability for the random variable that represents the number of prediction errors into a sequence of n input data samples. For each k data sample sequences, the error rate is the probability of the prediction error pk with standard deviation sk=pk(1-pk)/k. According to the probability approximately correct (PAC) learning model [35], the error rate of the learning algorithm decreases with the increase of input data samples, and if the distribution is stationary, a significant increase in the error rate suggests context changes. In this case, it is assumed that the current model is inappropriate and should be updated.
In this method, while monitoring the error, it defines a warning and a drift level. When pk+sk exceeds the warning level, the data samples are stored in memory. However, if pk+sk exceeds the drift level, it is considered that there is a context change. In this situation, the model induced by the learning algorithm should be updated with the data samples stored since the time that the warning level has been reached. It is possible that the error increases and, after reaching the warning level, it decreases to lower levels. This situation corresponds to a false alarm, where there is no change of context and, therefore, no action is required and the data samples stored in the memory are no longer needed. More details about the DDM method can be found in [23].
The use of the DDM algorithm embedded in a model learning algorithm can keep the dynamic system model continuously updated to the current context. For instance, DDM can be used embedded in a recursive clustering algorithm. In this case, the definition of the clusters are adjusted whenever a context change is detected. DDM is used to avoid the nonrobust approach of creating new clusters whenever a similarity measure threshold is violated. This mechanism gives the recursive clustering algorithm a greater robustness to outliers and noise in applications where online learning of nonstationary dynamic models is necessary.
2.3. Proposed Algorithm
This section describes the proposed unsupervised recursive clustering algorithm with a new mechanism of clustering update. The algorithm is a recursive version of the GK algorithm, inspired by the eGKL algorithm, incorporating the DDM algorithm. In the proposed algorithm, clustering is performed in online mode and, if necessary, in real time.
Assuming that there is no a priori information about the clustering structure nor a initial set of input data samples, the proposed algorithm starts by associating the center of the first cluster c1 to the first data sample x1. The corresponding covariance matrix F1, the learning rate α1, and the number of samples associated with the first cluster M1 are defined as follows:
(4)c1=x1;F1=Finit;α1=αinit;M1=1,
where Finit=γI; I is an identity matrix of n size, γ is a small positive number (default value: γ=10-2), and αinit∈[0,1] is the initial learning rate (default value: αinit=0.5).
The algorithm stops when all data samples are processed; otherwise, a new data sample xk is obtained and the distance between the data sample and the centers of the existing clusters is computed:
(5)Dik2=xk-viFi-1xk-viT,i=1,…,c.
The similarity between the current data sample and the existing clusters is verified by the similarity condition
(6)Dik2<χn,β2,i=1,…,c.
If similarity condition (6) is met for a given cluster, it is assumed that the current sample belongs to this cluster. The cluster parameters (center, covariance matrix, learning rate, and number of samples in the cluster) are then updated as follows:
(7)vq=vq+αqxk-vq,Fq=Fq+αqxk-vqTxk-vq-Fq,αq=αinitMq,Mq=Mq+1,
where q=argmini=1,…,c(Dik2).
If similarity condition (6) is not met, it is assumed that the current sample does not belong to any existing cluster. The algorithm increments a variable that represents the number of dissimilarities, Mdis=Mdis+1; then, the error probability and the standard deviation are computed as
(8)p=Mdisk,s=p1-pk.
In this algorithm, the p and s values are stored whenever p+s reach the lowest value during the process, obtaining pmin and smin. If the following condition is met,
(9)p+s<pmin+smin,
then pmin=p and smin=s. Note that, when algorithm starts, the p and s values must be initialized as a positive number, is suggested set at one for each value.
To decide whether the current data sample xk represents a new cluster or it is just an outlier, warning and drift conditions are evaluated. The warning condition is verified as
(10)p+s>pmin+z1·smin,
where z1 is the warning level (default value: z1=2). If the warning level is reached, then the current data sample is stored in a window of samples W(data)j, j=1,…,m (where m is the current size of the window) and then the drift condition is evaluated. Otherwise, the algorithm processes the next input data sample. Drift condition is verified as
(11)p+s>pmin+z2·smin,
where z2 is the drift level (default value: z2=3). If the drift level is reached, a new cluster is created and the center and the covariance matrix of the new cluster are determined by the samples stored in the data window as follows:
(12)c=c+1,vc=1m∑j=1mWdataj,Fc=covWdataj,j=1,…,m.
The remaining parameters of the new cluster (learning rate and number of samples in the cluster) are initialized as
(13)αc=αinit;Mc=1.
In order to avoid redundant cluster formation, during the update, the similarity between clusters is checked. To achieve this, distances between the centers of the clusters are computed as follows:
(14)Dij2=vi-vjFi-1vi-vjT,i=1,…,c,j=1,…,c.Dji2=vj-viFj-1vj-viT,j=1,…,c,i=1,…,c.
If one of the following similarity conditions is met for two existing clusters i and j,
(15)Dij2<χn,β2,i=1,…,c,j=1,…,c,Dji2<χn,β2,j=1,…,c,i=1,…,c,
the clusters are merged. These clusters have a hyper ellipsoidal shape, defined by a mean vector, a covariance matrix, and a number of samples associated with each one. The combination of these two clusters produces a new one with parameters computed as follows [36]:
(16)Mi=Mi+Mj,vi=MiMi+Mjvi+MjMi+Mjvj,Fi=Mi-1Mi+Mj+1Fi+Mj-1Mi+Mj+1Fj+MiMjMi+MjMi+Mj-1vi-vjTvi-vj.
Algorithm 1 summarizes the proposed recursive clustering algorithm.
Algorithm 1: Recursive clustering algorithm with drift detection.
Input: xk,χn,β2,Finit,αinit,z1,z2;
Output: vi,Fi;
Read the first data sample x1;
Initialize the first cluster;
fork=2,3,…do
Read xk;
Compute Dik2 for all clusters;
Identify the closest cluster;
ifDqk2<χn,β2then
Update the closest cluster;
else
Update the dissimilarity number Mdis;
Compute p and s;
ifp+s<pmin+sminthen
Update pmin and smin;
end if
ifp+s>pmin+z1⋅sminthen
Store xk in the data window W(data)j;
end if
ifp+s>pmin+z2⋅sminthen
Create new cluster;
end if
end if
Compute Dij2 and Dji2 for all clusters;
ifDij2<χn,β2 or Dji2<χn,β2then
Merge redundant clusters;
end if
end for
3. Evolving Fuzzy Classifier for Fault Diagnosis
The use of algorithms for pattern classification is present in many current applications, such as fingerprint recognition for security systems, handwriting recognition on touch screen computers, DNA sequences identification in medical diagnostic softwares, and fault diagnosis in industrial equipment. In this context, the problem of pattern classification consists in assigning a class or a category for each data sample from a set of “raw” data [26].
In many applications, pattern classification algorithms based on fuzzy rules have been used due to their advantages in relation to classic algorithms for pattern classification [26], especially by the good prediction performance in real problems and good transparency in linguistic rules [37], which allows an easy comprehension of the dependence between pattern characteristics. The typical architecture of a fuzzy classifier consists of a set of IF THEN fuzzy rules, defined as
(17)RULEi:IFx1ISμi1AND…ANDxnISμinTHENyi=Li,
where [xk1,…,xkn] are the input variables or input patterns of n dimensionality; [μi1,…,μin] are antecedent fuzzy sets of the ith fuzzy rule; yi is the output; Li is the crisp output corresponding to the class label from the set [1,…,K], where K is the number of classes.
The classification of each new input data sample xk is obtained by assigning to it the label of the class associated with the rule having the highest activation degree. The class is determined as follows:
(18)yi=Li*,
where i*=argmax1<i<R(τi), R is the number of fuzzy rules, and τi is the activation degree of the ith fuzzy rule, defined by a t-norm, usually expressed as a product operator:
(19)τi=Tj=1nμijxj,
where μij are the membership functions of fuzzy sets defined by Gaussians:
(20)μij=e-1/2(xj-vij)2/σij2,
where vij and σij2 represent, respectively, the membership functions center and variance.
To implement this fuzzy classifier architecture, clustering is usually performed in the input or input-output data space. Then, rules are created using one-dimensional (or univariate) fuzzy sets, generated from the projection of the clusters in the axis of each variable. According to [21], this approach can lead to information loss if there is interaction between variables, and, to avoid this, the authors propose the use of multivariate Gaussian membership functions to represent antecedent fuzzy sets of each rule. These membership functions are described as
(21)Hx=e-1/2x-vΣ-1x-vT,
where v is a 1×n central vector and Σ is a n×n symmetric positive definite matrix. The central vector is defined as the modal value and represents H(x) typical value and the Σ matrix denotes the dispersion and represents H(x) spreading.
In this case, each cluster found by the clustering algorithm is associated with a fuzzy rule and the multivariate Gaussian membership function parameters is defined as the parameters of the corresponding cluster. If multivariate Gaussian membership functions are used, the fuzzy classifier will have a rule set defined as
(22)RULEi:IFxkISAiTHENyi=Li,
where Ai is the fuzzy set with multivariate Gaussian membership function (21) of the ith fuzzy rule, with parameters extracted from the corresponding cluster.
Usually, more than one rule can be used to describe a class; for example, the class can be multimodal. In this case, only one rule cannot be sufficient to describe all possible variations of the same class. Thus, the fuzzy classifier aggregates rules outputs associated with the same class using a s-norm. The result of the aggregation can be interpreted like rules as follows:
(23)IFxkISAiORIFxkISAjOR…IFxkISAkTHENyi=Li.
The result of this aggregation is the degree of relevance of each known class. The classification of each new sample xk is defined by the class with the highest relevance degree.
In some pattern classification applications data samples classes are not known a priori. In these situations, it is required the use of an unsupervised learning process for classifier implementation. Moreover, in applications where the pattern classification should be performed in real time, the learning should be performed using incremental algorithms, processing each data sample once as a data stream. To solve these problems, the solution is to use a recursive clustering algorithm.
In this paper, we propose an evolving fuzzy classifier based on recursive clustering algorithm with drift detection presented in Section 2.3, which allows the creation of a fuzzy rule base in online mode and, if necessary, in real time from input data samples. This approach is different from the ones employed in traditional fuzzy classifiers, which require some training (usually supervised) conducted in offline mode.
The proposed classifier updates the rule base using the output of the recursive clustering algorithm described in the previous section. For each new input data sample, if a new cluster is created, a new fuzzy rule (22) is added to the rule base, where the cluster center and the covariance matrix are used as parameters of the multivariable Gaussian membership function of the antecedents. The rule consequent (the crisp output corresponding to the class label) must be defined by experts or system operators, since in unsupervised learning processes incoming online samples usually are not prelabelled. If a cluster is updated, the corresponding fuzzy rules are updated, the class label is determined as the consequent of the fuzzy rule with the highest activation degree, and the user intervention is not necessary. If two clusters are merged by the recursive clustering algorithm, the corresponding fuzzy rules are also merged to represent a unique class. It should be noted that both the number of rules and the number of classes are determined during the evolving process, and it is not necessary to set these parameters a priori. Algorithm 2 summarizes the procedures of the classifier.
Algorithm 2: Evolving fuzzy classifier.
Input: xk;
Output: yk;
Initialize the classifier;
fork=1,2,…do
Read xk;
Execute the recursive clustering algorithm;
ifnew cluster is createdthen
Create new fuzzy rule;
Define the new class elicited by expert/system operator;
yk = label of the new class;
end if
ifcluster is updatedthen
Update the corresponding fuzzy rule;
Find the most active rule;
yk = label of the most active rule;
end if
ifclusters are mergedthen
Merge the corresponding fuzzy rules;
end if
end for
The application of the proposed classifier for fault diagnosis is illustrated in Figure 1. Data samples are obtained from a dynamic system in a continuous stream, usually provided by sensors that monitor the process. These data might require the use of preprocessing techniques for feature extraction.
Fault diagnosis with an evolving fuzzy classifier.
The classifier starts with an empty rule set. Rules are created as the recursive clustering algorithm creates clusters to represent the data stream. Each rule will be related to a class, and each class will be related to a dynamic system condition, representing a normal operation or a faulty condition. When a new rule is created, the system operator is notified and informs the label of the class that defines it as a normal operation or as a specific fault. All of the necessary diagnostic information, the fuzzy rules, and classes labels are stored in a unified database and updated while the system is used.
After an initial period of operation, the database will contain a set of fuzzy rules and classes labels defined so far. When a new data sample is associated with an existing cluster, the classifier updates the corresponding fuzzy rule and classifies the dynamic system condition as the label present in the consequent of the fuzzy rule with the highest activation degree. In this situation, system operator intervention is not required, and the classification of the dynamic system condition is performed automatically.
The classifier proposed in this work has as main characteristic the ability to diagnose faults in a complex nonstationary dynamic system. The classifier does not require any a priori information about the dynamic model neither process system historical data. This allows the classifier to construct a rule base in an evolving way and, with the aid of the operator, to learn to diagnose faults as they occur. Thus, the proposed classifier is able to adapt to the dynamic system, making it possible to diagnose faults not previously known.
4. Experiments and Results
The proposed classifier was evaluated for fault diagnosis in a DC drive system. A fault simulator was used in this evaluation, from which normal operation data and fault data were generated and organized in random sequences of different operation modes. The output of the classifier was compared with the provided sequence to prove its efficiency in detecting and classifying faults.
4.1. DC Drive System
The DC drive system model employed was proposed by [38] and consists of a benchmark for fault detection and diagnosis. As illustrated in Figure 2, the system comprises of two power supplies, two controlled static converters, a direct current machine and a mechanical load. The variables definitions shown in the representation of the system are as follows:
va: voltage of the armature circuit;
vfd: voltage of the field circuit;
ia: current of the armature circuit;
ifd: current of the field circuit;
ra,La: resistance/inductance of the armature circuit;
rfd,Lfd: resistance/inductance of the field circuit;
ea: counter-electromotive force of the armature;
Tem: electromagnetic torque;
TL: torque required by the mechanical load.
Representation of the DC drive system.
Using this benchmark is possible to perform fault simulation on the actuators (armature and field converters), at the plant or process (machine and mechanical load), and on sensors (current and speed meters), as detailed in Table 1. To simulate a fault, was employed 750 V power supplies, constant speed and overload at 25% of nominal torque set at half of the simulation interval. A sampling period of 2 ms for the monitored variables was used.
Types of faults on DC drive system.
Index
Description
0
Normal operation
1
Armature converter disconnection
2
Field converter disconnection
3
Armature converter short circuit
4
Field converter short circuit
5
Armature turns short-circuit
6
Field turns short-circuit
7
Ventilation system fault
8
Bearing lubrication fault
9
Armature current sensor fault
10
Field current sensor fault
11
Machine speed sensor fault
Figures 3, 4, 5, 6, and 7 show as an example the curves of the armature current (Ia), field current (Ifd), and speed machine (V) in fault simulation. In this case, the following faults were simulated: armature converter disconnection, field converter short-circuit, armature turns short-circuit, bearing lubrication fault, and field current sensor fault. At the beginning of each simulation, the system is working under normal operation.
In Figure 8, the same faults presented in the previous figures are showed in three-dimensional space, where it is possible to observe that while some faults have abrupt behavior, others have an incipient behavior.
Fault simulation in 3D space.
4.2. Fault Diagnosis
The fault diagnosis experiments were performed considering different scenarios. Each scenario consists in the simulation of sequences from 3 to 11 randomly selected fault types within a set of faults with periods of normal operation between faults. In order to assess the robustness of the proposed classifier to the presence of noise in the data, for each monitored variable random Gaussian noise was added with a zero mean and standard deviation equal to 2% of the variable nominal value, considering normal operation of the system.
Data samples related to monitored variables of the DC drive system, armature current (Ia), field current (Ifd), and speed machine (V) were provided as inputs of the classifier in an online mode, and, in each sequence, the output classifier was compared to the sequence provided. Whereas the classifier starts with no fuzzy rule set, the first samples of data should match the normal operation of the system, that is, the first rule created to describe the normal operation. After that, during the diagnosis, faults are detected and new rules that describe each type of fault are created. The label of each class, which defines it as normal operation or fault, is provided by the system operator at the time that each new rule is created. Following the first fault occurrence, the operator intervention is not needed anymore. For the experiments, the parameters of the recursive clustering algorithm were defined as follows: χ3,0.952=7,8147; Finit=10-2I; αinit=0.5; z1=2; z2=3.
Figures 9, 10, 11, 12, and 13 show the results of fault diagnosis in each of the simulated scenarios, where we can compare the estimated output (classified faults sequence) of the proposed classifier with the desired output (selected faults sequence) from input data samples. Results show that the classifier was able to correctly diagnose all the DC drive system faults. Whereas the presence of noise in the data samples, the occurrence of false alarms or misclassification (represented by isolated points on the graphs) is significantly low, even in the scenario with the highest number of possible faults.
Desired output and estimated output by proposed classifier in a scenario of 3 faults.
Desired output and estimated output by proposed classifier in a scenario of 5 faults.
Desired output and estimated output by proposed classifier in a scenario of 7 faults.
Desired output and estimated output by proposed classifier in a scenario of 9 faults.
Desired output and estimated output by proposed classifier in a scenario of 11 faults.
In this work, the classifier performance evaluation was held in terms of fault detection and fault classification, as suggested in [3]. Three metrics were calculated in fault detection evaluation.
Probability of detection (POD): it assesses the detected faults over all potential fault cases (sensitivity). Consider
(24)POD=aa+c.
Probability of false alarm (POFA): it considers the proportion of all fault-free cases that trigger a fault detection alarm. Consider
(25)POFA=bb+d.
Accuracy (ACC): it measures the effectiveness of the algorithm in correctly distinguishing between a fault-present condition and fault-free condition. Consider
(26)ACC=a+da+b+c+d,
where a represents the number of detected faults; b represents the number of false alarms; c represents the number of missed faults; and d represents the number of correct rejections.
Regarding fault classification evaluation, the metric fault isolation rate (FIR) was used, expressing the percentage of all faults that the classifier is able to isolate unambiguously. This metric is computed as
(27)FIR=AA+C,
where A represents the total of detected and correctly classified faults and C represents the total of detected and incorrectly classified faults.
Other metrics that were used to assess the performance of the classifier are as follows.
Detection delay time (DDT): it represents the time lag between the first occurrence of a given fault and its detection by the algorithm.
Isolation delay time (IDT): it represents the time lag between the second occurrence of a given fault and its classification by the algorithm.
Operator intervention rate (OIR): it is the percentage of faults classified with the intervention of the operator.
All results of fault diagnosis experiments with DC drive system obtained by classifier proposed in this work were compared to the results obtained using the classifier proposed by [10]. For the experiments, the parameters of this alternative classifier were set to w=100, λ=0.001, α=0.01, Tμy=0.01.
Table 2 summarizes the results for both classifiers using the fault detection metrics described. The results show that the classifier proposed in this work has higher levels of fault detection rates and accuracy in all scenarios, with values above 99% and low false alarm rates, with values below 0.3%. These results prove the efficiency of the algorithm in detecting simulated faults in the DC drive system. Despite its lower fault detection rates and lower accuracy, the classifier proposed by [10] did not show any false alarms.
Fault detection performance.
Scenario
Proposed
Lemos et al. [10]
POD (%)
POFA (%)
ACC (%)
POD (%)
POFA (%)
ACC (%)
3 faults
99.85
0.00
99.89
99.79
0.00
99.85
5 faults
99.68
0.00
99.72
98.39
0.00
98.66
7 faults
99.79
0.00
99.89
98.50
0.00
98.68
9 faults
99.82
0.03
99.93
99.77
0.00
99.79
11 faults
99.33
0.29
99.32
93.73
0.00
94.26
Table 3 summarizes the results for both classifiers using the fault classification metrics described. The results show that the classifier proposed in this work presented higher fault isolation rate in all scenarios, with an average value of approximately 97%. In all scenarios, the operator intervention on fault classification was less than or equal to 0.05%. These results show the ability of the classifier to automatically diagnose almost all faults after the first occurrence, and it also reveals their ability to learn. Note that the classifier proposed by [10] in general had a lower performance in fault classification than the proposed classifier and it needed more operator interventions.
Fault classification performance.
Scenario
Proposed
Lemos et al. [10]
FIR (%)
OIR (%)
FIR (%)
OIR (%)
3 faults
99.75
0.05
99.77
0.11
5 faults
99.71
0.05
98.65
0.23
7 faults
99.76
0.03
98.39
0.19
9 faults
95.12
0.05
94.39
0.14
11 faults
94.39
0.05
90.85
0.38
Table 4 summarizes the results for both classifiers using the time metrics in fault detection and classification. The average fault detection time found in the experiments with the classifier proposed in this work was approximately 0.060 s, which is primarily determined by the amount of data samples required to the recursive clustering algorithm to detect a context change. The average time to isolate faults found in the experiments was approximately 0.009 s. A comparison between the average values for fault detection and fault isolation time demonstrates that fault classification is faster after the first occurrence of each type of fault, since the classifier database already has the fuzzy rules and labels for all types of detected faults, not requiring an operator intervention. The results of the experiments with the classifier proposed by [10] resulted in average values for fault detection and fault isolation time of 0.040 s and 0.007 s, respectively, demonstrating that their classifier has quicker response than the classifier proposed in this work, according to the different update mechanisms in the clustering algorithms used in each one of the classifiers.
Fault detection and classification time.
Scenario
Proposed
Lemos et al. [10]
DDT (s)
IDT (s)
DDT (s)
IDT (s)
3 faults
0.038
0.004
0.012
0.002
5 faults
0.047
0.003
0.047
0.005
7 faults
0.060
0.005
0.070
0.005
9 faults
0.068
0.008
0.012
0.006
11 faults
0.086
0.024
0.072
0.016
Another experiment was conducted to evaluate the robustness of the proposed classifier to the presence of outliers in the data. In this experiment, a scenario of 5 faults was simulated. Outliers were inserted in the data samples; that is, some samples were corrupted with high variance noise. Figure 14 shows the fault simulation results in the presence of outliers in the three dimensions space. Figure 15 shows the results of fault diagnosis in this scenario.
Fault simulation with outliers presence.
Desired output and estimated output by proposed classifier in a scenario of 5 faults with outliers presence.
The fault diagnosis results for this experiment shows that even in the presence of outliers the proposed classifier was able to correctly detect and diagnose all faults considered. This result shows that the classifier was able to correctly distinguish between outliers and valid data samples. The results of this experiment are presented in Tables 5 and 6. Analysing these tables, one can note that the proposed classifier has virtually the same performance in fault diagnosis with absence or presence of outliers, although we note an increase in false alarm rate. This experiment showed the greater robustness of the classifier proposed in this work when compared with the classifier proposed by [10], since the latter showed major differences in false alarm and fault isolation rates in scenarios with and without outliers.
Fault detection performance with outliers presence.
Scenario
Proposed
Lemos et al. [10]
POD (%)
POFA (%)
ACC (%)
POD (%)
POFA (%)
ACC (%)
Without outliers
99.69
0.00
99.85
99.70
0.08
99.74
With outliers
99.68
0.02
99.83
99.79
0.96
99.66
Fault classification performance with outliers presence.
Scenario
Proposed
Lemos et al. [10]
FIR (%)
OIR (%)
FIR (%)
OIR (%)
Without outliers
99.75
0.03
99.32
0.21
With outliers
99.73
0.03
98.80
0.22
5. Conclusions
In this work, we presented an evolving fuzzy classifier for fault diagnosis of complex nonstationary dynamic systems. The proposed classifier is composed by a set of fuzzy rules created and updated based on recursive unsupervised clustering algorithm. In this algorithm, a new mechanism for cluster updating based on a drift detection method is employed. With this mechanism, the update of the cluster depends not only on the similarity measure between data samples and clusters, but also on the data context monitoring. This feature gives the proposed classifier robustness to outliers and noise, as suggested by the experiment results. Multivariate Gaussian membership functions are used in fuzzy rules antecedents, whose parameters are extracted directly from clusters. This multivariate approach is used to eliminate the loss of information due to possible interactions between the input variables.
The classifier proposed in this work was evaluated in fault diagnosis experiments performed with a DC drive system model. The experiments showed that the classifier was able to detect and classify all faults with a high performance, even in the presence of outliers and noise. The low false alarm rate and high fault isolation rate obtained in all experiments showed that the recursive clustering algorithm with drift detection method was able to efficiently distinguish data samples representing clusters of invalid data. Moreover, the proposed classifier was able to automatically diagnose almost all faults, requiring operator intervention on a small percentage of cases. This demonstrates the advantage of the continuous and incremental learning of the classifier over other classifiers that require retraining whenever an unknown type of fault is found.
Considering the presented features, the classifier proposed in this work has as advantages the ability to learn from faults in online mode and in real time, the ability to adapt to cope with changes in the dynamic system, and robustness to the presence of outliers and noise in the input data. Summarizing, the proposed classifier has showed to be a promising alternative for application in fault diagnosis in complex nonstationary dynamic systems, where other methods prove to be inefficient or less advantageous, because of the characteristics of such systems. In a future work, we will investigate the application of the proposed algorithm in a real time diagnosis and prognosis system of equipment.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported by the Brazilian National Research Council (CNPq) and the Research Foundation of the State of Minas Gerais (FAPEMIG).
VenkatasubramanianV.Prognostic and diagnostic monitoring of complex systems for product lifecycle management: challenges and opportunities20052961253126310.1016/j.compchemeng.2005.02.0262-s2.0-18244389434JardineA. K. S.LinD.BanjevicD.A review on machinery diagnostics and prognostics implementing condition-based maintenance20062071483151010.1016/j.ymssp.2005.09.0122-s2.0-33646534620VachtsevanosG.LewisF.RoemeM.HessA.WuB.2006New York, NY, USAJohn Wiley & SonsAbellan-NebotJ. V.Romero SubirónF.A review of machining monitoring systems based on artificial intelligence process models2010471–423725710.1007/s00170-009-2191-82-s2.0-77249125824ZhangY.ZhangL.ZhangH.Fault detection for industrial processes201220121875782810.1155/2012/7578282-s2.0-84872162733LughoferE.GuardiolaC.Applying evolving fuzzy models with adaptive local error bars to on-line fault detectionProceedings of the 3rd International Workshop on Genetic and Evolving Systems (GEFS '08)March 2008354010.1109/GEFS.2008.4484564ChivalaD.MendoncaL. F.SousaJ. M. C.Sá da CostaJ. M. G.Application of evolving fuzzy modeling to fault tolerant control20101420922310.1007/s12530-010-9019-52-s2.0-79952449848FilevD. P.ChinnamR. B.TsengF.BaruahP.An industrial strength novelty detection framework for autonomous equipment monitoring and diagnostics20106476777910.1109/TII.2010.20607322-s2.0-78149466949PetkovićM.RapaićM. R.JeličićZ. D.PisanoA.On-line adaptive clustering for process monitoring and fault detection20123911102261023510.1016/j.eswa.2012.02.1502-s2.0-84859421708LemosA.CaminhasW.GomideF.Adaptive fault detection and diagnosis using an evolving fuzzy classifier2013220648510.1016/j.ins.2011.08.0302-s2.0-84864839130AngelovP.KasabovN.Evolving intelligent systems—eIS200615113KasabovN.FilevD.Evolving intelligent systems: methods, learning, & applicationsProceeding of the International Symposium on Evolving Fuzzy Systems (EFS '06)September 200681810.1109/ISEFS.2006.2511852-s2.0-34250694932AngelovP.FilevD.KasabovN.2010New York, NY, USAJohn Wiley & SonsKasabovN. K.SongQ.DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction200210214415410.1109/91.9951172-s2.0-0036530967AngelovP.FilevD.On-line design of Takagi-Sugeno modelsProceedings of the 10th International Fuzzy Systems Association World CongressJuly 2003tur5765842-s2.0-7044264242LengG.McGinnityT. M.PrasadG.An approach for on-line extraction of fuzzy rules using a self-organising fuzzy neural network2005150221124310.1016/j.fss.2004.03.001ZBL1067.681282-s2.0-11244351634RongH.SundararajanN.HuangG.SaratchandranP.Sequential Adaptive Fuzzy Inference System (SAFIS) for nonlinear system identification and prediction200615791260127510.1016/j.fss.2005.12.0112-s2.0-33645070541LughoferE. D.FLEXFIS: A robust incremental learning approach for evolving Takagi-Sugeno fuzzy models20081661393141010.1109/TFUZZ.2008.9259082-s2.0-55249122198Soleimani-BH.LucasC.AraabiB. N.Recursive Gath-Geva clustering as a basis for evolving neuro-fuzzy modeling201011597110.1007/s12530-010-9006-x2-s2.0-79952317714LimaE.HellM.GomideF.BalliniR.2010New York, NY, USAJohn Wiley & SonsLemosA.CaminhasW.GomideF.Multivariable gaussian evolving fuzzy modeling system20111919110410.1109/TFUZZ.2010.20873812-s2.0-79551619179GustafsonD. E.KesselW. C.Fuzzy clustering with fuzzy covarianceProceedings of IEEE Conference on Decision and ControlJanuary 19797617662-s2.0-0018057468GamaJ.MedasP.CastilloG.RodriguesP.Lea rning with drift detectionProceedings of 17th Brazilian Symposium on Artificial Intelligence2004286295FilevD.GeorgievaO.12. An extended version of the Gustafson-Kessel algorithm for evolving data stream clustering2010New York, NY, USAJohn Wiley and Sons2733000.1002/9780470569962.ch12SebastiaoR.GamaJ.A study on change detection methodsProceedings of the 14th Portuguese Conference on Artificial Intelligence2009353364DudaR. O.HartP. E.StorkD. G.20002ndNew York, NY, USAWiley-InterscienceMR1802993JainA. K.Data clustering: 50 years beyond K-means201031865166610.1016/j.patrec.2009.09.0112-s2.0-77950369345AngelovP.An approach for fuzzy rule-base adaptation using on-line clustering200435327528910.1016/j.ijar.2003.08.006MR2058427ZBL1068.681442-s2.0-1142279663HastieT.TibshiraniR.FriedmanJ.20012ndNew York, NY, USASpringerMR1851606MaloofM. A.MichalskiR. S.Selecting examples for partial memory learning2000411275210.1023/A:10076611196492-s2.0-0034299906KlinkenbergR.JoachimsT.Detecting concept drift with support vector machinesProceedings of the 17th International Conference on Machine Learning (ICML '00)2000487494KlinkenbergR.Learning drifting concepts: example selection vs. example weighting200483281300KiferD.Ben-DavidS.GehrkeJ.Detecting change in data streamsProceedings of the 30th International Conference on Very Large Data Bases (VLDB '04)2004180191SebastiaoR.GamaJ.RodriguesP. P.BernardesJ.Monitoring incremental histogram distribution for change detection in data streamsProceedings of the Workshop on Knowledge Discovery from Sensor Data (KDD '08)August 2008Las Vegas, NV, USA2542MitchellT. M.1997New York, NY, USAMcGraw-HillKellyP. M.An algorithm for merging hyperellipsoidal clusters1994LA-UR-94-3306Los Alamos, NM, USALos Alamos National LaboratoryJangJ. S. R.SunC. T.MizutaniE.1997Upper Saddle River, NJ, USAPrentice-HallCaminhasW. M.TakahashiR. H. C.Dynamic system failure detection and diagnosis employing sliding mode observers and fuzzy neural networksProceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International ConferenceJuly 2001Vancouver, Canada3043092-s2.0-0035792654