Fault Diagnosis with Evolving Fuzzy Classifier Based on Clustering Algorithm and Drift Detection

1Graduate Program in Electrical Engineering, Federal University of Minas Gerais, Avenue Antônio Carlos 6627, 31270-901 Belo Horizonte, MG, Brazil 2Department of Computer Engineering, Faculdade de Ciência e Tecnologia de Montes Claros, Avenue Deputado Esteves Rodrigues 1637, 39400-142 Montes Claros, MG, Brazil 3Department of Electronics Engineering, Federal University of Minas Gerais, Avenue Antônio Carlos 6627, 31270-901, Belo Horizonte, MG, Brazil


Introduction
The advance of technology has resulted in the emergence of machinery and complex equipment, which imposes great challenges for its management and maintenance.In industries, for instance, fault diagnosis in major processes is vitally important to assure normal operation of a plant.In these cases, due to the complexity of the systems, it is infeasible for human operators to diagnose abnormal situations (faults) in a timely manner, leading them to take wrong decisions.Statistical studies indicate that approximately 70% of the accidents in industries are caused by human error, which can account for economic losses, security reductions, and environmental damages [1].This scenario led to the emergence of new concepts on management and maintenance of machinery and equipment, such as condition-based maintenance (CBM) [2].CBM refers to the use of machine or equipment data obtained in real time to infer its working condition (or faulty condition), allowing maintenance scheduling and preventing equipment crashes.Based on CBM, the concept of intelligent maintenance has emerged [3].It employs advanced fault diagnosis systems to achieve the desired goals.Thus, intelligent maintenance becomes necessary for current complex machinery and equipment.
Over the past decades, several intelligent fault diagnosis methods based on different theories and approaches have been proposed in the literature.In general, these methods use mathematical/statistical models, accumulated experience, or even process data to perform fault diagnosis [1].Although methods based on models or experience have shown to be effective, they have the disadvantage of requiring previous knowledge of the dynamic system in question.On the contrary, methods based on process data do not require prior knowledge.They are based solely on data obtained directly from the system.
Recently, fault diagnosis methods based on process data have received great emphasis, since the acquisition of data through sensors is widely common in today's automation systems [4,5].Given this current scenario, many times it is easier to extract knowledge from data than developing a model or accumulating experience.In this type of diagnosis, several works have already proposed data based diagnostics methods employing so-called "intelligent systems, " which are tools derived from computational intelligence, mainly artificial neural networks, fuzzy systems, and neurofuzzy networks, among others [2].
However, despite the good performance achieved by intelligent systems in fault diagnosis, they tend to face difficulties when the problem involves complex nonstationary dynamic systems, which represent the vast majority of the current real cases.In such systems, physical parameters, operating characteristics and fault behaviours change over time, requiring an adaptive fault diagnosis system, able to self-adapt in favor to cope with changes in the monitored system.In order to address fault diagnosis in this scenario, several works propose the use of the so-called "evolving intelligent systems" [6][7][8][9][10].
Evolving intelligent systems are systems based on fuzzy inference systems, artificial neural networks, or a combination of both, the neurofuzzy networks, whose main characteristic is the ability to gradually determine both its structure and parameters from input data acquired in online mode and often in real time [11,12].The application of evolving intelligent systems has been growing in recent years.Many works present successful applications in real world complex problems involving modeling, control, classification, or prediction [13].An important aspect of evolving intelligent systems is that there are different theoretical and practical approaches which can be used for its implementation.Regardless of the approach to be used, the main features of evolving intelligent systems are as follows: (i) its structure is not fixed and is not defined a priori: it grows (expands or shrinks) naturally as the system evolves; (ii) its parameters are adjusted (adapted) as the system evolves; (iii) the operation is continuous; that is, they are based on online learning algorithms and, if necessary, in real time.
One of the most used approaches to define the structure of an evolving intelligent system is unsupervised recursive clustering.Generally, the algorithm performs data clustering in the input or input-output data space in an incremental manner, defining the center of each cluster, and in some cases, the radius of the cluster (or zone of influence).During the evolving process, the algorithm can create new clusters, update existing clusters, or eliminate redundant ones.The models proposed in [14][15][16][17][18][19][20][21] are examples of intelligent systems based on evolving clustering algorithms.
Most evolving intelligent systems based on recursive clustering adopt a mechanism to update the structure and parameters of the system (creation/modification/removal of clusters) using some measure of similarity between input data samples and existing clusters.Although this mechanism is functional, it may lead to an erroneous definition of the structure, since outliers or noisy samples (as usually are the data acquired by sensors in industrial environments) which exceed the measure of similarity may generate clusters that do not effectively represent the data spacial structure [21].Some evolving intelligent systems adopt more elaborated mechanisms to update the model structure and system parameters, such as the models proposed in [20,21], using methods to ignore/filter outliers and noise.
Considering the fault diagnosis problem, the use of evolving intelligent systems based on recursive clustering algorithms robust to outliers and data noise is mandatory.In this problem, each new cluster created is usually associated with a new faulty condition.Thus, if the clustering procedure is not robust, the fault diagnosis model tends to have a high false alarm rate; that is, new faulty conditions are erroneously detected.In this context, this paper proposes a fault diagnosis approach based on an evolving fuzzy classifier which uses a new robust unsupervised recursive clustering algorithm.The proposed classifier uses a modified version of the Gustafson-Kessel (GK) clustering algorithm [22] with the incorporation of the drift detection method (DDM) [23].
GK is a powerful clustering algorithm.Unlike many others, it allows the identification of clusters with different shapes and orientations in space.The algorithm employs a technique to adapt the distance metric to the shape of each cluster using an estimation of the cluster covariance matrix.Furthermore, the GK algorithm has also the advantage of being relatively insensitive to data scale and initialization of the partition matrix [24].Several applications have been proposed in the literature based on this clustering algorithm, such as time series prediction, dynamic systems modeling, fault diagnosis, and prognosis.
According to the literature, a drift detection is a method to detect gradual changes in the context of input data.By context, it is understood as a set of generated data when the process is stationary.Thus, a method for drift detection is able to detect time instants when changes occur in the context of the data.The detection of a new context suggests that the current model is outdated and needs to be updated using current relevant information.Drift detection methods are suitable for applications involving machine learning, where algorithms are applied to real world problems, in complex, nonstationary, and dynamic environments.In these appli-cations, large amounts of information are provided in a continuous flow of high-speed data presenting variations over time as, for example, real time monitoring of industrial plants [25].The learning algorithms must be able to monitor the behavior of the dynamic system in question and adapt the model as changes occur.Among several methods proposed for drift detection, the DDM algorithm employs simple and computationally efficient method to detect moments when changes occur.It consists of an independent drift detection method, and it can be embedded into any learning algorithm, while increasing its efficiency in problems involving nonstationary dynamic models.
The new unsupervised recursive clustering algorithm proposed in this paper combines the advantages of the GK algorithm, especially the ability to identify clusters with different shapes and orientations in an online mode, with the DDM algorithm.The DDM algorithm is used to detect changes in the input stream triggering updates in the cluster structure.In the proposed algorithm, any clustering update depends not only on the similarity measure, but also on monitoring changes in the input data flow, which gives the algorithm a greater robustness to the presence of outliers and noise.A merging cluster mechanism was also incorporated into the algorithm to enable the removal of redundant clusters.The fuzzy rule base of the proposed classifier is updated whenever the cluster structure is modified.The clusters centers and covariance matrices are used as parameters of fuzzy rules.Multivariate Gaussian memberships functions are employed in the rules, characterized by a central vector and a dispersion matrix, which represents the current dispersion of the input variables, as well as the interactions between them [21].
In accordance with the characteristics of the proposed recursive clustering algorithm, the main benefits achieved by the classifier used in this work are (i) the ability to learn faults of the dynamic system in online mode and, if necessary, in real time, eliminating the need for prior knowledge of the system; (ii) the ability to adapt whenever changes are detected in the monitored system, allowing the application to real problems; (iii) low false alarm rate and high fault isolation rate due to the robustness to outliers and noise, increasing the reliability of diagnosis.
To evaluate the performance of the proposed approach in fault diagnosis, a DC drive system fault simulator was used to simulate normal operation and several faulty conditions.Outliers and noise were added to the simulated data to evaluate the robustness of the fault diagnosis model.This paper is organized as follows.Section 2 presents the theoretical concepts regarding recursive clustering algorithm, drift detection method, and presents the proposed recursive clustering algorithm.Next, Section 3 presents the proposed classifier and its application in fault diagnosis.Section 4 presents the experiments and results.Finally, Section 5 presents the conclusion and suggestions for future works.

Recursive Clustering Algorithm and Drift Detection
2.1.Recursive Gustaffson-Kessel Algorithm.In pattern recognition, clustering algorithms are among the most useful tools to solve problems that involve analysis of nonlabeled data, or unsupervised learning [26].Over the past decades, thousands of clustering algorithms have been proposed [27], but most of them are based on the offline learning concept or batch learning; it is assumed that the entire dataset is previously available.However, for many applications, data is acquired in real time, requiring online learning.
In contrast to clustering algorithms for offline learning which find clusters employing an iterative strategy, such as Kmeans and Fuzzy C-Means (FCM) [27], clustering algorithms for online learning are based on recursive strategies, which allow the algorithm to find clusters processing each input data sample only once.Several algorithms have been proposed in the last years based on this approach, such as evolving clustering method (ECM) [14], evolving vector quantization (eVQ) [18], and eClustering [28].A common feature of these algorithms is that they assume that the form of the clusters is spherical, which can be a limiting factor in real applications, where the clusters may have different shapes and orientation in space.
Unlike many clustering algorithms that employ Euclidian distance as measure of similarity, GK algorithm employs Mahalanobis distance, which allows the identification of clusters with ellipsoidal shapes.In this algorithm, the distance is defined as follows: where  2  represents the distance between an input data sample   = [ 1 , . . .,   ],  = 1, . . ., , and the cluster center V  ,  = 1, . . ., , where  is the number of data samples,  is the number of data dimensions, and  is the number of clusters.The norm-inducing matrix   ,  = 1, . . ., , defines the shape and orientation of each cluster in space, which depends on a fuzzy covariance matrix   ,  = 1, . . ., , and of the membership degree of the input data sample   ,  = 1, . . ., ,  = 1, . . ., .The GK algorithm uses an iterative process to estimate the parameters of the clusters (the cluster center and fuzzy covariance matrix), which are used to define the distance  2  and membership degree   .This process is finished when a certain convergence criterion is reached.But, as discussed at the beginning of this section, when the application requires the definition of clustering in online mode, a recursive procedure is required.More details about the GK algorithm can be found in [22].
In [24], an extended version of the GK algorithm named evolving GK-like algorithm (eGKL) is proposed.This approach estimates the number of clusters and performs the adaptation of its parameters recursively, maintaining the advantages of the GK algorithm, such as the ability to identify clusters with generic shapes and orientations.The eGKL algorithm does not demand any a priori information regarding the number of clusters.In order to estimate the number of clusters, a strategy to evaluate each new input data sample is used.The strategy checks if each sample belongs to an existing cluster.If the current data sample belongs to a cluster already set, the parameters of the cluster (center and covariance matrix) are updated.If the data sample does not belong to any of the existing clusters, it is used to define a new one.To evaluate the similarity between a new sample data and one of the existing clusters, the eGKL algorithm employs the Mahalanobis distance, defined as follows: In this strategy, the current data sample belongs to an existing cluster if the distance to the cluster center is smaller than the cluster radius.The eGKL algorithm uses an approach inspired in concepts of statistical process control to estimate the radius of each cluster.In this approach, it is assumed that a sample belongs to a cluster if the following relationship holds: where  2 , is the value of a Chi-squared distribution with  degrees of freedom and a confidence interval .The degrees of freedom  correspond to the input space dimension.This approach has the advantage of avoiding the problem called "curse of dimensionality" [29], that is, the problem of increasing the distance between two adjacent points with the increase in the input space dimensionality, since  2 , is proportional to the dimension of the input data.
In eGLK algorithm, if condition ( 3) is satisfied, it means that the current data sample belongs to a cluster, so the cluster parameters are updated.Otherwise, it is assumed that the current data sample does not belong to any one of the existing clusters, and a new cluster is created.The complete procedures of the eGLK algorithm can be seen in [24].
To increase the eGKL algorithm robustness to outliers, the authors propose a mechanism based on the number of data samples that belong to a cluster.In this mechanism, if the number of data samples   ,  = 1, . . ., , already assigned to an existing cluster is less than  min (a minimum number initially chosen), even if the new data sample does not belong to that cluster, the cluster parameters are updated.Although it is functional, this mechanism depends on the proper choice of parameters to the problem at hand, which can be difficult for problems where a priori information is not available.

Drift Detection Method.
Several drift detection methods have been proposed.In general, they can be classified into two categories: methods that perform adaptive learning at regular intervals regardless of the occurrence of changes and methods that detect changes first and subsequently adapt the learning to these changes [25].Considering the first category, methods can use time windows of fixed size or weight the data according to their age or utility [30][31][32].When the time windows of a fixed size are used, at each time frame, learning is performed only with data samples included in the window.An inherent difficulty with methods using fixed-size windows is choosing the appropriate window size for each problem.In the second category, methods use some indicators monitored over time to detect changes, such as performance measures, data distribution, or data properties [23,33,34].If during the monitoring process a drift is detect, actions are taken to adapt the model to the change that has occurred, as in the case of using adaptive size time window, where the actions are to adjust the window according to the extent of the context change.
The DDM algorithm, which belongs to the second category, employs a simple method with direct application.This method is based on monitoring the number of errors produced by a learning model during prediction.The method uses the Binomial distribution to determine the general form of the probability for the random variable that represents the number of prediction errors into a sequence of  input data samples.For each  data sample sequences, the error rate is the probability of the prediction error   with standard deviation   = √  (1 −   )/.According to the probability approximately correct (PAC) learning model [35], the error rate of the learning algorithm decreases with the increase of input data samples, and if the distribution is stationary, a significant increase in the error rate suggests context changes.In this case, it is assumed that the current model is inappropriate and should be updated.
In this method, while monitoring the error, it defines a warning and a drift level.When   +   exceeds the warning level, the data samples are stored in memory.However, if   +   exceeds the drift level, it is considered that there is a context change.In this situation, the model induced by the learning algorithm should be updated with the data samples stored since the time that the warning level has been reached.It is possible that the error increases and, after reaching the warning level, it decreases to lower levels.This situation corresponds to a false alarm, where there is no change of context and, therefore, no action is required and the data samples stored in the memory are no longer needed.More details about the DDM method can be found in [23].
The use of the DDM algorithm embedded in a model learning algorithm can keep the dynamic system model continuously updated to the current context.For instance, DDM can be used embedded in a recursive clustering algorithm.In this case, the definition of the clusters are adjusted whenever a context change is detected.DDM is used to avoid the nonrobust approach of creating new clusters whenever a similarity measure threshold is violated.This mechanism gives the recursive clustering algorithm a greater robustness to outliers and noise in applications where online learning of nonstationary dynamic models is necessary.

Proposed Algorithm.
This section describes the proposed unsupervised recursive clustering algorithm with a new mechanism of clustering update.The algorithm is a recursive version of the GK algorithm, inspired by the eGKL algorithm, incorporating the DDM algorithm.In the proposed algorithm, clustering is performed in online mode and, if necessary, in real time.
Assuming that there is no a priori information about the clustering structure nor a initial set of input data samples, the proposed algorithm starts by associating the center of the first cluster  1 to the first data sample  1 .The corresponding covariance matrix  1 , the learning rate  1 , and the number of samples associated with the first cluster  1 are defined as follows: where  init = ;  is an identity matrix of  size,  is a small positive number (default value:  = 10 −2 ), and  init ∈ [0, 1] is the initial learning rate (default value:  init = 0.5).
The algorithm stops when all data samples are processed; otherwise, a new data sample   is obtained and the distance between the data sample and the centers of the existing clusters is computed: The similarity between the current data sample and the existing clusters is verified by the similarity condition If similarity condition ( 6) is met for a given cluster, it is assumed that the current sample belongs to this cluster.The cluster parameters (center, covariance matrix, learning rate, and number of samples in the cluster) are then updated as follows: where  = arg min =1,..., ( 2  ).If similarity condition (6) is not met, it is assumed that the current sample does not belong to any existing cluster.The algorithm increments a variable that represents the number of dissimilarities,  dis =  dis + 1; then, the error probability and the standard deviation are computed as In this algorithm, the  and  values are stored whenever + reach the lowest value during the process, obtaining  min and  min .If the following condition is met, then  min =  and  min = .Note that, when algorithm starts, the  and  values must be initialized as a positive number, is suggested set at one for each value.
To decide whether the current data sample   represents a new cluster or it is just an outlier, warning and drift conditions are evaluated.The warning condition is verified as where  1 is the warning level (default value:  1 = 2).If the warning level is reached, then the current data sample is stored in a window of samples (data)  ,  = 1, . . .,  (where  is the current size of the window) and then the drift condition is evaluated.Otherwise, the algorithm processes the next input data sample.Drift condition is verified as where  2 is the drift level (default value:  2 = 3).If the drift level is reached, a new cluster is created and the center and the covariance matrix of the new cluster are determined by the samples stored in the data window as follows: The remaining parameters of the new cluster (learning rate and number of samples in the cluster) are initialized as In order to avoid redundant cluster formation, during the update, the similarity between clusters is checked.To achieve this, distances between the centers of the clusters are computed as follows: ,  = 1, . . ., ,  = 1, . . ., . ( If one of the following similarity conditions is met for two existing clusters  and ,  2   <  2 , ,  = 1, . . ., ,  = 1, . . ., ,  2  <  2 , ,  = 1, . . ., ,  = 1, . . ., , the clusters are merged.These clusters have a hyper ellipsoidal shape, defined by a mean vector, a covariance matrix, and a number of samples associated with each one.The combination of these two clusters produces a new one with parameters computed as follows [36]: Algorithm 1 summarizes the proposed recursive clustering algorithm.

Evolving Fuzzy Classifier for Fault Diagnosis
The use of algorithms for pattern classification is present in many current applications, such as fingerprint recognition for security systems, handwriting recognition on touch screen computers, DNA sequences identification in medical diagnostic softwares, and fault diagnosis in industrial equipment.In this context, the problem of pattern classification consists in assigning a class or a category for each data sample from a set of "raw" data [26].In many applications, pattern classification algorithms based on fuzzy rules have been used due to their advantages in relation to classic algorithms for pattern classification [26], especially by the good prediction performance in real problems and good transparency in linguistic rules [37], which allows an easy comprehension of the dependence between pattern characteristics.
where  * = arg max 1<< (  ),  is the number of fuzzy rules, and   is the activation degree of the th fuzzy rule, defined by a -norm, usually expressed as a product operator: where   are the membership functions of fuzzy sets defined by Gaussians: where V  and  2  represent, respectively, the membership functions center and variance.
To implement this fuzzy classifier architecture, clustering is usually performed in the input or input-output data space.Then, rules are created using one-dimensional (or univariate) fuzzy sets, generated from the projection of the clusters in the axis of each variable.According to [21], this approach can lead to information loss if there is interaction between variables, and, to avoid this, the authors propose the use of multivariate Gaussian membership functions to represent antecedent fuzzy sets of each rule.These membership functions are described as where V is a 1 ×  central vector and Σ is a  ×  symmetric positive definite matrix.The central vector is defined as the modal value and represents () typical value and the Σ matrix denotes the dispersion and represents () spreading.In this case, each cluster found by the clustering algorithm is associated with a fuzzy rule and the multivariate Gaussian membership function parameters is defined as the parameters of the corresponding cluster.If multivariate Gaussian membership functions are used, the fuzzy classifier will have a rule set defined as where   is the fuzzy set with multivariate Gaussian membership function (21) of the th fuzzy rule, with parameters extracted from the corresponding cluster.
Usually, more than one rule can be used to describe a class; for example, the class can be multimodal.In this case, only one rule cannot be sufficient to describe all possible variations of the same class.Thus, the fuzzy classifier aggregates rules outputs associated with the same class using a -norm.The result of the aggregation can be interpreted like rules as follows: The result of this aggregation is the degree of relevance of each known class.The classification of each new sample   is defined by the class with the highest relevance degree.
In some pattern classification applications data samples classes are not known a priori.In these situations, it is required the use of an unsupervised learning process for classifier implementation.Moreover, in applications where the pattern classification should be performed in real time, the learning should be performed using incremental algorithms, processing each data sample once as a data stream.To solve these problems, the solution is to use a recursive clustering algorithm.
In this paper, we propose an evolving fuzzy classifier based on recursive clustering algorithm with drift detection presented in Section 2.3, which allows the creation of a fuzzy rule base in online mode and, if necessary, in real time from input data samples.This approach is different from the ones employed in traditional fuzzy classifiers, which require some training (usually supervised) conducted in offline mode.
The proposed classifier updates the rule base using the output of the recursive clustering algorithm described in the previous section.For each new input data sample, if a new cluster is created, a new fuzzy rule ( 22) is added to the rule base, where the cluster center and the covariance matrix are used as parameters of the multivariable Gaussian membership function of the antecedents.The rule consequent (the crisp output corresponding to the class label) must be defined by experts or system operators, since in unsupervised learning processes incoming online samples usually are not prelabelled.If a cluster is updated, the corresponding fuzzy rules are updated, the class label is determined as the consequent of the fuzzy rule with the highest activation degree, and the user intervention is not necessary.be noted that both the number of rules and the number of classes are determined during the evolving process, and it is not necessary to set these parameters a priori.Algorithm 2 summarizes the procedures of the classifier.The application of the proposed classifier for fault diagnosis is illustrated in Figure 1.Data samples are obtained from a dynamic system in a continuous stream, usually provided by sensors that monitor the process.These data might require the use of preprocessing techniques for feature extraction.
The classifier starts with an empty rule set.Rules are created as the recursive clustering algorithm creates clusters to represent the data stream.Each rule will be related to a class, and each class will be related to a dynamic system condition, representing a normal operation or a faulty condition.When a new rule is created, the system operator is notified and informs the label of the class that defines it as a normal operation or as a specific fault.All of the necessary diagnostic information, the fuzzy rules, and classes labels are stored in a unified database and updated while the system is used.
After an initial period of operation, the database will contain a set of fuzzy rules and classes labels defined so far.When a new data sample is associated with an existing cluster, the classifier updates the corresponding fuzzy rule and classifies the dynamic system condition as the label present in the consequent of the fuzzy rule with the highest activation degree.In this situation, system operator intervention is not required, and the classification of the dynamic system condition is performed automatically.
The classifier proposed in this work has as main characteristic the ability to diagnose faults in a complex nonstationary dynamic system.The classifier does not require any a priori information about the dynamic model neither process system historical data.This allows the classifier to construct a rule base in an evolving way and, with the aid of the operator, to learn to diagnose faults as they occur.Thus, the proposed classifier is able to adapt to the dynamic system, making it possible to diagnose faults not previously known.

Experiments and Results
The proposed classifier was evaluated for fault diagnosis in a DC drive system.A fault simulator was used in this evaluation, from which normal operation data and fault data were generated and organized in random sequences of different operation modes.The output of the classifier was compared with the provided sequence to prove its efficiency in detecting and classifying faults.

DC Drive System.
The DC drive system model employed was proposed by [38] and consists of a benchmark for fault detection and diagnosis.As illustrated in Figure 2, the system comprises of two power supplies, two controlled static converters, a direct current machine and a mechanical load.The variables definitions shown in the representation of the system are as follows: (i) V  : voltage of the armature circuit; (ii) V fd : voltage of the field circuit; (iii)   : current of the armature circuit; (iv)  fd : current of the field circuit; (v)   ,   : resistance/inductance of the armature circuit; (vi)  fd ,  fd : resistance/inductance of the field circuit; (vii)   : counter-electromotive force of the armature; (viii)  em : electromagnetic torque; (ix)   : torque required by the mechanical load.
Using this benchmark is possible to perform fault simulation on the actuators (armature and field converters), at the plant or process (machine and mechanical load), and on sensors (current and speed meters), as detailed in Table 1.To simulate a fault, was employed 750 V power supplies, constant speed and overload at 25% of nominal torque set at half of the simulation interval.A sampling period of 2 ms for the monitored variables was used.Machine speed sensor fault Figures 3,4,5,6,and 7 show as an example the curves of the armature current (  ), field current ( fd ), and speed machine () in fault simulation.In this case, the following faults were simulated: armature converter disconnection, field converter short-circuit, armature turns short-circuit, bearing lubrication fault, and field current sensor fault.At the beginning of each simulation, the system is working under normal operation.
In Figure 8, the same faults presented in the previous figures are showed in three-dimensional space, where it is possible to observe that while some faults have abrupt behavior, others have an incipient behavior.

Fault Diagnosis.
The fault diagnosis experiments were performed considering different scenarios.Each scenario consists in the simulation of sequences from 3 to 11 randomly selected fault types within a set of faults with periods of normal operation between faults.In order to assess the robustness of the proposed classifier to the presence of noise in the data, for each monitored variable random Gaussian noise was added with a zero mean and standard deviation equal to 2% of the variable nominal value, considering normal operation of the system.
Data samples related to monitored variables of the DC drive system, armature current (  ), field current ( fd ), and speed machine () were provided as inputs of the classifier    in an online mode, and, in each sequence, the output classifier was compared to the sequence provided.Whereas the classifier starts with no fuzzy rule set, the first samples of data should match the normal operation of the system, that is, the first rule created to describe the normal operation.rules that describe each type of fault are created.The label of each class, which defines it as normal operation or fault, is provided by the system operator at the time that each new rule is created.Following the first fault occurrence, the operator intervention is not needed anymore.For the experiments, the parameters of the recursive clustering algorithm were defined as follows:  2 3,0.95 = 7, 8147;  init = 10 −2 ;  init = 0.5;  1 = 2;  2 = 3.
Figures 9, 10, 11, 12, and 13 show the results of fault diagnosis in each of the simulated scenarios, where we can compare the estimated output (classified faults sequence) of the proposed classifier with the desired output (selected faults sequence) from input data samples.Results show that the classifier was able to correctly diagnose all the DC drive system faults.Whereas the presence of noise in the data samples, the occurrence of false alarms or misclassification (represented by isolated points on the graphs) is significantly low, even in the scenario with the highest number of possible faults.In this work, the classifier performance evaluation was held in terms of fault detection and fault classification, as suggested in [3].Three metrics were calculated in fault detection evaluation.
(iii) Accuracy (ACC): it measures the effectiveness of the algorithm in correctly distinguishing between a faultpresent condition and fault-free condition.Consider where  represents the number of detected faults;  represents the number of false alarms;  represents the number of missed faults; and  represents the number of correct rejections.Regarding fault classification evaluation, the metric fault isolation rate (FIR) was used, expressing the percentage of all faults that the classifier is able to isolate unambiguously.This metric is computed as where  represents the total of detected and correctly classified faults and  represents the total of detected and incorrectly classified faults.
Other metrics that were used to assess the performance of the classifier are as follows.All results of fault diagnosis experiments with DC drive system obtained by classifier proposed in this work were compared to the results obtained using the classifier proposed by [10].For the experiments, the parameters of this alternative classifier were set to  = 100,  = 0.001,  = 0.01,    = 0.01.
Table 2 summarizes the results for both classifiers using the fault detection metrics described.The results show that the classifier proposed in this work has higher levels of fault detection rates and accuracy in all scenarios, with values above 99% and low false alarm with values below These results prove the efficiency of the algorithm in detecting simulated faults in the DC drive system.Despite its lower fault detection rates and lower accuracy, the classifier proposed by [10] did not show any false alarms.
Table 3 summarizes the results for both classifiers using the fault classification metrics described.The results show that the classifier proposed in this work presented higher fault isolation rate in all scenarios, with an average value of approximately 97%.In all scenarios, the operator intervention on fault classification was less than or to 0.05%.These results show the ability of the classifier to automatically diagnose almost all faults after the first occurrence, and it also reveals their ability to learn.Note that the classifier proposed by [10] in general had a lower performance in fault classification than the proposed classifier and it needed more operator interventions.
Table 4 summarizes the results for both classifiers using the time metrics in fault detection and classification.The average fault detection time found in the experiments with the classifier proposed in this work was approximately 0.060 s, which is primarily determined by the amount of data samples required to the recursive clustering algorithm to detect a context change.The average time to isolate faults found in the experiments was approximately 0.009 s.A comparison between the average values for fault detection Another experiment was conducted to evaluate the robustness of the proposed classifier to the presence of outliers in the data.In this experiment, a scenario of 5 faults was simulated.Outliers were inserted in the data samples; that is, some samples were corrupted with high variance noise.Figure 14 shows the fault simulation results in the presence of outliers in the three dimensions space.Figure 15 shows the results of fault diagnosis in this scenario.
The fault diagnosis results for this experiment shows that even in the presence of outliers the proposed classifier was able to correctly detect and diagnose all faults considered.This result shows that the classifier was able to correctly distinguish between outliers and valid data samples.The results of this experiment are presented in Tables 5 and 6.Analysing these tables, one can note that the proposed classifier has virtually the same performance in fault diagnosis with absence or presence of outliers, although we note an increase in false alarm rate.This experiment showed the greater robustness of the classifier proposed in this work when compared with the classifier proposed by [10], since the latter showed major differences in false alarm and fault isolation rates in scenarios with and without outliers.

Conclusions
In this work, we presented an evolving fuzzy classifier for fault diagnosis of complex nonstationary dynamic systems.The proposed classifier is composed by a set of fuzzy rules created and updated based on recursive unsupervised clustering algorithm.In this algorithm, a new mechanism for cluster updating based on a drift detection method is employed.With this mechanism, the update of the cluster depends not only on the similarity measure between data samples and clusters, but also on the data context monitoring.This feature gives the proposed classifier robustness to outliers and noise, as suggested by the experiment results.Multivariate Gaussian membership functions are used in fuzzy rules antecedents, whose parameters are extracted directly from clusters.This multivariate approach is used to eliminate the loss of information due to possible interactions between the input variables.
The classifier proposed in this work was evaluated in fault diagnosis experiments performed with a DC drive system model.The experiments showed that the classifier was able to detect and classify all faults with a high performance, even in the presence of outliers and noise.The low false alarm rate and high fault isolation rate obtained in all experiments showed that the recursive clustering algorithm with drift detection method was able to efficiently distinguish data samples representing clusters of invalid data.Moreover, the proposed classifier was able to automatically diagnose almost all faults, requiring operator intervention on a small percentage of cases.This demonstrates the advantage of the continuous and incremental learning of the classifier over other classifiers that require retraining whenever an unknown type of fault is found.
Considering the presented features, the classifier proposed in this work has as advantages the ability to learn from faults in online mode and in real time, the ability to adapt to cope with changes in the dynamic system, and robustness to the presence of outliers and noise in the input data.Summarizing, the proposed classifier has showed to be a promising alternative for application in fault diagnosis in complex nonstationary dynamic systems, where other methods prove to be inefficient or less advantageous, because of the characteristics of such systems.In a future work, we will investigate the application of the proposed algorithm in a real time diagnosis and prognosis system of equipment.

Figure 1 :
Figure 1: Fault diagnosis with an evolving fuzzy classifier.

Figure 2 :
Figure 2: Representation of the DC drive system.

Figure 9 :
Figure 9: Desired output and estimated output by proposed classifier in a scenario of 3 faults.

Figure 10 :
Figure 10: Desired output and estimated output by proposed classifier in a scenario of 5 faults.

Figure 11 :
Figure 11: Desired output and estimated output by proposed classifier in a scenario of 7 faults.

Figure 12 :
Figure 12: Desired output and estimated output by proposed classifier in a scenario of 9 faults.

Figure 13 :
Figure 13: Desired output and estimated output by proposed classifier in a scenario of 11 faults.
(i) Probability of detection (POD): it assesses the detected faults over all potential fault cases (sensitivity)Probability of false alarm (POFA): it considers the proportion of all fault-free cases that trigger a fault detection alarm.Consider POFA =   +  .
(i) Detection delay time (DDT): it represents the time lag between the first occurrence of a given fault and its detection by the algorithm.(ii) Isolation delay time (IDT): it represents the time lag between the second occurrence of a given fault and its classification by the algorithm.(iii) Operator intervention rate (OIR): it is the percentage of faults classified with the intervention of the operator.
,  2 , ,  init ,  init ,  1 ,  2 ; Output: V  ,   ; Read the first data sample  1 ; Initialize the first cluster; for  = 2, 3, . . .do Update the dissimilarity number  dis ; Compute  and ; if  +  <  min +  min then Update  min and  min ; end if if  +  >  min +  1 ⋅  min then Store   in the data window (data)  ; end if if  +  >  min +  2 ⋅  min then Input: is the crisp output corresponding to the class label from the set [1, . .., ], where  is the number of classes.The classification of each new input data sample   is obtained by assigning to it the label of the class associated with the rule having the highest activation degree.The class is determined as follows: : IF  1 IS  1 AND . . .AND   IS   THEN   =   , (17) where [ 1 , . . .,   ] are the input variables or input patterns of  dimensionality; [ 1 , . . .,   ] are antecedent fuzzy sets of the th fuzzy rule;   is the output; IS   OR IF   IS   OR . . .IF   IS   THEN   =   .

Table 1 :
Types of faults on DC drive system.
After that, during the diagnosis, faults are detected and new

Table 4 :
Fault detection and classification time.

Table 5 :
Fault detection performance with outliers presence.

Table 6 :
Fault classification performance with outliers presence.