Vibration-Based Adaptive Novelty Detection Method for Monitoring Faults in a Kinematic Chain

1MCIA Research Center, Department of Electronic Engineering, Technical University of Catalonia (UPC), Rbla. San Nebridi 22, Gaia Research Building, Terrassa, 08222 Barcelona, Spain 2CA Mecatronica, Facultad de Ingenieria, Campus San Juan del Rio, Universidad Autonoma de Queretaro, Rio Moctezuma 249, Col. San Cayetano, 76807 San Juan del Rio, QRO, Mexico 3CATelematica, DICIS, Universidad deGuanajuato, Carr. Salamanca-Valle km 3.5 + 1.8, Palo Blanco, 36885 Salamanca, GTO,Mexico


Introduction
Condition-based maintenance has become accepted by the industrial manufacturing sector as a key factor to achieve the currently required high-competitiveness levels.In the field of electromechanical systems monitoring, an increased number of approaches of health monitoring schemes have been proposed during the last decades [1][2][3].These methodologies have been demonstrated to be a reliable option as fault detection techniques when data of the possible faults is available.
Nevertheless, in order to overcome the practical requirements of industrial implementation, some considerations must be taken into account.There is a necessity to develop methodologies capable of monitoring complicated industrial machinery, that is, diagnosis approaches capable of considering complete electromechanical kinematic chains including electric motors, gearboxes, and additional mechanical elements.It is important to mention that information regarding the possible faults that could occur on complicated machines is not usually available due to the high cost and obtaining this information requires a great amount of time.
Unexpected events in the form of faulty scenarios, not initially considered, or deviations over the normal operation take place during the useful life of the machinery.These situations must be identified in order to avoid misclassifications and the consequent incorrect maintenance action.To cope with these challenging scenarios, the condition-based maintenance methods must be able to work under the assumption that only information of the healthy condition of the machine is available [4].Another relevant characteristic is the capacity to incorporate new knowledge.Since new faults could appear during monitoring, the condition-based maintenance methods must also be able to adapt and include new scenarios to their initial knowledge [5,6].
These challenging scenarios are not always taken into consideration in condition monitoring methodologies; therefore, the application of classical methods on an industrial environment is limited or nonviable.
In the field of electromechanical system monitoring, a great deal of health monitoring schemes based on vibrations have been proposed during the last years [7][8][9][10].Mainly, these proposals can be divided into two categories regarding the available information of the monitored machine and the classification objective.
The first category, multifault detection, consists in the development of a monitoring system capable of identifying multiple faults previously known [1], which implies that the information available to train the monitoring system includes measurements of the healthy state of the machine but also measurements of the machine working under different faulty conditions.Over the last years, instead of looking for highly specific features for a component fault, a trend towards calculation of more general statistical features and the fusion of information to enhance the performance of the diagnosis system has been carried out [11].Yet, this approach is mostly carried out at the laboratory scale with successful results where controlled faults are provoked on the machine.
However, the information of the possible faults that could appear on a machine is not always available; consequently, a different approach must be considered if only information of healthy condition of the machine is available.The second category corresponds to novelty detection, also known as anomaly detection, and it represents a solution to cope with the lack of information available of the monitored machine.The objective of novelty detection is to detect novel events that differ in some manner from the information of which the model was trained.Novelty detection framework has been successfully applied to network intrusion, medical diagnosis, image segmentation and handwritten digit recognition, fault detection, and condition monitoring [4,6,[12][13][14].Special effort has been made to improve condition monitoring novelty detection applications based on vibrations.Ma et al. proposed a novelty detection approach to rotating machinery by means of monitoring the thresholds obtained from an extreme value distribution [15].McBain and Timusk proposed a feature extraction methodology for novelty detection applied to a rotating mechanical system [16].
Most of the methodologies for novelty detection are limited to a static analysis and the incorporation of the novel information to the novelty detection system is not usually considered.An approach to include adaptability to the novelty framework, based on vibrations, is proposed by Wang et al. [17].The proposed monitoring scheme includes testing data on the boundary of the novelty model, based on Support Vector Data Description, and retrains the model with this information to gain robustness.Nevertheless, this approach does not take into consideration the possibility to include novel scenarios during the monitored process.
Other approaches to develop an adapting condition monitoring scheme were presented by Filev et al.where a practical framework for autonomous monitoring of industrial equipment based on novelty detection is analyzed [18] and by Costa et al.where a two-stage algorithm for real-time fault detection and identification is presented [19].Both approaches provide the opportunity to incorporate novel detected faults to the monitoring system; nevertheless, in both methods, the incorporation is limited to updating the known data base, but an adaptation of the numerical features analyzed is not considered.
The performance of a novelty detection system is highly dependent on the numerical features considered.When there is no previous information of the possible faults that can occur, the application of a suitable numerical features analysis strategy represents a critical challenge [4].Considering a continuous monitoring framework where the initial information available is the healthy operating condition and, later on, different faults are identified progressively when the machine condition deteriorates, all the approaches previously discussed do not modify the initial set of features when new information of faults is incorporated.This static approach has the advantage of providing a most adequate situation for online adaptation; nevertheless, analyzing the information of the faults detected during the monitoring process could improve the identification of a most adequate set of features to discriminate the possible upcoming or already detected fault scenarios.
To improve the performance of the novelty detection task, a dynamic approach for reselection of the features including information of new faults detected is proposed on this work.To the extent of the authors' knowledge, the study regarding the dynamic change of the feature space by considering new operating scenarios is novel in the applications to electromechanical systems for fault diagnosis.
Thereby, the contribution of this study is to provide an adaptive methodology for novelty detection where the information of identified faults during the monitoring process is exploited to improve the anomaly detection task.The methodology is divided into two recursive stages: first, an offline stage for initialization and retraining of the feature reduction and novelty detection modules; second, an online monitoring stage to continuously assess the condition of the machine.The novelty model employed is a one-class Support Vector machine (OC-SVM) [20], which is a domain based model successfully employed on vibration signals monitoring [4].Contrary to classical static feature reduction approaches, the proposed method reformulates the features by employing first a Laplacian Score ranking and then the Fisher Score ranking for retraining.
The proposed methodology is tested using real data from a laboratory based kinematic chain, where faults are induced, and the vibration signals from two accelerometers are monitored to assess the system condition.To highlight the benefits of the methodology proposed, a comparison is performed of classical feature reduction approaches to the adaptive feature reduction scheme proposed.
This paper is organized as follows.In Section 2, theoretical aspects of the proposed method are described.Section 3 describes the method presented.The experimental setup employed for assessment and the results obtained are presented and discussed in Section 4. And finally, conclusions and future work are summarized in Section 5.

Theoretical Background
2.1.One-Class Support Vector Machine.As mentioned, the objective of novelty detection is to detect abnormal events that differ in some manner from the information of which the model was trained.The one-class Support Vector machine (OC-SVM), which is a domain based model successfully employed in condition monitoring applications, encloses the training data by means of a boundary or threshold which represents the limit of normality.The OC-SVM presents the possibility to include a kernel formulation, providing it with enough flexibility to adapt to the distribution of the training data.To assess the condition of the machine under test, new measurements are analyzed regarding their position to the boundary created in the training phase.The test measurements underlying inside the boundary are considered "normal" or "known" while the measurements underlying outside the boundary are considered "outlier" or "novel."Compared to other novelty detection techniques, the OC-SVM is robust to outliers during the training phase but has the disadvantage of presenting several configuration parameters, which are described later in this section.
The OC-SVM was proposed by Schölkopf et al. for estimating the support of a high-dimensional distribution [20].The OC-SVM classification objective is to separate one class of target samples from all other class samples.In this type of problem, one class is characterized properly, called target class, while for the other class, usually, no measurements are available.Considering  = [ 1 , . . .,   ]  ∈  × , which denotes the normal data set, and   ,  = 1, . . ., , which denotes training samples (available measurements), characterized by  numerical features, then, in order to obtain the boundary, an optimization model is considered as follows: where V is a regularization parameter and   is the slack variable for the point   .The constants w and  are the normal vector and offset of the hyperplane, respectively.Thus, the decision boundary can be formulated as where  ∈   and Φ is a higher dimensional projection vector.For the classification problem of two categories, the data sets are not always linearly separable in the original space; then, Φ projects the original data sets into a higher dimensional space, the so-called feature space, where the data sets can be linearly separable.However, Φ is inexplicit in the practical application, and only the dot product from Φ(  ) ⋅ Φ(  ) is necessary to be known. represents the kernel function Φ(  ) ⋅ Φ(  ).The most commonly used kernel function is the Gaussian: In order to solve optimization problem (1), Lagrange multipliers   ≥ 0 and   ≥ 0 ( = 1, . . ., ) are introduced and the Lagrange equation is formed as The partial derivatives of the Lagrangian equation with respect to w, , and  are set to zero.Then, w and   can be formulated as Substitute ( 5) into Lagrangian equation ( 4) and its dual form is presented as where a = [  , . . .,   ]  and  is the kernel matrix and the factor of , that is,   , which can be expressed as Solve optimization problem (6) to get a and then  can be given as where   is the number of Support Vectors.
Tax and Duin proposed another form of OC-SVM which is called Support Vector Data Description [21].The basic idea of SVDD is to construct a minimum-volume hypersphere in a high-dimensional feature space to enclose as much as normal data points.Both of these two forms of SVM have equivalent solution if the diagonal entries of kernel matrix  are equal to a constant.

Dimensionality Reduction Techniques.
Working with high-dimensional data sets could complicate the learning part of novelty detection methods, not only because of possible presence of noise and redundancy in the data but for other reasons as well [22].The empty space phenomenon states that to cover the whole space a number of samples that grow exponentially with dimensionality are needed.The curse of dimensionality implies that, in order to learn successfully, a number of training examples that also grow exponentially with the dimensionality are needed.The "concentration of measure" phenomenon seems to render distance measures not relevant to whatever concept is to be learnt as the dimension of the data increased.For these reasons, among others, there is a necessity to apply dimensionality reduction techniques in novelty detection applications.Dimensionality reduction strategies differ in the question of whether the learning process is supervised or unsupervised.The difference between both learning processes is the availability of labels to distinguish the different classes.
Principal Component Analysis (PCA) is one of the most commonly used techniques for unsupervised dimensionality reduction [23].It aims to find the linear projections that best capture the variability of the data.
Another well-known dimensionality reduction technique is the Laplacian Score (LS), where the merit of each feature is measured according to its locality preservation power [24].A nearest neighbor based graph is constructed from the training set and analyzed to rank each feature individually according to a weighting approach selected for the graph edges.If the labels of the classes of the analyzed data set are given, the method changes to a supervised approach and the graph is created based on these labels, instead of the nearest neighbor approach.Two variations for weighting the edges can be applied, Heat Kernel and Simple minded.To rank each feature, its Laplacian Score is computed, which is a measure of which extent the analyzed feature preserves the structure present in the graph divided by the variance of the feature.For a feature to be selected, it must have a low LS, which implies high variance and locality.
Linear discriminant analysis (LDA) is one of the most well-known supervised techniques for linear dimensionality reduction in multiclass problems [25].LDA searches attempts to maximize the linear separation between data points belonging to different classes.In contrast to most of the other dimensionality reduction techniques, LDA, as a feature extraction technique, finds a linear mapping that maximizes the linear class separation in the low-dimensional representation of the data.The criteria that are used to formulate linear class separation in LDA are the within-class scatter and the between-class scatter.
Another variant based on the Fisher coefficient is a feature dimensionality reduction approach called Fisher Score (FS), where the objective is to find a subset of features which maximize the Fisher coefficient [26].The main difference between LDA and Fisher Score is that LDA is a feature extraction approach and Fisher Score is a feature selection approach; then, instead of extracting new features from the subspace obtained, FS ranks the features and select the reduced subspace of features selected which maximizes the dimensionality reduction criterion, in this case, the Fisher criterion.
The aforementioned linear feature reduction techniques have a different combination of objective (variance preservation, topology preservation, and discrimination) and method of employment (unsupervised or supervised and extraction or selection) and have been widely used in the literature with successful results [16,22]; yet, they have been applied under a static framework where the set of features is reduced initially and no possible adaptation during the monitoring phase is considered.

Methodology
Given that the data initially available is usually related to the healthy condition of the machine under analysis, conditionbased monitoring schemes must be designed to overcome two main challenges.
(i) The identification of significant features to deal with the characterization of the known conditions of the machine, under the consideration that the occurrence of additional unknown faults must be detected (ii) The adaptation of the condition-based monitoring scheme to update the considered data base of the machine, once unknown fault scenarios have been detected.
Such requirements are addressed in this work by means of the proposed adaptive novelty detection method shown in Figure 1.This methodology represents an important step to the introduction of adaptive novelty detection schemes to the development of electromechanical system diagnosis procedures.
The proposed method is composed of two stages: an offline stage and an online monitoring stage.The main objectives of the offline stage are, first, the analysis of the information available of the monitored machine to find a reduced set of numerical features to characterize the known machine conditions and, second, the design of the novelty model by means of the selection of the configuration parameters and training.
Once a reduced set of characteristic features is obtained and the novelty model is designed, the online stage is carried out.During the online stage, new measurements are continuously compared with the normality threshold,   , defined during the novelty model training in the offline stage.Thus, if a novel scenario is detected, the supervision of an expert user is proposed in order to confirm and label the new condition of the machine; consequently, the monitoring system is retrained to include the characteristics of the novel scenario.Detailed information of each stage and the retraining process is described in Sections 3.1-3.3.

Offline Stage.
During the initialization, it is assumed that only information of the machine operating under healthy condition is available in the database.The first step is the calculation of numerical features from the vibration measurements obtained during the machine operation.Since the information of the possible faults of the monitored machine is not available yet during this initialization, a generic set of statistical time-based numerical features is proposed to be extracted from each available vibration axis measurement.
The proposed set of potential statistical time-frequency features is shown as follows: Root Mean Square (RMS) Crest Factor (CF) Shape Factor These features have been successfully employed for fault detection in the last years [1,27].The resulting number of numerical features is proportional to the number of available vibration axes collected during the acquisition.However, in order to allow the compression and visualization of the data, a feature reduction module is implemented.During such offline stage initialization, an unsupervised feature reduction approach must be used; a Laplacian Score ranking is proposed in this work as a good trade-off between simplicity and performance [24], to rank the features according to the topology preservation capabilities.The two or three first ranked features in terms of Laplacian Score are selected.
Next, the novelty model is designed.There is a significant number of novelty models proposed in the literature [4], with each one demonstrated to be a capable option under certain circumstances.An increasing amount of works implies that domain based novelty detection models present promising results [19,28].In this work, a standard OC-SVM with Gaussian kernel is used.The design of the novelty model includes the selection of the parameters for configuration and training employing the known scenarios stored at the database.Then, the initialization of the offline stage finalizes with the design of the OC-SVM.

Online Stage.
This stage continuously monitors the condition of the machine to detect if an anomaly is present.To accomplish this, each of the new measurements of the machine acquired certain amount of predefined time.Each measurement is segmented and a set of features is calculated from each segment.The set of numerical features calculated in this stage are reduced in the offline stage by means of Laplacian Score.
Thus, each new measurement characterized by the numerical features is analyzed by the novelty model.In case of no novelty detection, it is assumed that the machine is working under known conditions.However, if the analyzed measurement is detected as a novelty, an alarm is triggered in order to consider the user assessment.Then, if the occurrence of a new scenario is confirmed, the corresponding measurements are stored at the database and a retraining procedure is performed.

Retraining.
Once the retraining is trigged, the feature reduction and the novelty model design modules are modified at the offline stage.A diagram of the retraining procedure is presented in Figure 2.
It must be noticed that two important contributive aspects are proposed in this retraining approach: first, the reconsideration of the feature reduction module and, second, the incorporation of fault scenarios to the novelty model.As it has been explained, the feature reduction module, during its initialization, is supported by an unsupervised Laplacian Score ranking due to the lack of additional scenarios; yet, once the information of a fault is available, the feature reduction module could improve its performance by employing supervised methods.
Then, the Laplacian approach is replaced by a Fisher Score ranking approach in this work, where the features are sorted according to the Fisher coefficient calculated from each feature.It is important to mention that Laplacian Score can be configured to work also under a supervised framework and could obtain similar results than employing a Fisher approach; similarities between the two approaches are discussed in [24].Nevertheless, Fisher Score is used in this methodology to search for a possible discriminative scenario to increase the detection of novel scenarios.A comparison of results obtained from both features reduction approaches is presented in the results of this work.
The consideration of a faulty scenario in the novelty model may contradict the principle of anomaly detection, where the objective is to detect healthy behaviors from the rest.Nevertheless, the aim of an adaptive condition monitoring system should be to learnt from all the identified conditions to subsequently detect them if they are presented again by a fault detection module.Indeed, some works present a parallel structure for fault detection and novelty detection modules [18,19], where the novelty module learns the known faults along with the healthy operation because its objective is to detect only new scenarios, while the objective of the fault detection is to identify the condition of the machine, including known faults.This work is based on such parallel structure approach, where the fault scenarios must be taken into consideration to be included in the novelty model, and then they could be identified by a complementary fault detection module with a high confidence level.

Description of the Experimental Platform.
The test bench used for testing the kinematic chain with different faults and the acquisition system used to capture the vibration signals are shown in Figure 3.The test bench consists in a 1492-Wthree-phase induction motor (WEG 00236ET3E145T-W22), with its rotational speed controlled through a variable frequency drive-VFD (WEG CFW08); the operating speed is fixed to 60 Hz for all experiments.It also consists of a 4 : 1 ratio gearbox (BALDOR GCF4X01AA) and a DC generator (BAL-DOR CDP3604), which is used as the mechanical load comprising around 20% of the nominal load.The vibration signals are acquired using a triaxial accelerometer (LIS3L02AS4), mounted on a board with the signal conditioning and antialias filtering.Two 12-bit, 4-channel, serial output, sampling analog-to-digital converters (ADS7841) are used in the data acquisition system (DAS) board.Vibration signal from the perpendicular plane (-) to the motor shaft is acquired using a triaxial accelerometer, LIS3L02AS4, mounted on a board with the signal conditioning and antialiasing filtering.Sampling frequency is set to 3 kHz for vibration acquisition.The data retrieved by the DAS is stored in a regular computer (PC).
Three scenarios are considered to verify the performance of the proposed method: the first one, , is the kinematic chain working under healthy condition and the other two,  1 and  2 , represent the kinematic chain working under faulty conditions.For  1 , the motor is working with a half broken bar, and for  2 , the motor is working with a fully broken bar.The detail of the failures is shown in Figure 4.The half broken bar failure is artificially produced by drilling a 6 mm hole with a depth of 3 mm that corresponds mostly to the 22% of the section of the rotor bar, and the full broken bar is produced by a through-hole with a diameter of 6 mm and a depth of 14 mm, which corresponds to the complete section of the rotor bar.
The information stored from the kinematic chain consists of an acquisition of 60 seconds of the machine working under the three scenarios mentioned; each acquisition is segmented in 30 parts of 2 seconds and a set of features is calculated from the 30 segmented measurements.Since two axes are taken into consideration, a total of ten features are calculated from each segmented acquisition of the machine working under the different scenarios mentioned.The first step of the methodology is the offline stage, where a reduced set of features is obtained and the novelty model is designed.

Parameter Selection.
Regarding the Laplacian Score configuration, a simple approach is followed for parameter tuning; that is, a value of  = 3 is used for constructing the adjacency graph and a "simple minded" weighting approach is followed.Since the proposed approach is compared to different feature reduction modules, selecting generic parameters settings is useful for the purpose of evaluation but ignores that there may be dependencies between the feature reduction model and the novelty model.Regarding the design of the novelty model, the kernel used is the Gaussian and the value of the configuration parameter is tuned to minimize the error in the validation.In all experiments, 80% of the samples are used for training and 20% for validation.To train and adjust the parameter of the novelty model, a fivefold crossvalidation is used.

Results and Discussions.
In order to highlight the contributions and motivation of this work, the outline of the results is presented as follows: first, a test is performed by a classical approach, then the proposed methodology is applied, and the results are compared.The classical static approach implies conserving the reduced features set obtained at the offline initialization stage; meanwhile, the proposed dynamic approach implies a possible reformulation of the reduced features set during the retraining stage.During the initialization stage, the reduced set of features selected by means of the Laplacian Score ranking is composed of the RMS of the -axis and the Kurtosis of the -axis.The OC-SVM model is trained employing healthy condition data.The resulting OC-SVM during training is shown in Figure 5.The marks, * , in the figure represent measurements of the machine used to train the model; in this case and in the subsequence figures, only 1-fold of the fivefold crossvalidation is displayed.The dotted line represents the novelty threshold value; all data inside the boundaries of the dotted line is considered normal.The contour plot represents the novelty score evaluation over different regions of the feature map.
Once the offline initialization stage is finished, the online stage follows; that is, new measurements are obtained to assess the condition of the monitored machine.To give robustness to the novelty detection and avoid false alarms, a batch consisting of 30 measurements is evaluated; if 75% of the analyzed measurements are evaluated as novelty, then the alarm is triggered.Next, the  1 scenario is presented to test the performance of the model.The plot of the scenario and the novelty threshold obtained during training is presented in Figure 6.
As can be appreciated, the new scenario lays outside the novelty threshold so it is successfully detected as novelty.Once a novel scenario is detected and identified as a fault by the user, it is incorporated to the database to consider it as part of the known scenarios and the novelty model is retrained, without changing the features, to include this information.Figure 7 shows the feature space after the novelty model is trained using healthy and  1 data as part of the known scenarios.
Once the model is retrained, the third scenario,  2 , is introduced.The visual representation of the test is presented in Figure 8; this scenario is not detected as novel because only 50% of the measurements are labeled as novel by the model.A similar result is obtained when the novelty model is trained using healthy and  2 data and is tested with data of  1 .A summary of the results of novelty detection maintaining the same features obtained during initialization is shown in Table 1.
As it can be seen in Table 1, using the reduced set obtained during initialization is easy to detect the novel scenarios when there is only information of the healthy condition, but when a novel scenario is included in the database and the novelty model is retrained, the reduced set of features initially obtained does not necessarily provide a good representation to detect a new scenario during test.
To improve these results, the methodology presented in this work proposes to evaluate again the feature reduction module each time a retraining is applied.Following the outline presented for results and parting from the first retrain where the scenario  1 is included in the database, the feature reduction module is applied again but this time including information from healthy condition and  1 scenario.Since two scenarios are taken into consideration and the labels are known, a supervised approach can be applied; in this case, a Fisher Score ranking for feature reduction is employed.The novelty model using the initial set of features and the novelty model after retraining with the new reduced set of features are shown in Figure 9.
The new reduced set of features is composed of the RMS of the -axis and the Kurtosis of the -axis; as can be appreciated, the new set of features present a more discriminative distribution of the three scenarios considered.At the last step (Figure 9(d)), when  2 scenario is included, the Fisher Score still ranked the same features of the last retraining (Figure 9(b)), as the highest; yet, it would be possible that a different set of features would be obtained.The results obtained demonstrate the advantages of including the Fisher Score reduction module to the retraining procedure; the new distribution of the scenarios avoided an overlapping of the   successfully identified the new scenarios as novel since the percentage obtained in both cases is higher than the 75% predefined threshold in the methodology to activate the alarm.Both techniques achieved high scores, but still the Fisher Score provided a more appropriate selection of features.
As mentioned in Section 2, LDA is a feature extraction technique based on the Fisher discriminant coefficient, so similar results between Fisher Score ranking and LDA are expected; however, employing Fisher Score ranking score achieved a better result; this can be caused because the test consists of novel scenarios and LDA finds the directions on the feature space specialized for the two supervised scenarios employed during training; meanwhile, Fisher Score ranking provides a more general approach by selecting features.
To test the robustness of the Laplacian Score and Fisher Score approaches, a comparative test is performed where the set of features is increased from 10 to 15 and by varying the number of reduced features selected.The five features included in the original set are obtained from the -axis of the accelerometer monitored, where the features calculated are presented in the aforementioned statistical time-frequency features; these features were discarded initially because they are not part of the perpendicular plane of the motor and do not contribute significantly to the monitoring; in fact, including these features could affect the performance of the feature reduction and novelty detection modules.The results are presented in Table 3.
The features obtained from the Fisher Score still present a better distribution to detect the new scenarios.The Laplacian Score performance is affected when irrelevant features are included in the feature set but increases when more dimensions are taken into consideration.Since the objective of the Fisher Score is not topology preservation, contrary to LS, it is capable of discarding all the irrelevant features that were included and the performance is not affected.

Conclusions
This work proposes an adaptive novelty detection methodology based on vibration analysis for the condition monitoring and diagnosis of the components of a kinematic chain.The methodology is based on the acquisition of vibration signals that are generated in the kinematic chain, along with an adequate signal processing to extract features to characterize the components and an adaptive novelty detection model to detect anomalies.The method is composed of two sequential stages: an offline stage to initialize and retrain the modules and an online stage to continuously assess the condition of the machine.During initialization, the model is trained employing only information from the machine working under healthy condition and two additional faulty scenarios are introduced to test the performance of the method under unknown operations.The adaptive novelty detection approach successfully detected both novel scenarios and the model incorporated the information to avoid generating alarms if the same fault is detected.A comparison between the proposed method and classical dimensionality reduction approaches highlights the limitations of maintaining a static set of features during monitoring, instead of reformulating the feature reduction module once new information is available.In this particular study, employing features reduced by Laplacian Score and Fisher Score obtained similar results; nevertheless, it does not imply that a similar outcome will be present during the analysis of other faults.Fisher Score is encouraged to be employed in this methodology rather than maintaining the Laplacian Score approach due to the similarity between the method objective and the objective function of Fisher Score; in both, the ideal case is to find the features to maximize the distance between scenarios while maintaining compact clusters.A specific comparison of performances between LS and Fisher Score was also included, in which the Fisher Score obtained better results when irrelevant features are included in the original set of features and when the dimensionality of the reduced set is increased; this highlights the advantages and robustness of the feature selection approach by Fisher Score compared to the LS.
From an industrial perspective, two aspects should be mentioned.First, dealing with an identical operating scenario (i.e., healthy), the variation of the working conditions (speed and torque patterns) will affect the resulting value of the estimated numerical feature/s and, consequently, its representation in the feature space used for novelty model design.Thus, the introduction of the working condition as an additional degree of freedom in the novelty detection scheme implies the increase of the dimensional space that the novelty detection model is managing.This fact implies an increase of the risk of data overlapping (different scenarios under different operating conditions showing similar feature values) and, also, an increase of complexity for boundaries definition during the novelty modelling (data spread).Thus, the proposed method is not limited to stationary conditions but to the repetition of the pattern of working condition used to train the model.Indeed, in practical industrial application, where cyclostationary processes take place, the proposed method should be executed during the same period (the same working conditions), in order to reduce data variability and increase novelty performance.In case of multiple working conditions consideration, parallel novelty model structures could be generated in order to activate each one depending on the corresponding working condition.
Second, the integration of additional physical magnitudes as stator currents, temperatures, and others represents a trendy approach in electromechanical fault diagnosis in order to improve fault diagnosis resolution and performance.The corresponding data fusion scheme implies a high-dimensional feature space, that is, multiple numerical features estimated from multiple physical magnitudes considered.In this point, it must be noted that the presented methodology proposes the reevaluation of the feature reduction module each time a retraining is applied.Thus, the larger the available pool of features is, the higher the potentiality of the proposed method to increase the performance during the data characterization is.
Also, the proposed method can be extended and improved for further development; this improvement could include a diagnosis method to not only detect anomalies in the kinematic chain but also identify the fault causing the abnormal behavior.In this sense, future work will include a specific comparison of broken bar fault detection capabilities between the proposed methodology and classical approaches.

Figure 1 :
Figure1: Proposed methodology for the novelty detection approach.The monitoring method is composed of an offline stage for initialization and retraining and an online stage for continuous monitoring.

Figure 2 :
Figure 2: Proposed retraining approach.First, measurements characterizing the fault are stored, and then the feature reduction module and the novelty design module are modified to incorporate the new scenario encountered.

Figure 3 :
Figure 3: Test bench used for experimentation.

Figure 4 :
Figure 4: Detail of the faults produced in the test bench.(a) corresponds to the 1/2 broken rotor bar and (b) to one broken rotor bar.

Figure 5 :
Figure 5: Initial novelty model representation.Limit of the novelty threshold, ⋅ ⋅ ⋅ , and measurements used to train the model, * .

Figure 6 :Figure 7 :
Figure 6: Evaluation of the fault scenario  1 .The novelty model is trained employing data from healthy operation condition.

Figure 8 :
Figure 8: Evaluation of the fault scenario  2 .The novelty model is trained employing data from healthy and  1 scenarios.

Figure 9 :
Figure 9: Process of evaluation and retraining employing the methodology proposed.(a) Evaluation of the fault scenario  1 .(b) Retraining of the novelty model and reformulation of the reduced set of features including  1 .(c) Evaluation of the fault scenario  2 .(d) Retraining of the novelty model and reformulation of the reduced set of features including  2 .

Table 1 :
Performance of the novelty detection using only healthy class data to reduce the number of features, where DR stands for dimensionality reduction.Different scenarios are included according to the information available to train and test the novelty model.It is worth mentioning that if initially the scenario  2 is used for training and the scenario  1 is used for testing, the new set of features obtained could be different from the aforementioned.The results achieved from both scenarios are shown in Table2, which also includes a comparison of the results obtained employing PCA, LDA, and Laplacian Score dimensionality reduction techniques in the retraining step instead of the Fisher Score proposed.Regarding the classical feature extraction techniques, PCA and LDA, the test scenario is not identified as novel in both cases.The Fisher Score and the Laplacian Score

Table 2 :
Performance of the novelty detection employing a reduction of features during retraining.Different scenarios are included according to the information available to train and test the novelty model.

Table 3 :
Performance of the novelty detection increasing the number of initial features from 10 to 15 and varying the number of the reduced set of features.