Fault Isolation for Nonlinear Systems Using Flexible Support Vector Regression

While support vector regression is widely used as both a function approximating tool and a residual generator for nonlinear system fault isolation, a drawback for this method is the freedom in selecting model parameters. Moreover, for samples with discordant distributing complexities, the selection of reasonable parameters is even impossible. To alleviate this problem we introduce the method of flexible support vector regression (F-SVR), which is especially suited for modelling complicated sample distributions, as it is free from parameters selection. Reasonable parameters for F-SVR are automatically generated given a sample distribution. Lastly, we apply this method in the analysis of the fault isolation of high frequency power supplies, where satisfactory results have been obtained.


Introduction
With the increasing use of complex systems, there has been great interest in the development of techniques to fault isolations.Generally, the major approaches for fault isolation can be divided into two categories, namely, modelbased and data-driven techniques.The fundamental aspect of a model-based fault isolation is a process model that runs parallel to the process [1].With traditional methods like observers, approximating the function between state vectors and input/output vectors is successful due to precise mathematical modelling by the use of filters.While these methods have successfully modelled linear systems, when applied to nonlinear systems like chemical processing, precise devices, and aerodynamic systems, they often fail to construct a sufficient model because their mechanism models are hard to be formed.Model-based approaches have advantages in terms of on-board implementation considerations, but their reliability may decrease as the nonlinear system complexities increase [2].
Therefore, data-driven techniques have been introduced to more accurately construct process models as these methods are free from the requirement to analytically derive equations for a given system, shown in Figure 1.One feasible method is to use the artificial neural network (ANN).ANN utilizes experience risk minimization (ERM) principle to construct the process model, where the target function is numerically approximated by minimizing residuals between function estimates and outputs of the process data.Applications of ANN-based fault isolation have been widely addressed in the literature.For example, Sadough Vanini et al. [2] used the dynamic neural networks to isolate the fault of a dual spool gas turbine engine.Filippetti et al. [3] applied fuzzy-NN to the fault isolation of induction motor drives.However, the learning ability of ANN is dependent on the number of training samples.It requires massive samples to ensure the modelling performance.But in most practical applications, not many of fault samples can be acquired.
More recently, the principle of structure risk minimization [4] has been introduced in fault isolation through the utilization of support vector regression (SVR) [5,6] as it can provide more accurate results than using neural networks in condition of smaller training samples.It was constructed on the basis of statistics learning theory that provides the theoretical proofs of learning from finite samples.Much has been addressed in the literature where SVR shows superiorities to ANN in process modeling [7].
However, the performance of SVR-based modelling is greatly affected by its parameters.Although SVR has been well studied and many remarkable achievements have been obtained, the theoretical estimation of regression parameter remains unsolved in the last decade.There is no general consensus on the selection of proper parameters, but only some practical recommendations on this issue.This greatly increases the difficulty for common operators to master the SVR-based approach.Moreover, in some complicated cases, there are even no reasonable parameter settings that could be found.A rigorous selection of regression parameters can lead to the overlearning of training samples, while slack selections can lead to underlearning.There exist no parameters that yield good trade-off between overlearning and underlearning.
In this paper we introduce a flexible SVR (F-SVR) approach [8] to more accurately implement models that construct different residual generators for fault isolation.By automatically dividing training samples into several regions, this method is not only free from parameter selection, but also able to learn well and to generalize well for complicated cases.

Problem Statement
Support vector regression (SVR) is a process modeling tool that approximates the function between inputs and outputs: Here  and  represent the input and output vectors, respectively,  is the modeled function,  is its weight vector, and  is the bias decided by the vector .
The SVR-based modelling can be viewed as a process of finding the optimal weight vector  0 with a proper parameter vector  0 for a given data set {( 1 ,  1 ), . . ., (  ,   )}: ( 0 ,  0 ) = arg min: where (  , (  , , )) =   ⋅ |  − (∑  =1   (,   ) + )|  is the loss function, (, ) is the unknown joint distribution of  and ,   is the regularized parameter,  is the insensitive parameter, (⋅) is the kernel function, () denotes the generalization ability for the regression, and  = {, , (⋅)}.The optimal weight vector  0 could be obtained by Lagrangian approaches.Thus the core problem for the SVR modeling is the selection of parameters.
The framework for SVR-based fault isolation is shown in Figure 2. Different operating models are constructed by  SVRs with given parameters.The residuals between estimated outputs and real outputs are generated for fault isolation.
However, the selection of SVR parameters is not easy.With the fixed regression parameter , the drawback of SVR is the hardness of the trade-off between overlearning and underlearning.Moreover, in some complicated cases, even no reasonable parameters could be found.As shown in Figure 2, due to the different complexities of sample distributions, the requirements of parameters are discordant.If a rigorous parameter  1 is selected, the regression is overlearning in region A. In contrast, if a slack parameter  2 is selected, the regression fails to learn in region B (underlearning).No parameter that can adequately fit all of the cases exists.Consequently, we advocate using the F-SVR approach in order to overcome this drawback [8], as the regression parameter is automatically generated and is variable instead of fixed (Figure 3).

Fault Isolation Using Flexible Support
Vector Regression (F-SVR) 3.1.The Principle of Flexible Support Vector Regression.We proposed a parameter-free algorithm for process modeling, namely, flexible support vector regression.The F-SVR attempts to divide the training samples into  regions according to the distribution complexity, and for the th region, parameter   is generated.By minimization (3), the function between  and  is approximated: With given parameters, (3) can be minimized by solving a   (QP) problem.Supposing the training samples were divided into two areas, (  ,   )  =1 and (  ,   )  =+1 , and  = {, , } are parameters for the two areas, the minimization of (3) could be termed as min: And the Lagrangian for (4) is Taking the partial derivative for (5), we get Make the dual problem for (4): where As  ⋅  * ≡ 0, (7) could be written with the following form: min: where  = (  ,   ), V = [0, . . ., (1/4) −1 +1 , . . ., (1/4) −1  ], and  is the unit vector.Further, the regression could be written as the QP problem min: where  =  − V,  = [ + ,  − ], and This form of QP problem can be solved by an active set method [9].The feasibility for complicated cases that more regions are divided can similarly be proved.

Detailed Process of F-SVR Modeling.
F-SVR modeling contains three major steps, shown in Figure 4.
Step 1 (sample divisions).This section shows how   in (3) is determined.Given training samples (  ,   )  =1 , formula ( 12) is utilized to estimate the distribution complexity / ( − 1) where   = ( +1 ,  +1 )−(  ,   ) and cos Supposing the training samples have been divided into  areas, formula ( 13) is implemented to evaluate the performance of division The samples are divided randomly for several times, and the division with the smallest  value is treated as the best division: where  is number of times that random division is made and  is the number of training samples.And in this paper, we set  = ( * )/10.The th region denotes  ∈   .
Step 2 (setting parameters for each region  support vector regression approach, the following empirical formulas are given to set the hyperparameter   = {  ,   ,   }: where   = { \  ∈ the ith region}.
Remark 1.The empirical setting of parameters for each region is referred to in Cherkassky's work [10] in 2004.
Step 3 (function approximation using selected support vectors).We use the conventional SVR with parameters   to extract informative samples for the th region.As shown in Figure 6(a), the red samples are selected as support vectors (SVs).If  samples are selected as SVs for the training set (  ,   )  =1 , the regression problem () = ∑  =1   (,   ) +  0 can be approximated by the regression problem of the SVs [8]: Thus, the minimization of (3) can be simplified as In (17), rigorous parameters  = {0, 1000, 0.01} are set to ensure the learning ability of regression.This problem can be solved by using the Lagrangian method in the same way that is used in conventional SVR (Figure 5).
As is shown in Figure 6(b), F-SVR (unlike conventional SVR) successfully approximates the distribution function without overlearning or underlearning.As we mentioned earlier, F-SVR is free from the issues that can arise in the manual selection of parameters as all of the parameters are generated automatically.Training samples added with noise Function approximation using F-SVR

The Basic Scheme of F-SVR-Based Fault Isolation.
Fault isolation algorithms attempt to reveal which fault is occurring in the operating system.In our method, we determine which model of the system is most likely to be accurate by initially constructing the modes for all faulty statuses and then calculating the deviations between the real outputs and all of the model outputs.As is shown in Figure 7, once the best model has been located, the fault type can then be isolated.
Compared with conventional SVR-based fault isolation, the most significant contribution of our work is that we have alleviated the problem of parameter setting.What is required for F-SVR-based fault isolation are only process samples for different operating cases.
The detailed process of F-SVR-based fault isolation is as follows.Given (  ,   )  =1 ,   ,   ∈   as the training samples from  different operating statuses concluding the normal status and all faulty statuses, where (  ,   ) represents the training samples for the th status, the function   (⋅) between input vector   and output vector   is initially approximated by the F-SVR method.After the training samples (, ) construct the input/output models for all of the statuses, they are sent to these models to generate the residuals between the real outputs and the model outputs, thereby forming the residual vector: We define a function () to measure the deviation between the real output and the model output of the th status Faults can be isolated by analysis of the residual vector.In this paper, we simply regard the testing samples belonging to the  * th status: .In order to approximate the distribution, both F-SVR and least-square SVR (LS-SVR) [11] are implemented:

Experiments and Real Applications
To provide a fair comparison, the parameters for LS-SVR were optimized first using a grid search strategy.The evaluating index for LS-SVR is where  is the number of SVs and ŷ is the function output of LS-SVR.A small  value implies that the regression produces generalization and accurate learning.As is shown in Figure 8, the best parameter set for LS-SVR is { = 10,  2 = 0.2}.The performance of the two methods is shown in Figure 9.It is clear in this case that F-SVR is a more accurate method for modeling data compared to the well-known LS-SVR.

Fault Isolation for
High Frequency Power Supply.The high frequency power supply (HFPS) is a nonlinear device that has been widely used in power plants for dedusting purpose.As its structure is highly complicated, its precise model is usually hard to be constructed using classical analytical approaches [12][13][14][15][16][17][18].Thus, data-driven approaches are utilized for process modeling and fault isolation.However, when applied to different power plants, the HFPS yields very different input/output functions due to the change of its loads and working environments.This means that there is no general process model for HFPS in all conditions.
For each HFPS that has been installed, a particular process model should be constructed.Therefore, for conventional SVR-based fault isolation, experienced operators are required to select the modeling parameters at site.In this section, F-SVR was applied in order to isolate the fault of the high frequency power supply.Three operating cases were investigated: normal status, overcurrent fault, and learning excitation fault.Data for the 3 operating cases of HFPS was prepared in Figure 10 and an overcurrent fault sample was used as the input data for testing in Figure 11.
The basic scheme for HFPS fault isolation is designed in Figure 12.Firstly, process data for each operating case is acquired; then, F-SVR is implemented to approximate the unknown function between input (time) and output (the first primary current) for each operating case.As the models for all operating cases have been established, the residual vectors can be generated and then by finding the model with the smallest residual the fault can be isolated.
In this experiment, F-SVR is implemented to approximate the functions between the input and the output, namely,  1 (),  2 (), and  3 ().As shown in Figure 13, the functions we obtained using F-SVR accurately describe the relationship between the input and the output of the unknown functions for the different operating cases without setting parameters.
Since  1 (),  2 (), and  3 () were already obtained by the F-SVR method, the residuals could be generated   using (12).The corresponding residuals are recorded in Table 1.According to Table 1, the testing sample belongs to the overcurrent fault ( 2 () yields the smallest residual).Based on prior knowledge of the testing sample, the diagnostic result is consistent and shows the feasibility of our method.The LS-SVR method [11,19] is also implemented to give a comparison.As is shown in Table 1, the LS-SVR method also makes a correct diagnosis and it has a better ability to generalize than our method (the number of SVs is smaller).However, our method yields a smaller value of residuals.This implies that our method produces a better modeling accuracy.Most importantly, all parameters are required to be selected manually in LS-SVR but are selected automatically in F-SVR.

Conclusions
SVR is one of the most efficient tools for fault diagnosis because it is able to accurately model a function between the input and the output using process data.However, even though the SVR approach has been utilized for over a decade, there is still no consensus within the community on how to adequately select regression parameters.Given that the F-SVR method offers an automatic selection for regression parameters, we chose it to implement the fault isolation for nonlinear systems.We demonstrated on both a numerical experiment and the fault isolation for HFPS that F-SVR is especially suited for cases that yield complicated sample distributions.This is because this method generates reasonable parameters for each region by dividing the training samples into different regions according to the sample distribution complexity.Based on this work, we hope that F-SVR will become more widely recognized as a preferred fault isolation for nonlinear systems.

4. 1 .
Numerical Experiments.In our first attempt to validate our method by a numerical experiment, we used a data set with complicated distributions.White noise (SNR = 30 db) is added to the analytical equation shown in (21), where the training set consists of 600 samples extracted from  ∈ (−1, 1] Process data for learning excitation fault

Figure 10 :Figure 11 :
Figure 10: Process data for different operating cases of HFPS.

Table 1 :
Diagnostic performance of F-SVR and LS-SVR.