Data-Driven Incipient Sensor Fault Estimation with Application in Inverter of High-Speed Railway Hongtian Chen

Incipient faults in high-speed railway have been rarely considered before developing into faults or failures. In this paper, a new data-driven incipient fault estimate (FE) methodology is proposed under multivariate statistics frame, which incorporates with Kullback-Leibler divergence (KLD) in information domain and neural network approximation in machine learning. By defining one sensitive fault indicator (SFI), the incipient fault amplitude can be precisely estimated. According to the experimental platform of China Railway High-speed 2 (CRH2), the proposed incipient FE algorithm is examined, and the more sensitivity and accuracy to tiny abnormality are demonstrated. Followed by the incipient FE results, several factors on FE performance are further analyzed.


Introduction
Due to its high efficiency and loading capacity, high-speed railway has been rapidly developed in the past two decades [1][2][3].As one of the most important equipment of high-speed railway, inverter is the actuator of motors, and its sensor information is used to construct the control law of motors [4].Typically, sensors equipped in inverter are vulnerable to faults as they are direct interactions with the working environment [3,5].This may lead to abnormal operation of motors and effectiveness loss of the traction force and even cause an emergency stop.Thus, the real-time sensor fault detection and diagnosis (FDD) and fault estimation (FE) are urgently needed in high-speed railway to improve its reliability.
There exists abundant researches for inverter sensor FDD over the past several decades [2,4,[6][7][8][9].All of them can be mainly classified into two general categories, including model-based methods and knowledge-based methods.For the model-based FDD methods, they are mainly to build the mathematical model of electrical traction systems via outputs or work mechanisms of being monitored system.Usually, a residual is to generate the fault indicating by available input and output information from sensors of inverter [4,6,9,10].In addition, the knowledge-based methods [2,8] mainly depend on prior knowledge of inverter whose characteristics are to be summarized to distinguish or diagnose the faults.
Besides that, the data-driven FDD methods coming from chemical industry use the information of sampling data to directly analyze some features, such as data correlation, spectrum, and mean value.These kinds of latent information are crucial factors to identify and diagnose the sensor faults without considering the mathematical model of system.Recently, some modified data-driven methods were proposed to improve the process monitoring performance.For example, key-performance-indicator-related PLS was developed to acquire more meaningful fault information [11]; and faultrelevant PCA was designed to select variable optimally [12].In addition, some detailed reviews on data-driven methods can be found in [13][14][15].But these methods are rarely found to be FDD in electrical domains.
The accurate inverter mathematical model of high-speed railway is needed for model-based incipient fault estimation method.However, due to the complex operation environments, the model of inverter is time-varying and effected by unknown noise and cannot be represented by some accurate functions.In addition, the expertise cannot be completely acquired and is often invalid when coming to incipient faults.Therefore, the data-driven method is considered to estimate incipient faults in inverter of high-speed railway.
In this paper, the main contributions of the proposed method are summarized as follows: (1) Considering non-Gaussian characteristics of traction system, the rotary principal component subspace (RPCS) and rotary residual subspace (RRS) are firstly used and introduced for data-driven FE problems.
(2) Aiming at incipient FE, a sensitive fault indicator (SFI) is defined without any improper assumptions on measuring signals.
(3) The analytic relation between incipient fault amplitude and SFI is analyzed and is then estimated by the neural network.
The rest of this paper is organized as follows.In Section 2, a brief introduction on inverter of China Railway High-speed 2 (CRH2), PCA, and Kullback-Leibler divergence (KLD) scheme is given.In Section 3, a rotary data space including RPCS and RRS is presented, in which SFI is defined by a information distance named KLD; then the relation between incipient fault amplitude and SFI is analyzed and obtained by theoretical derivations and function approximation.The whole FE strategy is presented in Section 4. Section 5 presents FE results and discussions about the performance of the proposed method.Finally, the conclusion drawn from the research along with discussions is given in Section 6.

2.1.
A Three-Phase Three-Level VSI of CRH2.CRH2 traction system is a complex electrical system consisting of several electrical types of equipment, such as traction transformers, traction rectifiers, traction inverters, and traction motors.Traction inverter is reviewed as a major part which supply variable velocity variable frequency voltage.A three-phase three-level voltage source inverter (VSI) of CRH2 is mainly component of a filter circuit, overvoltage suppression unit, Insulated-Gate Bipolar Transistor (IGBT), and so on.As shown in Figure 1, every IGBT switch is composed of transistor unit (TU) and diode unit (DU), and there are altogether twelve switches in the inverter.
The incipient faults in CRH2 had not been considered sufficiently.Numerous studies [5,16,17] have mentioned that the incipient fault should be characterized with the following features: (i) Qualitative aspect: the degree of deteriorated system performance is such insignificant that it is not enough to trigger any set alarms.
(ii) Quantitative aspect: the gain percent of deviation, fault signal comparing with the actual value under normal condition, is quite small, for example, ranking from 1% to 10%.
(iii) Necessity aspect: if the incipient fault cannot be detected successfully and no action is taken, it must develop to fault or even failure as time goes on.
Based on the main circuit protection list of CRH2, there are altogether 41 types of faults [1].Though some 3%-15% abnormalities can be detected, many missing alarm and false alarm cases still exist actually.For example, the over current fault encoded Fault 28 will not trigger any fault alarm.
A physical model of common train systems via first principals contains 84 differential equations together [18].When coming to the more complex CRH2, it is impossible to set up accurate system model.Besides that, charging and discharging of many energy storage devices in high frequency make signals change frequently.This leads, predominantly, to the fact that all signals in inverter obey non-Gaussian distribution.In the following two subsections, preliminaries of SFI are presented.

Basic Form of PCA Process
Monitoring.PCA is a popular multivariate statistical method which is proposed for dimensionality reduction of a mass correlated data [19].The lower dimensional subspace obtained by projecting contains most of the original data features [20].This data-driven method and its many variants [13,21,22] have been successfully applied in fault detection and diagnosis [14,15].
The offline training data  off contains  sampling measurements from  sensors and can be completely denoted as  off = [ 1 , . . .,   ], where   is -th sampling.It is usually normalized to zero mean and unit variance before PCA modeling and its normalized former can be written as  off .Then, the sample covariance matrix can be denoted as Doing SVD on the covariance matrix  off as where Λ = diag( 1 ,  2 , . . .,   ), all eigenvalues rank in descending order.As given in [20], the loading matrix  and the diagonal eigenvalue matrix Λ are usually divided into the following formers according to : where ) , and  is the number of principal components.In this application,  can be obtained by cumulative percent variance.Then the principal and residual parts of  off can be calculated by X =  off      and X =  − X, respectively.

Kullback-Leibler Divergence.
The KLD is a most fundamental quantity in information domain [23].It has been reviewed as a power tool in many applications [24,25].The original definition can be found in [26] with the following former: where log(⋅) has base ,  1 () and  2 () are the continuous probability density functions (PDFs), and ( 1 ‖  2 ) is the defined KLD of  2 () with respect to  1 ().As pointed in [23], KLD have the characteristic with Pythagorean inequality, so a symmetric quantity is usually defined to be Given two normal PDFs such that then the KLD between two above normal distributions in case of positive variances in (5) can be equal to

Incipient Fault Estimation Method
3.1.Data Preprocessing.As explained in Section 2.1, CRH2 is a non-Gaussian system which prevents from adapting existed PCA-based FDD methods.Therefore, preprocessing the system data combined with its properties should be considered to cater for the Gaussian distribution of the measurements from three current sensors in Figure 1.Followed by this problem, some preprocessing steps based on the characteristic of selected signals should be implemented before incipient FE.
Let  ,off = [      ] ∈  × be the offline normal current signals of the CRH2; it is straightforward to know that  = 3. Figure 2 gives the relationship among three date spaces, as -- coordinate, - coordinate, and - coordinate.In Figure 1, the currents,   ,   , and   , are revolving axes with the same constant length in original -- coordinate if there is no fault.
In stationary coordinates, the phase currents can be transformed from -- frame to - frame by Clarke transformation, and the mathematical formulation is given in where  3/2 is the Clarke transformation matrix and Θ is the angle between the coordinate  and the coordinate .In order to reduce complexity, Θ is used to be chosen as 0, which is depicted in Figure 2. - coordinate in Figure 2 is a rotary frame, and its rotary angular velocity  is synchronous angular velocity.And the transformation angle can be calculated by  = .Then the currents   and   can be derived from   and   in stationary - frame, and the Park transformation matrix  2/2 is given as follows: Furthermore, the transformation matrix from -- coordinate to - coordinate can be obtained by combination of Clarke and Park transformation matric, and its form can be given as With  ,off ∈  ×2 = [    ]  , it can be obtained by By the nonlinear projections in (11),  ,off can be expressed as where X,off and X,off are the principal and residual subspace belonging, respectively, to  ,off , and  pc and  res are principal and residual vectors of  ,off .Since  3/2 is closely related to the rotary angular velocity  of - coordinate, both of the two subspaces must be rotary.Then, the three-phase sine currents can be projected on RPCS and RRS, which are, respectively, spanned by  pc and  res .
Remark 1.The data processing in (12) transforms the original data set  ,off into a new rotary data set  ,off which obeys Gaussian distribution approximately.It make an important connection between Λ in (2) and  2 in (7).Rewriting  = ( 1 , . . .,   ) and combining (1) with (2), it is easy to know that  2  = var(  ) =   , where  = 1, . . ., .Considering () as the referenced model obtained by offline data and calculating f() from the online data, then KLD can be used as a SFI to estimate amplitude of incipient fault online.

SFI.
After the above nonlinear projections, the measurement signals obey Gaussian distribution.Let the reference PDF of score be that After data normalization for  ,off , every column of  ,off will have zero mean and unit variance.Then,   is zero and every variance parameter   of principal score   can be obtained by Λ  in (3).For online sampling, the online PDF f() = N( μ , δ2  ) can be estimated via scores in RPCS and RRS.
If the incipient fault can be estimated accurately, one SFI with high sensitivity to fault should be chosen.In order to emphasize the unpredictable small changes caused by incipient faults, the SFI using KLD can be defined as where sgn() is the sign of .
For the SFI in (13), it contains two terms of parameters, as   and  2  obtained from offline data and μ and δ2  calculated in real-time.
Remark 2. If the slight abnormality can be estimated accurately, SFI in (13) should conform with four conditions: (i) sufficient sensitivity to faults; (ii) appropriate robustness to noises; (iii) precise relation between fault amplitude and SFI; (iv) the relation from fault amplitude to SFI which is a double mapping function.
For condition (i), the proposed SFI is more sensitive than Mahalanobis distance and Euclidean; similar theoretical demonstration can be found in [27].For condition (ii), the satisfied robustness to noises is analyzed in Section 5.3.Condition (iii) is easy to achieve because many curve fitting techniques can be used.In this application, neural network is adopted; this relation and approving experimental results are presented in Sections 3.3 and 5.2, respectively.And the mapping relation in condition (iv) is established, because sgn function ensures that fault and SFI share the same sign, and the positive correlation between fault amplitude and SFI can be known from Sections 3.3 and 3.4.
Remark 3. The developed SFI in (13) has three advantages over other related works.Firstly, it is more effective in detecting the change in mean deviation, while the methods in [16,17] are not capable because they used the assumption that (  − μ ) is 0. Secondly, it can determine the sign of the faulty parameter, which is useful in the subsequent fault isolation.Thirdly, it is more robust to noises and disturbances than the methods in [16], because a moving window strategy is used in (13).

Covariance Matrix Analysis.
For simplicity,  ,off is replaced by  off = [    ] in the following analysis.In this study, let -th measurement variable of fault-free portion be   .In the presence of an incipient fault   , the sample value   can be described as follows [13]: In order to simplify analysis, ( 14) can be further rewritten as where   =   ./ is the amplitude variation rate on   .Assume that there are  observations; then (15) within multivariate statistical former can be expressed as the following expression: where Ξ = [ 1 , . . .,   , . . .,   ] ∈  × is fault magnitude rate (FMR) matrix and ∘ denotes Hadamard product.Assume that -th sensor is affected by incipient fault after the sampling step , and the change of fault amplitude  depends on sampling step ; then Ξ = [0 ⋅ ⋅ ⋅   ⋅ ⋅ ⋅ 0], and the FMR vector   = [0 ⋅ ⋅ ⋅   ⋅ ⋅ ⋅   ]  .For simplicity, the single FMR value   is abbreviated as () where  = 1, . . ., .
Here, () = 0 when  ∈ [1,  − 1]; and ∃() ̸ = 0 if  ≥ .For incipient fault, the fault magnitude satisfy ()  =   in a size of small moving window ; then one can know that () = ⋅ ⋅ ⋅ = () = .After data normalization using the means and variances obtained by offline data set,   in ( 16) can be rewritten as where , the online covariance matrix is where Based on the computation rule of Hadamard product, we can see that where   =      ,   = [0, . . .,   /  , . . .,   /  ]  , and Asymptotically, Δ → 0 as  → 0. Comparing the covariance matrices   to  off , the statistical parameters Δ will be changed slightly by the amplitude of incipient fault .Because the spanned vectors in RPCS and RRS share the same directions, variations in the eigenvalues will be tinily affected by  from the expression of   .Based on (18), there is That is to say that ΔΛ ≈ 0 when the measurements are affected by tiny abnormal value.Then Taylor development for Λ  can be used in the neighborhood of  = 0 with the following form: From (20), Λ  =     .Substituting ( 21) into (20), it has where then  3   / 3 = ⋅ ⋅ ⋅ =     /  =   () = 0.It is interesting to find that all higher-order partial derivatives of   are equal to 0 if their order is more than 2. Thus, ΔΛ can be described as Remark 4. From (24), ΔΛ is only affected by fault amplitude.
As Λ = Var( off ) and ΔΛ = Var(  ), the KLD between offline and online PDFs can be directly calculated by Λ and ΔΛ instead of obtaining PDFs by kernel density estimation, the merit of which is a remarkable computational cost reduction.

3.4.
Incipient FE Analysis.Recall ( 13) and combine assumptions   and f which are the PDFs of   and t ; then the -th SFI can be written as where In (25),  1 ,  2 , and  3 can be obtained according to online data.The variations of SFI caused by incipient fault only depend on fault magnitude from the above expressions.In fact, there exists four candidate items for analytical expression of â which allow one to derive relation among fault amplitude and KLD.However, two main reasons make this solution improper results: (i) this may lead to a undetermined result from four analytical expressions of â; (ii) some approximate conditions are used in the above expressions, which must introduce estimation bias on fault amplitude even if the system noise is considered.By considering the relation in (25), the correlation between SFI and fault amplitude can be hence given as In (27), the correlation  must be nonlinear and monotonic increasing.Then, an alternative solution based on neural network is used to determine the nonlinear relationship .By backpropagation neural network, the correlations for application in Section 5 are shown in Figures 3 and 4. Both curves reflect the correlations which are described in (27).
The curves in both Figures 3 and 4 providing an indicator proportional to abnormality allow for the possibility of FE.

On-Line Fault Estimation Strategy
The flow diagram of fault estimation strategy is given in Figure 5, which contains data preprocessing, SFI, and fault estimation.The data processing is one of the key steps in Figure 5, which can make the original data set obey Gaussian distribution in the rotary space.And the rotary speed depends on the rotary angular velocity  of traction motor in CRH2.In addition, data normalization is needed, which gives the same weighting for   and   and simplifies the online fault diagnosis algorithms.The data normalization steps are summarized in Algorithm 5, and projecting original data into a rotary space is embedded in Algorithms 6 and 7.In addition, the complete fault estimation strategy shown in Figure 5 can be implemented by Algorithms 6 and 7.
Algorithm 5. Consider the following.

Offline Modeling Steps
Algorithm 6.Consider the following.
Step 1. Collect normal operating data  ,off ∈  ×3 from three current sensors in CRH2 under steady state.
Step 5. Calculate the score matrix in RPCS by where Step 6. Obtain the mean value and variance of score vector by Step 7. Determine the nonlinear double mapping relations between  and SFI 1 , SFI 2 .

Online FE Steps
Algorithm 7. Consider the following.
Step 1. Load a new current data  new from the running CRH2.
Step 2. Project  new into RPCS and normalize the data  ,new as  ,new using Algorithm 5.
Step 4. Calculate the mean value and variance of the online score vector by Step 5. Compute by (13).
Step 6. Estimate the fault amplitude according to the obtained relations in Step 7 of Algorithm 6.And then go back to Step 1.
Remark 8.The reasons for adopting moving window approach in Step 3 of Algorithm 6 are the following: (i) the single sampling value has no mean and variance; (ii) the moving window approach can weaken the influences caused by noises.

Results and Illustrations
In order to test the performance of the proposed method, some experiments of CRH2 explored by Central South University [28] are conducted in this section, as shown in Figure 6.The experimental setup with fault injection operations of CRH2 consists of DSP controller, upper computer, dSPACE, data acquisition, and display devices.Its main parameters in electric multiple unit (EMU) are given in Table 1.In this part, we concentrate on incipient sensor FE to illustrate the effectiveness of the proposed methodology.

Incipient Fault Injections.
When the reference running speed 200 Km/h is given, the tendency of six curves will be invariable after 1 s.It indicates that the traction system is running in the steady state.Therefore, the historical data are generated after 1 s to establish the offline data model.In this paper, we consider three continuous output currents as the selected signals.
For simplification without generality, -phase current is chosen as the corrupted signal.And two types of incipient current sensor faults are considered.

Incipient Fault Estimation Results
(1) Results for  1 .For the normal and the corrupted current signals with  1 , both data sets under SNR = 30 db are provided in rotary space, as shown in Figure 7.The added sensor noise allows the SNR of 30 db which is a reasonable noise level for electric system.And the equivalent value of actual fault is presented in Figure 8.As the current sensors are infected by the zero-mean Gaussian noises, there exists extreme tiny value in the case of normal condition which can be filtered out by using the proposed method.After 0.5 s, the constant bias fault will be invariably fluctuating by the preprocessing, as shown in Figure 8.In rotary data space, this waving abnormality is sine wave whose period and amplitude is dependent on synchronous angular velocity  and abnormal value, respectively.Based on the online strategy in Section 4, two latent components are selected to describe the data model in rotary space.This is caused by two approximate equalities of eigenvalues in (30).In Figures 9 and 10, two SFIs display the sensitivity to tiny distortion.Both figures clearly show this effectiveness for small bias.This performance is actually perfect for practical application under 30 db noise level.
In Figures 11 and 12, the red line is actual incipient fault value on   , and the blue line marked by + is the estimation value â by using the proposed method.On the basis of the enlarge figure in Figure 11, the estimation amplitude is close to actual fault value, which can show the correctness of the proposed incipient fault estimation method.However, there exists small acceptable delay for results.The  reason for this phenomenon will be discussed in the following subpart.Similarly, the result depended on SFI 2 in Figure 12 also showing its satisfied estimation performance.
(2) Results for  2 .For a ramp incipient fault, its wave and the corresponding corrupted signals   are shown in Figure 13.In this case, the tiny amplitude  is steadily climbing from 0 to 1% when the time ranged from 0.5 s to 1 s.Therefore, this must lead to the higher difficulty for FE than constant bias distortion.By using the nonlinear projection onto the rotary space, its characteristic of gradually changing is invariable, which is depicted in Figure 14.
For this type of slight abnormality, Figures 15 and 16 line and marked blue line.Even coming to the very tiny fault amplitude, the accuracy of results is still acceptable.(11), if there exists an incipient fault   in current sensors, the derived forms in rotary space can be described as

Discussions. Based on
where  , is the incipient fault in -th sensor,  = , , , and  , is its equivalent form in  coordinate,  = , .Therefore, both bias and drifting fault will be transformed to periodic signals which are described in Figures 7 and 14.In this paper, the number of scores within the moving window is chosen, 20.Following the results in two examples, a window size of  = 20 is sufficient for perfect FE performance.However, it has a little flaw which is the short delay.On the online stage, the estimated mean μ and variance δ of score vector in (34) are calculated by using multiple current measurements.This will lead to two results.
(i) Improving the robustness: by using the moving window approach in online computation part, the sensor noise effect can be notably reduced.Theoretically, the effect caused by noise will be completely filtered out if the window size trends toward infinity.
(ii) Producing the time or step delay: because of periodicity of fault value after nonlinear projection, the amplitude will fluctuate between its bottom and peak in the rotary space.Combining with multiple score values, the evaluation function at step  will be impacted by  −+1 , . . .,  −1 which are in the window.In fact, the length of delay is dependent on the moving window size.If  is chosen as 1, the SFI can be only determined by the current score value.In this case, the delay can be eliminated.
Therefore, the tradeoff between robustness and the estimation delay should be considered in choosing the window size .From the asymptotic behavior of SFI, increasing the number of score value does not affect the robustness when  > 30.In fact, this effect can be approximately achieved if  ≥ 20.This characteristic was illustrated and shown in incipient fault detection [27].Moreover, the waves of SFI and estimation values are similar to actual fault value.As the longer window size is chosen, the similarity among them will be reduced, and fault estimation delay will become bigger.In addition, for the degradation in sensor precision, the large window size is problematic.
The sampling time of the experimental platform in Table 1 gives the step time.Then, the constant delay time is 0.4 ms which can be obtained by the size of moving window.From the enlarge pictures in four fault estimation figures, the short delay is acceptable for industrial application.Among them, Figures 11 and 12 perform better amplitude estimation because of the bigger fault amplitude.
Beyond the moving size, Fault-to-Noise-Ratio (FNR) is introduced to explain the effectiveness of the proposed method from the results in Figures 17 and 18.It is well known that FNR is defined as FNR = 10 log( 2  / 2  ), where  2  and  2  are fault power and noise level, respectively, although the weaker results than Figures 11,12,17,and 18 show more useful information by varying FNR.For the drifting fault in Figure 13, the peak of FNR level in every period ranges from minus infinity to 10 db after 0.5 s.From the FE results, it can be seen that the proposed SFI in this paper is sensitive enough to emphasize such tiny abnormality under high noise levels.

Conclusion
In this paper, the real-time incipient sensor FE in CRH2 is investigated.An effective SFI based on KLD and PCA is developed and analyzed.In order to cater for the latent requirement of PCA, a rotary space is firstly introduced into data-driven FDD and FE domain.The proposed FE methodology can not only emphasize the tiny abnormality but also be insensitive to sampling noises.Through testing incipient faults in experimental setup of CRH2, the feasibility and efficiency of the developed method are validated.The effect of some factors on FE results have further been explored.Moreover, the proposed method can be extended to other electrical systems based on the nonlinear projections and SFI from both theoretical and practical points.

Figure 2 :
Figure 2: Two data space transformations to deal with non-Gaussian data set.

Figure 5 :
Figure 5: Flow diagram of incipient FE based on the proposed approach.

Figure 7 :Figure 8 :
Figure 7: The influence on rotary date space caused by  1 .

Table 1 :
The system parameter for EMU in CRH2.