A Combined Markov Chain Model and Generalized Projection Nonnegative Matrix Factorization Approach for Fault Diagnosis

The presence of sets of incomplete measurements is a significant issue in the real-world application of multivariate statistical processmonitoringmodels for industrial process fault detection. Since themissing data in the incompletemeasurements are usually correlated with some of the available variables, these measurements can be used if an efficient algorithm is presented. To resolve the problem, a novel method combiningMarkov chainmodel and generalized projection nonnegative matrix factorization (MCMGPNMF) is proposed to detect and diagnose the faults in industrial process.The basic idea of the approach is to useMCM-GPNMF to extract the dominant variables from incomplete process data and to combine themwith statistical processmonitoring techniques. T2 G and SPEG statistics are defined as online monitoring quantities for fault detection and corresponding contribution plots are also considered for fault isolation. The proposed method is applied to a 1000MW unit boiler process. The simulation results clearly illustrate the feasibility of the proposed method.


Introduction
As industrial process becomes more and more complex, process monitoring and diagnosis techniques are gaining importance for plant safety, maintenance cost, and profit margins.Multivariate statistical process monitoring (MSPM) techniques have been widely used to build statistical models for some unmeasured variables and for establishing online monitoring schemes for industrial process [1,2].These models extract a small number of latent variables, which in a manner better summarize the properties contained in the original variables.Monitoring and diagnosis using these latent variables are both simpler and more powerful than using the original variables [3].
It is well known that the traditional MSPM models usually require that the process data on all variables must be complete.In practice, however, one of the challenges in applying these models is to deal with the process data sets that contain some missing observations.Sometimes, more than 50% of industrial process data would contain the missing data [4].Since the missing data in the incomplete measurements is usually correlated with some of the available variables, the conventional MSPM methods generally eliminate the sample data in the data matrix that contain them, but doing so will leave the corresponding nature of process unknown.
It is of great importance to determine how to use the incomplete process data sets to build the normal operating model.The well-known Markov chain model (MCM), a typical stochastic process model, is one of commonly used prediction approaches.The most basic feature of MCM is "Markov property," also known as "no aftereffect."The next process variable state is predicted by the transition probability matrix obtained using MCM to predict the mobility of measurements if the original time series satisfies the conditions of the Markov chain (MC) [5,6].The MC prediction method has been widely used in various fields.In Jeon and Lee's research, MC-based prediction routing methods are proposed to select the optimal behavior nodes [7].In order to keep balance of electric power system, Yoder et al. use the MCM to predict short-term wind power [8].
Nonnegative matrix factorization (NMF) is a novel multivariate data analysis and dimension reduction technique that has many applications in spectroscopy, data mining, and pattern recognition [9].NMF and its variant methods are typically applied to high-dimensional data where each element has a non-negative value, and they find a low rank approximation from the historical process data sets.Unlike the traditional MSPM methods, the NMF-based algorithms do not have any assumption about the nature of the process variables except for nonnegativity.The nonnegativity restriction lets only additions in the factorization process.This property makes NMF obtain sparse and part-based subspace representations of the original data sets [10,11].Therefore, NMF-based methods have potential superiority to solve the monitoring and diagnosis problem of complex industrial process.
Normally, the NMF-based algorithms require that the measurement data be nonnegative.In practice, however, the process data of industrial process may be not fulfilling this constraint.Due to the difference in unit selection, the collected data is likely to contain negative numbers.Although it is possible by adjusting the unit to make negative data satisfy the nonnegative constraints, in order to make NMF method have a wider range of applications, we want to relax the nonnegative constraints on the original data sets.In this research, we will propose a new variant of NMF to solve the above problem.It can be called generalized projection nonnegative matrix factorization (GPNMF).Then, MCM and GPNMF are combined to extract useful information from incomplete process data and to combine them with statistical process monitoring techniques.
The rest of the article is organized as follows: Section 2 introduces MCM-based prediction method.Section 3 proposes the GPNMF method.In Section 4, the MCM-GPNMFbased process monitoring method is introduced.In Section 5, a case study in a 1000 MW unit boiler process is shown and discussed.Section 6 contains the conclusions.

MCM-Based Prediction Method
2.1.Markov Chain Model.Markov chain model is a stochastic process which can be used to estimate the transition probability between the discrete states of the system, and it can be expressed by some parameters.In the first-order Markov chain model, the state at a certain moment only depends on the state of its previous moment [12].In the further research, the second-order or even higher-order Markov chain process was proposed [12,13].In these second-order or higher-order Markov chain processes, the state of a moment depends on two or more states before it.In this paper, we will use the second-order Markov chain model to estimate the missing values in the data set.
For a second-order Markov chain model, let   be a stochastic process.The set of all possible states in the stochastic process is called state space.The state space is defined as  = {1, 2, . . ., }.
For a given time series  1 <  2 < ⋅ ⋅ ⋅ <  −1 <   , its conditional probability distribution can be expressed as Markov transition probability  ,, is defined as follows: where  represents the current state of the moment and  and  represent the previous and next states of , respectively.If a stochastic process contains  states, the matrix  ∈  ×× consisting of all transition probabilities is called the state transition matrix of the second-order Markov chain.The state transition matrix  can be expressed as follows: The second-order Markovian transfer matrix has the following property.For any state , the sum of the probability of its transition to all states is 1.
The maximum likelihood estimation of transfer probability  ,, for second-order Markov chains is defined as follows: where  ,, is the transition frequency that the state is transferred from the previous moment state  and the current time state  to the next moment state .
The cumulative distribution function (CDF)  CDF is also used in the process of generating the time series.The CDF of the second-order Markov chain model is calculated as follows: The second-order Markov chain model is used to generate the process variable time series, where the initial state is completely random.Generate a random number  between 0 and 1 by the random number generator.For the second-order Markov chain model, the current time state  and the previous time state  are known.If  satisfies  CDF,,−1 <  <  CDF,, , then the next time state is .
Only a time series of state is obtained by the above steps.Next, it is necessary to transform the resulting time series into the time series of actual process variable.In addition to the fact that the first and last states do not need to be transformed, all other states need to use formula (7) to transform.
where   and   represent the upper and lower limits of a state, respectively.  represents a random number evenly distributed between 0 and 1.  represents the actual value of the process variable.

Markov Property
where  , stands for the frequency that the variable transitions from state  to state .
When the amount of data is large enough, the statistic  obeys the  2 distribution and the degree of freedom of  2 distribution is ( − 1) 2 .The statistic  can be calculated as follows: where  , stands for the probability that the variable transitions from state  to state .When the significance level  is given, the value of  2  ((− 1) 2 ) can be obtained by looking up the table.The Markov chain model can be used to process the variable if  >  2   ((− 1) 2 ) is satisfied.

Numerical Example.
In this section, the active power of a 1000 MW unit is chosen for numerical experiment.Under the influence of random noise, the power measurement is randomly fluctuating near the set point.Therefore, the active power can be regarded as a stochastic process.The normal operation data of the active power is used as experiment data vector with 500 samples.The maximum and minimum values of power in all samples are 743.86MW and 749.94 MW, respectively.Therefore, the stochastic process is divided into 16 states.In these states, the power values 743 MW and 750 MW are defined as state 1 and state 16, respectively.The other states are evenly distributed between state 1 and state 16, and the power interval between each two states is 500 kW.The significance level is given by 0.05 in the present work.The conclusion can be easily obtained through the calculation; that is,  ≫  2 0.05 (225).Therefore, The Markov chain model can be used to process this time series.Next, we will select 50 samples from the experimental data randomly and their values will be set to zero.These 50 samples represent the missing data in experiment data vector.Then, the second-order Markov chain model is used to deal with the missing data in experiment data vector.The incomplete data can be replaced by the predicted value, which is shown in Figure 1.
As seen in Figure 1, the predicted data are very close to the corresponding original data.In other words, the prediction value by second-order MCM is very effective.Therefore, this example can illustrate the feasibility of the second-order MCM algorithm.

Generalized Projection Nonnegative Matrix Factorization
3.1.NMF Algorithm.The mathematical formulation of NMF is expressed as follows.Given a process data matrix  = [ 1 ,  2 , . . .,   ] ∈  × (each column of  is a sample vector), where all elements are nonnegative and a natural number  < {, }, NMF aims to find two low rank nonnegative matrices  ∈  × and  ∈  × such that  ≈ .Here, each column of  is called basis vector.Each column of  is called an encoding and is in one-to-one correspondence with a sample vector in  [10].In other words, each sample vector   is approximated by a linear combination of the columns of , weighted by the components of ℎ  [11].
The approximate factorization  ≈  problem can be formulated as an optimization problem that uses a cost function to measure the quality of the approximation.Lee and Seung constructed a simple objective function which utilizes the square of the Euclidean to measure the distance between  and  [11].The objective function of the optimization problem can be expressed as follows: min It is well known that the function ‖ − ‖ 2 is convex in  only or  only but is not convex in both variables meanwhile.Therefore, it is realistic to find local minima by solving the above optimization problem. Lee and Seung presented an iterative algorithm to obtain the local minima.The objective function ‖ − ‖ 2 is monotonically nonincreasing under the following rules [11]: Heretofore, the multiplicative iterative algorithm is the most classic and widely used monotone algorithm.The contradiction of the optimization algorithm between convergence speed and easy use can be better coordinated by using above iterative rules.

GPNMF Algorithm.
Many improvements of NMF have been presented based on the objective function of the basic NMF algorithm by adding different regularization terms so as to increase the constraint conditions to NMF from different perspectives [14,15].On the contrary, Yuan and Oja proposed an improved NMF algorithm based on linear projection [16].It can be called projective nonnegative matrix factorization (PNMF).However, the PNMF algorithm does not guarantee that the objective function is monotonically decreasing during the iteration, and there may be oscillations.In order to solve this problem and make the NMF algorithm better adapt to the actual industrial process, we propose a new variation of NMF method named generalized projection nonnegative matrix factorization (GPNMF).This method will not only guarantee the objective function monotone nonincreasing under the new iterative rule but also make the process data not necessarily constrained to be nonnegative.In this approach, the approximate factorization  ≈  will be rewritten as  ≈    and the optimization problem can be expressed as follows: The above constrained optimization problem can be regarded as a typical application of regularized least squares problem proposed by Tikhonov in 1963 [17].The objective function of (12) can be expanded to a function about the variable  which ignores the constant term   .Hence, (12) can be rewritten as follows: The gradient matrix of ( 13) can be obtained by using matrix differential: When the value of ( 14) is equal to 0, the solution of the least squares problem in ( 12) is given by Since the nonnegative constraint for the original data matrix  is relaxed in the GPNMF algorithm, now consider the case where  contains positive and negative elements.The factors  + and  − are defined as the absolute values of all positive elements and negative elements in , respectively. + and  − are calculated separately as follows: where || represents the absolute value for all the elements of the matrix .Then the original data matrix  can be expressed as  ± =  + − − and ( 15) can be adapted as follows: It can be seen from ( 13) that the objective function  is a quartic function with respect to the coefficient matrix .
Here, a new iterative rule for the optimization problem in ( 12) is presented as follows: The objective function in (12) is invariant under this update if and only if  is at local minima of the constrained optimization problem.

Fault Diagnosis Algorithm Based on MCM-GPNMF
4.1.Initialization of GPNMF Algorithm.For the moment, NMF and its improved algorithms are solved by iteration.
It is well known that a good iteration initial value can improve the convergence speed and accuracy of the NMFbased algorithm.Although many researchers have pointed out the importance of a good initial value for the NMF algorithm in their article, most people still use the random method to initialize the NMF algorithm in the practical application.Since the NMF can only converge to the local optimal solution, the different initial values will result in different results.Langville et al. compared several commonly used initialization methods [18].For the presented GPNMF algorithm, there is only one unknown factor, that is, coefficient matrix .Besides, the approximate factorization  ≈    can clearly indicate that the coefficient matrix  is calculated to satisfy    ≈ , where  is identity matrix of size .In this research, therefore, the coefficient matrix  is initialized with a -by--dimensional matrix with ones on the diagonal and zeros elsewhere.

Statistical Monitoring Model Based on GPNMF.
The GPNMF-based statistical process monitoring model can be described as where the matrix  Ĥ and  represent the eigensubspace and the residual subspace, respectively.The matrix Ĥ = (  ) −1    is the reconstructed value of the coefficient matrix .It reflects the changes in process variables.Due to the fact that the eigensubspace and the residual subspace of GPNMF algorithm are similar to the principal component subspace and residual subspace of PCA-based method, according to the definition of monitoring statistic  2 and SPE in PCA method, we construct two new monitoring statistics based on GPNMF statistical monitoring model to monitor the change of eigensubspace and residual subspace.They are calculated as follows: where the vector Ĥ() and () represent the th column of matrices Ĥ and , respectively.The column vector x() is the reconstructed value of ().It can be calculated as follows: For the two new monitoring statistics proposed in this section, it is obviously unreasonable to assume that these two metrics are subject to a particular distribution and calculate their upper control limits like traditional monitoring algorithm.Therefore, we use kernel density estimation (KDE) which is a more general method to calculate the actual distribution information of the monitoring statistics and then determine their upper control limits.The confidence interval  is given by 99% during the calculation process.

Contribution Plot Method.
It is well known that contribution plot method has been widely used in fault isolation research field.Once a fault is detected by the fault detection method, this method will be used to isolate the variables that are most likely to cause the failure.Contribution plot method is a commonly used fault separation method in PCAbased fault diagnosis algorithm.The monitoring statistics  2 and SPE are often used in the contribution plot method.The corresponding contributions to the  2 statistic and SPE statistic can be expressed using the following equations, respectively: where  = Λ −1   and C =  −   .The vector   represents the th column of unit matrix   .The integer  represents the number of process variables in PCA.The parameters Cont  2  and Cont SPE  represent the contribution of each process variable to the monitoring statistics  2 and SPE, respectively.
Alcala and Qin made a detailed analysis and summary of the contribution plot method [19].Besides, a basic framework to construct contribution graph method was given in their research.Based on this framework, the corresponding contributions to the  2  statistic and SPE  statistic can be designed as follows: where the parameters Cont represent the contribution of each process variable to the monitoring statistics  2  and SPE  , respectively.The vector   represents the th column of unit matrix   .The natural number  represents the number of process variables in GPNMF method.

Monitoring Performance
5.1.Fault Detection.In this section, the proposed method is applied to a 1000 MW unit boiler process.The boiler process for monitoring experiment contains three control systems: main steam temperature control system, main steam pressure control system, and feedwater control system.This process mainly consists of 33 consecutive process measurements variables, including 14 temperature signals, 9 pressure signals, 9 flow signals, and 1 power signal.All the 33 process variables are used for fault detection in this experiment.The normal operation history data of the boiler process is used as training data set which has 500 samples.For the fault data, the corresponding testing data set consists of 500 observations.Remarkably, the testing data set begins with normal operation data and the abnormal data is introduced from sample 51.All these process data are sampled every 3 sec.
In order to display the monitoring performance of MCM-GPNMF-based method, a bias fault in the main steam temperature  is taken into account.The magnitude of the fault is 1% of the actual measured value.Next, we consider the condition that 10% of the measurement values in testing matrix are missing randomly.The testing matrix contains 33 variables and each variable has 500 samples.In other words, 1650 values of testing matrix are missing.The monitoring result of bias fault when 10% of testing data are missing is shown in Figure 2. performance, if all the process data is complete.However, the monitoring performance of the method has a serious decline when the testing data set contains missing data.When the fault occurs, the localized features of missing values will have a little change.So it is a huge challenge for  2  statistic of GPNMF, which is shown in Figure 2.Many fault samples are error-detected by  2  statistic.For the SPE  statistic, although it can detect almost all of the faulty samples, many normal samples are error-detected.On the contrary, the  2  statistic and SPE  statistic of MCM-GPNMF-based method can detect almost all the faulty samples (over 98% actually).Meanwhile, the MCM-GPNMF-based method has the lower false alarm rate of SPE  statistic than GPNMF, when the testing matrix is incomplete.However, in all three cases, it should be mentioned that several normal samples are errordetected, but the monitoring result of normal samples is still acceptable.In summary, the presented MCM-GPNMFbased monitoring method can deal with the fault detection problem, which contains incomplete data.

Fault Isolation.
Based on the monitoring result of the previous section, in this part, we will use the contribution plot based on  2  statistic and SPE  statistic to identify the variable that most likely leads to the fault.Due to the fact that the accuracy of fault variable identification is closely related to the accuracy of fault detection, it is meaningful to calculate the contribution value of each variable after the system fault is detected accurately.The most obvious change when bias fault occurs is that the value of main steam temperature  suddenly changes by the reason of a step temperature change at itself.The main steam temperature  is the 13th variable of the chosen boiler process.Both of the contribution plots corresponding to  2  statistic and SPE  statistic, shown in Figure 3, identify this variable accurately.It could illustrate the effectiveness of the presented algorithm.

Conclusions
In recent years, the NMF-based algorithm, which is still under development, has shown great potential in the field of industrial process fault detection.However, there are still some weaknesses for these methods in dealing with the problem of missing data.This paper proposed a new variant of NMF named generalized projection nonnegative matrix factorization which combines with the second-order Markov chain model for the situation that the process data contain missing data, which significantly improves the detection rate of industrial process.Meanwhile, two types of monitoring indices,  2  statistic and SPE  statistic, and the corresponding contribution plots were designed, respectively.As a result, the simulation in a 1000 MW unit boiler process showed that the proposed method yielded better results for fault detection and fault isolation.In other words, the monitoring method based on MCM-GPNMF proposed in this work is valuable for research and application.

Figure 1 :
Figure 1: Comparison of predicted and real values.

Figures 2 (Figure 2 :
Figure 2: The monitoring result of bias fault.

Figure 3 :
Figure 3: The contribution plots when bias fault occurs.
Test.When using the Markov chain to predict the value of a variable, it is not necessary to consider the change of the variable in the past.If the current state of the variable is known, the state of the next moment can be predicted.The Markov property test of the original time series should be carried out before the establishment of the Markov chain model.The specific method of Markov property test is given as follows: we assume that all states of the original time series are .The marginal probability  ⋅, is defined as follows: