A Partial Robust M-Regression-Based Prediction and Fault Detection Method

and Applied Analysis 3 S1: Calculate ψr i using (7) with initial value r i = y i −med(y j ), calculate ψx i using (6) with t i replaced by x i , and then compute ψ using (11); S2: Multiply each row ofX and y(l) by√ψ, then perform PLS regression (see Algorithm 1) on the new PLS model. Divide each row of T by√ψ; S3: Calculate r i using (8), and ψ using (6), (7), (11); S4: If the relative difference in norm between two consecutive approximations of the regression coefficients is smaller than 10−3 then continue to the next step, else go back to S2; S5: Get T, P, q, β in the last step of PLS at the final iteration. Algorithm 2: Main steps of PRM. S1: Compute regression coefficient β using PRM algorithm (see Algorithm 2). S2: Realize online prediction using (5). S3: Perform SVD on ββ. S4: Monitor subspaceX using (15) and (16). S4: Monitor subspaceX using (17) and (18). Algorithm 3: Main steps of the new prediction and diagnosis approach. will be first multiplied by√ψ to (√ψxi, √ψyi), and then PLS regression is performed on the reweighted model. Detailed steps of PRM are summarized in Algorithm 2. 2.2. The Proposed Prediction and Diagnosis Approach. Based on the algorithm of PRM (see Algorithm 2) and (5), we can easily implement online prediction of quality-related indicators. Next we will develop EM-PRM-based fault detection scheme. For most existing schemes, detecting all the faults timely and accurately is the most important evaluation criteria. However, not all the faults may cause serious damage in practice, for example, faults that are unrelated to the quality-related indicators are harmless to the production. Therefore, if the nature of the fault is known in advance, reducing fault alarm rate for the fault unrelated to the qualityrelated indicators is another important evaluation criteria [16]. Zhou et al. [26] proposed such a criteria which classifies the faults into two categories, that is, faults effecting y and faults having influence on y. Based on this criteria, we should design test statistics and the corresponding threshold in subspaces X and X, separately. Following the idea of [16], an orthogonal decomposition algorithm is employed in our new fault detection scheme. First, perform singular value decomposition on matrix


Introduction
With the rapid development of modern science and technology, the industrial production processes become more automated and more complicated.The result is that safety and reliability of the complicated process become critical issues concerned during the process design [1,2].Many efforts have been done both in industry and academia.If precise analytical model of the process is known as prior, the well-developed model-based diagnosis approaches can be successfully applied for online process monitoring [3][4][5][6][7][8].However, limited to the poor understanding of the underling process, it is quite difficult to obtain a precise model of the process, which means that model-based techniques usually cannot be applied in practical.
Different from model-based approaches, data-driven techniques do not require any knowledge about the model of the complex process.Many efficient data-driven methods have been developed in recent years [9][10][11][12][13].Due to its simplicity and easy implementation, partial least squares [14,15] quickly becomes one of the most popular methods.By identifying the regression coefficient between the measurable variables space and the prediction variables space, PLS can be easily applied for the prediction of the quality-related indicator [16,17].Besides, the successful applications of PLS in fault detection have also been reported in many existing literatures [9,16,18].However, one drawback of PLS is that it is very sensitive to the abnormal characteristics of the measured process data, for example, outliers, which may be caused by various reasons like formatting errors, hardware failure, nonrepresentative sampling, and so forth.One single outlier may seriously affect the performance of PLS.In statistical sense, outliers are samples with extreme values that are located far from the data majority.There are two categories of outliers in the measurable variables space and the prediction variables space, called high leverage points and high residual points, respectively.To overcome the drawback of classical PLS, many robust versions of PLS had been proposed [19][20][21].Nevertheless, all these methods either suffer from nonrobust to high leverage points or are not efficient enough.To develop a robust and efficient method, Serneels et al. [22] proposed a partial robust M-regression (PRM) approach which weakens the effect of outliers by choosing a proper weighting scheme with relative less computational load.PRM has become a popular method and a matlab toolbox had been developed.
On the other hand, the goal of modern industrial process is the pursuit of high quality, not just the high production.It is extremely important to ensure high quality products to S1: Normalize  and  into zero mean and unit variance.S2: Set  0 =   ,  0 =   ,  0 = , for  = 1, . . ., ℎ.
2.1 compute   , which is the dominant eigenvector of make enterprises to survive in the fierce competition of the worldwide market.Nowadays, quality-related prediction and diagnosis play a critical role in practical production and have a wide range of applications [23,24].Therefore, it has practical significance to develop a robust data-driven, qualityrelated prediction and diagnosis method which can deal with the outliers.The main purpose of this paper is to develop such an approach.Based on the partial robust M-regression method, this paper first realizes a PRM-based prediction approach.Furthermore, with an orthogonal decomposition [16] performed on the measurable variables space, this paper finally develops a quality-related prediction and diagnosis method.
The rest of this paper is organized as follows.Section 2 first reviews the basic algorithm of the partial robust Mregression and then proposes the new method.Section 3 briefly introduces the industrial benchmark of Tennessee Eastman (TE) process.Section 4 presents the simulation results, and we draw conclusion in Section 5.

Preliminaries and the New Approach
2.1.Partial Robust M-Regression.PRM is a robust version of PLS which can weaken the effect of outliers by choosing a proper weighting scheme.Let us first review the classical PLS algorithm.Given measurement data matrix (measurable variables space) , in which  observations of  measurable variables are recorded, and a quality variable vector  which contains  observations of one prediction variable (a univariate output is considered here), that is, by projecting  and  onto the latent variables space, we have the following PLS model: where ℎ is the number of latent variables and  ∈  ×ℎ and  ∈  1×ℎ are the loading matrices of  and , respectively.ŷ is the predicted output and  ∈  ×1 is the regression coefficient between the measurable variables space and the prediction variables space.X and ỹ are the residuals of  and , respectively.Algebraically, the PLS model can be calculated by an iterative algorithm, such as NIPALS [25] or SIMPLS [15].We take SIMPLS for example, which can be summarized in Algorithm 1.
As mentioned previously, there are two categories of outliers existing in the measurable variables space and the prediction variables space, respectively.In order to weaken their influence, two types of weighting coefficients are designed in PRM, called leverage weights    and residual weights    , which are computed as follows: with where med and med  1 are the median estimate and  1median estimate, respectively. is the "fair" function and  is a tuning constant [22].Then, the global weight  is To solve PRM, an iterative reweighted partial least squares algorithm will be used.In each step, the observation (  ,   ) S1: Calculate    using (7) with initial value   =   − med(  ), calculate    using (6) with   replaced by   , and then compute  using (11); S2: Multiply each row of  () and  () by √, then perform PLS regression (see Algorithm 1) on the new PLS model.Divide each row of  by √; S3: Calculate   using (8), and  using ( 6), ( 7), (11); S4: If the relative difference in norm between two consecutive approximations of the regression coefficients is smaller than 10 −3 then continue to the next step, else go back to S2; S5: Get , , ,  in the last step of PLS at the final iteration.
will be first multiplied by √ to (√  , √  ), and then PLS regression is performed on the reweighted model.Detailed steps of PRM are summarized in Algorithm 2.

The Proposed Prediction and Diagnosis Approach.
Based on the algorithm of PRM (see Algorithm 2) and ( 5), we can easily implement online prediction of quality-related indicators.Next we will develop EM-PRM-based fault detection scheme.For most existing schemes, detecting all the faults timely and accurately is the most important evaluation criteria.However, not all the faults may cause serious damage in practice, for example, faults that are unrelated to the quality-related indicators are harmless to the production.Therefore, if the nature of the fault is known in advance, reducing fault alarm rate for the fault unrelated to the qualityrelated indicators is another important evaluation criteria [16].Zhou et al. [26] proposed such a criteria which classifies the faults into two categories, that is, faults effecting  and faults having influence on .Based on this criteria, we should design test statistics and the corresponding threshold in subspaces X and X, separately.Following the idea of [16], an orthogonal decomposition algorithm is employed in our new fault detection scheme.
First, perform singular value decomposition on matrix   : where Then, construct orthogonal spaces Π  , Π ⊥  as follows: Last, project  onto the orthogonal subspaces X and X: After obtaining X and X, we can continue to design test statistics and threshold in the two subspaces, respectively.Firstly, we use x = P    for  2 statistic for monitoring subspace X; we have and the corresponding threshold is where X , a fault which affects  appeared, and else it is fault free.
If  2 X ≥  ℎ, 2 X , a fault which has no influence on  appeared, and else it is fault free.We summarize the main steps of our new method in Algorithm 3.

Description of Tennessee Eastman Benchmark
The Tennessee Eastman process is a chemical plant simulator, in which a total of 53 variables are available with 12 manipulated variables (XMV(1-12)) and 41 process variables (XMEAS(1-41)).It is developed by Eastman Chemical Company to serve as a benchmark for research purpose and it can be downloaded from http://brahms.scs.uiuc.edu.Figure 1 shows the schematic diagram of TE.As we can see, five units are contained in the process: a vapor-liquid separator, the condenser, the reactor, a product stripper, and a recycle compressor.The process produces two products from four reactants.An inert and a by-product are also present, making a total of eight components, which are named as A, B, C, D, E, F, G, and H [10]. Additionally, for monitoring studies purpose, 21 faults (IDV(1-21)) are designed in the benchmark just as shown in Table 1.The effectiveness of the proposed approach will be verified on the TE benchmark.

Simulation Results
In this section, the proposed scheme will be applied on the TE benchmark.Two tasks are involved in the simulation, that is, quality-related prediction and fault detection.Firstly, we determine the input and output variables.As mentioned earlier, there are 53 variables available and we choose 22 process measurements (XMEAS(1-22)) and 11 manipulated variables (XMV(2-12)) as the input variables.The analyzer for component G (XMEAS(40)) is used for the final product

Fault number Location IDV(1)
A/C feed ratio, B composition constant IDV (2) B composition, A/C ration constant IDV(3) D feed temperature IDV (4) Reactor cooling water inlet temperature IDV (5) Condenser cooling water inlet temperature IDV (6) A feed loss IDV (7) C header pressure loss-reduced availability IDV (8) A, B, C feed composition IDV (9) D feed temperature IDV (10) C feed temperature IDV (11) Reactor cooling water inlet temperature IDV (12) Condenser cooling water inlet temperature IDV (13) Reaction kinetics IDV (14) Reactor cooling water valve IDV (15) Condenser cooling water valve IDV (16) Unknown IDV (17) Unknown IDV (18) Unknown IDV (19) Unknown IDV (20) Unknown IDV (21) The valve fixed at steady state position analysis; therefore, we choose it as the output variable, that is, the quality indicator.A total of 480 samples are obtained from normal process operation, and these samples will be used for  method has an obvious prediction bias due to the existing of outliers, especially in Figure 5.In contrast, the PRM-based method provides a more accurate prediction result.These results explain the nonrobust nature of PLS and verify the robustness of PRM.Next, we apply the PRM-based method for qualityrelated fault detection.As explained previously, in the sense of quality-related classification of faults, the PRM-based method should distinguish whether a fault affects the predicted output or not.To illustrate this, we detect the faults IDV(4), IDV(5), IDV (8), and IDV(18) using the PRM-based fault detection method.Both the statistical results in the orthogonal subspaces X and X are shown in Figures 6(a)-6(d), represented by "T2h" and "T2t", respectively.According to [16], we know that X and  are completely unrelated.Seen from Figures 6(a)-6(d), the faults that affect the predicted output and have no effect on the predicted output are clearly distinguished.For example, in Figure 6(b), although both subspaces X and X have a fault during 160 s and 350 s, the fault in subspace X disappears after 350 s.In Figure 6(d), during 250 s and 400 s there is a fault in subspace X but there is no fault in subspace X.In Figures 6(a) and 6(c), both subspaces X and X have a fault or not, synchronously.
By summarizing these experiments we are able to come to a conclusion that all these simulation results demonstrate the effectiveness of the proposed new approach.

Conclusion
Aiming to solve the nonrobustness of PLS against missing values and outliers, this paper presents an PRM-based quality-related prediction and fault detection scheme.Based on the partial robust M-regression approach, a prediction method is first implemented.Following the idea of orthogonal projection, different test statistics are designed in the two orthogonal subspaces, respectively.Thereby, quality-related fault detection is realized, in which faults that affect or do not affect the quality indicator are distinguished, so that the false alarm rate for the fault unrelated to the quality indicator will be reduced.The effectiveness of the new approach is finally demonstrated on the benchmark of Tennessee Eastman process.

Table 1 :
Predefined faults in Tennessee Eastman process.