Soft Sensor Development Based on Quality-Relevant Slow Feature Analysis and Bayesian Regression with Application to Propylene Polymerization

Data-driven soft sensors are widely used to predict quality indices in propylene polymerization processes to improve the availability of measurements and efficiency. To deal with the nonlinearity and dynamics in propylene polymerization processes, a novel soft sensor based on quality-relevant slow feature analysis and Bayesian regression is proposed in this paper. The proposed method can handle the dynamics of the process better by extracting quality-relevant slow features, which present both the slowly varying characteristic and the correlations with quality indices. Meanwhile, a Bayesian inference model is developed to predict the quality indices, which takes advantages of a probability framework with iterative maximum likelihood techniques for parameter estimation and a sparse constraint for avoiding overfitting. Finally, a case study is conducted with data sampled from a practical industrial propylene polymerization process to demonstrate the effectiveness and superiority of the proposed method.


Introduction
As modern process industries become larger scale and more integrated, pivotal key performance indices about product quality, process safety, and pollution reduction should be closely monitored [1][2][3][4]. Besides, real-time measuring of quality indices is important for process monitoring, control, and optimizing products [5][6][7]. Thermoplastic polymers such as polypropylene are important materials and have been used in various sectors. The melt index (MI) which determines the grade of the polymer product is considered one of the most crucial indicators in quality control for industrial propylene polymerization (PP) processes. Up to date, an accurate first-principle MI model is still not available. The MI is usually evaluated offline with an analytical procedure in the laboratory which takes 1.5-2 h to complete. This can result in a delay in the quality control system as there is no available quality indicator at this time. In the absence of an economical or effective online measurement, soft sensors could serve as an alternative solution [8][9][10][11][12][13][14]. Additionally, with the wide availability of process data in PP processes, increasing data-driven soft sensors have been adopted to predict the MI.
Data-driven approaches are used as an alternative to the mathematical model. The data reading from sensors can often contain a large number of variables that describe the same process phenomena. When encountering highdimensional data, it is useful to project the data into the latent space, which is more compact than the original space. For MI prediction, the principal component analysis (PCA) [15] and partial least squares (PLS) [16] have been applied. Unfortunately, as the model is linear, it is inadequate when the soft sensing variables are nonlinear. To this end, nonlinear modeling methods are used in MI prediction, such as artificial neural networks (ANNs) [17], support vector machines (SVMs) [18], Gaussian process regression (GPR) [19,20], and relevance vector machine (RVM) [21] [ [22][23][24][25]. Recently, Liu et al. proposed an adversarial transfer learning-(ATL-) based soft sensor [26] and a domain adaptation transfer learning soft sensor for product quality prediction [7]. As classical methods, PCA and PLS have achieved great successes with respect to quality prediction by extracting latent variables (LVs) [27][28][29]. However, they can only be used to extract static LVs, which are unsuitable for modeling dynamic processes because they are limited by containing temporally related process information.
In practical chemical processes, frequent fluctuations of equipment characteristics with time always result in dynamic processes. For soft sensing of quality indices, dynamic data characteristic could play an important role in regression, the result of which may highly influence the performance for the quality control system. A new merged unsupervised algorithm, slow feature analysis (SFA) [30], is applied in our work for its remarkable ability to extract slowly varying and temporally related features for modeling dynamical processes. SFA has been extensively used for various reasons, such as blind source separation [31], signal processing [32], and process monitoring [33][34][35][36]. Unlike the static models generated by PCA or PLS, SFA-based models can better describe process temporal behaviors through the extraction of slowness of LVs. When used for modeling, the extracted slowly varying LVs will be termed as process intrinsic properties, and the fast-varying LVs are seen as process noise inversely. Up to now, some SFA-based methods have been applied in prediction modeling. Shang et al. [37] proposed probabilistic slow feature analysis (PSFA) based a soft sensor model for quality prediction. Fan et al. [38] proposed a robust PSFA-(RPSFA-) based regression model that models outliers in the observation data using Student's t-distribution. Zhong et al. [39] put forward an online quality prediction method based on modified regularized SFA (ReSFA). Jiang et al. [40] proposed a real-time semisupervised predictive modeling strategy for industrial continuous catalytic reforming process using SFA. However, traditional slow features are calculated only considering slowness of process variations, while their correlations with quality indices are neglected for feature extraction. Slow features are extracted and then used for quality prediction by performing regression using the ordinary least square, which means that they may not describe nonlinear relationship among variables well. Considering the nonlinearity of the propylene polymerization process, using nonlinear regression modeling method is quite necessary.
Hence, to deal with the nonlinearity and dynamics in PP processes, a soft sensor based on quality-relevant slow feature analysis and Bayesian regression is proposed in this paper. First, the quality-relevant slow feature analysis (QSFA) is adopted to extract slow features which present both the slowly varying characteristic and the correlations with quality indices. Then, the extracted quality-relevant slow features (QSFs) are sorted according to the correlation index and selected for regression modeling by the crossvalidation method. Based on the selected QSFs, a Bayesian inference framework named relevance vector regression (RVR) is developed to predict the quality variable, i.e., MI for the polypropylene products. It takes advantages of a probability framework with iterative maximum likelihood techniques for parameter estimation and a sparse constraint for avoiding overfitting. Finally, the effectiveness of the proposed method (QSFA-RVR) is confirmed through real data obtained from industrial propylene polymerization plants.
The rest of the paper is organized as follows. Section 2 presents the methods for the development of the proposed soft sensor. First, a brief introduction of the traditional SFA is presented. Then, the formulation for the QSFA-RVR prediction model is presented, which contains the QSFA method, the relevance vector regression, and the quality prediction method based on QSFA-RVR. Experiment results on real data from practical industry processes are analyzed and discussed in Section 3. Finally, the conclusion is drawn in Section 4.

Methods
2.1. Overview of Slow Feature Analysis. SFA is an unsupervised algorithm proposed by Wiskott and Sejnowski [30], which extracts invariant features from quickly varying signals. The slow variations are always termed as intrinsic features of signals. Meanwhile, rapid variations often allude to measurement noise. The detailed methodology of SFA is illustrated as follows [41].
Given a J-dimensional input signal X, SFA is aimed at finding mapping functions from input to slow features s j = g j ðXÞ, j ∈ ½1, 2,⋯,J that minimize under the constraints where h⋅i denotes time averaging and denotes the firstorder derivative of s j with respect to time.
To simplify this optimization problem, it is assumed that the SFs are a linear combination of input variables. The linear mapping is The optimization problem in (1) can be solved by two steps of singular value decomposition (SVD) [42], which is tantamount to a numerical solution to the generalized eigenvalue decomposition (GED) problem in where A = h _ X _ X T i denotes the covariance matrix of the firstorder derivative of X and B = hXX T i denotes the covariance matrix of X; W = ½w 1 , w 2 ,⋯w J T is the matrix of generalized eigenvectors, which are coefficient vectors of linear mappings; and Ω = diag fλ 1 ,⋯,λ J g is a diagonal matrix with λ j 2 Journal of Sensors = h_ s 2 j i being generalized eigenvalues, which are arranged in an ascending order.
By performing a first SVD on B with equation (5), the original input X is whitened as Eq. (6).
where U is an orthogonal matrix, Λ is a diagonal matrix, and cov ðZÞ = hZZ T i = I J . Then, a second SVD on the covariance of the first-order derivative of Z yields where P is an orthogonal matrix. Thus, the weighting matrix W can be calculated, and the slow features s can be obtained using equation (3).

Quality-Relevant Slow Feature Analysis.
Conventional SFA only considers slowness of process variations when extracting slow features. SFA's competence to retain process information containing quality indices is limited. An intuition is presented to integrate the relationships between slow features and quality indices into SFA's optimization problem. In this way, an improved SFA algorithm, named quality-relevant slow feature analysis (QSFA), can extract quality-relevant slow features (QSF) from a process, and these slow features can better reveal the essence of the process. Moreover, a new objective function for SFA should be designed to enable the QSFA that considers both slowness of process variations and their correlations with quality indices. Given the normalized process variables X and the quality index y, a new objective function JðwÞ composed of three subobjective functions is designed. The first subobjective function J 1 which considers that the QSF should change slowly is given as follows: where s is the QSF and w is the corresponding weighting vector. Considering that the QSF should be highly correlated with the quality index, the second subobjective function J 2 is aimed at maximizing the correlation between s and y, which is given as follows: The third subobjective function J 3 is used to maximize the variance of s which represents the variation information of the process.
Thus, a new optimization problem of QSFA can be described as By introducing Lagrange multipliers λ and δ, equation (14) can be easily obtained. Obviously, the value of the objective function JðwÞ is equal to ðλ + δÞ.
Therefore, we can rewrite the solution of equation (13) as the following GED problem [43]: where I is a J-dimensional identity matrix. By performing SVD decomposition on the covariance matrix _ X T _ X, we have The whitening transformation with the whitening matrix H is described as follows: Obviously, it holds that ZZ T = H T _ X T _ XH = I. Thus, finding a weighting vector w is identical to finding a vector Hφ that satisfies s = XHφ. The GED problem in equation (15) can be reformulated as Left multiplying both sides of equation (18) by H T and substituting H T _ X T _ XH = I and H T H = Λ −1 , we obtain It is an eigenvalue decomposition problem, and the eigenvector φ is calculated. Finally, the QSF s can thus be computed as 3 Journal of Sensors 2.3. Relevance Vector Regression. After the QSFs have been extracted and selected, a sparse Bayesian framework named relevance vector regression (RVR) is developed for the prediction model. Given a database fx n , t n g N n=1 with the input data x n and the corresponding target, the target can be expressed as follows [44,45]: where ε n~N ð0, σ 2 Þ. The function yðxÞ is defined as follows: where ϕ i ðxÞ = Kðx, x i Þ is the kernel function and ω = ½ω 0 , ω 1 ,⋯,ω N is the weighted parameter vector of the kernel function.
The targets is given as pðt n | x n Þ = Nðt n | yðx n Þ, σ 2 Þ. The likelihood function of target values t = ½t 1 , t 2 ,⋯,t N is written as where Φ is a design matrix consisting of kernel functions. A zero-mean Gaussian distribution is used to constrain the weight parameters ω: where α is a vector of hyperparameters (α = ½α 0 , α 1 ,⋯,α N T ). Then, the posterior distribution overweights can be calculated through the Bayesian rule: where the posterior mean and covariance are given by For a new test sample x * and the corresponding target t * , the predictive distribution is given by where α MP and σ 2 MP are most probable values for hyper-parameters and variance, respectively. Using equations (23) and (25), the predictive model is presented as where the variance is σ 2 * = σ 2 MP + ϕðx * Þ T Σϕðx * Þ. Thus, a Bayesian prediction model is developed. The sparseness of the model is guaranteed by the constrained weights that follow zero-mean prior distribution. The α and σ 2 are estimated by maximizing pðt | α, σ 2 Þ given by tipping [44]: where the covariance is given by Values of α and σ 2 can be calculated by the iterative method: 2.4. Quality Prediction Method Based on QSFA-RVR. As plenty of data have been sampled from the propylene polymerization process, a soft sensor model based on qualityrelevant slow feature analysis and Bayesian regression is constructed to predict the MI value for quality monitoring.
The model extracts quality-relevant slow features from the sampled data and built a conditional probability distribution over sparse weight parameters; then, MI values are predicted according to the Bayesian inference. The parameters of the model are estimated by a recursive expectation maximum technique. It is recognized that different QSFs may have different significances and only some of them are critical to quality prediction. The noncritical QSFs should be removed from model development to avoid undesirable disturbance. Here, the cross-validation method is adopted to determine the retained QSFs for Bayesian regression.
The flowchart of the QSFA-RVR soft sensing approach for quality prediction is shown in Figure 1. The step-bystep procedures for constructing the quality prediction model are described as follows: Step 1. Classify the modeling data into two groups, namely, the training dataset fX tr , y tr g and testing dataset fX te , y te g.
Step 2. Normalize the training dataset fX tr , y tr g withX tr − u/σ to obtain fX tr , y tr g, where u and σ are the mean and standard deviation ofX tr .

Journal of Sensors
Step 3. Conduct quality-relevant slow feature analysis with fX tr , y tr g and obtain a series of QSFs s = fs tr 1 , s tr 2 ,⋯,s tr J g according to equations (15)- (20).
Step 4. Sort the QSFs s = fs tr 1 , s tr 2 ,⋯,s tr J g in descending order evaluated by the correlation index RðiÞ: For brevity, the sorted QSFs are still noted s = fs tr 1 , s tr 2 , ⋯,s tr J g, and the corresponding weighting matrix is saved asw.
Step 5. Determine the retained QSFs s tr = fs tr 1 , s tr 2 ,⋯,s tr M g for regression modeling using the cross-validation method, where M is the number of retained features. The corresponding weighting matrix is saved as w.
Step 6. Initialize the parameters α and σ 2 of the quality prediction model.
Step 7. Given the values of parameters α and σ 2 and the training set fs tr , y tr g, compute the posterior mean μ and covariance Σ of the model according to equations (26) and (27).
Step 8. Given the values of posterior statistics μ and Σ, determine the model parameters α and σ 2 by equations (31) and (32).
Step 9. If values of parameters α and σ 2 are converged to their optima, continue to Step 10. Otherwise, go back to Step 7.
Step 10. For the testing dataX te , it is normalized as X te according to the mean and variance of the training dataset.
Step 11. Calculate the corresponding QSFs s te = fs te 1 , s te 2 ,⋯, s te M g by projecting the normalized X te onto the weighting matrix w, that is, s te = X te w.
Step 12. A quality prediction model is formulated by equation (29) with input s te . An estimated value of quality index y could be obtained by the prediction model.
To quantify the prediction accuracy, five performance indices are used for comparisons, namely, the mean absolute error (MAE), the mean relative error (MRE), the root mean square error (RMSE), Theil's inequality coefficient (TIC), and standard deviation of absolute error (STD). The error indicators are defined as follows:

Journal of Sensors
where e i = y i −ŷ i , e = 1/n∑ n i=1 e i , y i is the real MI value,ŷ i is the predicted MI value, and y is the mean value of the output. The MAE, MRE, and RMSE confirm the prediction accuracy of the prediction models. The smaller the value of these indicators is, the higher the accuracy of the prediction model is. The STD indicates the stability of the prediction model. The smaller the value is, the more stable the prediction model is. The TIC indicates a good level of agreement between the proposed model and the studied process.

Results and Discussion
The proposed QSFA-RVR model is applied to predict the MI in an industrial propylene polymerization process located in China. Figure 2 shows the schematic diagram of the industrial process. The process consists of four reactors in series: the first two continuous stirred tank reactors (CSTR) and the last two fluidized-bed reactors (FBR). The feed to the reactor is comprised of propylene, hydrogen, and the Ziegler-Natta catalyst. In the first two reactors, the polymerization reaction takes place in a liquid phase, and in the third and fourth reactors, the reaction is completed in the vapor phase to produce the powdered polymer products. The MI, which depends on the catalyst properties, reactant composition, reactor temperature, and so on, can determine different brands of the products. To develop a prediction model to estimate the MI, a total of nine process variables are chosen as input variables according to the workers' experience and reaction mechanism analysis. The selected variables are listed in Table 1, including the flow rates, temperature, and pressure. The MI is analyzed at a sampling time of 2 hours in the laboratory, and the dataset of nine process variables is acquired from the distributed control system of the polypropylene process. There are 170 datasets, and they are divided into training and testing datasets. The first 119 datasets (about 70%) are used for training, and the remaining 51 datasets (about 30%) are used for testing. It should be noted that the current MI data are from the same grade of polypropylene production, which is a slowly varying dynamic process. The data are filtered to discard abnormal situations and to improve the quality of the prediction results. The variables are normalized with the method of statistical normalization.
First, QSFs are extracted based on the proposed method. To provide a visual picture of extracted QSFs, Figure 3 visualizes all QSFs extracted on the training set for the PP   Journal of Sensors process. It is observed that the first six QSFs change more slowly and the slowness decreases. To investigate these QSFs in detail, two quantitative indices are defined, one of which measures the slowness of each QSF and the other measures the correlation with quality. The slowness index is calculated as SðiÞ = Δð_ s i Þ, which has been defined in equation (10). The correlation index RðiÞ defined in equation (38) is used to evaluate the correlation between a QSF and a quality index. Then, the slowness of QSFs and correlation coefficients with respect to the quality index, i.e., MI, are calculated and shown in Figure 4. These QSFs are sorted in a descending order and their values of slowness are correspondingly arranged. As can be seen from the figure, the slowness of QSFs is consistent with their quality interpretation. In particular for the first three QSFs, as the correlation weakened, the slowness increased. Compared to the first seven QSFs, the 8th and the 9th QSFs are fast time-varying features and almost irrelevant to the quality index. It reveals that the MI is almost all determined by slowly time-varying QSFs. Thus, a good prediction performance can be achieved based on the extracted QSFs which present both the slowly varying characteristic and the correlations with MI.
For prediction purpose, the cross-validation method is adopted to determine the retained QSFs for Bayesian regression. The number of retained features is M = 5. Then, the proposed QSFA-RVR model is built based on the selected QSFs. At the same time, the RVR model, the qualityrelevant slow feature regression (QSFR) model, and the PCA-based RVR (PCA-RVR) model have also been developed in comparison with the proposed model. Performances of different models for predicting MI values on the testing dataset are shown in Figure 5. The analytic values obtained from the laboratory are marked with points, and prediction values of RVR, QSFR, PCA-RVR, and QSFA-RVR are marked with triangle, asterisks, squares, and pentagrams, respectively. As can be seen from the figure, the proposed QSFA-RVR model performance is the best among all models. It not only can achieve more satisfactory expected results but also can trace process variations well.
Moreover, another qualitative comparison is displayed in Figure 6. The scattered plots give exhibition on how the predicted values gather around the real values. If the plots are on the black diagonal line, the predicted results are more accurate and vice versa. As illustrated in Figure 6, the    45.2% compared to the PCA-RVR model. The comparison results prove that the soft sensor based on the QSFA-RVR model can achieve a good performance in the MI prediction for the propylene polymerization process.
The above experiments are performed on a personal computer with the configuration shown as follows: operating system: Windows 10 (64-bit); CPU: Intel Core i5-7200U (2.70 GHz); RAM: 8.00 GB; and MATLAB 2017b software. The computation time of the QSFR, PCA-RVR, and QSFA-RVR models is 0.96 s, 1.65 s, and 1.94 s, respectively. The prediction time for testing data only needs less than 2 s. Since the sampling time of the industrial MI prediction is about 2 h, the proposed method qualifies the online soft sensor for the MI prediction. The time lags of MI in the form of average residence time do exist and have been considered in our research work.

Conclusions
In this paper, a novel soft sensor based on quality-relevant slow feature analysis and Bayesian regression is proposed for quality prediction in industrial PP processes. The proposed method can handle the dynamics of the process better by extracting quality-relevant slow features and deal with the nonlinearity by introducing the Bayesian inference model. A case study about the MI prediction in a real industrial PP plant is carried out to evaluate the performance of the proposed QSFA-RVR method. For comparison, the RVR model, the QSFR model, and the PCA-RVR model are also developed and evaluated. The application of the proposed model to the testing dataset demonstrates its superiority. The QSFA-RVR model predicts the MI with an MRE of 0.19%, which is much more accurate than the PCA-RVR model with an MRE of 0.45%, while much better than the QSFR model with an MRE of 0.64%. The research results reveal the prediction accuracy and validity of the proposed model, which indicate that the QSFA-RVR modeling approach can be a promising and efficient methodology for industrial MI prediction.