Recursive Gaussian Process Regression Model for Adaptive Quality Monitoring in Batch Processes

. Inchemicalbatchprocesseswithslowresponsesandalongduration,itistime-consumingandexpensivetoobtainsufficientnormal dataforstatisticalanalysis.Withthepersistentaccumulationofthenewlyevolvingdata,themodellingbecomesadequategradually andthesubsequentbatcheswillchangeslightlyowingtotheslowtime-varyingbehavior.Toefficientlymakeuseofthesmall amountofinitialdataandthenewlyevolvingdatasets,anadaptivemonitoringschemebasedontherecursiveGaussianprocess (RGP)modelisdesignedinthispaper.Basedontheinitialdata,aGaussianprocessmodelandthecorrespondingSPEstatisticare constructedatfirst.Whenthenewbatchesofdataareincluded,astrategybasedontheRGPmodelisusedtochoosetheproperdata formodelupdating.Theperformanceoftheproposedmethodisfinallydemonstratedbyapenicillinfermentationbatchprocess andtheresultindicatesthattheproposedmonitoringschemeiseffectiveforadaptivemodellingandonlinemonitoring.


Introduction
In modern industries, batch and semibatch processes are of great importance for the production of high-quality and value-added specialty chemicals, such as semiconductors, pharmaceuticals, and biological products.For product consistency and quality improvement, multivariate statistical process monitoring (MSPM) has been developed and widely used in many batch processes [1][2][3][4][5][6][7][8][9][10][11].Nomikos and MacGregor [12,13] are the first to apply multiway principle component analysis (MPCA) and multiway partial least squares (MPLS) to batch process monitoring.MPCA is powerful at analyzing the batch-to-batch variations and monitoring the newly evolving batches with the assumption that the batchwise unfolded data follow a multivariate Gaussian distribution.However, there are still some disadvantages of the conventional MPCA and its extensions.Firstly, MPCA assumes that the length of each batch is equal, which is unlikely to happen in practice.Furthermore, the entire batch data are needed for MPCA for online monitoring.Hence, the unknown future data of a newly evolving batch have to be estimated.Also, it is hard to reveal the timewise variations since the entire batch data are treated as a single object [14].
To solve these problems, variable-wise unfolding approaches were developed [15].This method provides a straightforward scheme for online monitoring and the equal length of the collected batches is not required.
It is noticed that, however, both batchwise and variablewise unfolding methods are based on the sufficient batch data.Abundant data are easy to be collected in the rapid manufacturing process, such as wafer etching, spray-drying, and spray-coating.Nevertheless, it is time-consuming and expensive to obtain data from slow chemical processes, such as emulsion polymerization, fermentation, and pharmaceutical and biotechnical products [15].In this situation, it is not proper to construct monitoring models after all the sufficient data are collected.It is interesting to design a statistical scheme to construct an initial model using limited batch data and then adapt the strategy when newly evolving batch data arrive.Therefore, Zhao et al. [14,16] proposed an adaptive monitoring model based on ICA [14].In their article, multiphase ICA was constructed based on limited reference cycles and an adaptive algorithm was formulated with the accumulation of newly available normal batches.However, their adaptive strategy was complex and time-consuming since the adaptive criterion was not straightforward and the updated model needed to be retrained completely.
Process monitoring is crucial not only to process safety but also to quality improvement.When the quality variables are examined, an input-output model is needed to reveal the quality-relevant variations.Since the batch process data are highly nonlinear, dynamic, and seriously interconnected with uncertainties, it is hard to build the relationship between the process variables and the quality variables and detect the abnormal production conditions.Therefore, there has been an increasing interest in quality monitoring of batch processes [13,[16][17][18][19].However, their research was also conducted based on the assumption that the normal batch data were sufficient, when the chemical batch processes are operated in a long batch time.Moreover, the process is slow-responding.Thus, only limited batch runs can be used for modelling.With the restricted batch data, the statistical process analysis would be inaccurate.Also, the time-varying behavior of the newly evolving batches may not be completely described based on the data recorded in the early stage of the operating batches.Hence, an adaptive strategy is important for batch quality monitoring, especially in slow processes.
In this paper, a recursive Gaussian process (RGP) model is designed for adaptive quality monitoring with limited initial batch data.A Gaussian process (GP) model is trained at first using limited batch data to construct the relationship between the process and the quality variables.In the GP model, the measure of the confidence level in the prediction is also taken into account.Hence, not only the mean value but also the variance of the quality prediction is supplied.With the estimation of the variance from GP as a level of confidence, the adaptive strategy is proposed to determine when and how the model should be updated.This updating strategy is straightforward and easy to implement.It is helpful to describe the time-varying dynamics over batches.Using the online updating strategy, an adaptive quality monitoring scheme is constructed.
The rest of the paper is organized as follows.The Gaussian process model is revisited in Section 2. Next, the batch process monitoring based on GP using limited batch data is discussed.The SPE statistic and its confidence bound are constructed then.With the accumulation of newly evolving batches, the updating strategy is developed and the implementation of the recursive GP based adaptive quality monitoring in the batch process is discussed in Section 3. Also, an industrial case is provided to demonstrate the efficiency of the proposed method in Section 4. Finally, conclusions are made.

GP Regression Model
In supervised learning, the objective is to infer a distribution over functions ( | X,Y) given the inputs X and outputs Y. Usually, a parametric representation for the function  is assumed.Then the model parameters ( | X, Y) are inferred instead of ( | X,Y).The methods above are called parametric modelling.In nonparametric modelling, without limitations of the model structure, the distribution over functions themselves is inferred.In a GP model, Bayesian inference is used for estimating the distribution of functions.
Given  inputs X = (x 1 , x 2 , . . ., x  )  and outputs Y = ( 1 ,  2 , . . .,   )  , a GP model defines a Gaussian prior distribution as [20] Y ∼  (, K) , where  is the mean value of outputs and K is the covariance function in which K  = (x  , x  ).In the GP model, the data collected are not assumed to follow the same Gaussian distribution any more.On the contrary, all the samples follow a joint Gaussian distribution.Hence, the joint Gaussian distribution of the training data X and the test data x * can be estimated: where K * = (X, x * ) and  = (x * , x * ).According to the property of conditional Gaussian distribution, the posterior distribution of the test data has the following form: It is noticed that it is common to use the mean function as  = 0 since GP is flexible enough to model arbitrary mean value [20].In the kernel function of GP, any positive definite kernel can be used for GP covariance.Generally, three common kernels, linear kernel, Gaussian kernel, and squared-exponential kernel, are mostly used [21].In this paper, Gaussian kernel is employed, which is given by [22] where  controls the vertical length scales of the function and  reflects the horizontal variations.To estimate the kernel parameters, which are called hyperparameters, the empirical Bayes approach is used.Firstly, the marginal log-likelihood is written as [20] Then the log-likelihood is maximized with respect to each hyperparameter as [23] In the GP model, it will take ( 3 ) to compute K −1 and ( 2 ) for hyperparameter gradient estimation.Particularly in the batch process model, the model needs to be updated with the newly evolving batches.With the accumulation of the sample size, the training of the GP model is timeconsuming.In this research work, a RGP model needs to be constructed.

Adaptive Quality Monitoring Based on RGP in Batch Processes
For a slowly varying batch process, all the batch data cannot sufficiently be collected at the initial operation stage.If the monitoring system is not adapted with the newly evolving batches, it will cause nuisance alarms.An online adaptive strategy is proposed to determine when the model needs to be updated and how to adapt the model without retraining the GP model using all the data set.The update criterion is based on the estimation of the variance from GP and the SPE statistic to judge if the newly evolving data are needed for updating.Then the covariance matrix of GP is recursively updated to provide a better representative of the current state of the system.

GP Based Batch Process Monitoring with Limited Batches.
Using limited batch data at first, the data are collected in the form of three-dimensional array.Assume the measured process variables are summarized as X( ×  × ), where  = 1, 2, . . .,  variables are collected at times  = 1, 2, . . .,  and  is the batch number.Simultaneously, the key quality variable is collected as Y(×1×).For simple explanation, only single quality variable is measured, but it is very easy to extend it to multiple quality variables.Also, in this paper, a variablewise unfolding method is used for data preprocessing [15].Using the unfolded matrix X( × ) and Y( × 1), a GP model can be trained as an initial quality monitoring model to identify the early correlations between the process and the quality variables.
The prediction error of the GP model and the corresponding SPE statistic can be calculated as [24]  =   − ŷ , The confidence limit of SPE is approximated by  2 distribution as SPE ∼  ⋅  2  ℎ , in which  and ℎ are the parameters of  2 distribution and they are estimated as [25]  ⋅ ℎ = mean (SPE) , The significance level of SPE statistics is 99%.When a new sample of the process variables x * is collected, the prediction value of quality can be estimated using (3) and the SPE statistic is calculated using (7).Generally, if the SPE statistic exceeds the control limit, it is thought that a fault occurs and vice versa.It is noticed that, however, the accuracy of the statistic is not evaluated because most of the time the collected data are not sufficient in the initial modeling stage.In the GP model, not only the mean value but also the variance of prediction of each sample is provided.With the variance serving as a measure of the confidence for the prediction, the required information can be found and included to enhance the accuracy of the monitoring model.Since the prediction follows a Gaussian distribution, the real value of quality mostly relies on a region covering ±3 of the prediction distribution around the mean.Owing to the uncertainty of prediction, the SPE statistic can also be estimated in a corresponding confidence region instead of a single value.According to the 3 criterion, the upper and lower limit of mean can be estimated as ( ŷUpper ) .

(9)
In this way, the confidence bound of the SPE statistic is constructed to evaluate its reliability.If SPE Mean  is below the control limit while SPE Upper  is above it, the normal state this time becomes questionable.By contrast, in some states there may be false alarms if only SPE Mean  is above the control limit while SPE Lower  indicates it is normal.Thus, the confidence bound of the SPE statistic is helpful for online updating strategy for quality monitoring in batch processes.

Updating Strategy and RGP Based on Adaptive Quality
Monitoring.After the initial monitoring model is built using the early collected normal batches, with the new batch data evolved, one needs to judge if the model should be adjusted to enhance the predictability of the monitoring model.Based on the discussions in the previous section, we have the following.
(1) If the upper bond SPE Upper  of the test data is still below the control limit, there is statistically sufficient evidence to show that it is normal, no more new information is needed, and the original model is good enough without being updated.
(2) If the lower bond SPE Lower  of the test data is larger than the control limit, chances are that almost all the potential values of predictions are far from the real data.In this situation, it is highly doubted that an abnormal variation has occurred, and it is mainly caused by the unknown disturbances instead of the normal time-varying behavior.
(3) If the control limit falls in the region of the SPE statistic, in which it is larger than SPE Lower  and smaller than SPE Upper  , the cases become indistinct.The result indicates that part of the potential values has exceeded the control limited while others have not.Hence, the part of false alarm results from the failure of initial models and an updating model is needed to adapt the model to the state of current batches.Figure 1 illustrates the updating strategy of our method and the relationship between SPE Lower

𝑖 and SPE
Upper  of a test data and the control limit.In Figure 1, the value of -coordinate is the control limit based on the mean prediction error of the normal data.Hence, when the control limit is smaller than SPE Lower , all the test data are defined as faults; when the control limit is larger than SPE Upper , it indicates that almost all the possible prediction values fall in the normal region and all the test data would be normal.
When the test data are added for updating the model, a recursive method is used.Suppose  inputs X  = (x where K +1 = (X +1 , X +1 ).The updated covariance K +1 can be directly calculated using  + 1 data which are computationally demanded.In this paper, a recursive method is employed using Sherman-Morrison-Woodbury formula to estimate K +1 by the covariance of the first  samples [26].
The updated covariance is calculated as where Γ +1 = (X  , x +1 ) and  +1 = (x +1 , x +1 ).Hence, when the new data need to be added, the recursive formula for K −1 +1 can be used.After the model has been updated, the corresponding monitoring system can be set up as GP.
To sum up, the procedures of adaptive quality monitoring based on RGP are listed as follows.

Initial Phase
(1) Collect the historical normal batch data even if there are a limited number of operation batches.(2) Unfold the data using variable-wise unfolding and train an initial GP model.(3) Calculate the SPE statistic and the control limit based on the currently available data.Estimate the data regions using ( 7) and (9).(5) Update the covariance using (11) and calculate the new SPE statistic and the region of the control limit based on new SPE Lower

𝑖 and SPE
Upper  in the updated model.

Monitoring Phase
(6) Check if the current SPE statistic exceeds the control limit.
For batch-size updating, the region of control limits SPE Lower

and SPE
Upper  is estimated based on all the useful data after one batch run.Then, the active data are chosen in a newly evolving batch to update the model.For sample-size updating, during the batch run, the region of control limit will be changed as soon as the model is updated.Compared with the two updating strategies, the selected active data in the batch-size updating may be similar and redundant.Thus, in this paper, sample-size updating is applied since it is more time-saving.
Case Study.In this section, a fed-batch penicillin benchmark is illustrated to evaluate the performance of the proposed method.In a typical operating process for penicillin fermentation, the microorganisms are generated in certain circumstances during the initial stage.When the substrate has been consumed by the microorganisms, the penicillin is produced as the metabolite.The concentration of microorganisms reaches the stationary phase.During the penicillin production stage, glucose is fed continuously to achieve a high product formation rate due to the catabolized repressor effect [27,28].The batch data are carried out using a simulator (PenSim v2.0) developed by the monitoring and control group at the Illinois Institute of Technology (http://www.chee.iit.edu/∼cinar).The flow diagram of the penicillin fermentation process is given in Figure 2.
In this paper, the duration of every batch is 400 hours and the sampling intervals of the process variables are set to be 1 hour.For modelling, a total of 11 key process variables are selected.They are tabulated in Table 1.The concentration of penicillin is chosen as the quality variable.It is noticed that the sampling interval of quality cannot be 1 hour in reality as the concentration needs to be examined offline at the lab generally.Therefore, to mimic the real situation, it is assumed that the quality data are recorded every 20 h.To build the input-output relationship using the unequal size of the process variables and the quality variables, some research has been done [29].Since it is not the main concern of this paper, the simple downsampling method is used here to equalize the sampling frequency of inputs and outputs.Besides, the measured variables are added by Gaussian noises (NID(0, 0.02 2 )) to simulate the real process environment.Forty normal batches are generated to demonstrate the proposed method.To simulate the slow time-varying behavior among batches, the culture volume of these normal batches is set to change gradually from 100 L to 120 L.Although the culture volume has changed, it is still considered that these batches are normal and the variations over batches can be tolerated.For constructing initial model, 5 batches data are    To test the efficiency of the proposed method, the prediction result using the initial GP model without updating is firstly discussed.With the slow time-varying dynamics, the final quality production has changed slightly from one batch after another.If the initial model is still used to predict all the following bathes, the prediction of mean will drift from the real value gradually (Figure 4).Moreover, even ±3 of the  prediction distribution cannot cover the real value in the last several batches.In this situation, the SPE statistics based on the prediction error have several false alarms (Figure 5).The false alarm rate is the percentage of regular samples as faulty ones for the normal part (Type I error).By contrast, with the new batches data, the proposed RGP model can judge when the model should be updated and which data should be selected.Thus, the time-varying dynamics are reflected by the updating model and the prediction performance is improved.
In Figure 6, it is found that the predicted value is close to the real data and the covariance shadow is narrow, which indicates the result is reliable.Based on the prediction result, the SPE statistic can be calculated for all the 35 bathes.As shown in Figure 7, there are almost no false alarms during testing all the samples.To sum up, the RMSE values and false alarms of these two models listed in Table 2 indicate that the accuracy of predicting the monitoring performance is improved by recursive GP.RMSE (root-mean-square error) is the sample standard deviation of the differences between predicted values and observed values.Finally, for a clearer explanation of the updating model procedures, the 6th batch data are demonstrated particularly.In Figure 8(a), it can be found that the prediction becomes inaccurate staring from the 25th sampling interval if the original model is still used.The quality monitoring results are shown in Figure 9

Conclusion
In this paper, an adaptive quality monitoring scheme is proposed based on the RGP model.It is mainly used for the slow time-varying batch process with limited initial batches.The cases discussed in this paper are commonly seen in fine chemicals.Using the initial normal data, a GP model and the corresponding SPE statistic can be constructed for revealing the variations of early batches.With the accumulation of the newly evolving data, an updating strategy is introduced to judge if the new data should be added so that the model can be enhanced.Then a recursive GP model is used for implementation of updating.The proposed method is efficient since it avoids the computation of the inversion of the covariance matrix in the GP model.It is helpful for online updating.The feasibility and the monitoring performance have been demonstrated by a real penicillin fermentation batch process.Among different batches, the operating state has changed slowly and the proposed method can make an accurate prediction and right monitoring decision over batches.Therefore, the proposed adaptive monitoring scheme is an effective tool for solving the slow time-varying behavior of the process.The proposed method is mainly used for slowresponding chemical processes; the corresponding model changes slightly during the operation.In this situation, only

Figure 1 :
Figure 1: Relationship between the SPE statistics and the control limit.

( 4 )
Calculate the SPE statistic for new batch data one time point by one time point, and judge if each sampling data point needs to be included for updating the model.If yes, go to Step (5); otherwise, go to Step (6).

Figure 2 :
Figure 2: Flow diagram of the penicillin fermentation process.
first used.According to the downsampling method, a GP model can be constructed based on the process variables and the quality variables X(5 × 11 × 20) and Y(5 × 1 × 20).The remaining 35 batches are employed for updating the test.The prediction result of GP training is illustrated in Figure 3.The solid curve represents the prediction of the mean and the red dashed line represents the real data.The shade region denotes ±3 of the prediction distribution around the mean value.It can be seen that the shadow is very thin, indicating that the training model is quite accurate and representative.

Figure 3 :
Figure 3: Prediction result of the GP model using initial batches.

Figure 4 :
Figure 4: Prediction result of the test data using the initial GP model.

Figure 5 :Figure 6 :Figure 7 :Figure 8 :Figure 9 :
Figure 5: The SPE statistic of the test data using the initial GP model.
−   )  ( ŷ −   ) can also be constructed based on the prediction error between mean value and the real value.It is noticed that SPE are not the real bounds of the SPE statistic because of the quadratic property.However, it is easy to prove that the upper and lower limits can be determined by SPEMean 1 , x 2 , . . ., x  )  and outputs Y  = ( 1 ,  2 , . . .,   )  are used for training the initial GP model; the new data (x +1 ,  +1 ) are added for model updating.Then the new  + 1 data Y +1 = ( 1 ,  2 , . . .,   ,  +1 )  follow a new joint Gaussian distribution as

Table 1 :
Process variables used in the modeling of the fed-batch penicillin process.

Table 2 :
Mean prediction and the process monitoring result in the penicillin process.