Ensemble Just-In-Time Learning-Based Soft Sensor for Mooney Viscosity Prediction in an Industrial Rubber Mixing Process

)e lack of online sensors for Mooney viscosity measurement has posed significant challenges for enabling efficient monitoring, control, and optimization of industrial rubber mixing process. To obtain real-time and accurate estimations ofMooney viscosity, a novel soft sensor method, referred to as multimodal perturbation(MP-) based ensemble just-in-time learning Gaussian process regression (MP-EJITGPR), is proposed by exploiting ensemble JIT learning. )is method employs perturbations on similarity measure and input variables for generating the diversity of JIT learners. Furthermore, a set of accurate and diverse JIT learners are built through an evolutionary multiobjective optimization by balancing the accuracy and diversity objectives explicitly. Moreover, all base JIT learners are combined adaptively using a finite mixture mechanism. )e proposed method is applied to an industrial rubber mixing process for Mooney viscosity prediction, and the experimental results demonstrate its effectiveness and superiority over traditional soft sensor methods.


Introduction
Rubber mixing is a crucial step in rubber and tire industry. e quality of rubber products highly depends on the exact mixing of raw materials and additives.
e Mooney viscosity, indicating the molecular weight and viscoelastic behavior of an elastomer, has been recognized as an important quality index for producing nonvulcanized rubbery materials [1,2]. In most rubber and tire factories, however, the Mooney viscosity can only be determined through manual analysis, which often takes 4∼6 h after a batch has been discharged, while the duration of a batch run of mixing process is only about 2∼5 min. erefore, in recent years, soft sensor methods have been widely applied to provide real-time estimations of the Mooney viscosity to obtain the optimal and uniform rubber product quality [3][4][5][6][7][8][9][10][11][12].
In general, it is time-consuming and even impossible to build accurate first-principles soft sensors for Mooney viscosity due to the lack of in-depth chemical and physical knowledge of rubber mixing process. Alternatively, datadriven soft sensors have attracted much attention because of the availability of large amounts of data and advanced data mining and analytics tools [13][14][15][16][17][18]. e early attempts to data-driven soft sensors for quality estimation mainly focus on global modeling techniques, such as multivariate statistical techniques [19,20], artificial neural networks [3,21], support vector regression [22], and Gaussian process regression [4,5]. Recently, deep learning methods have also been introduced to soft sensor applications [23].
However, global soft sensor methods cannot always function well because they may fail to handle local process characteristics effectively and perform model adaptation efficiently.
us, in practical applications, local learningbased soft sensor methods are more appealing for providing accurate predictions. Compared to global modeling, local learning exhibits two outstanding advantages. First, global soft sensors are usually based on the underlying assumption of a constant operating phase and conditions throughout the entire duration of a production process, whereas industrial processes are often characterized by strong nonlinearity, and multiple operation phases or modes. us, from the process industry viewpoint, local learning is more suitable for handling complex process characteristics. Second, since local learning is completely localized, the local models can be built and updated independently of each other, which greatly simplifies the incremental adaptation, inclusion, or removal of local models and receptive fields.
Generally, there are two categories of local learning methods: ensemble methods [24][25][26][27] and just-in-time learning (JIT) methods [6,7,9,12]. ese methods employ the divide-and-conquer philosophy to model the relationships between the inputs and output by building a set of locally valid models. In particular, JIT learning, known as a representative local learning paradigm, has gained growing attention in soft sensor applications for Mooney viscosity estimation due to its strong capability of handling nonlinearity, time-varying behavior, multiphase and multimode process characteristics, etc. [9,11,12]. However, traditional JIT soft sensors attempt to build a globally optimal encapsulation of local modeling techniques, similarity measures, input variables, model hyper-parameters, etc., while the diversity of JIT learning is ignored. To tackle this problem, various ensemble JIT learning (EJIT) soft sensors have been developed [6,8,25,[28][29][30][31].
e basic idea of EJIT modeling is to build multiple component JIT learners and then combine their predictions. For instance, Liu et al. (2012) [28] employed heterogeneous predictive models to build base JIT models and then combined them through a simple averaging rule. Liu and Gao (2015) [6] developed an EJIT soft sensor by using diverse similar data sets, which are obtained by assigning diverse hyperparameters to the support vector clustering for outlier detection. Kaneko and Funatsu (2016) [25] developed an ensemble locally weighted partial least squares (LWPLS) soft sensor, where diverse subsets are first built using moving window method and then multiple of the most relevant ones to the query state are selected to build diverse LWPLS models, which are finally integrated via Bayes' theorem.   [29] built an EJIT kernel learning framework through perturbing the hyperparameters of local learning methods. Yuan et al. (2018) [30] developed an EJIT soft sensor by using different similarity measures. Besides, we proposed an EJIT soft sensor by perturbing the input features for building diverse input subspaces [8]. Recently, we developed an EJIT soft sensor by employing multiple weighted Euclidean distance-(WED-) based similarity measures, which are optimized through an EMO approach [31]. ese studies show that it is feasible and effective to enhance the prediction accuracy of JIT soft sensors by introducing ensemble learning.
However, it remains challenging to build highperformance EJIT soft sensors due to the following issues. First, many current EJIT soft sensor methods only consider single perturbation, such as perturbing training data [25], similarity measure [30,31], perturbing input variables [8], perturbing local modeling technique [28], or perturbing model parameters [29]. In practice, the diversity of JIT learning is often originated from multiple factors. Second, most of the current EJIT methods construct base JIT learners in a heuristic way. In such situations, the accuracy and diversity objectives of JIT learners are difficult to achieve a good tradeoff. Finally, most methods employ nonadaptive weightings for the combination of base JIT learners, which will limit the prediction performance of EJIT soft sensors.
To address the aforementioned issues, a novel EJIT soft sensor method, referred to as multimodal perturbationbased EJITGPR (MP-EJITGPR), is proposed for enabling accurate predictions of Mooney viscosity.
is method works through integrating perturbation on similarity measures and perturbation on input variables together. With the multimodal perturbation mechanism, a set of accurate and diverse JIT learners are built by balancing the accuracy and diversity objectives explicitly through an evolutionary multiobjective optimization (EMO) approach. en, a finite mixture mechanism-(FMM-) based weighting stagey is used to achieve an adaptive combination of base learners. In summary, the main contributions of this study are as follows: (1) A multimodal perturbation mechanism is proposed by utilizing heterogeneous similarity measures and building diverse input subspaces, which allows enhancing the diversity of base JIT learners efficiently (2) e generation of accurate and diverse JIT learners is formulated as a multiobjective optimization problem and then solved by an EMO approach (3) e combination of base JIT learners is achieved through the finite mixture mechanism, which enables adaptive assignments of weights (4) A novel EJIT soft sensor modeling framework is built by integrating the multimodal perturbation mechanism-based diversity creation, the EMO-based generation of base JIT learners, and the FMM-based adaptive combination of base JIT learners e rest of the paper proceeds as follows. Section 2 briefly introduces JIT learning and Gaussian process regression. Section 3 details the proposed MP-EJITGPR soft sensor method and its implementation procedure. e application of MP-EJITGPR for Mooney viscosity prediction in an industrial rubber mixing process is reported in Section 4. Finally, conclusions are drawn in Section 5.

Just-In-Time
Learning. JIT learning [32], also known as lazy learning [33] and locally weighted learning [34], refers to a family of algorithms in which all historical data are stored in a database and local models are built dynamically by retrieving the most similar data to the query state. Compared to conventional global modeling methods, JIT learning has the following features: (1) All available modeling data are stored at a database.
And only those samples most similar to the query point are used for modeling for each run of prediction. (2) Only when an estimation is required, a local model is built dynamically based on samples with high similarities to the query point.

Advances in Polymer Technology
(3) e constructed local model is discarded after the estimation is given.

Gaussian Process Regression.
A Gaussian process is a collection of random variables, any finite number of which follows joint Gaussian distributions [35]. Considering a data set D � X, y � x i , y i n i�1 , the regression model can be formulated as where f(·) represents an unknown regression function and ε denotes the Gaussian noise with zero mean and variance σ 2 n . A Gaussian process is completely specified by its mean function m(x) and covariance function C(x, x ′ ): Since the modeling data is usually normalized to be zero mean, the output observations follow a Gaussian distribution as where C is an n × n covariance matrix with C ij � C(x i , x j ) representing the ijth element. In this study, a Matérn covariance function with noise term is adopted: where Θ � σ 2 f , l, σ 2 n are the hyperparameters, l is the input scale, σ 2 n is the noise variance, σ 2 f is the output scale, and δ ij � 1; i � j 0; i ≠ j . e hyperparameters Θ are determined by Bayesian inference. Given a query data x new , the training outputs y and the test output y new follow a joint Gaussian distribution as follows: where en, the prediction output y new and variance σ 2 new can be calculated as

Proposed MP-EJITGPR Soft Sensor
In this section, a multimodal perturbation-based ensemble just-in-time learning Gaussian process regression (MP-EJITGPR) is presented. First, data preprocessing is conducted on the three-way data matrices of industrial rubber mixing process. en, heterogeneous similarity measures are defined. Furthermore, by introducing the multimodal perturbation mechanism, a set of accurate and diverse JIT learners, which are equipped with heterogeneous similarity measures and diverse subspaces, are generated through an EMO approach. Next, a finite mixture mechanism is employed to achieve an adaptive combination of base JIT learners. Finally, the implementation procedure of MP-EJITGPR is provided.

Data Preprocessing.
Typically, the online data of those easy-to-measure variables in industrial rubber mixing process can be arranged in a three-way matrix X(I × J × K) consisting of J process variables measured at K points for I batches. Meanwhile, the quality variable (i.e., Mooney viscosity), which is only available at the end of the batch, i.e., time point K, can be expressed as y K (I × 1). Before soft sensor modeling, it is desirable to perform essential processing. First, the process data is first preprocessed using a simple 3σ rule for outlier detection. en, the three-way data matrix X(I × J × K) is unfolded into a two-way matrix, which allows utilizing the standard regression techniques for building the predictive model between online measured variables and the end-use quality variable. Generally, X can be unfolded in six different ways [36], among which batchwise unfolding is employed in this study, as illustrated in Figure 1. In practice, this way of unfolding has been actually recognized to be the most meaningful one for analysis and monitoring of batch processes. With this way of unfolding, all potential input variables at different time instants can be obtained for predicting the final quality variable. Moreover, since the dimensions and magnitudes of various process variables are significantly different from each other, another crucial step to guarantee the reliability and accuracy of soft sensors is data normalization, which is achieved by scaling the unfolded data matrices to zero mean and unit variance in this study.

Definition of Heterogeneous Similarity Measures.
Similarity measure plays a central role in JIT modeling. In contrast to traditional modeling methods utilizing all available data, the JIT method is to construct a local model based on a small data set with high similarities to the query data. us, the key to building highly accurate JIT soft sensors is to define appropriate similarity metrics. Despite the availability of various similarity measures, it is hopeless to pursue a similarity evaluation criterion, which is consistently better than other metrics across different application scenarios. Consequently, it is a common practice to select one of the best similarity measures for a given task, which is usually time-consuming and even impossible. In practice, different similarity measures can provide different insights into similarity evaluation between data points. us, in this work, heterogeneous similarity measures will be combined for JIT learning, including Euclidean distance-(ED-) based similarity, cosine similarity (cosine), covariance weighted distance-(CWD-) based similarity, and correlation coefficient-(CC-) based similarity. e ED similarity measure is the most commonly used metric for JIT learning due to its simplicity and efficiency. It is defined based on the Euclidean distance between points in space; that is,

Advances in Polymer Technology
where σ d is the standard deviation of d i (i � 1, 2, . . . , n) and φ is a localization parameter. One disadvantage of ED similarity is that the differences among input variables are ignored. To address this issue, various weighted distance-based similarity measures have been proposed, among which the CWD similarity measure is defined by considering the relationships among input variables and among input and output variables. [37] at is, CWD similarity is defined by using the weighted distance metric: where Η denotes a weighting matrix, and X and y are the input and output matrices, respectively. Alternatively, by exploiting the angle between two vectors in space, the cosine similarity measure can be defined as where cos i denotes the cosine value of the angle between two vectors.
In addition to the distance and angle criteria, the relevance between two vectors can also be used to evaluate the similarity between samples. For the sake of simplicity, and without loss of generality, the frequently used correlation coefficient (CC) criterion is used to define similarity measure as follows: where Cov(·, ·) and Var(·) are used to compute the covariance and variance, respectively.
It is noteworthy that the above similarity measures are defined from different points of view, thus behaving differently in different applications. So, one promising solution towards further improving the prediction performance of JIT learning soft sensors is to use heterogeneous similarity measures together, which has not been well investigated and will be discussed in the following section.

Generation of Base JIT Learners through Evolutionary
Multiobjective Optimization. When considering JIT learning as a base learner for ensemble learning, EJIT modeling is essentially one ensemble method. It has been proven both theoretically and empirically that both accuracy and diversity of base learners are crucial to guarantee high ensemble performance [38]. According to the famous bias-variance decomposition [39] and error-ambiguity decomposition [40] theories, the more accurate and the more diverse base learners are, the better the ensemble is. Hence, the success of developing high-performance EJIT soft sensors lies in generating accurate and diverse base JIT learners.
Among various ensemble learning soft sensors, perturbing training data remains dominant for creating diversity, such as clustering [41], moving window [24,27], bootstrapping sampling [42,43], and sequential sampling [44]. However, such data manipulation strategies do not always function well for EJIT modeling because JIT learning only relies on a small subset of relevant samples for each run of prediction and is less sensitive to the randomness injected to the database. Moreover, the perturbation on input variables is often ignored in developing ensemble soft sensors. In addition, many of the current methods for base learner generation are based on heuristic mechanisms without measuring or ensuring diversity among base learners explicitly.
us, in this study, the diversity generation is achieved through the multimodal perturbation mechanism, i.e., perturbing similarity measure and input variables together. en, the generation of accurate and diverse base JIT learners is formulated as a multiobjective optimization problem (MOP). Finally, the MOP problem is solved by using an EMO approach, which leads to a tradeoff between accuracy and diversity objectives. In the following, the  decision variables, optimization objectives, and the adopted EMO approach are detailed.
Suppose M JIT learners, each of which is characterized by one similarity measure and one input subspace, are required for constructing an EJIT model. Since M heterogeneous similarity measures have been defined in advance, the decision variables z only include the selection variables, which indicate whether an input variable is selected or not in order to form a subspace. at is, the decision vector z can be expressed as follows: where M and D denote the number of base JIT learners and that of potential input variables for building subspaces, respectively. θ is a binary variable, which indicates inclusion and exclusion of an input variable by using "1" and "0", respectively. Furthermore, the accuracy and diversity objectives are defined. In this study, the ensemble accuracy is given as the average of individual accuracies. Given the training data set D trn � X trn , y trn and an independent validation set D val � X val , y val , the accuracy objective is estimated as where RMSE m val denotes the root-mean-squared error (RMSE) on the validation set for the m th JIT learner.
In comparison with the accuracy objective, measuring ensemble diversity is not straightforward. Up to now, there is no generally accepted formal formulation and measures for ensemble diversity. us, in this study, the standard deviation of individual prediction outputs is used to evaluate the ensemble diversity. By applying the base JIT learners, f m JIT M m�1 , the prediction outputs of individual JIT learners can be given as where y m val is a column vector denoting the prediction output vector on the validation data using the mth JIT learner.
Let italic R be a row vector denoting the prediction outputs from various base JIT learners for the i th query data in validation set. en, the ensemble diversity can be defined as follows: where N val is the number of validation samples, and σ val,i is the standard deviation of y val,i . To build accurate and diverse base JIT learners, small RMSE avg,val and large σ avg,val are desirable. us, generating accurate and diverse JIT learners can be formulated as a biobjective optimization problem: min RMSE avg,val , 1/σ avg,val .
To solve the MOP problem in equation (15), one of the most famous EMO algorithms, i.e., NSGA-II (nondominated sorting genetic algorithm II), is employed.
Details about the NSGA-II algorithm can be found in [45]. First, the decision vector z is coded as a binary string and an initial population is created. en, the Pareto-optimal solutions can be obtained by performing the following procedure: (iv) Step 4: find the Pareto-optimal solutions from the combined population in the last generation by applying the nondominated sorting method.
e outcome of this step is a set of Pareto-optimal solutions, one of which is selected for the ensemble construction of MP-EJITGPR modeling. can be obtained, where f JIT is built by the GPR method in this paper. When a query data is requested to be predicted, each base JIT learner makes a prediction for the output variable. To get the final predictions, these individual predictions have to be combined.

Adaptive Combination of Base JIT Learners by Finite
Generally, there are two classes of combination methods: nonadaptive and adaptive weightings. In the former, weights Advances in Polymer Technology assigned to base learners remain unchanged once deployed into the real-life operation, whereas in the latter weights are assigned adaptively to accommodate the query process state. e simplest nonadaptive weighting method is the simple averaging rule, which provides the average of individual predictions as the final prediction. Another popular nonadaptive weighting is to determine weights according to their prediction capability on training set or validation set. For instance, weights can be determined by using linear regression methods, such as PCR and PLS. Besides, the combination of base learners can be achieved by learning, i.e., stacking, which usually leads to a nonadaptive combination model. However, a deficiency of nonadaptive combination methods is that they tend to assign larger weights to the models that exhibit excellent prediction on the training set or validation set, which may lead to overestimation or underestimation of weights and thus deteriorate the generalization capability of ensemble models. erefore, adaptive combination strategies are highly appealing.
In this study, a finite mixture mechanism-based adaptive weighting method is proposed to achieve the combination of base learners. For a new query data x new , the predictive distribution of the mth output y m,new of the target variable is estimated from the mth JITGPR model and y m,new follows a Gaussian distribution as follows: where y m,new and σ 2 m,new represent the prediction output and variance using the m th JITGPR model, respectively; and ω m denotes the mixture weights satisfying the following constraints: Since the prediction uncertainty can effectively indicate the confidence level of the output predictions, we assume the mixture weights are inversely proportional to the prediction variances from individual JITGPR models. us, ω m,new can be estimated as follows: where p is an adjustable parameter. e proposed FMM-based combination strategy allows predictions from individual JITGPR models to be combined adaptively at each run of prediction.

Implementation Procedure.
e step-by-step procedure of the proposed MP-EJITGPR soft sensor method for Mooney viscosity prediction is summarized below and the schematic diagram of this approach is illustrated in Figure 2.

Offline Optimization Phase
(a) Collect the process data of the batch process for model training and validation (b) Data processing is performed, including outlier detection, data unfolding, mean-centering, and scaling (c) Formulate the generation of accurate and diverse JITGPR models as a multiobjective optimization problem (MOP) (d) Solve the MOP problem using the EMO approach, i.e., NSGA-II (e) By using one of the best-performing Pareto-optimal solutions, a set of JITGPR models characterized by heterogeneous similarity measures and diverse input subspaces are constructed It is worth noting that the computational load of the proposed MP-EJITGPR method is mainly focused on the offline optimization phase, especially the NSGA-II-based EMO optimization. However, once the learning configurations for generating diverse JIT models have been obtained, the online prediction for new test samples can be conducted fast. is is because, on one hand, if the similarity measures are defined appropriately, only a small number of samples are selected for online local modeling, which enables fast training of diverse JIT models for each query data. On the other hand, the finite mixture mechanism-based adaptive combination is very efficient because only simple calculations are involved. erefore, the proposed approach can be applied for providing real-time estimations of Mooney viscosity in an industrial rubber mixing process.

Application to an Industrial Rubber
Mixing Process e effectiveness and superiority of the proposed MP-EJITGPR soft sensor for Mooney viscosity prediction are demonstrated through an industrial rubber mixing process in China. e methods for comparison are as follows:    Table 1. ese methods can be roughly categorized into five classes: (1) global methods, i.e., PLS and GPR; (2) ensemble methods through training data perturbation, i.e., GMMGPR; (3) JITGPR methods using single similarity measure; (4) EJITGPR methods using similarity perturbation and ensemble learning, i.e., SP-EJITGPR; (5) EJITGPR methods using multimodal perturbation and ensemble learning, i.e., MP-EJITGPR. For those EJIT methods, four types of combination methods are investigated, i.e., simple averaging rule (SAR), PLS stacking, GPR stacking, and finite mixture mechanism (FMM). e modeling data are split into three parts: training set for modeling learning, validation set for parameter tuning and EMO optimization, and testing set for model evaluation.
Moreover, some critical parameters should be predetermined. In detail, the number of principal components for PLS is selected based on the prediction accuracy on the validation set. e local modeling size l and the adjustable parameter p in equation (19) are determined by trial and error. In addition, the optimization settings for NSGA-II are given as follows: population size N pop � 100, and maximum generation size N max gen � 100. To assess the prediction performance of soft sensors, three indices, namely, root-mean-square error (RMSE), relative RMSE (RRMSE), and coefficient of determination (R 2 ), are used: where y i and y i denote the actual and predicted outputs, respectively; y represents the mean value; and n test is the number of testing samples. e computer configurations for experiments are as follows. OS : Windows 10 (64 bit); CPU : Intel (R) Core(TM) i7-6700 (3.4 GHz × 2); RAM : 8G byte; and the simulation software is MATLAB R2016a. e MATLAB codes for running GPR regression can be downloaded from the website: http://www.gaussianprocess.org/gpml/code/ matlab/doc/.

Process Description.
Rubber mixing is a crucial step in the rubber and tire industry. [7,8] e industrial rubber mixing process in this study is practiced in a Chinese tire company.
e industrial production site is illustrated in Figure 3. e rubber mixing process lasts for 2 min, during which various raw materials are fed into the raw rubber to produce synthetic rubber according to the technical formula and then various complex chemical reactions take place in an internal mixer. Generally, Mooney viscosity is a crucial index for monitoring the product quality of the rubber mixing process. However, in a practical production process, Mooney viscosity is only measured through a viscometer with a large delay of 4∼6 h after one batch has discharged. Consequently, it becomes challenging to assure the optimal and uniform quality of mixed rubber. Fortunately, soft sensor technology provides the possibility of estimating Mooney viscosity in real-time. us, we attempt to build a high-performing soft sensor for Mooney viscosity prediction in this study. e process variables used for soft sensor modeling include temperature in the mixer chamber, motor power, ram pressure, motor speed, and energy.

Prediction Results of Mooney Viscosity.
e modeling data have been collected from DCS system and laboratory analysis and are then preprocessed by using a simple 3σ rule for outlier detection, batchwise unfolding, zero mean centering, and one variance scaling. With a sampling interval of 2 s, a total of 1172 batches are selected from three internal mixers and are further divided into three sets: 822 batches as the training set, 175 batches as the validation set, and 175 batches as the testing set. By considering the time instants 0 s, 14 s, 18 s, 22 s, . . ., 118 s, a total of 140 delayed and nondelayed variables are obtained as potential input variables and the Mooney viscosity is chosen as the output variable.
e prediction results of Mooney viscosity from different soft sensor methods are presented in Table 2. It is readily observed that PLS leads to the poorest prediction performance among those methods in terms of RMSE, RRMSE, and R 2 . is is mainly because PLS cannot effectively handle the nonlinearity of rubber mixing process. In comparison, other nonlinear soft sensor methods achieve much better prediction accuracy than PLS.
ough GPR obtains significant accuracy improvement, it still produces high prediction errors due to its failure in dealing with local process characteristics. Instead of relying on a global model, GMMGPR and various JITGPR methods employ local learning philosophy, thus obtaining much better performance than global GPR. Although GMMGPR performs well in this case study, JITGPR methods are more appealing to provide better prediction performance.
However, the prediction accuracy of JITGPR methods is highly related to the similarity measure definition. As can be seen in Table 2, different prediction performance is obtained by using different similarity measures. In real applications, it is difficult to determine which similarity measure performs best in advance. us, a promising idea is to fully exploit the advantages of multiple similarity measures for JIT learning by using ensemble methods. As expected, by introducing ensemble learning, SP-EJITGPR can deliver better prediction results than single similarity measure-based JITGPR methods. It is noteworthy that, however, inappropriate combination methods can lead to performance degradation instead of improvements. Among the compared combination methods, the proposed FMM-based combination successfully obtains significant performance enhancement, while simple averaging rule, PLS stacking, and GPR stacking lead to performance degradation. ese results reveal that the integration of heterogeneous similarity measures and the FMM-based adaptive combination method significantly allows improving the prediction accuracy of JIT learning soft sensors.
Apart from the similarity measure, input variable selection is also critical to guarantee the performance of JIT learning. us, it is interesting to explore whether EJITGPR model using only similarity perturbation can be further improved or not by performing perturbations on similarity measure and input variables simultaneously. As we have expected, when the FMM-based adaptive combination is employed, EJITGPR model using multimodal perturbation, i.e., MP-EJITGPR (FMM), performs better than SP-EJITGPR methods. Once again, the simple averaging rule, PLS stacking, and GPR stacking does not function well in this study because they are nonadaptive. e above observations show that the proposed MP-EJITGPR (FMM) soft sensor method is the best among the compared methods. In addition, as illustrated in Figure 4, the superior prediction performance of MP-EJITGPR (FMM) is further verified by a good agreement between the predicted and actual trend plots of Mooney viscosity.
To further investigate the estimation performance of MP-EJITGPR (FMM) soft sensor, the prediction RMSE values of JITGPR, SP-EJITGPR, and MP-EJITGPR under different local modeling sizes are compared in Figure 5. It can be found that the increase of local modeling samples can lead to prediction accuracy reduction in most cases for all compared methods. In particular, for this case study, the prediction accuracy of JITGPR methods and SP-EJITGPR using small local modeling sizes is significantly better than that using large local modeling sizes. In comparison, the proposed MP-EJITGPR (FMM) is much less sensitive to local modeling size than other methods. erefore, the proposed approach is more desirable than other traditional JIT methods in providing accurate and reliable predictions.
Compared to the traditional JIT learning soft sensors, the outstanding prediction performance of MP-EJITGPR   (FMM) is mainly due to the effective cooperation of multimodal perturbation, EMO optimization, and adaptive combination for ensemble construction. On the one hand, the utilization of heterogeneous similarity measures, i.e., ED, cosine, CWD, and CC similarity metrics, and input variable selection for constructing subspaces shown in Figure 6, can be helpful to generate accurate and diverse base JITGPR models. On the other hand, the accuracy and diversity objectives of base JITGPR models can be well balanced by using an EMO approach. Additionally, the FMM-based adaptive combination scheme allows the proposed MP-EJITGPR method to accommodate the query process state by dynamically assigned weights to base JITGPR models, as illustrated in Figure 7. Moreover, the real-time performance of MP-EJITGPR (FMM) for online prediction is analyzed. e average CPU time for each run of prediction under different local modeling sizes is shown in Figure 8. Clearly, the online computational load becomes large with the increase of local modeling samples. However, only a small relevant subset is    Advances in Polymer Technology required for local modeling, and the prediction taking less than 1 s is completely acceptable in practical application. e obtained application results confirm that the proposed MP-EJITGPR (FMM) soft sensor method outperforms the other traditional JITsoft sensors, implying that it is more suitable for providing accurate predictions of Mooney viscosity in an industrial rubber mixing process.

Conclusions
In this paper, a new soft sensor method MP-EJITGPR is proposed for facilitating accurate estimations of Mooney viscosity in an industrial rubber mixing process.
is method enables to enhance the diversity of base JIT learners through the multimodal perturbation mechanism, i.e., perturbing similarity measure and input variables. Moreover, a group of accurate and diverse base JIT learners is generated by employing an EMO approach to achieve a tradeoff between the accuracy and diversity objectives explicitly. In addition, a finite mixture mechanism is exploited to achieve an adaptive combination of base JIT learners. By integrating the multimodal perturbation-based diversity generation, the EMO optimization-based generation of base JIT learners, and the FMM-based adaptive combination of base learners for EJIT modeling, the proposed MP-EJITGPR method allows providing marked improvement of prediction performance over its conventional counterparts in nonlinear process modeling. e superiority and effectiveness of the proposed approach are demonstrated through the Mooney viscosity prediction of an industrial rubber mixing process.
Besides the presented case study, the proposed method has the potential of addressing other nonlinear modeling issues in process industry. In future research, more efforts are encouraged to extend the library of heterogeneous similarity measures and improve the diversity generation mechanism for building high-performance JIT soft sensors. Moreover, although this paper mainly focuses on manipulating input variables for building diverse input spaces based on evolutional multiobjective optimization approach, exploiting feature extraction by deep learning and making use of unlabeled data by semisupervised learning for improving the prediction performance of soft sensors are also interesting [47]. ese will be investigated in the future.

CC:
Correlation coefficient CWD: Covariance weighted distance ED: Euclidean distance EJIT: Ensemble just-in-time learning EJITGPR: Ensemble just-in-time learning Gaussian process regression EMO: Evolutionary multiobjective optimization FMM: Finite mixture mechanism GPR: Gaussian process regression JIT: Just-in-time learning JITGPR: Just-in-time learning Gaussian process regression MOP: Multiobjective optimization problem MP-EJITGPR: Multimodal perturbation-based ensemble justin-time learning Gaussian process regression NSGA-II: Nondominated sorting genetic algorithm II SAR: Simple averaging rule SP-EJITGPR: Similarity perturbation-based ensemble justin-time learning Gaussian process regression.
Data Availability e industrial data set of rubber mixing process involved in the present study is not allowed to be disclosed for reasons of commercial confidentiality.

Conflicts of Interest
e authors declare no conflicts of financial interest.