Integrated Multiscale Latent Variable Regression and Application to Distillation

Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions), which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR) techniques, such as principal component regression (PCR), partial least squares (PLS), and regularized canonical correlation analysis (RCCA). Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of thesemodels. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR) modeling algorithm that integratesmodeling and feature extraction.The idea behind the IMSLVRmodeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVRmodel that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.


Introduction
In the chemical process industry, models play a key role in various process operations, such as process control, monitoring, and scheduling.For example, the control of a distillation column requires the availability of the distillate and bottom stream compositions.Measuring compositions online is very challenging and costly; therefore, these compositions are usually estimated (using inferential models) from other process variables, which are easier to measure, such as temperature, pressure, flow rates, heat duties, and others.However, there are several challenges that can affect the accuracy of these inferential models, which include the presence of collinearity (or redundancy among the variables) and the presence of measurement noise in the data.
The presence of collinearity, which is due to the large number of variables associated with inferential models, increases the uncertainty about the estimated model parameters and degrades its prediction accuracy.Latent variable regression (LVR), which is a commonly used framework in inferential modeling, deals with collinearity among the variables by transforming the variables so that most of the data information is captured in a smaller number of variables that can be used to construct the model.In fact, LVR models perform regression on a small number of latent variables that are linear combinations of the original variables.This generally results in well-conditioned models and good predictions [1].LVR model estimation techniques include principal component regression (PCR) [2,3], partial least squares (PLS) [2,4,5], and regularized canonical correlation analysis (RCCA) [6][7][8][9].PCR is performed in two main steps: transform the input variables using principal component analysis (PCA), and then construct a simple model relating the output to the transformed inputs (principal components) using ordinary least squares (OLS).Thus, PCR completely ignores the output(s) when determining the principal components.Partial least squares (PLS), on the other hand, transform the variables taking the input-output relationship into account by maximizing the covariance between the output and the transformed input variables.That is why PLS has been widely utilized in practice, such as in the chemical industry to estimate distillation column compositions [10][11][12][13].Other LVR model estimation methods include regularized canonical correlation analysis (RCCA).RCCA is an extension of another estimation technique called canonical correlation analysis (CCA), which determines the transformed input variables by maximizing the correlation between the transformed inputs and the output(s) [6,14].Thus, CCA also takes the inputoutput relationship into account when transforming the variables.CCA, however, requires computing the inverses of the input covariance matrix.Thus, in the case of collinearity among the variables or rank deficiency, regularization of these matrices is performed to enhance the conditioning of the estimated model and, thus, is referred to as regularized CCA (RCCA).Since the covariance and correlation of the transformed variables are related, RCCA reduces to PLS under certain assumptions.
The other challenge in constructing inferential models is the presence of measurement noise in the data.Measured process data are usually contaminated by random and gross errors due to normal fluctuations, disturbances, instrument degradation, and human errors.Such errors mask the important features in the data and degrade the prediction ability of the estimated inferential model.Therefore, measurement noise needs to be filtered for improved model's prediction.Unfortunately, measured data are usually multiscale in nature, which means that they contain features and noise with varying contributions over both time and frequency [15].For example, an abrupt change in the data spans a wide range in the frequency domain and a small range in the time domain, while a slow change spans a wide range in the time domain and a small range in the frequency domain.Filtering such data using conventional low pass filters, such as the mean filter (MF) or exponentially weighted moving average (EWMA) filter, does not usually provide a good noise-feature separation because these filtering techniques classify noise as high frequency features and filter the data by removing all features having frequencies higher than a defined threshold.Thus, modeling multiscale data requires developing multiscale modeling techniques that can take this multiscale nature of the data into account.
Many investigators have used multiscale techniques to improve the accuracy of estimated empirical models [16][17][18][19][20][21][22][23][24][25][26][27].For example, in [17], the authors used multiscale representation of data to design wavelet prefilters for modeling purposes.In [16], on the other hand, the author discussed the advantages of using multiscale representation in empirical modeling, and in [18], he developed a multiscale PCA modeling technique and used it in process monitoring.Also, in [19,20,23], the authors used multiscale representation to reduce collinearity and shrink the large variations in FIR model parameters.Furthermore, in [21,24], multiscale representation was utilized to enhance the prediction and parsimony of fuzzy and ARX models, respectively.In [22], the author extends the classic single-scale system identification tools to the description of multiscale systems.In [25], the authors developed a multiscale latent variable regression (MSLVR) modeling algorithm by decomposing the input-output data at multiple scales using wavelet and scaling functions and then constructing multiple latent variable regression models at multiple scales using the scaled signal approximations of the data.Note that in this MSLVR approach [25], the LVR models are estimated using only the scaled signals and thus neglect the effect of any significant wavelet coefficients on the model input-output relationship.Later, the same authors extended the same principle to construct nonlinear models using multiscale representation [26].Finally, in [27], wavelets were used as modulating functions for control-related system identification.Unfortunately, the advantages of multiscale filtering have not been fully utilized to enhance the prediction accuracy of the general class of latent variable regression (LVR) models (e.g., PCR, PLS, and RCCA), which is the focus of this paper.
The objective of this paper is to utilize wavelet-based multiscale filtering to enhance the prediction accuracy of LVR models by developing a modeling technique that integrates multiscale filtering and LVR model estimation.The sought technique should provide improvement over conventional LVR methods as well as those obtained by prefiltering the process data (using low pass or multiscale filters).
The remainder of this paper is organized as follows.In Section 2, a statement of the problem addressed in this work is presented, followed by descriptions of several commonly used LVR model estimation techniques in Section 3. In Section 4, brief descriptions of low pass and multiscale filtering techniques are presented.Then, in Section 5, the advantages of utilizing multiscale filtering in empirical modeling are discussed, followed by a description of an algorithm, called integrated multiscale LVR modeling (IMSLVR), that integrates multiscale filtering and LVR modeling.Then, in Section 6, the performance of the developed IMSLVR modeling technique is assessed through three examples, two simulated examples using synthetic data and distillation column data, and one experimental example using practical packed bed distillation column data.Finally, concluding remarks are presented in Section 7.

Problem Statement
This work addresses the problem of enhancing the prediction accuracy of linear inferential models (that can be used to estimate or infer key process variables that are difficult or expensive to measure from more easily measured ones) using multiscale filtering.All variables, inputs and outputs, are assumed to be contaminated with additive zero-mean Gaussian noise.Also, it is assumed that there exists a strong collinearity among the variables.Thus, given noisy measurements of the input and output data, it is desired to construct a linear model with enhanced prediction ability (compared to existing LVR modeling methods) using multiscale data filtering.A general form of a linear inferential model can be expressed as where X ∈ R × is the input matrix, y ∈ R ×1 is the output vector, b ∈ R ×1 is the unknown model parameter vector, and  ∈ R ×1 is the model error, respectively.Multiscale filtering has great feature extraction properties as will be discussed in Sections 4 and 5.However, modeling prefiltered data may result in the elimination of modelrelevant information from the filtered input-output data.Therefore, the developed multiscale modeling technique is expected to integrate multiscale filtering and LVR model estimation to enhance the prediction ability of the estimated LVR model.Some of the conventional LVR modeling methods are described next.

Latent Variable Regression (LVR) Modeling
One main challenge in developing inferential models is the presence of collinearity among the large number of process variables associated with these models, which affects their prediction ability.Multivariate statistical projection methods such as PCR, PLS, and RCCA can be utilized to deal with this issue by performing regression on a smaller number of transformed variables, called latent variables (or principal components), which are linear combinations of the original variables.This approach, which is called latent variable regression (LVR), generally results in well-conditioned parameter estimates and good model predictions [1].
In this section, descriptions of some of the well-known LVR modeling techniques, which include PCR, PLS, and RCCA, are presented.However, before we describe these techniques, let us introduce some definitions.Let the matrix D be defined as the augmented scaled input and output data, that is, D = [Xy].Note that scaling the data is performed by making each variable (input and output) zero-mean with a unit variance.Then, the covariance of D can be defined as follows [9]: where the matrices C XX , C Xy , C yX , and C yy are of dimensions ( × ), ( × 1), (1 × ), and (1 × 1), respectively.
Since the latent variable model will be developed using transformed (latent) variables, let us define the transformed inputs as follows: where z  is the th latent input variable ( = 1, . . ., ), and a  is the th input loading vector, which is of dimension ( × 1).

Principal Component Regression (PCR)
. PCR accounts for collinearity in the input variables by reducing their dimension using principal component analysis (PCA), which utilizes singular value decomposition (SVD) to compute the latent variables or principal components.Then, it constructs a simple linear model between the latent variables and the output using ordinary least square (OLS) regression [2,3].Therefore, PCR can be formulated as two consecutive estimation problems.First, the loading vectors are estimated by maximizing the variance of the estimated principal components as follows: â = arg max a  var (z  ) ( = 1, . . ., ) , which (because the data are mean centered) can also be expressed in terms of the input covariance matrix C XX as follows: ( The solution of the optimization problem (5) can be obtained using the method of Lagrangian multiplier, which results in the following eigenvalue problem [3,28]: which means that the estimated loading vectors are the eigenvectors of the matrix C XX .Secondly, after the principal components (PCs) are computed, a subset (or all) of these PCs (which correspond to the largest eigenvalues) are used to construct a simple linear model (that relates these PCs to the output) using OLS.Let the subset of PCs used to construct the model be defined as where  ≤ , then the model parameters relating these PCs to the output can be estimated using the following optimization problem: which has the following closed-from solution, Note that if all the estimated principal components are used in constructing the inferential model (i.e.,  = ), then PCR reduces to OLS.Note also that all principal components in PCR are estimated at the same time (using ( 6)) and without taking the model output into account.Other methods that take the input-output relationship into consideration when estimating the principal components include partial least squares (PLS) and regularized canonical correlation analysis (RCCA), which are presented next.

Partial Least Squares (PLS)
. PLS computes the input loading vectors, a  , by maximizing the covariance between the estimated latent variable ẑ and model output, y, that is, [14,29], where  = 1, . . ., ,  ≤ .Since z  = Xa  and the data are mean centered, (9) can also be expressed in terms of the covariance matrix C Xy as follows: The solution of the optimization problem (10) can be obtained using the method of Lagrangian multiplier, which leads to the following eigenvalue problem [3,28]: which means that the estimated loading vectors are the eigenvectors of the matrix (C Xy C yX ).
Note that PLS utilizes an iterative algorithm [14,30] to estimate the latent variables used in the model, where one latent variable or principal component is added iteratively to the model.After the inclusion of a latent variable, the input and output residuals are computed and the process is repeated using the residual data until a cross-validation error criterion is minimized [2,3,30,31].

Regularized Canonical Correlation Analysis (RCCA).
RCCA is an extension of a method called canonical correlation analysis (CCA), which was first proposed by Hotelling [6].CCA reduces the dimension of the model input space by exploiting the correlation among the input and output variables.The assumption behind CCA is that the input and output data contain some joint information that can be represented by the correlation between these variables.Thus, CCA computes the model loading vectors by maximizing the correlation between the estimated principal components and the model output [6][7][8][9], that is, where  = 1, . . ., ,  ≤ .Since the correlation between two variables is the covariance divided by the product of the variances of the individual variables, ( 12) can be written in terms of the covariance between z  and y subject to the following two additional constraints: â  C XX â = 1 and C yy = 1.Thus, the CCA formulation can be expressed as follows: Note that the constraint (C yy = 1) is omitted from (13) because it is satisfied by scaling the data to have zero-mean and unit variance as described in Section 3. Since the data are mean centered, ( 13) can be written in terms of the covariance matrix C Xy as follows: The solution of the optimization problem ( 14) can be obtained using the method of Lagrangian multiplier, which leads to the following eigenvalue problem [14,28]: which means that the estimated loading vector is the eigenvector of the matrix C −1 XX C Xy C yX .Equation (15) shows that CCA requires inverting the matrix C XX to obtain the loading vector, a  .In the case of collinearity in the model input space, the matrix C XX becomes nearly singular, which results in poor estimation of the loading vectors, and thus a poor model.Therefore, a regularized version of CCA (called RCCA) has been developed to account for this drawback of CCA [14].The formulation of RCCA can be expressed as follows: The solution of the optimization problem ( 16) can be obtained using the method of Lagrangian multiplier, which leads to the following eigenvalue problem [14]: which means that the estimated loading vectors are the eigenvectors of the matrix Note from (17) that RCCA deals with possible collinearity in the model input space by inverting a weighted sum of the matrix C XX and the identity matrix, that is, [(1−  )C XX +  I], instead of inverting the matrix C XX itself.However, this requires knowledge of the weighting or regularization parameter   .We know, however, that when   = 0, the RCCA solution (17) reduces to the CCA solution (15), and when   = 1, the RCCA solution (17) reduces to the PLS solution (11) since C yy is a scalar.

3.3.1.
Optimizing the RCCA Regularization Parameter.The above discussion shows that depending on the value of   , where 0 ≤   ≤ 1, RCCA provides a solution that converges to CCA or PLS at the two end points, 0 or 1, respectively.In [14], it has been shown that RCCA can provide better results than PLS for some intermediate values of   between 0 and 1.Therefore, in this section, we propose to optimize the performance of RCCA by optimizing its regularization parameter by solving the following nested optimization problem to find the optimum value of   : The inner loop of the optimization problem shown in (18) solves for the RCCA model prediction given the value of the regularization parameter   , and the outer loop selects the value of   that provides the least cross-validation mean square error using unseen testing data.
Note that RCCA solves for the latent variable regression model in an iterative fashion similar to PLS, where one latent variable is estimated in each iteration [14].Then, the contributions of the latent variable and its corresponding model prediction are subtracted from the input and output data, and the process is repeated using the residual data until an optimum number of principal components or latent variables are used according to some cross-validation error criterion.

Data Filtering
In this section, brief descriptions of some of the filtering techniques which will be used later to enhance the prediction of LVR models are presented.These techniques include linear (or low pass) as well as multiscale filtering techniques.
4.1.Linear Data Filtering.Linear filtering techniques filter the data by computing a weighted sum of previous measurements in a window of finite or infinite length and are called finite impulse response (FIR) and infinite impulse response (IIR) filters.A linear filter can be written as follows: where ∑    = 1, and  is the filter length.Well-known FIR and IIR filters include the mean filer (MF) and the exponentially weighted moving average (EWMA) filter, respectively.The mean filter uses equal weights, that is,   = 1/, while the exponentially weighted moving average (EWMA) filter averages all the previous measurements.The EWMA filter can also be implemented recursively as follows: where   and ŷ are the measured and filtered data samples at time step ().The parameter  is an adjustable smoothing parameter lying between 0 and 1, where a value of 1 corresponds to no filtering and a value of zero corresponds to keeping only the first measured point.A more detailed discussion of different types of filters is presented in [32].
In linear filtering, the basis functions representing raw measured data have a temporal localization equal to the sampling interval.This means that linear filters are single scale in nature since all the basis functions have the same fixed time-frequency localization.Consequently, these methods face a tradeoff between accurate representation of temporally localized changes and efficient removal of temporally global noise [33].Therefore, simultaneous noise removal and accurate feature representation of measured signals containing multiscale features cannot be effectively achieved by singlescale filtering methods [33].Enhanced denoising can be achieved using multiscale filtering as will be described next.

Multiscale Data Filtering.
In this section, a brief description of multiscale filtering is presented.However, since multiscale filtering relies on multiscale representation of data using wavelets and scaling functions, a brief introduction to multiscale representation is presented first.

Multiscale Representation of Data.
Any square-integrable signal (or data vector) can be represented at multiple scales by expressing the signal as a superposition of wavelets and scaling functions, as shown in Figure 1.The signals in Figures 1(b), 1(d), and 1(f) are at increasingly coarser scales compared to the original signal shown in Figure 1(a).These scaled signals are determined by filtering the data using a low pass filter of length , h f = [ℎ 1 , ℎ 2 , . . ., ℎ  ], which is equivalent to projecting the original signal on a set of orthonormal scaling functions of the form On the other hand, the signals in Figures 1(c), 1(e), and 1(g), which are called the detail signals, capture the details between any scaled signal and the scaled signal at the finer scale.These detailed signals are determined by projecting the signal on a set of wavelet basis functions of the form or equivalently by filtering the scaled signal at the finer scale using a high pass filter of length , g f = [ 1 ,  2 , . . .,   ], that is derived from the wavelet basis functions.Therefore, the original signal can be represented as the sum of all detailed signals at all scales and the scaled signal at the coarsest scale as follows: where , , , and  are the dilation parameter, translation parameter, maximum number of scales (or decomposition depth), and the length of the original signal, respectively [27,[34][35][36].
Fast wavelet transform algorithms with () complexity for a discrete signal of dyadic length have been developed [37].For example, the wavelet and scaling function coefficients at a particular scale (), a  and d  , can be computed in a compact fashion by multiplying the scaling coefficient vector at the finer scale, a −1 , by the matrices H  and G  , respectively, that is, where, Note that the length of the scaled and detailed signals decreases dyadically at coarser resolutions (higher ).In other words, the length of scaled signal at scale () is half the length of scaled signal at the finer scale ( − 1).This is due to downsampling, which is used in discrete wavelet transform.

Multiscale Data Filtering Algorithm.
Multiscale filtering using wavelets is based on the observation that random errors in a signal are present over all wavelet coefficients while deterministic changes get captured in a small number of relatively large coefficients [16,[38][39][40][41]. Thus, stationary Gaussian noise may be removed by a three-step method [40].
(i) Transform the noisy signal into the time-frequency domain by decomposing the signal on a selected set of orthonormal wavelet basis functions.
(ii) Threshold the wavelet coefficients by suppressing any coefficients smaller than a selected threshold value.
(iii) Transform the thresholded coefficients back into the original time domain.
Donoho and coworkers have studied the statistical properties of wavelet thresholding and have shown that for a noisy signal of length , the filtered signal will have an error within (log ) of the error between the noise-free signal and the signal filtered with a priori knowledge of the smoothness of the underlying signal [39].
Selecting the proper value of the threshold is a critical step in this filtering process, and several methods have been devised.For good visual quality of the filtered signal, the Visushrink method determines the threshold as [42] where  is the signal length and   is the standard deviation of the errors at scale , which can be estimated from the wavelet coefficients at that scale using the following relation: Other methods for determining the value of the threshold are described in [43].

Multiscale LVR Modeling
In this section, multiscale filtering will be utilized to enhance the prediction accuracy of various LVR modeling techniques in the presence of measurement noise in the data.It is important to note that in practical process data, features and noise span wide ranges over time and frequency.In other words, features in the input-output data may change at a high frequency over a certain time span, but at a much lower frequency over a different time span.Also, noise (especially colored or correlated) may have varying frequency contents over time.In modeling such multiscale data, the model estimation technique should be capable of extracting the important features in the data and removing the undesirable noise and disturbance to minimize the effect of these disturbances on the estimated model.

Advantages of Multiscale
Filtering in LVR Modeling.Since practical process data are usually multiscale in nature, modeling such data requires a multiscale modeling technique that accounts for this type of data.Below is a description of some of the advantages of multiscale filtering in LVR model estimation [44].
(i) The presence of noise in measured data can considerably affect the accuracy of estimated LVR models.This effect can be greatly reduced by filtering the data using wavelet-based multiscale filtering, which provides effective separation of noise from important features to improve the quality of the estimated models.This noise-feature separation can be visually seen from Figure 1, which shows that the scaled signals are less noise corrupted at coarser scales.(ii) Another advantage of multiscale representation is that correlated noise (within each variable) gets approximately decorrelated at multiple scales.Correlated (or colored) noise arises in situations where the source of error is not completely independent and random, such as malfunctioning sensors or erroneous sensor calibration.Having correlated noise in the data makes modeling more challenging because such noise is interpreted as important features in the data, while it is in fact noise.This property of multiscale representation is really useful in practice, where measurement errors are not always random [33].
These advantages will be utilized to enhance the accuracy of LVR models by developing an algorithm that integrates multiscale filtering and LVR model estimation as described next.

Integrated Multiscale LVR (IMSLVR) Modeling.
The idea behind the developed integrated multiscale LVR (IMSLVR) modeling algorithm is to combine the advantages of multiscale filtering and LVR model estimation to provide inferential models with improved predictions.Let the time domain input and output data be X and y, and let the filtered data (using the multiscale filtering algorithm described in Section 4.2.2) at a particular scale () be X  and y  ; then the inferential model (which is estimated using these filtered data) can be expressed as follows: where X  ∈ R × is the filtered input data matrix at scale (), y  ∈ R ×1 is the filtered output vector at scale (), b ∈ R ×1 is the estimated model parameter vector using the filtered data at scale (), and   ∈ R ×1 is the model error when the filtered data at scale () are used, respectively.Before we present the formulations of the LVR modeling techniques using the multiscale filtered data, let us define the following.Let the matrix D  be defined as the augmented scaled and filtered input and output data, that is, Then, the covariance of D  can be defined as follows [9]: Also, since the LVR models are developed using transformed variables, the transformed input variables using the filtered inputs at scale () can be expressed as follows: where z , is the th latent input variable ( = 1, . . ., ) and a , is the th input loading vector which is estimated using the filtered data at scale () using any of the LVR modeling techniques, that is, PCR, PLS, or RCCA.Thus, the LVR model estimation problem (using the multiscale filtered data at scale ()) can be formulated as follows.

Integrated Multiscale LVR Modeling Algorithm.
It is important to note that multiscale filtering enhances the quality of the data and the accuracy of the LVR models estimated using these data.However, filtering the input and output data a priori without taking the relationship between these two data sets into account may result in the removal of features that are important to the model.Thus, multiscale filtering needs to be integrated with LVR model for proper noise removal.This is what is referred to as integrated multiscale LVR (IMSLVR) modeling.One way to accomplish this integration between multiscale filtering and LVR modeling is using the following IMSLVR modeling algorithm which is schematically illustrated in Figure 2: (i) split the data into two sets: training and testing, (ii) scale the training and testing data sets, (iii) filter the input and output training data at different scales (decomposition depths) using the algorithm described in Section 4.2.2, (iv) using the filtered training data from each scale, construct an LVR model.The number of principal components is optimized using cross-validation, (v) use the estimated model from each scale to predict the output for the testing data, and compute the crossvalidation mean square error, (vi) select the LVR with the least cross-validation mean square error as the IMSLVR model.

Illustrative Examples
In this section, the performances of the IMSLVR modeling algorithm described in Section 5.2.2 is illustrated and compared with those of the conventional LVR modeling methods as well as the models obtained by prefiltering the data (using either multiscale filtering or low pass filtering).This comparison is performed through three examples.The first two examples are simulated examples, one using synthetic data and the other using simulated distillation column data.The third example is a practical example that uses experimental packed bed distillation column data.In all examples, the estimated models are optimized and compared using crossvalidation, by minimizing the output prediction mean square error (MSE) using unseen testing data as follow: where () and ŷ() are the measured and predicted outputs at time step (), and  is the total number of testing measurements.Also, the number of retained latent variables (or principal components) by the various LVR modeling techniques (RCCA, PLS, and PCR) is optimized using crossvalidation.Note that the data (inputs and output) are scaled (by subtracting the mean and dividing by the standard deviation) before constructing the LVR models to enhance their prediction abilities.

Example 1: Inferential Modeling of Synthetic Data.
In this example, the performances of the various LVR modeling techniques are compared by modeling synthetic data consisting of ten input variables and one output variable.

Data Generation.
The data are generated as follows.
The first two input variables are "block" and "heavy-sine" signals, and the other input variables are computed as linear combinations of the first two inputs as follows: x 4 = 0.3x 1 + 0.7x 2 , x 5 = 0.3x 3 + 0.2x 4 , which means that the input matrix X is of rank 2.Then, the output is computed as a weighed sum of all inputs as follows: where   = {0.07,0.03, −0.05, 0.04, 0.02, −1.1, −0.04, −0.02, 0.01, −0.03}, for  = 1, . . ., 10.The total number of generated data samples is 512.All variables, inputs and output, which are assumed to be noise-free, are then contaminated with additive zero-mean Gaussian noise.Different levels of noise, which correspond to signal-to-noise ratios (SNR) of 5, 10, and 20, are used to illustrate the performances of the various methods at different noise contributions.The SNR is defined as the variance of the noise-free data divided by the variance of the contaminating noise.A sample of the output data, where SNR = 10, is shown in Figure 3.

Selection of Decomposition Depth and Optimal Filter
Parameters.The decomposition depth used in multiscale filtering and the parameters of the low pass filters (i.e., the length of the mean filter and the value of the smoothing parameter ) are optimized using a cross-validation criterion, which was proposed in [43].The idea here is to split the data into two sets: odd (y  ) and even (y  ); filter the odd set, compute estimates of the even numbered data from the filtered odd data by averaging the two adjacent filtered samples, that is, ŷ, = (1/2)(ŷ , + ŷ,+1 ), and then compute the cross-validation MSE (CVMSE) with respect to the even data samples as follows: The same process is repeated using the even numbered samples as the training data, and then the optimum filter parameters are selected by minimizing the sum of crossvalidation mean squared errors using both the odd and even data samples.

Simulation Results.
In this section, the performance of the IMSLVR modeling algorithm is compared to those of the conventional LVR algorithms (RCCA, PLS, and PCR) and those obtained by prefiltering the data using multiscale filtering, mean filtering (MF), and EWMA filtering.In multiscale filtering, the Daubechies wavelet filter of order three is used, and the filtering parameters for all filtering techniques are optimized using cross-validation.To obtain statistically valid conclusions, a Monte Carlo simulation using 1000 realizations is performed, and the results are shown in Table 1.The results in Table 1 clearly show that modeling prefiltered data (using multiscale filtering (MSF+LVR), EWMA filtering (EWMA+LVR), or mean filtering (MF+LVR)) provides a significant improvement over the conventional LVR modeling techniques.This advantage is much clearer for multiscale filtering over the single-scale (low pass) filtering techniques.However, the IMSLVR algorithm provides a further improvement over multiscale prefiltering (MSF+LVR) for all noise levels.This is because the IMSLVR algorithm integrates modeling and feature extraction to retain features in the data that are important to the model, which improves the model prediction ability.Finally, the results in Table 1 also show that the advantages of the IMSLVR algorithm are clearer for larger noise contents, that is, smaller SNR.As an example, the performances of all estimated models using RCCA are demonstrated in Figure 4 for the case where SNR = 10, which clearly shows the advantages of IMSLVR over other LVR modeling techniques.

Effect of Wavelet
Filter on Model Prediction.The choice of the wavelet filter has a great impact on the performance of the estimated model using the IMSLVR modeling algorithm.
To study the effect of the wavelet filter on the performance of the estimated models, in this example, we repeated the simulations using different wavelet filters (Haar, Daubechies second and third order filters) and results of a Monte Carlo simulation using 1000 realizations are shown in Figure 5.The simulation results clearly show that the Daubechies third order filter is the best filter for this example, which makes sense because it is smoother than the other two filters, and thus it fits the nature of the data better.In this simulated modeling problem, the input variables consist of ten temperatures at different trays of the column, in addition to the flow rates of the feed and reflux streams.The output variables, on the other hand, are the compositions of the light component (propane) in the distillate and the bottom streams (i.e.,   and   , resp.).The dynamic temperature and composition data generated using the Aspen simulator (due to the perturbations in the feed and reflux flow rates) are assumed to be noise-free, which are then contaminated with zero-mean Gaussian noise.To assess the robustness of the various modeling techniques to different noise contributions, different levels of noise (which correspond to signal-to-noise ratios of 5, 10, and 20) are used.Sample training and testing data sets showing the effect of the perturbations on the column compositions are shown in Figures 6(a), 6(b), 6(c), and 6(d) for the case where the signal-to-noise ratio is 10.

Simulation Results.
In this section, the performance of the IMSLVR algorithm is compared to the conventional LVR models as well as the models estimated using prefiltered data.To obtain statistically valid conclusions, a Monte Carlo simulation of 1000 realizations is performed, and the results are presented in Tables 3 and 4 for the estimation of top and bottom distillation column compositions, that is,   and   , respectively.As in the first example, the results in both Tables 3 and 4 show that modeling prefiltered data significantly improves the prediction accuracy of the estimated LVR models over the conventional model estimation methods.The IMSLVR algorithm, however, improves the prediction of the estimated LVR model even further, especially at higher noise contents, that is, at smaller SNR.To illustrate the relative performances of the various LVR modeling techniques, as an example, the performances of the estimated RCCA models for the top composition (  ) in the case of SNR = 10 are shown in Figure 7.     8.Ten Resistance Temperature Detector (RTD) sensors are fixed at various locations in the setup to monitor the column temperature profile.The flow rates and densities of various streams (e.g., feed, reflux, top product, and bottom product) are also monitored.In addition, the setup includes four pumps and five heat exchangers at different locations.

Modelling and Simulation in Engineering
The feed stream enters the column near its midpoint.The part of the column above the feed constitutes the rectifying section, and the part below (and including) the feed constitutes the stripping section.The feed flows down the stripping section into the bottom of the column, where a certain level of liquid is maintained by a closed-loop controller.A steamheated reboiler is used to heat and vaporize part of the bottom stream, which is then sent back to the column.The vapor passes up the entire column contacting descending liquid on its way down.The bottom product is withdrawn from the bottom of the column and is then sent to a heat exchanger, where it is used to heat the feed stream.The vapors rising through the rectifying section are completely condensed in the condenser and the condensate is collected in the reflux drum, in which a specified liquid level is maintained.and reflux streams), eight columns for the lagged inputs, and one column for the lagged output.To show the advantage of the IMSLVR algorithm, its performance is compared to those of the conventional LVR models and the models estimated using multiscale prefiltered data, and the results are shown in Figure 10.The results clearly show that multiscale prefiltering provides a significant improvement over the conventional LVR (RCCA) method (which sought to overfit the measurements), and that the IMSLVR algorithm provides further improvement in the smoothness and the prediction accuracy.Note that Figure 10 shows only a part of the testing data for the sake of clarity.

Conclusions
Latent variable regression models are commonly used in practice to estimate variables which are difficult to measure from other easier-to-measure variables.This paper presents a modeling technique to improve the prediction ability of LVR models by integrating multiscale filtering and LVR model estimation, which is called integrated multiscale LVR (IMSLVR) modeling.The idea behind the developed IMSLVR algorithm is to filter the input and output data at different scales, construct different models using the filtered data from each scale, and then select the model that provides the minimum cross-validation MSE.The performance of the IMSLVR modeling algorithm is compared to the conventional LVR modeling methods as well as modeling prefiltered data, either using low pass filtering (such as mean filtering or EMWA filtering) or using multiscale filtering through three examples, two simulated examples and one practical example.
The simulated examples use synthetic data and simulated distillation column data, while the practical example uses experimental packed bed distillation column data.The results of all examples show that data prefiltering (especially using multiscale filtering) provides a significant improvement over the convectional LVR methods, and that the IMSLVR algorithm provides a further improvement, especially at higher noise levels.The main reason for the advantages of the IMSLVR algorithm over modeling prefiltered data is that it integrates multiscale filtering and LVR modeling, which helps retain the model-relevant features in the data that can provide enhanced model predictions.

Figure 1 :
Figure 1: Multiscale decomposition of a heavy-sine signal using Haar.

25 Figure 3 :
Figure 3: Sample output data set used in example 1 for the case where SNR = 10 (solid line: noise-free data; dots: noisy data).

20 𝑦Figure 4 :
Figure 4: Comparison of the model predictions using the various LVR (RCCA) modeling techniques in example 1 for the case where SNR = 10 (solid blue line: model prediction; solid red line: noisefree data; black dots: noisy data).

Figure 5 :
Figure 5: Comparison of the MSEs for various wavelet filters in example 1 for the case where SNR = 10.

Figure 6 :
Figure 6: The dynamic input-output data used for training and testing the models in the simulated distillation column example for the case where the noise SNR = 10 (solid red line: noise-free data; blue dots: noisy data).

6. 3 .
Example 3: Dynamic LVR Modeling of an Experimental Packed Bed Distillation Column.In this example, the developed IMSLVR modeling algorithm is used to model a practical packed bed distillation column with a recycle

Figure 8 :
Figure 8: A schematic diagram of the packed bed distillation column setup.

Figure 10 :
Figure 10: Comparison of the model predictions using the various modeling methods for the experimental packed bed distillation column example (solid blue line: model prediction; black dots: plant data).

Table 1 :
Comparison of the Monte Carlo MSEs for the various modeling techniques in example 1.

Table 2 :
Steady state operating conditions of the distillation column.

Table 3 :
Comparison of the Monte Carlo MSE's for   in the simulated distillation column example.

Table 4 :
Comparison of the Monte Carlo MSE's for   in the simulated distillation column example.
Figure 7: Comparison of the RCCA model predictions of   using the various LVR (RCCA) modeling techniques for the simulated distillation column example and the case where the noise SNR = 10 (solid blue line: model prediction; black dots: noisy data; solid red line: noise-free data).

Table 5 :
Steady state operating conditions of the packed bed distillation column.packing sections (bottom, middle, and top section) rising to a height of 20 feet.The column, which is used to separate a methanol-water mixture, has Koch-Sulzer structured packing with liquid distributors above each packing section.An industrial quality Distributed Control System (DCS) is used to control the column.A schematic diagram of packed bed distillation column is shown in Figure