Multiscale Latent Variable Regression

Multiscale wavelet-based representation of data has been shown to be a powerful tool in feature extraction from practical process data. In this paper, this characteristic of multiscale representation is utilized to improve the prediction accuracy of some of the latent variable regression models, such as Principal Component Regression (PCR) and Partial Least Squares (PLS), by developing a multiscale latent variable regression (MSLVR) modeling algorithm. The idea is to decompose the input-output data at multiple scales using wavelet and scaling functions, construct multiple latent variable regression models at multiple scales using the scaled signal approximations of the data and then using cross-validation, and select among all MSLVR models the model which best describes the process. The main advantage of the MSLVR modeling algorithm is that it inherently accounts for the presence of measurement noise in the data by the application of the low-pass filters used in multiscale decomposition, which in turn improves the model robustness to measurement noise and enhances its prediction accuracy. The advantages of the developed MSLVR modeling algorithm are demonstrated using a simulated inferential model which predicts the distillate composition from measurements of some of the trays’ temperatures.


Introduction
Process models are an essential part of many process operations, such as model-based control [1,2].However, constructing empirical models using measurements of the process variables is associated with many difficulties, which include dealing with collinearity or redundancy in the variables and accounting for the presence of measurement noise in the data.
Collinearity is common in models which involve large number of variables, such as Finite Impulse Response (FIR) models [3,4] and inferential models.Collinearity increases the variance of the estimated model parameters, which degrades their accuracy of estimation.Many modeling techniques have been developed to deal with collinearity, which include Ridge Regression (RR) [5][6][7] and latent variable regression [3][4][5].RR reduces the variations in model parameters by imposing a penalty on the norm of their estimated values.The latent variable regression models, on the other hand, use singular value decomposition to reduce the dimension of the input variables to provide a more conditioned set of inputs.Some of the popular latent variable regression model estimation techniques include the wellknown Principal Component Regression (PCR) and Partial Least Squares (PLS) modeling methods [3][4][5].
Also, the presence of measurement noise in the data used in empirical modeling, even in small amounts, can largely affect the estimated model's prediction accuracy.Therefore, measurement noise needs to be filtered for improved model prediction.Modeling of prefiltered data does not usually provide satisfactory modeling performance [8].This is because applying data filtering without taking into account the input-output relationship may result in the removal of certain features from the data which are important for the model.Therefore, filtering and modeling need to be integrated for improved model accuracy.
Unfortunately, measured data usually are multiscale in nature, which means that they contain features and noise that have varying contributions over both time and frequency [9].For example, an abrupt change in the data spans a wide range in the frequency domain and a small range in the time domain, while a slow change spans a wide range in the time International Journal of Chemical Engineering domain and a small range in the frequency domain.Filtering such data using the conventional low-pass filters usually does not result in a good noise-feature separation because these filtering techniques classify noise as high frequency features and filter the data by removing features with frequency higher than a defined frequency threshold.Thus, modeling multiscale data requires developing multiscale modeling techniques that account for this multiscale nature of the data.
Many investigators have used multiscale techniques to improve the accuracy of estimated empirical models [8,[10][11][12][13][14][15][16][17].For example, the authors in [10] showed how to use wavelet representation to design wavelet prefilters for process modeling purposes.In [9], the author discussed some of the advantages of using multiscale representation in empirical modeling, and in [11], he enhanced the noise removal ability of the Principal Component Analysis model by constructing multiscale PCA models, which he also used in process monitoring.Also, the authors in [12][13][14] used multiscale representation to reduce collinearity and shrink the large variations in FIR model parameters.Furthermore, the authors in [15,16] used multiscale representation to enhance the prediction of fuzzy models and the parsimony and accuracy of ARX models.Finally, in [17], the authors used wavelets as modulating functions for control-related system identification.
In this work, multiscale representation of data is utilized to improve the prediction accuracy of some of the common latent variable regression modeling methods, such as PCR and PLS, by developing a multiscale latent variable regression (MSLVR) modeling algorithm that reduces the effect of measurement noise in the data on the accuracy and prediction of these models.The MSLVR algorithm integrates modeling and data filtering by constructing multiple latent variable regression models at multiple scales using the scaled signal approximations of the input and output data and then selecting, among all scales, the model that provides the optimum prediction and maximum noise-feature separation.
The rest of this paper is organized as follows.In Section 2, the formulation and estimation of latent variable regression models are introduced, followed by a description of the wavelet-based multiscale representation of data in Section 3.Then, in Section 4, the representation and algorithm of MSLVR modeling are presented.Then, in Section 5, the performance of the developed MSLVR modeling algorithm is illustrated and compared to time-domain models through a simulated distillation column example.Finally, the paper is concluded with few remarks in Section 6.

Latent Variable Model Representation and Estimation
Given measurements of the input and output data, that is, {x k } k=1,...,n where , where all variables are assumed to be contaminated with additive zero mean Gaussian noise (i.e., x = x + ε x and y = y + ε y , and the superscript "∼" represents the noise-free variables), it is desired to construct latent variable regression models of the following form: where In ( 1) and ( 2), z k , the latent variables vector at time step (k), β, the latent variable model parameter vector, and α, the projection directions matrix are of sizes (1 × p), (p × 1), and (p × m), respectively, and α T α = I.Note that latent variable regression models reduce the dimension of the input variables from (m), which is the length of x k , to (p), which is the length of z k , where m > p, and the model output is regressed to the latent variables instead of the original input variables.
the latent variable regression model can be written in matrix form as follows: where Common methods of estimating the above latent variable regression model include PCR and PLS, which are described below.

Principal Component Regression (PCR). PCR accounts
for collinearity in the input variables by reducing the dimension of these variables using Principal Component Analysis (PCA), which uses Singular Value Decomposition (SVD) to compute the latent variables or principal components, z k .Then, it constructs a simple linear model between the latent variables and the output using the well-known Ordinary Least Squares (OLS) regression method [5,18].Therefore, PCR can be formulated as two consecutive estimation problems (II) International Journal of Chemical Engineering 3

Partial Least Squares (PLS) Regression. PLS regression
uses the same model structure used by PCR but extends PCR to consider the output variables in computing the latent variables or principal components.It determines the projection directions that capture the variations in the input variables which are the closest to the output by maximizing the following objective function [19]: A similar formulation of PLS has also been used to extend PLS to deal with nonlinear problems, where the projection directions are estimated by minimizing the sum of input and output errors as follows [20]: subject to the constraints shown in ( 6) and ( 7).
It can be seen from the formulation of both PCR and PLS that they partially reduce the effect of measurement noise as they reduce the redundancy among the input variables.However, improvement can be made if the noise content within each variable is also reduced, which can be achieved using multiscale representation of data.

Multiscale Representation of Data
A proper way of analyzing real data requires their representation at multiple scales.This can be achieved by expressing the data as a weighted sum of orthonormal basis functions, which are defined in both time and frequency, such as wavelets.Wavelets are a computationally efficient family of multiscale basis functions.A signal can be represented at multiple resolutions by decomposing the signal on a family of wavelets and scaling functions.The signals in Figures 1(b), 1(d), and 1(f) are at increasingly coarser scales compared to the original signal in Figure 1(a).These scaled signals are determined by projecting the original signal on a set of orthonormal scaling functions of the form or equivalently by filtering the signal using a low-pass filter of length r, h derived from the scaling functions.On the other hand, the signals in Figures 1(c), 1(e), and 1(g), which are called the detail signals, capture the differences between any scaled signal and the scaled signal at the finer scale.These detail signals are determined by projecting the signal on a set of wavelet basis functions of the form or equivalently by filtering the scaled signal at the finer scale using a high-pass filter of length r, g derived from the wavelet basis functions.Therefore, the original signal can be represented as the sum of all detail signals at all scales and the scaled signal at the coarsest scale as follows where j, k, J, and n are the dilation parameter, translation parameter, maximum number of scales (or decomposition depth), and the length of the original signal, respectively [21,22].Fast wavelet transform algorithms of O(n) complexity for a discrete signal of dyadic length have been developed [23].For example, the wavelets and scaling functions coefficients at a particular scale ( j), d j and a j , can be computed in a compact fashion by multiplying the scaling coefficient vector at the finer scale, a j−1 , by the matrices G j and H j , respectively, that is, where Note that the length of the scaling and detail signals decreases dyadically at coarser resolutions (higher j).In other words, the length of scaled signal at scale ( j) is half the length of scaled signal at the finer scale ( j − 1).This is due to downsampling, which is used in discrete wavelet transform.
Just as an example to illustrate the multiscale decomposition procedure and to introduce some terminology, consider the following discrete signal, Y o , of length (n) in the time domain (i.e., j = 0), the scaled signal approximation of Y o at scale ( j), which can be written as can be computed as follows: International Journal of Chemical Engineering Note that this decomposition algorithm is batch, that is, it requires the availability of the entire data set beforehand.An online wavelet decomposition algorithm has also been developed and used in data filtering [24].

Multiscale Latent Variable Regression (MSLVR)
In this section, the feature extraction abilities of multiscale data representation are utilized to develop multiscale latent variable regression models, which are less affected by the presence of noise in the data, the main idea is to decompose the input-output data at multiple scales and construct a latent variable regression model at each scale using the scaled signal approximations of the data.Then, among all scales, select the optimum latent variable regression model that provides the best noise-feature separation and least prediction error.

Multiscale latent variable Regression Model Formulation.
Denoting the model input and output matrices in the time domain as X o and Y o , the scaled signal approximations of these matrices at the first scale can be computed using the low-pass filter matrices as shown in (17) as follows: Note that X 1 and Y 1 are of sizes (n/2) × m and (n/2) × 1, respectively because of the downsampling used in wavelet decomposition.Having computed the matrices, X 1 and Y 1 , the latent variable regression model at the first scale, which can be estimated using either PCR or PLS, can be represented as follows: where Repeating this process at coarser scales, the latent variable regression model at scale ( j) can be expressed as follows: where,

Multiscale Latent Variable Regression (MSLVR) Modeling
Algorithm.Based on the above discussion, the following algorithm is proposed for multiscale latent variable regression.
(1) Given the input-output data, construct the matrices X and Y and estimate a latent variable regression model either using PCR or PLS as described in Section 2. This will require estimating the rank of the latent variable regression model, which can be done using cross-validation [25].
(2) Decompose the input and output data at coarser scales as shown in ( 23) and ( 24).
(3) At each scale (j) and using the scaled signal approximations of the data, (a) estimate either a PCR or PLS model as described in step ( 1), and use it to predict the output at that scale; (b) reconstruct the predicted output of either the PCR or PLS model back to the time domain; (c) use the reconstructed output prediction from each scale to compute the following crossvalidation mean squares errors (CVMSE) at each scale [25] CVMSE where (4) select among all scales the optimum multiscale latent variable regression model that minimizes the crossvalidation mean squares error criterion shown in (25).

Illustrative Example
In this section, the performance of the developed MSLVR modeling algorithm is illustrated and compared to those of the time-domain latent variable regression techniques and to modeling prefiltered data using an Exponentially Weighted Moving Average (EWMA) filter.The MSLVR models are used as inferential models that estimate the distillation column composition using temperature measurements.The data are simulated using a 30-tray distillation column that is used to separate methanol, ethanol, 1-propanol, and 1-butanol under temperature control.The objective of this distillation process is to maintain high-purity separation of the light and heavy components.A more detailed description of the process and its operating conditions is provided in [26].
For control purposes, measuring compositions on-line is very expensive, and thus it is desired to build accurate inferential models that estimate the product composition in the distillate and bottom streams.In this example, an inferential model is constructed to estimate the composition of ethanol in the bottom stream from temperature measurements at different trays.As shown in [26], the distillate composition can be estimated using temperature measurements from nine trays, which are T 3 , T 6 , T 9 , T 12 , T 15 , T 19 , T 22 , T 25 , and T 28 .The simulated data, which consist of 1024 samples, are assumed to be noise-free.Then, all variables, inputs, and outputs are contaminated with zero mean Gaussian noise.Different levels of noise, that is, signal-to-noise ratios (SNR) of 10 and 50, are used to test the robustness of the developed MSLVR modeling algorithm.
To show the effect of prefiltering of the model prediction accuracy, the data are filtered using an EWMA filter, and then the filtered data are used to construct the LVR models.The EWMA filter has the following structure: where w is a raw data sample, w is a filtered data sample, and α is a filter parameter that is optimized using crossvalidation.Of course, for different level of noise, the optimum value of α changes.For example, for a SNR of 10, α is found to be 0.41, while for a SNR of 50, it is found to be 0.52.The performances of the multiscale and time-domain latent variable regression methods are compared by comparing the prediction mean square errors with respect to the noise-free output, that is, where y and y are the predicted and noise-free outputs, respectively.Note that such comparison is possible in this simulated example because the noise-free output is known.Also, in this example, the Haar wavelet and scaling functions are used in multiscale representation of the data.
To make statistically valid conclusions about the performances of the various modeling methods, a Monte Carlo simulation of 100 realizations is performed, and the results are presented in Tables 1 and 2.
Table 1 shows that PLS outperformed PCR in the time domain.It also shows that for both PCR and PLS, prefiltering the data using an EWMA filter helps the prediction of the estimated models.However, the multiscale variable regression modeling techniques (MS PCR and MS PLS) provide improved prediction results than their time domain counterparts and modeling prefiltered data, and the level of improvement increases for larger noise contents (smaller SNR).This improvement is illustrated in Figures 2 and 3, which show the advantages of constructing latent variable regression models at multiple scales.Figures 2 and 3 also show that the accuracy of the estimated MSLVR models improves at coarser scales but up to a certain scale beyond which the quality of estimated models deteriorates.This can also be noted from Table 2, which reports the MSE of the estimated PCR and PLS models at different scales and shows that there is an intermediate scale at which MSLVR is the best.This is because at very coarse scales, important features get eliminated which affects the model's quality.That is why it is very important to select the optimum scale for modeling.Table 2 also presents (in parentheses) the percentages at which each scale is selected as optimum using the cross-validation mean squares error criterion (shown in (25)) and shows that the optimum scale increases or gets coarser (higher j) for higher noise levels or smaller SNR.This    makes sense because, for higher noise levels, more filtering is needed for good noise-feature separation.

Conclusions
In this paper, the noise-feature separation capabilities of multiscale representation of data are exploited to improve the prediction accuracy of the latent variable regression models, by presenting a multiscale latent variable regression (MSLVR) modeling algorithm that enhances the robustness of the latent variable regression models to measurement noise in the data.The MSLVR algorithm integrates modeling and filtering by decomposing the input-output data and using the scaled signal approximations to construct different latent variable regression models at different scales.Then, among all scales, the model that minimizes a cross-validation mean squares error criterion is selected as the optimum model.The estimated models using the developed MSLVR modeling algorithm are shown to outperform their time domain counterparts and modeling prefiltered data through a simulated example, which clearly shows the advantages of integrating filtering and model estimation on the prediction of the estimated models.

Figure 1 :
Figure 1: A schematic diagram of data representation at multiple scales.

Figure 2 :
Figure 2: The performance of MS PCR modeling at multiple scales for the case where the data SNR = 10 [dashed line: noise-free output; solid line: predicted output].

Figure 3 :
Figure3: The performance of MS PLS modeling at multiple scales for the case where the data SNR = 10 (dashed line: noise-free output and solid line: predicted output).

Table 1 :
Comparison between the output prediction mean square errors for the various modeling methods.

Table 2 :
Comparison between the output prediction mean square errors obtained by MSLVR at multiple scales (the numbers in parenthesis indicate the percentages that each scale is selected as optimum using the CVMSE criterion).