Reduced Noise Effect in Nonlinear Model Estimation Using Multiscale Representation

Nonlinear process models are widely used in various applications. In the absence of fundamental models, it is usually relied on empirical models, which are estimated from measurements of the process variables. Unfortunately, measured data are usually corrupted with measurement noise that degrades the accuracy of the estimated models. Multiscale wavelet-based representation of data has been shown to be a powerful data analysis and feature extraction tool. In this paper, these characteristics of multiscale representation are utilized to improve the estimation accuracy of the linear-in-the-parameters nonlinear model by developing a multiscale nonlinear (MSNL) modeling algorithm. The main idea in this MSNL modeling algorithm is to decompose the data at multiple scales, construct multiple nonlinear models at multiple scales, and then select among all scales the model which best describes the process. The main advantage of the developed algorithm is that it integrates modeling and feature extraction to improve the robustness of the estimated model to the presence of measurement noise in the data. This advantage of MSNL modeling is demonstrated using a nonlinear reactor model.


Introduction
Process models are a core element in many process operations, such as process control and optimization [1,2], and the accuracy of these models has a direct impact on the quality of these operations and ultimately on the overall performance of the process.Therefore, it is always sought to improve the accuracy of the models used in these operations.Since fundamental models are not always available, in many cases, it is relied on deriving empirical or semiempirical models from measurement of the process variables.However, data-driven approaches for model estimation are associated with many challenges, which include defining the model structure and accounting for the presence of measurement noise in the data.The objective of this work is to improve the prediction accuracy of the well-known class of nonlinear, but linear-in-the-parameters, process models using multiscale representation to account for the presence of measurement noise in the data.
The presence of measurement noise, even in small amounts, can largely affect the estimated model's prediction accuracy.Therefore, such noise needs to be filtered for improved model's prediction.Modeling of prefiltered measured data does not usually provide satisfactory performance [3].This is because applying data filtering without taking into account the input-output relationship may result in the removal of certain features from the data which are important for the model.Therefore, filtering and modeling need to be integrated for satisfactory model estimation.
Unfortunately, measured data usually have a multiscale character, which means that they contain features and noise that have varying contributions over both time and frequency [4].For example, an abrupt change in the data spans a wide range in the frequency domain and a small range in the time domain, while a slow change spans a wide range in the time domain and a small range in the frequency domain.Filtering such data using the conventional low pass filters usually does not result in a good noisefeature separation because these filtering techniques classify noise as high-frequency features and filter the data by removing features with frequency higher than a defined frequency threshold.Thus, modeling multiscale data requires developing multiscale modeling techniques that account for this multiscale nature of the data.
Many investigators have used multiscale techniques to improve the accuracy of estimated empirical models [3,[5][6][7][8][9].For example, the authors in [5] showed how to use wavelet representation to design wavelet prefilters for process modeling purposes.In [3], the author discussed some of the advantages of using multiscale representation in empirical modeling, and in [6], he enhanced the noise removal ability of the Principal Component Analysis (PCA) model by constructing multiscale PCA models, which he also used in process monitoring.Also, the authors in [7][8][9] used multiscale representation to reduce collinearity and shrink the large variations in the Finite Impulse Response (FIR) model parameters.Furthermore, in [10], the authors used wavelets as modulating function for control-related system identification.Finally, the authors in [11,12] used waveletbased representation to enhance the accuracy and parsimony of the Autoregressive with exogenous variable (ARX) model and the Takagi-Sugeno fuzzy model.
In this work, multiscale representation of data is utilized to improve the prediction accuracy of nonlinear models by developing a multiscale nonlinear (MSNL) process modeling algorithm that accounts for the presence of noise in the data and improves the model's prediction accuracy.The developed MSNL modeling algorithm integrates modeling and data filtering by constructing multiple nonlinear models at multiple scales using the scaled signal approximations of the input and output data and then selecting, among all scales, the model that provides the optimum prediction and maximum noise-feature separation.
The rest of this paper is organized as follows.In Section 2, the problem statement and the formulation and estimation of the linear-in-the-parameter nonlinear model are introduced, followed by a description of the waveletbased multiscale representation of data in Section 3.Then, in Section 4, the formulation, estimation, and algorithm for MSNL modeling are described.Then, in Section 5, the performance of the developed MSNL modeling algorithm is illustrated and compared with that of the time-domain method.Finally, the paper is concluded with few remarks in Section 6.

Problem Statement
In this work, we address the problem of empirically (from measurement of the input and output variables) estimating linear-in-the-parameters nonlinear models which are less affected by the presence of measurement noise in the data.Given measurements of the input-output data, that is, {u(k)} k=1,...,n and {y(k)} k=1,...,n , where the output is assumed to be contaminated with additive zero mean Gaussian noise (i.e., y = y + ε, where ε ∼ N(0, σ 2 ε ), and the superscript "∼" represents the noise-free variables), it is desired to construct a linear-in-the-parameter nonlinear model of the form where y(k + 1) is the process output at time step (k + 1), , where these basis functions are assumed to be known, and β i is the model parameter corresponding to the ith basis function, which is a function of the parameter vector θ i .Also, p i and q i are the number of lagged outputs and inputs used in the model, respectively.

Nonlinear Model Estimation.
The nonlinear model shown in (1) can also be represented in matrix form as where and the parameter vector T which can be estimated using ordinary least squares (OLSs) regression as follows: Note that since OLS minimizes the output prediction mean squares error in the estimation of the model parameters, it implicitly assumes that all variables in the information matrix, X, are noise-free.In the nonlinear model given in (2), however, past outputs are also a part of the information matrix.Therefore, the accuracy of the model prediction can be affected by the presence of measurement noise in the data, especially in large amounts.This is because OLS will provide biased estimate of the parameters, which degrades the accuracy of the model's prediction.More details about bias and its effect on the model's prediction can be found in [13].In this paper, an alternative modeling approach will be developed to reduce the effect of measurement noise in the data on the prediction accuracy of the estimated model.This approach will utilize multiscale wavelet-based representation of data, which is introduced next.

Multiscale Representation of Data
A proper way of analyzing real data requires their representation at multiple scales.This can be achieved by expressing the data as a weighted sum of orthonormal basis functions, which are defined in both time and frequency, such as wavelets.Wavelets are a computationally efficient family of multiscale basis functions.A signal can be represented at multiple resolutions by decomposing the signal on a family or equivalently by filtering the signal using a low pass filter of length r, h = [h 1 h 2 • • h r ], derived from the scaling functions [14].On the other hand, the signals in Figures 1(c), 1(e), and 1(g), which are called the detail signals, capture the differences between any scaled signal and the scaled signal at the finer scale.These detail signals are determined by projecting the signal on a set of wavelet basis functions of the form or equivalently by filtering the scaled signal at the finer scale using a high pass filter of length r, derived from the wavelet basis functions.Therefore, the original signal can be represented as the sum of all detail signals at all scales and the scaled signal at the coarsest scale as follows: where j, k, J, and n are the dilation parameter, translation parameter, maximum number of scales (or decomposition depth), and the length of the original signal, respectively [15,16].Fast wavelet transform algorithms of O(n) complexity for a discrete signal of dyadic length (of length 2 j where j is a positive integer) have been developed [14].For example, the wavelets and scaling functions coefficients at a particular scale ( j), d j and a j , can be computed in a compact fashion by multiplying the scaling coefficient vector at the finer scale, a j−1 , by the matrices, G j and H j , respectively, that is, where Note that the length of the scaling and detail signals decreases dyadically at coarser resolutions (higher j).In other words, the length of scaled signal at scale ( j) is half the length of scaled signal at the finer scale, ( j − 1).This is due to downsampling, which is used in discrete wavelet transform.Just as an example to illustrate the multiscale decomposition procedure and to introduce some terminology, consider the following discrete signal, Y o , of length (n) in the time domain (i.e., j = 0): The scaled signal approximation of Y o at scale ( j), which can be written as (1) .y j (k) .y j n2 − j T (12) can be computed as follows: Note that this decomposition algorithm is batch, that is, it requires the availability of the entire data set beforehand.
An online wavelet decomposition algorithm has also been developed and used in data filtering [17].

Multiscale Nonlinear Process Modeling
In this section, the feature extraction abilities of multiscale representation of data are utilized to construct multiscale nonlinear models that are less affected by the presence of noise in the data.The main idea is to decompose the inputoutput data at multiple scales and construct a nonlinear model at each scale using the scaled signal approximations of the data.Then, among all scales, select the optimum nonlinear model that provides the best prediction and noisefeature separation.
The ability of multiscale representation to extract important features from data can be verified by computing the signal-to-noise ratio (SNR) of the scaled signal at multiple scales.Theoretically, the SNR at any scale can be computed as follows: where y j is the noise-free scaled signal representation of the data at scale ( j).It can be empirically illustrated that the SNR of the scaled signals peaks at some intermediate scale, which can be explained as follows.At very fine scales, high-frequency noise gets filtered out, which decreases the noise content and increases the SNR.However, at very coarse scales, important features start getting removed, which decreases the signal content and decreases the SNR.Therefore, there is an intermediate scale at which the SNR peaks.This observation will be used later to estimate the optimum modeling scale.Another characteristic of multiscale representation is that correlated noise gets decorrelated at multiple scales [4].This gives another advantage to multiscale models over conventional ones.

MSNL Model Representation.
Having computed the scaled signal approximations of the input-output data at multiple scales as described in Section 3, nonlinear models of the form can be constructed at each scale ( j) using these scaled signals of the input and output data, u j and y j .Note here that the basis functions used at scale( j), that is, f i j (z i j (k), θ i j ), are not the same basis function used in the time domain, . This is because of the different transformations used to compute the scaled signals at different scales.Note, however, that the form of the model's basis functions at any scale can be defined in a similar fashion to that used to dilate the wavelet and scaling functions shown in ( 5) and (6).For example, for the basis function f i (z i (k), θ i ) in the time domain, the equivalent basis function at any scale ( j) is This effect of basis function dilation at multiple scales is illustrated in Figure 2, which shows that for a given timedomain ( j = 0) nonlinear basis function, f i0 (•), the dilated basis function, f i j (z i j (k), θ i j ), is stretched at coarser scales (larger j) to account for the dyadic downsampling used in multiscale representation.Now, the nonlinear model at scale ( j) can be written by combining (15) and ( 16) as follows:

MSNL Model Estimation.
The nonlinear model at scale ( j), shown in (17), can also be written in matrix form as where Therefore, the model parameters at scale ( j) can be estimated using OLS as shown in (4) for the time domain model.

MSNL Modeling Algorithm.
Based on the above discussion on the nonlinear model representation and estimation at multiple scales, the following algorithm is proposed for nonlinear multiscale (MSNL) identification: (1) decompose the input/output data using the waveletfilter of choice at multiple scales; (2) at each scale and using the input/output scaled signals, (a) define the structure of the model's basis functions at each scale given the basis functions in the time domain using ( 16), (b) express the model in matrix form as shown in (18), (c) estimate the nonlinear model using OLS and predict the output at each scale, (d) compute the output prediction SNR as follows: SNR( j) = var( y j )/var( y j − y j ), where y j is the predicted output at scale ( j), (3) select the optimum model among all scales by choosing the one with the maximum output prediction SNR; (4) for the optimum model, reconstruct the predicted output back to the time domain.

Illustrative Example
In this section, the advantages of the MSNL modeling algorithm are illustrated through a simulated example.The model used in this example relates the concentration of the inlet stream to a stirred tank reactor (input) to the exit stream concentration (output).Let the inlet and exit stream concentrations of a species "A" be C Ai and C A , respectively, and let the flow rates in and out of the reactor be q (see Figure 3).If species "A" is converted into "B" in the reactor according to the following reaction, 2A → B, with a reaction rate of r = k r C 2 A , and the reactor volume is constant and equals V , the rate of change of species "A" can be expressed using mass balance as follows: Discretizing the model shown in (20) with a sampling time interval of Δt, we obtain the following nonlinear discrete model: where C Ai and C A are the model input and output, respectively.Assuming the following values for the parameters: q = 1 L/s, k r = 5 × 10 −5 mole/(L.s),V = 100 L, and Δt = 1 s, the above model is used to generate data by applying a PRBS input changing between 0 and 2, and the output, which is assumed to be noise-free, is then contaminated with zero mean Gaussian noise.Different levels of noise (standard deviation σ ε = 0.05, 0.1, and 0.15, which approximately correspond to output signal to noise ratios of 50, 10, and 5) are used to test the robustness of the MSNL modeling algorithm.A 500-sample input-output data set, where σ ε = 0.15, is shown in Figure 4.
The performance of the MSNL algorithm is compared to the time-domain counterpart by comparing their prediction mean square errors with respect to the noise-free output, that is, where y and y are the predicted and noise-free outputs, respectively.Note that such comparison is possible in this simulated example because the noise-free output is known.Also, in this example, the Haar wavelet and scaling functions are used in multiscale representation of the data.Please note that smoother wavelet filters (e.g., Daubechies) may provide better performance in the case of smoother data.To make statistically valid conclusions about the performances of the various modeling methods, a Monte Carlo simulation of 1000 realizations is performed, and the results are presented in Tables 1 and 2. Table 1 shows that MSNL models achieve a noticeable improvement over those estimated using the raw data (at the time domain) and that this improvement increases for larger noise contents.This improvement can also be visually seen from Figure 5, which shows the advantage of constructing nonlinear models at multiple scales.
Figure 5 shows the change in the models' accuracies at different scales.It can be seen that prediction accuracy improves at coarser scales but up to a certain level beyond which the quality of estimated model deteriorates.In other words, the accuracy improves from the time domain up to scale 3, but starting at scale 4, the quality of the model deteriorates.This can also be noted from Table 2, which reports the MSE of the estimated models at different scales and shows that there is an intermediate scale at which MSNL modeling is the best.This is because at very coarse scales, features in the data (which are important to the model) get eliminated and thus affect the model's quality.That is why it is very important to select the optimum scale for modeling.Table 2 also presents (in parentheses) the percentages at which each scale is selected as optimum using the SNR criterion and shows that the optimum scale increases or gets coarser (higher j) for higher noise levels.This makes sense because, for higher noise levels, more filtering is needed for good noise-feature separation.Also, note that a multiscale model estimated at a particular scale ( j) will update its prediction every 2 j samples.This is because of the dyadic upsampling performed during the reconstruction of the model prediction, that is, one time sample prediction at scale ( j) corresponds to 2 j samples in the time domain.

Conclusions
In this paper, the noise-feature separation capabilities of multiscale representation of data are exploited to improve the estimation and prediction accuracy of the linear-in-theparameters nonlinear model, by presenting a multiscale nonlinear (MSNL) modeling algorithm with enhanced robustness to measurement noise in the data.The MSNL modeling algorithm integrates modeling and filtering by decomposing the input-output data and using the scaled signal approximations to construct different nonlinear models at different scales.Then, among all scales, the model with the largest output prediction SNR is selected as the optimum MSNL model.Finally, the performance of the MSNL algorithm is illustrated

1 H 2 H 3 G 1 G 2 G 3 (
Figure 1: A schematic diagram of data representation at multiple scales.

Figure 2 :
Figure 2: Dilation of basis functions at multiple scales.

Figure 3 :Figure 4 :
Figure 3: A schematic diagram of a stirred tank reactor.

Table 1 :
Comparison between the prediction means square errors of the multiscale and time-domain modeling methods.

Table 2 :
Comparison between the prediction means square errors at multiple scales.