Research Article Optimal Modeling and Filtering of Stochastic Time Series for Geoscience Applications

Article deposited according to Hindawi open access publishing policy for Mathematical Problems in Engineering: http://www.hindawi.com/oa/ [July 23, 2013].


Introduction
Consider a stochastic process or time series {(), −∞ <  < ∞} which has been observed or measured as {(), −∞ <  < ∞}, most often in practice at equispaced discrete time or space intervals Δ = 1 for simplicity over some finite domain.In practice, it is also often convenient to assume these processes or time series to be centered, that is, with null mean value or expectation following the appropriate trend modeling in increment stationary situations.When these observations (or measurements) are of Gaussian processes, their second moments or covariances then fully specify the processes.Otherwise, it can only be assumed that the covariance functions are sufficient for the following modeling and filtering operations.Specifically, it is assumed that these covariance functions have positive Fourier transforms or power spectra for realizability.
In general, the optimal estimate of () from observations () is given by the conditional expectation in which the conditional expectation of a random variable  given an observation V is defined by using the conditional density ( | V); that is, for the joint density (, V) with positive marginal density   (); that is, These quantities can readily be generalized to several variables.The previous expression is also called the conditional mean with corresponding conditional variance which is commonly used in optimization, with the * corresponding to conjugate transpose, or simply transpose in real cases.For the random variable  corresponding to the observations, E[ | ] and Var[ | ] are then the conditional expectation for the mean and variance of  given .

Such conditional expectation E[𝑋(𝑡) | 𝑌(𝑠)
] can easily be seen as an optimal estimate X() as and its variance as expected.For any other estimator (), that is, any function of (), it is straightforward to verify that for the mean-square errors (MSEs); that is, the conditional expectation estimator E[() | ()] is optimal in the mean-square sense.For any unbiased estimator (), the relationship is in the least-squares sense with variances replacing the mean-square errors.More discussion of these relationships can be found, for example, in [1].
In the linear context, the conditional expectation for some unknown kernel (, ), or with discrete data, that is, simply an unknown linear combination of the available data.In practice, only finite data are available so that the previous limits are finite, and furthermore in causal cases, only past observations are available for processing.Furthermore to simplify the presentation in the following, the inner product notation will be used; that is, for the continuous and discrete applications whenever appropriate.The superscript * stands for conjugate transpose although only real applications are considered herein.The objective in the following is to estimate optimally () from the available data () using minimal assumptions.Using the above discussed property of the conditional expectation estimator E[() | ()], one can write the implied orthogonality of the residuals to the observations; that is, that is, using the linear transformation of () for X(), or which is usually interpreted in terms of cross-covariance between () and () being equal to the autocovariance of () modified by the unknown kernel (, ), assuming E[()] = 0, or equivalently, using the cross-covariance   (, ) and (symmetric real positive definite) covariance   (, ), often written as   (, ) in practice, respectively.These normal equations are the wellknown Wiener-Hopf equations, which are Fredholm integral equations (of the first kind) in the continuous context.In cases of stationarity, this unknown kernel (, ) becomes ( − ) which will be seen to greatly simplify the situation in practice in terms of computations.
It should be mentioned that the previous interchange of the order in the linear expectation operation in the inner products needs to be done with care in the case of continuous applications with infinite limits.However for discrete applications with finite limits, the situation is much simpler and usually carried out without second thoughts.More discussion of those aspects can be found in, for example, the works of Dougherty [1] and Pugachev [2] with references in functional analysis.
The general nonlinear solution methodology will also briefly be overviewed in the Gaussian context with some of the practical limitations.These different formulations are discussed with emphasis on the properties of the covariance matrices and illustrated with some numerical examples and references.General recommendations are included for practical geoscience applications.

Fourier Transform Approaches
The Wiener-Hopf equations written as for unknown (, ) are trivial for white noise, that is, with   (,) ≡ (,) ≡ ( − ), the Dirac delta function, implying that (, ) ≡ ( − ) =   (, ) ≡   ( − ) in such cases.However, when the white noise has different intensities in different parts of the domain, this simple formulation cannot be used and the subsequent sections will deal with the more general situation.When the covariance kernel   (, ) can be written as   ( − ) meaning that the covariance is only dependent on the spacing between its arguments, then it can be diagonalized using a Fourier transform.The corresponding covariance matrix is Toeplitz; that is, its diagonals are constant in terms of − for row index  and column index  with gridded data.Notice that the discrete Fourier transform (DFT) of a finite Toeplitz matrix is not necessarily diagonal unless the matrix is circulant as will be illustrated in Section 6.
Explicitly using a Fourier matrix and hence in the case of stationarity, or for frequencies ; that is, obviously for  ∧  () ̸ = 0 assuming the symmetry and positive definiteness of   , in which  ⋅  * =  * ⋅  = .This means that the problem reduces to estimating frequency components of the () observations to infer the corresponding frequency components of the () process.Furthermore, in general, the transfer kernel or function  ∧ () is noncausal, as for each time , it uses all of the information from −∞ <  < ∞ in the Fourier transform.Hence in practical cases with only finite or even only past information, complications can be expected (see, e.g., the discussion in [3]).

Empirical Orthogonal Function Expansions
The covariance kernel   (, ) for the observations is generally assumed to be symmetric and positive definite for physical realizability.Its eigenfunctions are the exponential functions only in the stationary case as  *   (, ) is then diagonal and  *   (, ) = diag( ] ) or   (, ) =  −1  diag( ] ) for the corresponding eigenvalues  ] .The discrete Fourier transform application or orthogonal projection onto the exponential eigenfunctions can be generalized to other systems of orthogonal functions in the sense of Galerkin's orthogonal projection methodology for such functional operator systems.This will be illustrated in Section 6 using Haar wavelet transforms.
Hence for a spectrum of orthogonal eigenfunctions   () and corresponding eigenvalues   , usually ordered as can be used for an expansion of the random functions.The eigenfunction expansions of random functions or Karhunen-Loève transformed random variables then have a diagonal covariance kernel or matrix.Notice that, in finite dimensions, the expansion of the covariance matrix is simply the eigenvalue expansion which is a special application of the Singular Value Decomposition (SVD) (see, e.g., the works of Klema and Laub [4], Lewis [5], and Parlett [6]).The eigenfunction expansion of the covariance kernel is often referred to as Mercer's theorem in the mathematical literature.The eigenfunctions derived from the data covariance are also called empirical orthogonal functions (EOFs) and have proven most effective in signal data processing and in modeling dynamical systems such as in oceanography and related fields (see, e.g., Kim and Wu [7], North and Cahalan [8]).
The mathematical situation is summarized by the wellknown Karhunen-Loève theorem: Let () be a zero-mean random function on  with a weight function () ≥ 0 such that its covariance kernel satisfies there exist eigenvalues for all  such that {  ()} is a deterministic orthonormal system on  relative to the weight function () meaning that (ii) the corresponding generalized Fourier coefficients of () relative to these eigenfunctions are uncorrelated and Var[  ] =   ; (iii) () can then be represented as an eigenfunction expansion as in the mean-square sense.
This last orthogonal expansion of the random function () is known in the literature as a generalized Fourier expansion, a Karhunen-Loève expansion, or a canonical expansion.Also, the Fourier and Karhunen-Loève expansions are usually introduced in the functional analysis context of squareintegrable functions such as  2 (R) while the canonical expansion context is simply a decomposition of the covariance matrix or kernel (see, e.g., the works of Dougherty [1] for more discussion).
In practice when only the observational sequence {()} is known, the SVD can be used to obtain the eigenvectors of the corresponding covariance function without explicitly evaluating the covariance matrix.As different data matrices are generally rectangular in shape, different strategies are advisable for the SVD computations (see, e.g., the works of Björck [9] and Blais [10]).Furthermore, especially in multiple dimensions, the SVD and Principal Component Analysis (PCA) terminologies are not always consistent in the applied science and engineering literature largely because of related data array conventions and objectives.
Notice that when the given sequence {  ()} of squareintegrable functions are orthogonal, then the generated biorthogonal sequence {V  ()} is simply the original orthogonal sequence {  ()}.Furthermore, if the given sequence {  ()} happened to be the eigenfunction sequence {  ()} from   (, ), then the generated biorthogonal sequence {V  ()} would simply be the eigensequence {  ()} with normalizing coefficients being the corresponding eigenvalues   and the situation reducing to the previous orthogonal case with unit weight function () ≡ 1.
For a random function () over some domain  with zero-mean value and covariance kernel   (, ), let {V  (),  ∈ } denote a sequence of square-integrable functions such that {  ()} and {V  ()} are biorthogonal function sequences on .Then the following conditions are equivalent: (i) () is equal in the mean-square sense to its canonical expansion in terms of   and   (); that is, (ii)   (, ) has a canonical expansion in terms of   ,   (), and   () (iii) Var[()] has a canonical expansion in terms of   and   () More discussion and applications of these results can be found in, for example, [1].Again in some applications where an arbitrary orthogonal function sequence is available such as from previous measurements or observations, the biorthogonal basis function approach is generally advantageous over other approaches that do not make use of the prior information.

Nonstationary and Nonlinear Stochastic Analysis
Although most nonstationary stochastic processes are not amenable to linear analysis, there are exceptional cases where a simple variable transformation renders linear analysis possible.For instance, multiplicative noise contamination becomes additive "noise" under a logarithm transformation, and hence linear analysis can be done in the logarithmic space.Some stochastic processes become stationary under differentiation in continuous contexts or differencing in discrete contexts.For example, a (covariance stationary) process with an unknown linear trend as first moment becomes stationary under double differentiation or differencing as commonly done in Global Positioning System (GPS) data processing.Such processes are called increment stationary and their second moments remain stationary under linear operations of differentiation or differencing.Obviously only finitely many such operations are considered corresponding to the (finite) degree of the polynomial trend.
Assuming that () is increment stationary, then its expected value can be modeled using an algebraic polynomial   () =  0 +  1  +  2  2 + ⋅ ⋅ ⋅ +     with unknown coefficients  0 ,  1 ,  2 , . . .,   , and the previous Wiener-Hopf equations for unknown (, ) ⟨  (, ) ,  (, )⟩ =   (, ) have to be generalized to the so-called extended normal equations, for some Lagrange multiplier  for constrained optimization.Explicitly, using the terminology and conventions of regionalized variables in geostatistics, the linear prediction for ( 0 ) = ∑  =1   (  ), with an algebraic polynomial trend or drift model  0 +  1  + ⋅ ⋅ ⋅ +     for the function (), and denoting   = E[(  )  * (  )], the normal equations become This is the same extended form of normal equations encountered with empirical Radial Basis Functions (RBFs) instead of (generalized) covariance functions.Notice that when the mean is known and constant, the process can be centered with the Lagrange multiplier  set to zero, and these extended normal equations reduce to the standard Wiener-Hopf equations.Such an extended covariance matrix is easily invertible under minimal conditions as where  is sometimes called Schur's Complement of  in the extended matrix (see, e.g., the work of Kailath et al. [3] for more details).Hence under minimal conditions, these extended normal equations can be solved explicitly for the linear coefficients  ] and the trend polynomial coefficients.The inverse of a covariance matrix is usually called an information matrix.Also, generalized covariance functions in the sense of generalized functions are often advantageous for estimation in practical applications (see, e.g., the work of Blais [11] for more discussion with simulations).
Recalling that given observations {()}, the optimal general estimate (not necessarily linear) (41) for any linear or nonlinear estimator Ψ(()) can sometimes be evaluated directly using numerical simulations.In fact, using the previous conventions, ) ) as expected.Notice that formulating the preceding covariance matrix differently for the given covariance function Cov[] = 1/(1 +  2 ) would not imply the above diagonal matrix.Also, the scaling of this diagonal matrix would not be needed with normalized Fourier transforms.Again considering the synthetic covariance function for some real scalar , the plot of Cov[] for  = 0, 1, . . ., 512 (using  = 1 for simplicity) with corresponding power spectrum or Fourier transform of this covariance function is simply for frequencies  which is shown in Figure 1(a).Adding some noise amplification within −1 ≤  ≤ 1 in space or time, the covariance function becomes for the Heavyside (or step) function and scalar ; its spectrum becomes which is nondiagonal as shown in Figure 1(b) with ripples away from the diagonal.
does not imply much variation in the corresponding spectrogram (see Figure 1(d)).Similar results were obtained by Keller [13] with Haar wavelets and Zhang [12] using a slightly different orthogonal basis.Although the orthogonal Haar wavelet and similar transforms would seem most appropriate for such covariance matrices, they only succeed in making the covariance matrix sparse but not diagonal.The EOF approach does diagonalize any symmetric positive definite covariance matrix by the very nature of the eigenvector decomposition.Such eigenvector and eigenvalue decompositions are computationally demanding, but it is worth noting that the biorthogonal basis construction is computationally advantageous whenever some arbitrary basis is available, as mentioned above.
Next, assuming an unknown mean for the simulated stochastic process, the covariance matrix Cov[] can easily be modified such that the last column and row are 1's with 0 as the last diagonal element; the corresponding singular spectral matrix is shown in Figure 1(e).Using this modified covariance matrix Cov[], its DFT spectral matrix is shown in Figure 1(f) which shows anomalous effects due to the border modifications of Cov [𝑑].In other words, the DFT and FFT approaches are not appropriate with extended covariance matrices even with equispaced data.This is obviously an important result for applications of extended covariance and RBF normal matrices in geostatistics (see, e.g., the works of Chilès and Delfiner [14]).

Concluding Remarks
Stochastic time series are very common in geoscience, and better analytical tools and procedures for their modeling and filtering warrant more research and development.Multiresolution analysis and synthesis of sequences of observations and measurements are needed for all kinds of applications in geoscience and other fields.
DFTs and FFTs are normally used in stationary processes with equispaced data.However in practice, nonstationarities are often unavoidable, and even in increment stationary cases with equispaced data, DFTs and FFTs are not appropriate.Galerkin's projection methods especially with EOFs are more general and unfortunately more computationally demanding.However in some applications, only the leading spectral modes are necessary for analysis and inference thus reducing the necessary computational efforts.
For simple prediction applications in stationary cases, the normal equations can be solved using Cholesky's Square-Root algorithm which has well-known numerical properties; while in increment stationary cases, as well as various RBF formulations, some Cholesky and Givens reduction procedure can be used (see, e.g., the work of Blais [10]).For more in-depth numerical analysis, the SVD and KLT are most recommendable in general and visualization tools are very helpful in the structural analysis of covariance and information matrices.