We find asymptotically sufficient statistics that could help simplify inference in nonparametric regression problems with correlated errors. These statistics are derived from a wavelet decomposition that is used to whiten the noise process and to effectively separate high-resolution and low-resolution components. The lower-resolution components contain nearly all the available information about the mean function, and the higher-resolution components can be used to estimate the error covariances. The strength of the correlation among the errors is related to the speed at which the variance of the higher-resolution components shrinks, and this is considered an additional nuisance parameter in the model. We show that the NPR experiment with correlated noise is asymptotically equivalent to an experiment that observes the mean function in the presence of a continuous Gaussian process that is similar to a fractional Brownian motion. These results provide a theoretical motivation for some commonly proposed wavelet estimation techniques.

A nonparametric regression (NPR) problem consists of estimating an unknown mean function that smoothly changes between observations at different design points. There are

Brown and Low [

All of these results assume that the errors

Our approach is motivated by the work by Johnstone and Silverman [

The nonparametric regression experiment

This will be proven in two steps. First, Lemma

Furthermore, in both experiments the lower-frequency terms in the wavelet decomposition are sufficient for estimating the means, allowing the higher-frequency terms to be used to give information about the variance process. This leads to Theorem

The NPR experiment

The experiment

For

This theorem can be seen as an extension of Carter [

Wang [

Lemma

Instead of focusing on single estimation techniques, we will consider approximations of the entire statistical experiment. For large sample sizes, there is often a simpler statistical experiment that can approximate the problem at hand. One benefit of finding an approximating experiment is that it may have convenient sufficient statistics even when they are not available in the original experiment.

Our approximations will therefore be of

The NPR experiment will be approximated using Le Cam's notion of asymptotically equivalent experiments [

Asymptotic sufficiency is a stronger notion, where if

Le Cam's asymptotic equivalence is characterized using the total-variation distance

We will use orthonormal wavelet bases to characterize the function space and to simplify the covariance structure of the errors.

Assuming we are considering periodic functions on the interval

The mean functions

These results rely on a specific structure to the covariance matrix of the errors in the NPR experiment. As by Johnstone [

Traditionally, the asymptotics of the NPR experiment have assumed independent noise. This white-noise model is especially convenient because all of the eigenvalues of the covariance operator are equal. Thus, any orthonormal basis generates a set of independent standard normal coefficients. With a more general covariance function, the eigenvalues are different and only particular decompositions lead to independent coefficients. Thus there is much less flexibility in the choice of basis, and this basis determines some of the structure of the covariance.

Following Johnstone [

We will assume that there exists an orthonormal basis

This is a convenient form for the error, but not completely unrealistic. Wavelet decompositions nearly whiten the fractional Brownian motion process. Wornell [

Section

A well-established method for estimating the parameter

The assumptions in Theorem

The first step in the proof of Theorem

The experiment

In the Gaussian sequence experiment

We want to approximate this experiment

By (

Theorem

We suppose that we have

The expected value of this transformed vector is

In the original NPR experiment, the variances are

If the mean function

This lemma essentially goes back to the original work of Mallat [

The NPR observations are such that the covariance matrix of

A standard calculation bounds the difference between the means when

The covariance matrix is a positive definite matrix such that

The theorem follows from the fact that the observations

Furthermore, by Lemma

This result is restrictive in that it requires a specific known covariance structure. We are working under the assumption that the covariance matrix has eigenfunctions that correspond to a wavelet basis. This does not generally lead to a typical covariance structure. It does not even necessarily lead to a stationary Gaussian process; see the Haar basis example below.

The difficulty is that the requirement for having asymptotically equivalent experiments is quite strict, and the total variation distance between the processes with even small differences in the structure of the covariance is not negligible. For two multivariate Guassian distributions with the same means but where one covariance matrix is

If the correlation between the highest level coefficients is

Thus, the difference

The key limitation of Theorem

By Carter [

Flexibility with regard to the covariance structure is added by allowing the magnitude of the

Specifically, the experiment

This experiment

The theorem can be proven by applying Lemma

The first step is to decompose the nonparametric regression into a set of wavelet coefficients. The

The key strategy is to break the observations from this wavelet composition into pieces starting at level

For each resolution level with

The error in approximating

This

The distance between

In the experiment

Therefore, we can compare this experiment

We can improve this approximation by replacing the estimators

Finally, we create a continuous Gaussian version of the

All that is left to do is to choose the level

We need a bound on the distance between two multivariate normal distributions with different means in order to bound the error in many of our approximations.

For shifted Gaussian processes, the total-variation distance between the distributions is

For the Gaussian process with correlated components, we will assume that the variance of each wavelet coefficient is of the form

In order to expand our asymptotically sufficient statistics out into a continuous Gaussian experiment, we need a bound on the total-variation distance between

For two normal distributions with the same means

To bound the expected value of the divergence in (

Via elementary linear algebra calculations we get that for

A simple bound of

Thus,

If we add up these errors over the

The Haar basis is a simple enough wavelet basis by which we can make some explicit calculations of the properties of the error distribution. We will show that the resulting errors

The scaling functions for the Haar basis are constant on

The formula for synthesizing the scaling function coefficients

Using the covariance structure described above, the variance of

To find the covariance between two variables

This work is supported by NSF Grant no. DMS-08-05481.