On the adaptive estimation of a multiplicative separable regression function

We investigate the estimation of a multiplicative separable regression function from a bi-dimensional nonparametric regression model with random design. We present a general estimator for this problem and study its mean integrated squared error (MISE) properties. A wavelet version of this estimator is developed. In some situations, we prove that it attains the standard unidimensional rate of convergence under the MISE over Besov balls.

We consider the bi-dimensional nonparametric regression model with random design described as follows.Let (Y i , U i , V i ) i∈Z be a stochastic process defined on a probability space (Ω, A, P), where (ξ i ) i∈Z is a strictly stationary stochastic process, (U i , V i ) i∈Z is a strictly stationary stochastic process with support in [0, 1] 2 and h : [0, 1] 2 → R is an unknown bivariate regression function.It is assumed that E(ξ 1 ) = 0, E(ξ 2 1 ) exists, (U i , V i ) i∈Z are independent, (ξ i ) i∈Z are independent and, for any i ∈ Z, (U i , V i ) and ξ i are independent.In this study, we focus our attention on the case where h is a multiplicative separable regression function: there exist two functions f : [0, 1] → R and g : [0, 1] → R such that h(x, y) = f (x)g(y). (2) Laboratoire de Mathématiques Nicolas Oresme, Université de Caen Basse-Normandie, Campus II, Science 3, 14032 Caen, France.E-mail: chesneau@math.unicaen.fr We aim to estimate h from the n random variables: (Y 1 , U 1 , V 1 ), . . ., (Y n , U n , V n ).This problem is plausible in many practical situations as in utility, production, and cost function applications.See, e.g., Linton and Nielsen (1995), Yatchew and Bos (1997), Pinske (2000), Lewbel and Linton (2007) and Jacho-Chávez et al. (2010).
In this paper, we provide a theoretical contribution to the subject by introducing a new general estimation method for h.A sharp upper bound for its associated mean integrated squared error (MISE) is proved.Then we adapt our methodology to propose an efficient and adaptive wavelet procedure.It is based on two wavelet thresholding estimators having the features to be adaptive for a wide class of unknown functions and enjoying nice MISE properties.Further details on such wavelet estimators can be found in, e.g., Antoniadis (1997), Vidakovic (1999) and Härdle et al. (1998).Despite the so-called "curse of dimension" coming from the bi-dimensionality of (1), we prove that our wavelet estimator attains the standard unidimensional rate of convergence under the MISE over Besov balls (for both the homogeneous and inhomogeneous zones).It completes asymptotic results proved by Linton and Nielsen (1995) via non adaptive kernel methods for the structured nonparametric regression model.
The paper is organized as follows.Assumptions on (1) and some notations are introduced in Section 2. Section 3 presents our general MISE result.Section 4 is devoted to our wavelet estimator and its performances in terms of rate of convergence under the MISE over Besov balls.Technical proofs are collected in Section 5.

Assumptions and notations
For any p ≥ 1, we set We set (provided that they exist).We formulate the following assumptions.
(H1) There exists a known constant (H3) The density of (U 1 , V 1 ), denoted by q, is known and there exist two constants c 3 > 0 and C 3 > 0 such that (H4) There exists a known constant ω > 0 such that The assumptions (H1) and (H2), involving the boundedness of h, are standard in nonparametric regression models.The knowledge of q discussed in (H3) is restrictive but plausible in some situations, the most common case being (U 1 , V 1 ) ∼ U([0, 1] 2 ) (the uniform distribution on [0, 1] 2 ).Finally, mention that (H4) is just a technical assumption more realistic to the knowledge of e o and e * (depending on f and g respectively).

MISE result
Theorem 1 presents an estimator for h and shows an upper bound for its MISE.
Then there exists a constant C > 0 such that The form of h (3) is derived to the multiplicative separable structure of h (2) and a ratio-type normalization.Other results about such ratio-type estimators in a general statistical context can be found in Vasiliev (2012).
Based on Theorem 1, ĥ is efficient for h if and only if f is efficient for f e * and g is efficient for ge o in terms of MISE.This result motivates the investigation of wavelet methods enjoying adaptivity for a wide class of unknown functions and having optimal properties under the MISE.For details on the interests of wavelet methods in nonparametric statistics, we refer to Antoniadis (1997), Vidakovic (1999) and Härdle et al. (1998).

Adaptive wavelet estimation
Before introducing our wavelet estimators, let us present some basics on wavelets.

Wavelet basis on [0,1]
Let us briefly recall the construction of wavelet basis on the interval [0, 1] introduced by Cohen et al. (1993).Let N be a positive integer, φ and ψ be the initial wavelets of the Daubechies orthogonal wavelets db2N .We set With appropriate treatments at the boundaries, there exists an integer τ satisfying 2 τ ≥ 2N such that the collection where α j,k and β j,k are the wavelet coefficients of v defined by

Besov balls
For the sake of simplicity, we consider the sequential version of Besov balls defined as follows.Let M > 0, s > 0, p ≥ 1 and r ≥ 1.A function v belongs to B s p,r (M ) if and only if there exists a constant M * > 0 (depending on M ) such that the associated wavelet coefficients (4) satisfy In this expression, s is a smoothness parameter and p and r are norm parameters.For a particular choice of s, p and r, B s p,r (M ) contains the Hölder and Sobolev balls.See, e.g., Devore and Popov (1988), Meyer (1992) and Härdle et al. (1998).
We focus our attention on wavelet hard thresholding estimators for f and g in (3).They are based on a term-by-term selection of estimators of the wavelet coefficients of the unknown function.Those which are greater to a threshold are kept, the other are removed.This selection is the key to the adaptivity and the good performances of the hard wavelet estimators.See, e.g., Donoho et al. (1996), Delyon and Juditsky (1996) and Härdle et al. (1998).
Estimator f for f e * .We define the hard thresholding estimator f by where )) and Estimator g for ge o .We define the hard thresholding estimator g by where )) and Estimator for h: From f (5) and g (6), we consider the following estimator for h (2): where and ω refers to (H4).Let us mention that h is adaptive in the sense that it does not depend on f or g in its construction.
Remark 2 The calibration of the parameters in f and g is based on theoretical considerations; thus defined, f and g can attain a fast rate of convergence under the MISE over Besov balls.See (Chaubey et al. 2013, Theorem 6.1).Further details are given in the proof of Theorem 2.

Rate of convergence
Theorem 2 investigates the rate of convergence attains by ĥ under the MISE over Besov balls.
The rate of convergence (ln n/n) 2s * /(2s * +1) is the near optimal one in the minimax sense for the unidimensional regression model with random design under the MISE over Besov balls B s * p,r (M ).See, e.g., Tsybakov (2004) and Härdle et al. (1998).In this sense, Theorem 2 proves that our estimator escapes to the so-called "curse of dimension".Such a result is not possible with the standard bi-dimensional hard thresholding estimator attaining the rate of convergence (ln n/n) 2s/(2s+d) with d = 2 under the MISE over bi-dimensional Besov balls defined with s as smoothness parameter.See Delyon and Juditsky (1996).
Theorem 2 completes asymptotic results proved by Linton and Nielsen (1995) investigating this problem for the structured nonparametric regression model via another estimation method based on non adaptive kernels.
Remark 4 Note that Theorem 2 does not require the knowledge of the distribution of ξ 1 ; {E(ξ 1 ) = 0 and the existence of E(ξ 2 1 )} is enough.

Proofs
In this section, for the sake of simplicity, C denotes a generic constant; its value may change from one term to another.