A Note on Wavelet Estimation of the Derivatives of a Regression Function in a Random Design Setting

We investigate the estimation of the derivatives of a regression function in the nonparametric regression model with random design. New wavelet estimators are developed. Their performances are evaluated via the mean integrated squared error. Fast rates of convergence are obtained for a wide class of unknown functions.

In the literature, various estimation methods have been proposed and studied.The main ones are the kernel methods (see, e.g., [1][2][3][4][5]), the smoothing splines, and local polynomial methods (see, e.g., [6][7][8][9]).The object of this note is to introduce new efficient estimators based on wavelet methods.Contrary to the others, they have the benefit of enjoying local adaptivity against discontinuities thanks to the use of a multiresolution analysis.Reviews on wavelet methods can be found in, for example, Antoniadis [10], Härdle et al. [11], and Vidakovic [12].To the best of our knowledge, only Cai [13] and Petsa and Sapatinas [14] have proposed wavelet estimators for  () from (1) but defined with a deterministic equidistant design; that is,   = /.The consideration of a random design complicates significantly the problem and no wavelet estimators exist in this case.This motivates our study.
In the first part, assuming that  is known, we propose two wavelet estimators: the first one is linear nonadaptive and the second one nonlinear adaptive.Both use the approach of Prakasa Rao [15] initially developed in the context of the density estimation problem.Then we determine their rates of convergence by considering the mean integrated squared error (MISE) and assuming that  () belongs to Besov balls.In a second part, we develop a linear wavelet estimator in the case where  is unknown.It is derived from the one introduced by Pensky and Vidakovic [16] considering the estimation of  (0) =  from (1).We evaluate its rate of convergence again under the MISE over Besov balls.The obtained rates of convergence are similar to those attained by wavelet estimators for the derivatives of a density (see, e.g., [15,17,18]).
The organization of this note is as follows.The next section describes some basics on wavelets and Besov balls.Our estimators and their rates of convergence are presented in Section 3. The proofs are carried out in Section 4.

Preliminaries
This section is devoted to the presentation of the considered wavelet basis and the Besov balls.

Besov Balls.
We consider the following wavelet sequential definition of the Besov balls.We say that ℎ ∈   , () with  > 0,  ≥ 1,  ≥ 1, and  > 0 if there exists a constant  > 0 such that  , and  , (6) satisfy with the usual modifications if  = ∞ or  = ∞.The interest of Besov balls is to contain various kinds of homogeneous and inhomogeneous functions ℎ.For particular choices of , , and ,   , () correspond to standard balls of function spaces, as the Hölder and Sobolev balls (see, e.g., [11,22]).

Results
In this section, we set the assumptions on the model, present our wavelet estimators, and determine their rates of convergence under the MISE over Besov balls.
(K1) We have  () (0) =  () (1) = 0 for any  ∈ {0, . . ., }.(K2) There exists a constant  1 > 0 such that sup (K3) There exists a constant  2 > 0 such that (K4) There exists a constant  3 > 0 such that sup 3.2.Wavelet Estimators: When  Is Known.We consider the wavelet basis B with  > 5 to ensure that  and  belong to Linear Wavelet Estimator.We define the linear wavelet estimator f() where and  0 is an integer chosen a posteriori.The definition of ĉ() , is motivated by the following unbiased property: using the independence between  1 and  1 , E( 1 ) = 0, and  integrations by parts with (K1), we obtain which is the wavelet coefficient of  () associated with  , .
International Journal of Mathematics and Mathematical Sciences 3 This approach was initially introduced by Prakasa Rao [15] for the estimation of the derivatives of a density.Its adaptation to (1) gives a suitable alternative to the wavelet methods developed by Cai [13] and Petsa and Sapatinas [14] in the case   = /, specially in the treatment of the random design.
Note that, for the standard case  = 0, this estimator has been considered and studied in Chesneau [23].
Theorem 1 investigates the rate of convergence attained by f() under the MISE assuming that  () belongs to Besov balls.
In the rest of the study, the rate of convergence  −2/(2+2+1) will be taken for benchmark.However, we do not claim that it is the optimal one in a minimax sense; the lower bounds are not determined.However, from some logical considerations, it is a serious candidate.
Hard Thresholding Wavelet Estimator.We define the hard thresholding wavelet estimator f() ∈ [0, 1], where ĉ() , is defined by (12), 1 is the indicator function,  > 0 is a large enough constant,  1 is the integer satisfying and The construction of f() 2 is an adaptation of the hard thresholding wavelet estimator introduced by Delyon and Juditsky [24] to the estimation of  () from (1).It used the modern version developed by Chaubey et al. [25].The advantage of f() is adaptive; thanks to the thresholding in (17), its performance does not depend on the knowledge of the smoothness of  () .The second thresholding in (17) enables us to relax some assumptions on the model, and, in particular, to only suppose E( 2 1 ) < ∞ on  1 (its density can be unknown).Basics and important results on hard thresholding wavelet estimators can be found in, for example, Donoho and Johnstone [26,27], Donoho et al. [28,29], and Delyon and Juditsky [24].
Theorem 2 determines the rate of convergence attained by f() 2 under the MISE assuming that  () belongs to Besov balls.

Theorem 2. Suppose that (K1), (K2), and (K3) are satisfied and that 𝑓
2 be defined by (16).Then there exists a constant  > 0 such that The proof is based on a general result proved by [25, Theorem 6.1].Let us observe that, for the case  ≥ 2, (ln /) 2/(2+2+1) is equal to the rate of convergence attained by f() 1 up to a logarithmic factor (see Theorem 1).However, for the case  ∈ [1, 2), it is significantly better in terms of power.

Wavelet Estimators:
When  Is Unknown.In the case where  is unknown, we propose the linear wavelet estimator f() where = [/2],  2 is an integer chosen a posteriori,  2 refers to (K3), and ĝ is an estimator of  constructed from the random variables   = (   +1 , . . .,   ).For instance, we can consider the linear wavelet estimator ĝ by where and  3 is an integer chosen a posteriori.

International Journal of Mathematics and Mathematical Sciences
The estimator f() is close to the "NES linear wavelet estimator" proposed by Pensky and Vidakovic [16] for  = 0.However, there are notable differences in the thresholding in (21), the partitioning of the variables, and the definition of ĝ, making the study of its performance under the MISE more simpler (see the proofs of Theorem 3 below).
Theorem 3 determines an upper bound of the MISE of f() and then exhibits its rate of convergence when  ()  belongs to Besov balls.Theorem 3. Suppose that (K1), (K2), and (K3) are satisfied and that 3 be defined by (20).Then there exists a constant  > 0 such that with with the estimator ĝ defined by (22) with  3 such that Then there exists a constant  > 0 such that The first point of Theorem 3 is proved for any estimator ĝ of  depending on   .Taking ĝ = , it corresponds to the upper bound of the MISE for f() 1 established in the proof of Theorem 1.Note that the rate of convergence described in the second point is slower to the one attained by f() 1 (see Theorem 1).The fact that the smoothness of  influences the performance of ĝ and, a fortiori, f() 3 seems natural.This phenomenon also appears in [16, Theorem 2.1], for  = 0. Remark 4. If  2 exists but is unknown, we can define f() 3 as (20) with 1/ ln  instead of  2 in the threshold of (21).The impact of this modification is a logarithmic term in Theorem 3; that is, Moreover, choosing  2 such that there exists a constant  > 0 such that Remark 5. Note that the assumption (K4) has been only used in the second point of Theorem 3.

Conclusion and Perspectives.
We explore the estimation of  () from (1).Distinguishing the cases where  is known or not, we propose wavelet methods and prove that they attain fast rates of convergence under the MISE assuming that  () ∈   , ().Perspectives of this work are (i) to develop an adaptive wavelet estimator, as the hard thresholding one, for the estimation of  () in the case where  is unknown; (ii) to relax assumptions on the model.Indeed, several techniques exist to relax (K3); that is,  has potential zeros.See, for example, Kerkyacharian and Picard [30], Gaïffas [31], and Antoniadis et al. [32].However, their adaptations to the estimation of  () are more difficult than they appear at first glance; (iii) to consider dependent ( 1 ,  1 ), . . ., (  ,   ).
These aspects need further investigations that we leave for a future work.

Proofs
In this section,  denotes any constant that does not depend on , , and .Its value may change from one term to another and may depend on  or .

2 International
Journal of Mathematics and Mathematical Sciences 2.1.Wavelet Basis.We set