A Note on the Adaptive Estimation of a Conditional Continuous-Discrete Multivariate Density by Wavelet Methods

The estimation of conditional densities is an important statistical challenge with applications in many practical problems, especially those connectedwith forecasting (economics, etc.). There is a vast literature in this area. We refer to the papers of Li and Racine [1], Akakpo and Lacour [2], and Chagny [3] and the references therein. In this paper we focus our attention on a specific problem: the estimation of a multivariate continuous-discrete conditional density. The considered model is described as follows. Let d, d∗, ], and n be positive integers and let (X1,Y1), . . . , (Xn,Yn) be n iid random vectors defined on the probability space (Ω,A, P). We suppose that X1 is continuous with support [0, 1] d and that Y1 is discrete with support {0, 1, . . . , ]}∗ . Let f be the density of (X1,Y1). We define the density function of X1 conditionally to the event {Y = m} by


Introduction
The estimation of conditional densities is an important statistical challenge with applications in many practical problems, especially those connected with forecasting (economics, etc.).There is a vast literature in this area.We refer to the papers of Li and Racine [1], Akakpo and Lacour [2], and Chagny [3] and the references therein.In this paper we focus our attention on a specific problem: the estimation of a multivariate continuous-discrete conditional density.The considered model is described as follows.Let ,  * , ], and  be positive integers and let (X 1 , Y 1 ), . . ., (X  , Y  ) be  iid random vectors defined on the probability space (Ω, A, ).We suppose that X 1 is continuous with support [0, 1]  and that Y 1 is discrete with support {0, 1, . . ., ]}  * .Let  be the density of (X 1 , Y 1 ).We define the density function of X 1 conditionally to the event {Y = m} by (x, m) ∈ [0, 1]  × {0, 1, . . ., ]}  * .We aim to estimate (x, m) from (X 1 , Y 1 ), . . ., (X  , Y  ).The most common approach is based on the kernel methods developed by Li and Racine [4].
Applications and recent developments for these methods are described in detail in Li and Racine [1].
In this paper we develop a new estimator ĝ(x, m) based on wavelet methods.It is now an established fact that, in comparison to kernel methods, wavelet methods have the advantage to achieve a high degree of adaptivity for a large class of unknown functions, with possible complex discontinuities (jumps, spikes, etc.).See, for instance, Antoniadis [5], Härdle et al. [6], and Vidakovic [7].This fact motivates our interest to develop wavelet methods for the considered conditional density estimation problem.The main ingredients in the construction of ĝ(x, m) are an estimation of (x, m) with a new wavelet estimator f(x, m), an estimation of (Y 1 = m) by an empirical estimator, and a global thresholding technique developed by Vasiliev [8].In particular, the considered estimator f(x, m) can be viewed as a multivariate (but "nonsmooth") version of the one introduced in the univariate case, that is,  =  * = 1, in Chesneau et al. [9].We prove that ĝ(x, m) is both adaptive and efficient; it is not dependent on the smoothness of (x, m) in its construction and, under mild assumptions on the smoothness of (x, m) (we assume that it belongs to a wide class of functions, the so-called Besov balls), it attains fast rates of convergence under the L  risk (with  ≥ 1).These theoretical guarantees are illustrated 2 Chinese Journal of Mathematics by a numerical study showing the good practical performance of our estimator.
The remainder of this paper is set out as follows.Next, in Section 2, we briefly describe the considered multidimensional wavelet bases and Besov balls.Our wavelet estimator and some of its theoretical properties are presented in Section 3. A short numerical study can be found in Section 4. Finally, the proofs are postponed to Section 5.

Conditional Density Estimation
We formulate the following assumptions.
(B2) There exists a known constant  ∈ (0, 1) such that We propose the following "ratio-thresholding estimator" ĝ(x, m) for (x, m): , where 1 denotes the indicator function,  refers to the constant in (B2), and f(x, m) is defined by where is a large enough constant and  1 is an integer such that / ln() ≤ 2  1  ≤ 2/ ln(), and ρm is defined by The estimator (10) uses a hard thresholding technique of the wavelet coefficients estimators (12).Such a selection rule is at the heart of the adaptive nature of wavelet methods which have the ability to capture the most important wavelet coefficients of a function, that is, those with the high magnitudes.
We refer to Antoniadis [5], Härdle et al. [6], and Vidakovic [7] for further details.The definition of the threshold, that is,   = √ln()/, corresponds to the universal one proposed by Donoho and Johnstone [14] and Donoho et al. [15].It is based on technical considerations ensuring good convergence properties of the hard thresholding wavelet estimator (see also Theorem A.3 in Appendix).
Note that (10) can be viewed as a nonsmooth multivariate version of the estimator proposed by Chesneau et al. [9].The main advantage of this estimator is to be more easy to implement from a practical point of view (see Section 4 below for a numerical comparison in the univariate case).Concerning ρ , let us mention that it is a natural unbiased estimator for (Y 1 = m) with nice convergence properties.They will be used in the proof of our main result.
The global construction of ( 9) follows the idea proposed by Vasiliev [8] for other statistical contexts.Note that a control on the lower bound of ρm is necessary; it must be large enough to ensure good statistical properties for (9).
The following result investigates the rates of convergence attained by (9) under the L  risk with  ≥ 1. 1), and let ĝ(x, m) be defined by ( 9) with a large enough  (the exact condition is described in ( 29)).Suppose that (B1) and (B2) hold and that (x, m) ∈ B  , () with  > 0,  > /,  ≥ 1, and  ≥ 1.Then there exists a constant  > 0 such that, for  being large enough, where The proof of Theorem 1 is based on several technical inequalities and the application of a general result derived from [16, Theorem 5.1] and [17, Theorem 1] (see Theorem A.3 in Appendix).
Theorem 1 provides theoretical guarantees on the convergence of (9) under mild assumptions on the smoothness of (x, m) and a fortiori (x, m) under the L  risk.The obtained rates of convergence are sharp.However, since the lower minimax bounds are not established in our setting, we do not claim that they are the optimal ones in the minimax sense.An important benchmark is that they correspond to the optimal ones in the minimax sense for the standard multivariate density estimation problem, corresponding to  * = 1 and Y 1 , is constant almost surely, up to a logarithmic term (see [15]).
Finally, note that the factor  * plays a secondary role in our study; it only appears in the presentation of the model and the construction of ρm and its performance does not depend on the value of  * .

A Short Numerical Study
In this section we investigate some practical aspects of our wavelet methods.For the sake of simplicity, we focus our attention on the univariate case, that is,  =  * = 1 (so x = , m = , Y 1 = , etc.).The codes are written in MATLAB and are adapted from Ramirez and Vidakovic [18].First we compare the performance of new estimators of density functions (, ) with those proposed in our former publication, Chesneau et al. [9], in two styles, accuracy and speed of computation.In order to illustrate the rate of decrease of errors, as Chesneau et al. [9], we employ the indicator defined by where  and  are sample size and the number of replications, respectively,  represents the true density, and f is an estimator.We consider three estimators based on our statistical methodology: the linear wavelet estimator; that is, f (, ) = ∈ [0, 1], the hard thresholding wavelet estimator defined by (10), and the smooth version of the linear wavelet estimator after local linear regression (see, e.g., [19]).The practical construction of this smooth version of linear wavelet estimators was proposed by Ramirez and Vidakovic [18].Several studies confirm that this version of estimators has nice performance in different fields (see, e.g., [20,21]).We adopt similar setup from Chesneau et al. [9] for our example; that is, we use Daubechies's compactly supported "Daubechies 3" and we take  0 = 6.Also, we generate different sample sizes  = 20, 50, 100, 200, 500, and 1000 data points  1 , . . .,   , from Beta(2, 3) distribution.The discrete random sample is generated from Binomial(1,   ); the bivariate density function is (, ) ∈ [0, 1] × {0, 1}.On the other hand, Table 2 depicts the speed of computation for two groups of estimators in seconds.The codes are run with an ordinary laptop with 4.3 RAM.As we see the speed of new version of estimators is much less than the former.For example, when the sample size is 1000, the speed of computation is about 200 times less than the former version of wavelets estimators of densities.These differences will be much bigger when the sample size increases.
In the second part of this section we show the performance of proposed estimators of conditional density functions.Note that the conditional density function in the above examples satisfies   ∈ [0, 1].Figures 1 and 2 depict (, 0) and (, 1), respectively.In each case the true conditional density function is shown in black line, the linear wavelet estimator is blue, the hard thresholding wavelet estimator is red, and the smooth version of linear one is green.All the figures illustrate the good performance of our proposed linear and nonlinear estimators of conditional density functions.It should be reminded that the hard thresholding one has no tuning parameter; it is entirely adaptive.The smooth version of our wavelet linear estimator has the best performance.Furthermore, Table 3 represents the impact of sample size on performance of our estimators.This table also compares the performance of three estimators.The number of replications is 500.As the sample size increases the value of indicator decreases and the performance of smooth version of linear wavelet estimators is the best.

Proof of Theorem 1
In what follows,  denotes any constant that does not depend on , k, and .Its value may change from one term to another.
Owing to (B2 where Let us now bound  and  in turn. Upper Bound for .We investigate an upper bound for  by using Theorem A.3 in Appendix.First of all, thanks to (B1) implying that (x, m) ∈ L 2 ([0, 1]  ), let us expand the density (x, m) on the considered wavelet basis: where We prove similarly that (ĉ ,k (m)) =  ,k (m).

Investigation of (C1
This completes the proof of Theorem 1.

4Figure 1 :4
Figure 1: The true conditional density function (, 0) is shown in black line, the wavelet linear estimator is blue, the wavelet hard thresholding estimator is red, and its smooth version is green with  = 500.

Figure 2 :
Figure 2: The true conditional density function (, 1) is shown in black line, the wavelet linear estimator is blue, the wavelet hard thresholding estimator is red, and its smooth version is green with  = 500.

Table 1 :
Computed values for  2 -Norm for various sample sizes.

Table 1
[9]es the value of  2 -Norm computed from 100 simulations for different sample sizes.This table should be compared with Table1in page 70 in Chesneau et al.[9].As we see, similar results could be obtained;  2 -Norm decreases while the sample size increases.The performance of the smooth version of linear wavelet estimator is the best.As we see there is no significant difference between the new version of estimators with former versions in Chesneau et al.[9].

Table 3 :
Computed values for 100  2 -Norm for various sample sizes.