Bandwidth Selection for Recursive Kernel Density Estimators Defined by Stochastic Approximation Method

. We propose an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm introduced by Mokkadem et al. (2009a). We showed that, using the selected bandwidth and thestepsizewhichminimizetheMISE(meanintegratedsquarederror)oftheclassoftherecursiveestimatorsdefinedinMokkadem etal.(2009a),therecursiveestimatorwillbebetterthanthenonrecursiveoneforsmallsamplesettingintermsofestimationerror andcomputationalcosts.Wecorroboratedthesetheoreticalresultsthroughsimulationstudy.


Introduction
The problem of automatic choice of smoothing parameters has been widely studied.There are many reasons to use an automatic choice of smoothing.One is in many situations the smoothing which is used by nonexperts.In this paper we focus only on one-dimensional kernel density estimation.The main ideas are useful in all types of nonparametric curve estimation, including regression, distribution, and time series.The bandwidth selection methods studied in the literature can be divided into two broad classes: the crossvalidation techniques and the plug-in ideas.
There are many varieties of the technique crossvalidation: pseudolikelihood cross-validation [1], least squares cross-validation [2], and biased cross-validation [3].Reviews of all these bandwidth selection methods can be found in Marron [4].
Plug-in methods [5], also called "second generation methods" [6], need to use a pilot bandwidth to estimate the unknown quantities.For a choice of pilot bandwidth, a number of approaches have been proposed; see Jones et al. [7] for details and references.An interesting approach to choose the pilot bandwidth is the smoothed bootstrap [8].In this paper, we developed a specific second generation bandwidth selection method of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm introduced by Mokkadem et al. [9].
Let  1 , . . .,   be independent, identically distributed random variables and let  denote the probability density of  1 .To construct a stochastic algorithm, which approximates the function  at a given point , Mokkadem et al. [9] define an algorithm of search of the zero of the function ℎ :  → () − .Following Robbins-Monro's procedure, this algorithm is defined by setting  0 () ∈ R, and, for all  ≥ 1, where   () is an "observation" of the function ℎ at the point  −1 () and the stepsize (  ) is a sequence of positive real numbers that goes to zero.To define   (), Mokkadem et al. [9] follow the approach of Révész [10,11] and of Tsybakov [12] and introduce a kernel  (i.e., a function satisfying ∫ R () = 1), a bandwidth (ℎ  ) (i.e., a sequence of positive real numbers that goes to zero), and set Then, the estimator   to recursively estimate the density function  at the point  can be written as 2

Journal of Probability and Statistics
This estimator was introduced by Mokkadem et al. [9] whose large and moderate deviation principles were established by Slaoui [13].
Throughout this paper, we suppose that  0 () = 0 and we let Π  = ∏  =1 (1 −   ); then it follows from (2) that one can estimate  recursively at the point  by Moreover, it was shown in Mokkadem et al. [9] that the bandwidth which minimizes the MISE of   depends on the choice of the stepsize (  ); they show in particular that the sequence (  ) = ( −1 ) belongs to this set, under some conditions of regularity of , and they show that the bandwidth (ℎ  ) must equal ) The first aim of this paper is to propose an automatic selection of such bandwidth through a plug-in method, and the second aim is to give the conditions under which the recursive estimator   will be better than the nonrecursive kernel density estimator introduced by Rosenblatt [14] (see also Parzen [15]) and defined as The simulation results given in Section 3 are corroborating these theoretical results.The remainder of the paper is organized as follows.In Section 2, we state our main results.Section 3 is devoted to our simulation results.We conclude the paper in Section 4.

Assumptions and Main Results
We define the following class of regularly varying sequences.
Definition 1.Let  ∈ R and let (V  ) ≥1 be a nonrandom positive sequence.One says that Condition (6) was introduced by Galambos and Seneta [16] to define regularly varying sequences (see also Bojanic and Seneta [17]) and by Mokkadem and Pelletier [18] in the context of stochastic approximation algorithms.Note that the acronym GS stands for [16].Typical sequences in GS() are, for  ∈ R,   (log )  ,   (log log )  , and so on.
The assumptions to which we will refer are as follows.
Assumption (A2)(iii) on the limit of (  ) as  goes to infinity is usual in the framework of stochastic approximation algorithms.It implies in particular that the limit of ([  ] −1 ) is finite.Throughout this paper we will use the following notations: In order to measure the quality of our recursive estimator (3), we use the following quantity: Moreover, in the case  = /5, Proposition 1 in Mokkadem et al. [9] shows that and that Then The following corollary ensures that the bandwidth which minimizes the MISE * depends on the stepsize (  ) and then the corresponding MISE * depends also on the stepsize (  ).

(15)
Moreover, the minimum of  2 0 ( 0 − 2/5) −6/5 is reached at  0 = 1; then the bandwidth (ℎ  ) must equal ) and one then has ) In order to estimate the optimal bandwidth (16), we must estimate  1 and  2 .We followed the approach of Altman and Léger [19], which is called the plug-in estimate, and we use the following kernel estimator of  1 : where   is a kernel and  is the associated bandwidth.
In practice, we take (see Silverman [20]) with ŝ the sample standard deviation and  1 ,  3 denoting the first and third quartiles, respectively.
The following theorem gives the bias and variance of Î1 .
Theorem 4. Let assumptions (A2)-(A3) hold and suppose that the kernel   satisfies assumption (A1) and (  ) ∈ GS(−), with  ∈]0, 1[; one has The following corollary shows that the bandwidth which minimizes the MISE of Î1 depends on the stepsize (  ) and then the corresponding MISE depends also on the stepsize (  ).
Furthermore, to estimate  2 , we introduce the following kernel estimator: where  (2)    is the second order derivative of a kernel    .The bias and variance of Î2 are computed in the following theorem.
One has Then, the expected MISE * of the recursive estimator defined by ( 3) is smaller than the expected MISE * of the nonrecursive estimator defined by (5) for small sample setting.

Simulation
The aim of our simulation study is to compare the performance of the nonrecursive Rosenblatt's estimator defined in (5) with that of the recursive estimators defined in (3).
When applying   one needs to choose three quantities.
(i) For the function , we choose the normal kernel.
(iii) The bandwidth (ℎ  ) is chosen to be equal to (37).
When applying f one needs to choose two quantities.
(i) For the function , as in the recursive framework, we use the normal kernel.
(ii) The bandwidth (ℎ  ) is chosen to be equal to (52).
In order to investigate the comparison between the two estimators, we consider two densities of : the standard normal N(0, 1) distribution (see Table 1) and the exponential E(1/2) distribution (see Table 2).For each of these two cases, 500 samples of sizes  = 50 and 100 were generated.For each fixed bandwidth ℎ, we computed the mean and the standard deviation (over the 500 samples) of  1 ,  2 , ℎ  , and MISE * .The plug-in estimators (37) and (52) require two kernels to estimate  1 and  2 .In both cases we use the normal kernel with   and    given in (19), with  equal, respectively, to 2/5 and 3/14.Both tables show that (1) the bias, respectively, and the standard deviation of  1 using the recursive algorithm (3) are very similar to the bias, respectively, and the standard deviation of  1 using the nonrecursive estimator ( 5), (2) the bias, respectively, and the standard deviation of  2 using the recursive algorithm (3) are always smaller than the bias, respectively, and the standard deviation of  2 using the nonrecursive estimator (5), (3) the mean, respectively, and the standard deviation of the bandwidths selected by the recursive estimator (3) are always smaller than the bias, respectively, and the standard deviation of the bandwidths selected by the nonrecursive estimator (5), and (4) the mean, respectively, and the standard deviation of the MISE * of the bandwidths selected by the recursive estimator (3) are always smaller than the bias, respectively, and the standard deviation of MISE * of the bandwidths selected by the nonrecursive estimator (5).In Tables 1 and 2 the Ref. column could be used as a reference for the mean of  1 and  2 .Figures 1 and 2 show boxplots of the selected bandwidths by the two algorithms (3) and (5), respectively.For samples of size 50 and 100, the bandwidths selected by the recursive estimator (3) are always smaller than the bandwidths selected by the nonrecursive estimator (5).
Figures 3 and 4 show boxplots of the expected MISE * by the two algorithms (3) and (5), respectively.For samples of size 50 and 100, the expected MISE * of the selected

Conclusion
In this paper we proposed an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm (2).We showed that, using the selected bandwidth and the stepsize (  ) = ( −1 ) (the stepsize which minimizes the MISE of the class of the recursive estimators defined in Mokkadem et al. [9]), the recursive estimator will be better than the nonrecursive one for small sample setting.The simulation study corroborated these theoretical results.Moreover, the simulation results indicate that the proposed recursive estimator has more computing efficiency than the nonrecursive estimator.
In conclusion, the proposed method allowed us to obtain better results than the nonrecursive estimator proposed by Rosenblatt [14] for small sample setting.Moreover, we plan to make an extension of our method in the future and to consider the case of a regression function as in Härdle and Marron [22] in recursive way (see Mokkadem et al. [23]) and the case of time series as in Hart and Vieu [24] in recursive way.

Figure 1 :Figure 2 :
Figure 1: The bandwidths minimizing the MISE * of the nonrecursive estimator (5) and the recursive estimator (3) for 500 samples, respectively, of size  = 50 (a) and of size  = 100 (b) from a normal distribution. )