Adaptive Wavelet Estimation of a Biased Density for Strongly Mixing Sequences

The estimation of a biased density for exponentially strongly
mixing sequences is investigated. We construct a new adaptive wavelet
estimator based on a hard thresholding rule. We determine a sharp
upper bound of the associated mean integrated square error for a wide
class of functions.


Introduction
In the standard density estimation problem, we observe n random variables X 1 , . . ., X n with common density function f.The goal is to estimate f from X 1 , . . ., X n .However, in some applications, X 1 , . . ., X n are not accessible; we only have n random variables Z 1 , . . ., Z n with the common density where w denotes a known positive function and μ is the unknown normalization parameter: μ w y f y dy.Our goal is to estimate the "biased density" f from Z 1 , . . ., Z n .Practical examples can be found in, for example, 1-3 and the survey by the author of 4 .
The standard i.i.d.case has been investigated in several papers.See, for example, 5-9 .To the best of our knowledge, the dependent case has only been examined in 10 for associated positively or negatively Z 1 , . . ., Z n .In this paper, we study another dependent and realistic structure which has not been addressed earlier: we suppose that Z 1 , . . ., Z n is a sample of a strictly stationary and exponentially strongly mixing process Z i i∈ to be defined in Section 2 .Such a dependence condition arises for a wide class of GARCHtype time series models classically encountered in finance.See, for example, 11, 12 for an overview.

International Journal of Mathematics and Mathematical Sciences
We focus our attention on the wavelet methods because they provide a coherent set of procedures that are spatially adaptive and near optimal over a wide range of function spaces.See, for example, 13, 14 for a detailed coverage of wavelet theory in statistics.We develop two new wavelet estimators: a linear nonadaptive based on projections and a nonlinear adaptive using the hard thresholding rule introduced by 15 .We measure their performances by determining upper bounds of the mean integrated squared error MISE over Besov balls to be defined in Section 3 .We prove that our adaptive estimator attains a sharp rate of convergence, close to the one attained by the linear wavelet estimator constructed in a nonadaptive fashion to minimize the MISE .
The rest of the paper is organized as follows.Section 2 is devoted to the assumptions on the model.In Section 3, we present wavelets and Besov balls.The considered wavelet estimators are defined in Section 4. Section 5 is devoted to the results.The proofs are postponed in Section 6.

Assumptions on the Model
We assume that Z 1 , . . ., Z n coming from a strictly stationary process Z i i∈ .For any m ∈ , we define the mth strongly mixing coefficient of Z i i∈ by where, for any u ∈ , F Z −∞,u is the σ-algebra generated by the random variables . . ., Z u−1 , Z u and F Z u,∞ is the σ-algebra generated by the random variables Z u , Z u 1 , . ... We consider the exponentially strongly mixing case, that is, there exist three known constants, γ > 0, c > 0, and θ > 0, such that, for any m ∈ , This assumption is satisfied by a large class of GARCH processes.See, for example, 11, 12, 16, 17 .Note that, when θ → ∞, we are in the standard i.i.d.case.W.o.l.g., the support of the functions f, and w are 0, 1 .
There exist two constants, c > 0 and C > 0, such that There exists a known constant C > 0 such that For any m ∈ {1, . . ., n}, let g Z 0 ,Z m be the density of Z 0 , Z m .There exists a constant C > 0 such that sup m∈{1,...,n} sup x,y ∈ 0,1 2 g Z 0 ,Z m x, y − g x g y ≤ C.

2.5
The two first boundedness assumptions are standard in the estimation of biased densities.See, for example, 6-8 .

Wavelets and Besov Balls
Let N be an integer φ and ψ be the initial wavelets of dbN so supp φ supp ψ With an appropriate treatments at the boundaries, there exists an integer τ satisfying 2 τ ≥ 2N such that the collection is an orthonormal basis of Ä 2 0, 1 the space of square-integrable functions on 0, 1 .See 18 .
For any integer ≥ τ, any h ∈ Ä 2 0, 1 can be expanded on B as where α j,k and β j,k are the wavelet coefficients of h defined by Let M > 0, s > 0, p ≥ 1, and r ≥ 1.A function h belongs to B s p,r M if and only if there exists a constant M * > 0 depending on M such that the associated wavelet coefficients 3.3 satisfy In this expression, s is a smoothness parameter and p and r are norm parameters.For a particular choice of s, p, and r, B s p,r M contains some classical sets of functions as the H ölder and Sobolev balls.See 19 .

Estimators
Firstly, we consider the following estimator for μ: 4.1 It is obtained by the method of moments see Proposition 6.2 below .Then, for any integer j ≥ τ and any k ∈ {0, . . ., 2 j − 1}, we estimate the unknown wavelet coefficient i α j,k Note that they are those considered in the i.i.d.case see, e.g., 8, 9 .Their statistical properties, with our dependent structure, are investigated in Propositions 6.2, 6.3, and 6.4 below.
Assuming that f ∈ B s p,r M with p ≥ 2, we define the linear estimator f L by f L x where α j,k is defined by 4.2 and j 0 is the integer satisfying For a survey on wavelet linear estimators for various density models, we refer the reader to 20 .For the consideration of strongly mixing sequences, see, for example, 21, 22 .
We define the hard thresholding estimator f H by f H x International Journal of Mathematics and Mathematical Sciences 5 x ∈ 0, 1 , where α τ,k is defined by 4.2 and β j,k by 4.3 , for any random event A, Á A is the indicator function on A, j 1 is the integer satisfying θ is the one in 2.2 , κ is a large enough constant the one in Proposition 6.4 below and λ n is the threshold The feature of the hard thresholding estimator is to only estimate the "large" unknown wavelet coefficients of f which contain his main characteristics.
For the construction of hard thresholding wavelet estimators in the standard density model, see, for example, 15, 23 .

Results
Theorem 5.1 upper bound for f L .Consider 1.1 under the assumptions of Section 2. Suppose that f ∈ B s p,r M with s > 0, p ≥ 2, and r ≥ 1.Let f L be 4.4 .Then there exists a constant C > 0 such that 5.1 The proof of Theorem 5.1 uses a suitable decomposition of the MISE and a moment inequality on 4.2 see Proposition 6.3 below .
Note that n −2s/ 2s 1 is the optimal rate of convergence in the minimax sense for the standard density model in the independent case see, e.g., 14, 23 .

5.2
The proof of Theorem 5.2 uses a suitable decomposition of the MISE, some moment inequalities on 4.2 and 4.3 see Proposition 6.3 below , and a concentration inequality on 4.3 see Proposition 6.4 below .
Theorem 5.2 shows that, besides being adaptive, f H attains a rate of convergence close to the one of f L .The only difference is the logarithmic term ln n 1 1/θ 2s/ 2s 1 .
Note that, if we restrict our study to the independent case, that is, θ → ∞, the rate of convergence attained by f H becomes the standard one: log n/n 2s/ 2s 1 .See, for example, 14, 15, 23 .

Proofs
In this section, we consider 1.1 under the assumptions of Section 2.Moreover, C denotes any constant that does not depend on j, k and n.Its value may change from one term to another and may depends on φ or ψ.

Auxiliary Results
Lemma 6.1.For any integer j ≥ τ and any k ∈ {0, . . ., 2 j − 1}, let α j,k be 4.2 and α j,k 1 0 f x φ j,k x dx.Then, under the assumptions of Section 2, there exists a constant C > 0 such that This inequality holds for ψ instead of φ (and, a fortiori, β j,k defined by 4.3 instead of α j,k and β j,k Proof of Lemma 6.1.We have Using 2.4 and the Cauchy-Schwarz inequality, we obtain C.

6.4
Hence Lemma 6.1 is proved.Proposition 6.2.For any integer j ≥ τ such that 2 j ≤ n and any k ∈ {0, . . ., 2 j − 1}, let α j,k 1 0 f x φ j,k x dx and μ be 4.1 .Then, 2 there exists a constant C > 0 such that 3 there exists a constant C > 0 such that These results hold for ψ instead of φ (and, a fortiori, β j,k Proof of Proposition 6.2. 1 We have 6.9 Since f is a density, we obtain 1 μ

6.12
It follows from the stationarity of Z i i∈ and 2 j ≤ n that where

6.14
Let us now bound T 1 and T 2 .
Upper Bound for T 1 Using 2.5 , 2.3 , and doing the change a variables y 2 j x − k, we obtain

6.15
Therefore, Upper Bound for T 2 By the Davydov inequality for strongly mixing processes see 24 , for any q ∈ 0, 1 , it holds that 6.17 and, by 6.12 , Therefore,

International Journal of Mathematics and Mathematical Sciences
Since n m 2 j m q a q m ≤ ∞ m 1 m q a q m γ q ∞ m 1 m q exp −cqm θ < ∞, we have m q a q m ≤ Cn.

6.21
It follows from 6.13 , 6.16 , and 6.21 that Combining 6.11 , 6.12 , and 6.22 , we obtain 3 Proceeding in a similar fashion to 2-, we obtain

6.24
Using 2.3 which implies sup x∈ 0,1 1/w x ≤ C and applying the Davydov inequality, we obtain The proof of Proposition 6.2 is complete.Proposition 6.3.For any integer j ≥ τ such that 2 j ≤ n and any k ∈ {0, . . ., 2 j − 1}, let α j,k 1 0 f x φ j,k x dx and α j,k be 4.2 .Then, 1 there exists a constant C > 0 such that 2 there exists a constant C > 0 such that α j,k − α j,k 4 ≤ C2 j 1 n .

6.27
These inequalities hold for β j,k defined by 4.3 instead of α j,k , and β j,k Proof of Proposition 6.3. 1 Applying Lemma 6.1 and Proposition 6.2, we have

6.31
It follows from 6.31 and 6.28 that The proof of Proposition 6.3 is complete.

6.33
Proof of Proposition 6.4.It follows from Lemma 6.1 that where

6.35
In

6.37
Then U 1 , . . ., U n are identically distributed, depend on the stationary strongly mixing process Z i i∈ which satisfies 2.2 , Proposition 6.2 gives and, by 2.3 and 6.4 ,

6.40
Therefore, for large enough κ and u, we have

6.41
Upper Bound for P 2 For any i ∈ {1, . . ., n}, set Then U 1 , . . ., U n are identically distributed, depend on the stationary strongly mixing process Z i i∈ which satisfies 2.2 , Proposition 6.2 gives By 2.3 , we have

International Journal of Mathematics and Mathematical Sciences
It follows from Lemma 6.5 applied with U 1 , . . ., U n , λ κCλ n , λ n ln n 1 1/θ /n 1/2 , m u ln n 1/θ with u > 0 chosen later and M C that

6.45
Therefore, for large enough κ and u, we have 6.46 Putting 6.34 , 6.41 , and 6.46 together, this ends the proof of Proposition 6.4.

Proofs of the Main Results
Proof of Theorem 5.1.We expand the function f on B as f x where α j 0 ,k 1 0 f x φ j 0 ,k x dx and β j,k 1 0 f x ψ j,k x dx.We have, for any x ∈ 0, 1 , Since B is an orthonormal basis of Ä 2 0, 1 , we have,

6.49
Using Proposition 6.3, we obtain The proof of Theorem 5.1 is complete.
Proof of Theorem 5.2.We expand the function f on B as where α τ,k 1 0 f x φ τ,k x dx and β j,k 1 0 f x ψ j,k x dx.We have, for any x ∈ 0, 1 ,

6.54
Since B is an orthonormal basis of Ä 2 0, 1 , we have where

6.56
Let us bound R, T, and S, in turn.

6.57
Upper Bound for T For r ≥ 1 and p ≥ 2, we have B s p,r M ⊆ B s 2,∞ M .Since 2s/ 2s 1 < 2s, we have

6.60
Upper Bound for S Note that we can write the term S as S S 1 S 2 S 3 S 4 , 6.61 where

6.62
Let us investigate the bounds of S 1 , S 2 , S 3 , and S 4 in turn.
Upper Bounds for S 1 and S 3 We have

6.64
It follows from the Cauchy-Schwarz inequality, Propositions 6.3 and 6.4, and 2 j ≤ 2 j 1 ≤ n that

6.66
Upper Bound for S 2 Using again Proposition 6.3, we obtain Hence,

6.68
Let j 2 be the integer defined by 1 2

6.69
We have where

6.84
The proof of Theorem 5.2 is complete.