Research Article Central Limit Theorem of the Smoothed Empirical Distribution Functions for Asymptotically Stationary Absolutely Regular Stochastic Processes

Let Fn be an estimator obtained by integrating a kernel type density estimator based on a random sample of size n. A central limit theorem is established for the target statistic Fn ξn , where the underlying random vector forms an asymptotically stationary absolutely regular stochastic process, and ξn is an estimator of a multivariate parameter ξ by using a vector of U-statistics. The results obtained extend or generalize previous results from the stationary univariate case to the asymptotically stationary multivariate case. An example of asymptotically stationary absolutely regular multivariate ARMA process and an example of a useful estimation of F ξ are given in the applications.


Introduction
The purpose of this paper is to estimate the value of a multivariate distribution function, called the target distribution function, at a given point, when observing a nonstationary process. Clearly, there must be a connection between the process and the target distribution. We will assume that as time goes, the marginal distribution of the process gets closer and closer to the target in a suitable sense. The point at which we want to estimate the target distribution is not any bona fide vector, for we will assume that it can be estimated by a vector of U-statistics. Such a problem is clearly out of reach with that generality, and we will assume that, though nonstationary, the process exhibits an asymptotic form of stationarity and has a suitable mixing property. Those will be defined formally after this general introduction.
Let X i i≥1 be a stochastic process indexed by the positive integers, taking value in a finite dimensional Euclidean space H. Identifying H with a product of a finite number copies or the 2 Journal of Applied Mathematics and Stochastic Analysis real line, we write F i for the distribution function of X i . We will assume that the process has some form of asymptotic stationarity, implying that the sequence F i converges in a sense to be made precise to a limiting distribution function F.
For i ≤ j, let A j i denote the σ-algebra of events generated by X i , . . . , X j . We will say that the nonstationary stochastic process is absolutely regular if Assume that for some positive δ less than 2/5, We consider a parameter ξ in H whose components can be naturally estimated by Ustatistics. To be more formal and precise, we assume that ξ is defined as follows. Let m be an integer to be the degree of the U-statistics. Let Φ be a function from H m into H, invariant by permutation of its arguments. We are interested in parameters of the form Example 1. 1. Take H to be R. The mean vector corresponds to taking m 1, and Φ is the identity.
Example 1.2. Take H to be R 2 . Consider ξ to be the 2-dimensional vector whose components are the marginal variances. We take m 2 and Φ is going to be a function defined on R 2 2 . It has two arguments, each being in R 2 , and it is defined by Such a parameter can be estimated naturally by U-statistics, essentially replacing F ⊗m in 1.3 by an empirical counterpart. By using the invariance of Φ, the estimator of ξ is then of the form Now, we have described the parameter ξ and its estimator, going back to our problem, we need to define an estimator of the distribution function F.
A natural one would be the empirical distribution function calculated on the observed values of the process. Even if the empirical distribution function is optimal with respect to the speed of convergence of the mean square error, it is not appropriate for not taking care of the fact that F is smooth, and particularly of the existence of a density f.
It is, therefore, natural to seek an estimator of the target distribution which is smooth. A good candidate is to smooth the empirical distribution function with a kernel. Another way to introduce this estimator is to say that we integrate a standard kernel estimator of the density. E. Elharfaoui and M. Harel 3 Such an estimator estimates the mean distribution function But since the sequence F i has a limit F, it estimates the limit F as well. To be explicit, we consider a sequence K n , n ≥ 1, of distribution function converging, in the usual sense of convergence in distribution, to that of the point mass at the origin. We write F n for the empirical distribution function pertaining to the measure having mass 1/n at each sample point where denotes the convolution operator. Finally, our estimator of F ξ is F n ξ n . Our method is the adaption of some of the ideas of Puri and Ralescu 1 who proved a central limit theorem of F n ξ n for the i.i.d. case which was generalized by Sun 2 for the stationary absolutely regular case, then Sun 3 proved the asymptotic normality of F n and the perturbed sample quantiles for the nonstationary strong mixing condition. We also have to mention Harel and Puri 4, 5 who proved central limit theorems of U-statistics for nonstationary not necessarily bounded strong mixing double array of random variables, Ducharme and El Mouvid 6 who proved limit theorems for the conditional cumulative distribution function by using the convergence of the rapport of two U-statistics, and Oodaira and Yoshihara 7 who obtained the law of the iterated logarithm for the sum of random variables satisfying the absolute regularity, then Harel and Puri 8 proved the law of the iterated logarithm for perturbed empirical distribution function when the random variables are nonstationary absolutely regular; later, this result was generalized for the strong mixing condition by Sun and Chiang 9 . In addition, some of the ideas of Billingsley 10 and Yoshihara 11 have been used to study our problem. For the study of some limit theorems dealing with U-statistic for processes which are uniformly mixing in both directions of time, the reader is also referred to Denker and Keller 12 .

Preliminaries
To specify our assumption on the process, it is convenient to introduce copies of H. Hence we write H i , i ≥ 1, an infinite sequence of copies of H. The basic idea is to think of the process at time i as taking value in H i and we think of each H i as the ith component of H ∞ . We then agree on the following definition.
We write S p for a generic canonical p-subspace.
The origin of this terminology is that when H is the real line, then a canonical p-subspace is a subspace spanned by exactly p distinct vectors of the canonical basis of H ∞ . We write S p ⊂H n for a sum over all canonical p-subspaces included in H n .

Journal of Applied Mathematics and Stochastic Analysis
To such a canonical subspace S p H i 1 ⊕· · ·⊕H i p , we can associate the distribution function F S p of X i 1 , . . . , X i p as well as the distribution function with the same marginals Clearly, the marginals of F ⊗S p are independent, while these of F S p are not. Consider two nested canonical subspaces S p and S m−p , where S m−p ⊂ H n S p . For a function φ symmetric in its argument and defined on S p ⊕ S m−p , we can define its projection onto the functions defined on S p by Identifying S p ⊕ S m−p with H m and H p with S p allows to project functions defined on H m onto functions on H p . However, with this identification, the projection depends on the particular choice of S m−p in H n . To remove the dependence in S m−p , we sum over all choices of Given U-statistics of degree m, we can then define an analogue of Hoeffding decomposition e.g., Hoeffding 13 when the random variables come from a nonstationary process. For this purpose, consider, firstly, an expectation of U n if the process had no dependence, namely, Then for any p 1, . . . , m, we define We can construct the vector ξ n,·,p m j p ξ n,j,p 1≤j≤d .

2.11
Now, writing m for the largest of the m j 's, we can write a vector version of the Hoeffding decomposition ξ n 0≤p≤m ξ n,·,p .

2.12
Note that this decomposition makes an explicit use of convention 2.7 , and this is why this convention was introduced.
We now need to specify exactly what we mean by asymptotic stationary of a process. For this, recall the following notion of distance between probability measures. Definition 2.3. The distance in total variation between two probability measures P and Q defined on the same σ-algebra A is If S p is a canonical subspace of H ∞ , we write σ S p as the σ-algebra generated by the X i 's with H i ⊂ S p . We write P as the probability measure pertaining to the process X i i≥1 , which is a probability measure on H ∞ .
Definition 2.4. The process X i i≥1 with probability measure P on H ∞ is geometrically asymptotically pairwise stationary if there exists a strictly stationary process with distribution Q on H ∞ , and a positive τ less than 1, such that for 1 ≤ i < j, Since Q is strictly stationary in this definition, its restriction to σ H i ⊕H j depends in fact only on j − i. Hence, this definition asserts that the process X i , X i 1 , . . . is very close of being stationary when i is large. It also implies that

Journal of Applied Mathematics and Stochastic Analysis
This asserts that the marginal distribution of the process converges geometrically fast to a fixed distribution. We suppose that there exists a strictly stationary process X * i i≥1 with probability measure Q on H ∞ , which is absolutely regular with the same rate as the process X i i≥1 . F is the distribution function of X * i . We define the function φ * on H 1 by 2.16 Next, we denote Identifying H with R d says, the vector of U-statistics being defined by a vector function Φ, we can write We can construct the vector where s x 1 for x ≥ 0, s x 0 otherwise, and D is the differential operator. We have We also define

Weak convergence of the smoothed empirical distribution function
In this section, we identify H with R d and we have, of course, a vector of U-statistics defined obviously by a vector function Φ φ j 1≤j≤d , where the degree of φ j is m j . Let k be a probability density function on H and let a n n≥1 be a sequence of positive window-width, tending to zero as n → ∞. Denote k n t a −1 n k ta −1 n , K n x t≤x k n y dy, and consider the perturbed empirical distribution function F n defined by 1.7 corresponding to the sequence K n n≥1 .

7
Consider the smoothed empirical distribution F n defined in 1.7 and using the kernel density estimator f n , where f n x na n −1 n l 1 k x − X l /a n , and define Note that this is of the form 1.7 , with K n x t≤x k n t dt, where k n t a −1 n k ta −1 n . For a better understanding of the use of the integral type estimators F n , it is of interest to study the asymptotic behavior of the distribution of F n defined by 3.1 evaluated at a random point ξ n defined by 1.5 . Such a statistic is useful in estimating a functional F ξ if F is unknown.
Supposing that the conditions introduced in Section 2 are satisfied, our main result establishes that F n ξ n is an estimator which converges to F ξ , and the asymptotic normality will allow us to obtain confidence intervals for F ξ . Finally, by using the notations introduced in Section 2, we can write the following result.
where δ is the number introduced in 1.2 ; ii the mixing rate of absolute regularity verifies Condition 1.2 ; iii condition 2.14 is verified; where k is a probability density function; v the sequence F i and F are twice differentiable on H with uniformly bounded first and second partial derivatives. Then We are then faced with a difficulty as the variance σ 2 defined in 2.22 is unknown. In order to overcome this difficulty, we can proceed to an estimation of the variance σ 2 by truncating the expansion of σ, keeping only the first I more informative terms and estimating σ 2 by its empirical counterpart σ 2 n defined by DK n x − X l .

3.6
From condition 1.2 , we have where M is some finite positive constant.
To obtain a suitable value for I, a simple criterion consists of computing the smallest integer I for which where α would be the needed level of precision.
From the empirical construction of the estimator σ 2 n , we deduce easily the convergence in distribution of σ 2 n to σ 2 I E A 2

Application to an ARMA process
First, we give an example for which the the stochastic process X i i≥1 satisfies our general condition. It means that X i i≥1 is a multivariate asymptotical stationary absolute regular stochastic process.
Example 4.1. ARMA process. Consider a d-variate ARMA 1,1 process X i i≥1 defined by where the initial vector X 1 has a measure which is not necessarily the invariant measure and admits a strictly positive density, A and B are square matrices and i i≥1 is a d-variate white noise with strictly positive density and geometrical absolute regularity. If the eigenvalues of the square matrix A have modulus strictly less than 1, then the process X i i≥1 satisfying Condition 2.14 for a proof, see 5.4 in 8 , the process is asymptotically stationary and geometrically absolutely regular. Consider the strictly stationary process X * i i≥1 satisfying 4.1 , associated to the process X i i≥1 where X * 1 has a measure which is the invariant measure. Some parameters of the model 4.1 can be estimated by estimators of the form 1.5 and we can apply Theorem 3.1.
For example, take d 2 and denote by μ the mean and by Γ · the covariance function of the process X * i i≥1 . Consider ξ to be the 2-dimensional vector which is the column matrix μ, and suppose that we want to estimate μ, then one possibility should be to use the estimator ξ n , and its associated kernel Φ is, of course, the identity.
For estimating the two parameters ξ 1 and ξ 2 , where ξ 1 is the first column and ξ 2 is the second column of Γ · , we could also use the estimator ξ n , where the associated kernel Φ 1 of ξ 1 is defined by and the associated kernel Φ 2 of ξ 2 is defined by

Application to estimation of the median
We give a very simple example for which it is useful to estimate F ξ . For simplicity, we suppose d 1.
Let X 1 , . . . , X n be a random sample for which the sequence of distribution functions F i of X i converges to the limiting distribution function F with median ξ. A well-known estimator of ξ is the Hodges-Lehmann estimator ξ n defined by The ξ n estimator is a weighed U-statistic with kernel φ x 1 , x 2 1/2 x 1 x 2 . The theorems of convergence for U-statistics remain true for weighted U-statistics. We can easily conclude that ξ n convergence in law to ξ, and also F n ξ n converges in law to 1/2. We can confront these two results to evaluate the validity of the estimation of the parameter ξ by ξ n .

Proof of Theorem 3.1
We are going to use the following lemma proved by Harel and Puri 4, Lemma 2.2 , which is a generalization of a lemma of Yoshihara 11, Lemma 2 .
Writing F n ξ n − F ξ as F n ξ n − F n ξ F n ξ − F n K n ξ F n K n ξ − F ξ 5.2 will allow us to determine the contributions of the stochastic behavior of ξ n and that of F n to the limiting distribution. First, we use smoothness of our nonparametric estimator to linearize the term F n ξ n − F n ξ , approximating it by the differential D F n ξ ξ n −ξ minus a centralization term DF n ξ ξ n −ξ . The second term F n ξ −F n K n ξ , plus the centralization term defined above, is analyzed using an empirical process technique for dependent random variables. Finally, the last term satisfies F n K n ξ − F ξ o n −1/2 , using the exponential asymptotic stationarity. Setting H n,1 ξ n 1/2 F n ξ − F n K n ξ DF n ξ ξ n − ξ , H n,2 ξ n 1/2 F n ξ n − F n ξ − DF n ξ ξ n − ξ ,

5.3
we can rewrite the first and second terms as n 1/2 F n ξ n − F n ξ F n ξ − F n K n ξ H n,1 ξ H n,2 ξ . 5.4 where T n,l K n ξ − X l − E K n ξ − X l DF n ξ ζ l, ·,1 , ζ l, ·,1 m j ζ l,j,1 1≤j≤d , and φ j,H l is defined by 2.3 for the component φ j of Φ.
From the exponential asymptotic stationarity, n −1/2 n l 1 E K n ξ − X l − n 1/2 F n K n ξ 5.8 converges to zero, and from Lemma 5.1 and Markov inequality, we deduce that DF n ξ ξ l, ·,p 5.9 converges to zero in probability. It remains to show that n −1/2 n l 1 T n,l converges to a normal distribution, noting that T n,l 1≤l≤n is a nonstationary absolutely regular unbounded sequence of random variables which verifies the mixing rate 1.2 . To prove the asymptotic normality of n −1/2 n l 1 T n,l , we use the following lemma, obtained by Harel and Puri 4, Lemma 2.3 . where c 2 is a positive constant; where c 2 K is a positive constant;

5.14
Then n −1/2 n l 1 Y n,l converges in law to a normal distribution with mean zero and variance c 2 .
First, we prove 5.10 and 5.11 for the sequence T n,l 1≤l≤n . Put

5.16
From condition v , we have where m max 1≤j≤d |m j |, and 5.10 is proved. Now, by using the inequality

5.19
From 3.2 , we have where ϕ l, l × K n ξ − y − E K n ξ − X k DF n ξ ζ k, ·,1 dF H l ⊕H k x, y , k > l.

5.23
It results that where

Journal of Applied Mathematics and Stochastic Analysis
Thus to prove 5.12 , we have to show that L 1,n L 2,n L 3,n L 4,n −→ 0 asn −→ ∞.

5.26
From 2.14 and the convergence of K n to the function s, L 1,n → 0 as n → ∞, and by using 4, Lemma 3.1 of Harel and Puri, |L 2,n | → 0 as n → ∞.
From the well-known inequality on moments of absolute regular processes, we have, for δ < δ/2, then L 3,n −→ 0 as n −→ ∞.

5.32
We have

5.43
Then, we deduce that where C is a positive constant. To prove that

5.46
Since F n is differentiable, there exists θ in 0, 1 such that F n y − F n x DF n x θ y − x · y − x .

5.47
The differential DF n being bounded, there exists a positive constant M 3 such that

5.50
where G n is the copula of F n and I · denotes the indicator function. We have where ω is the module of continuity for any bounded function f: 0, 1 d → R defined by We generalize 2, Relation 3.11 by Sun from the univariate case to the multivariate case by using similar methods as in 14, Lemmas 6.3 and 6.5 by Harel and Puri. Therefore, we get lim n→∞ sup P ω W n , η n ≥ ε ≤ ε; ∀ε > 0.

5.54
It results from the inequalities 5.51 and 5.54 that A n converges to zero in probability as n → ∞.
For the second term of H n,2 and using the Lagrange form of Taylor theorem, applied to F n on the points ξ n − t and ξ − t until the second order, there exists θ such that F n ξ n − t − F n ξ − t DF n ξ − t ξ n − ξ 1 2 D 2 F n z θ ξ n − ξ, ξ n − ξ , 5.55 with z θ ξ − t θ ξ n − ξ , 0 < θ < 1.

5.57
From Harel and Puri 5 , we deduce that n 1/2 { ξ n − ξ} converges to a multinormal distribution as n → ∞. From Condition iv , we have Consequently, we deduce that H n,2 converges to zero in probability, and Lemma 5.4 is proved. Therefore, the proof of Theorem 3.1 follows from Lemmas 5.2 and 5.4.