We consider the nonparametric estimation of the generalised regression
function for continuous time processes with irregular paths when the regressor takes values
in a semimetric space. We establish the mean-square convergence of our estimator with
the same superoptimal rate as when the regressor is real valued.
1. Introduction
Since the pioneer works of [1, 2], the nonparametric estimation of the regression function has been very widely studied for real and vectorial regressors (see, e.g., [3–8]) and, more recently, the case when the regressor takes values in a semimetric space of infinite dimension has been addressed. Interest in this type of explanatory variables has increased quickly since the foundational work of Ramsay and Silverman (1997), who proposed efficient methods for linear modelling (see [9] for a reissue of this work or [10, 11] for other developments on this topic). Later, fully nonparametric methods have been proposed (e.g., [12–15]) but the increased generality comes at a price in terms of convergence rate: in the regression estimation framework, it is well known that the efficiency of a nonparametric estimator decreases quickly when the dimension of the regressor grows. This problem, known as the “curse of dimensionality,” is due to the sparsity of data in high dimensional spaces. However, when studying continuous time processes with irregular paths, it has been shown in [16] that even when the regressor is Rd-valued, we can estimate the regression function with the parametric rate of convergence O(1/T). This kind of superoptimal rate of convergence for nonparametric estimators is always obtained under hypotheses on the joint probability density functions of the process which are very similar to those introduced by [17]. Since there is no equivalent of the Lebesgue measure on an infinite-dimensional Hilbert space, the definition of a density is less natural in the infinite-dimensional framework and the classical techniques cannot be applied. Under hypotheses about probabilities of small balls, we show that we can reach superoptimal rates of convergence for nonparametric estimation of the regression function when the regressor takes values in an infinite-dimensional space.
Notations and assumptions are presented in Section 2. Section 3 introduces our estimator and the main result. We comment on hypotheses and results and give some examples of processes fulfilling our hypotheses in Section 4. A numerical study can be found in Section 5. The proofs are postponed to Section 6.
2. Problem and Assumptions
Let {Xt,Yt}t∈[0,∞) be a measurable continuous time process defined on a probability space (Ω,F,P) and observed for t∈[0,T], where Yt is real valued and Xt takes values in a semimetric vectorial space H equipped with the semimetric d(·,·). We suppose that the law of (Xt,Yt) does not depend on t and that there exists a regular version of the conditional probability distribution of Yt, given Xt (see [18–20] for conditions giving the existence of the conditional probability). Throughout this paper, C denotes a compact set of H. Let Ψ be a real valued Borel function defined on R and consider the generalized regression function
(1)r(x)∶=E(Ψ(Y0)∣X0=x),x∈C.
We aim to estimate r from {Xt,Yt}t∈[0,T].
We gather hereafter the assumptions that are needed to establish our result.
For any x∈H and any h>0, set B(x,h)∶={y∈C;d(y,x)≤h}. There exist three constants (c1,C,η)∈(0,∞)2×]0,1] such that, for any x∈C and any (u,v)∈B(x,c1)2, we have
(2)|r(u)-r(v)|≤Cd(u,v)η.
There exist
a function ϕ and three constants (β1,β2,c2)∈[0,∞)3 such that, for any x∈C and any h∈(0,c2], we have
(3)0<β1ϕ(h)≤P(X0∈B(x,h))≤β2ϕ(h),
a constant c3>0 and a function g0 integrable on (0,∞) such that, for any x∈C, any s>t≥0, and any h∈(0,c3], we have
(4)|P((Xt,Xs)∈B(x,h)2)-P(Xt∈B(x,h))2|≤g0(s-t)ϕ(h)2.
For any t≥0, we set ɛt∶=Ψ(Yt)-E(Ψ(Yt)∣Xt). There exists an integrable bounded function g1 on [0,∞) such that, for any (s,t)∈[0,∞)2, we have
(5)max{|E(ɛs∣Xs,Xt)|,|E(ɛsɛt∣Xs,Xt)|}≤g1(|s-t|).
Let TT be the sigma-algebra generated by {Xt,t∈[0,T]}. There exists a constant R>0, not depending on T, such that
(6)supt∈[0,T]E(Ψ(Yt)2∣TT)<R.
3. Estimator and Result
We define the generalized regression function estimate by
(7)r^T(x)∶={∫t=0TΨ(Yt)K(hT-1d(x,Xt))dt∫t=0TK(hT-1d(x,Xt))dtif∫t=0TK(hT-1d(x,Xt))dt≠0,∫t=0TΨ(Yt)dtTotherwise,
where K(x)=I[0,1](x) is the indicator function on [0,1] and hT is a bandwidth decreasing to 0 when T→∞. Remark that this estimator is the same as the one defined in [21, page 130] with the use of the semimetric d instead of the simple difference used in the real case.
Theorem 1 explores the performance of r^T(x) in terms of mean-square error.
Theorem 1.
Suppose that (H1)–(H4) hold. Let r be (1) and r^T be (7) defined with hT=O(T-1/η). Then, one has
(8)supx∈CE(r^T(x)-r(x))2=O(1T).
We can compare this rate of convergence with the one obtained for discrete time processes in [14], which is, with our notations,
(9)(r^n(x)-r(x))2=O(hn2+1nϕ(hn)).
Remark that, with infinite-dimensional variables, ϕ(h) can decrease to zero, when h tends to zero, at an exponential rate so that hn have to tend to zero at a logarithmic rate.
4. Comments and Examples
(H1) is a very classical Hölderian condition on the true regression function, but, in the infinite-dimensional framework, this condition depends on the semimetric used.
The assumption on small balls probabilities given in (H2)-(i) is widely used in nonparametric estimation for functional data (see, e.g., the monograph [22]). However, we want to point out the fact that if we define equivalence classes using the semidistance d, we can construct a quotient space on which d is a distance and if this quotient space is infinite-dimensional, then this condition can be satisfied only very locally in that for any point x of our compact C, we can find, for any ɛ>0, a point y and a positive number h<ɛ such that d(x,y)<ɛ and P(X0∈B(y,h))≤β1ϕ(h): in that case, we could not extend our hypothesis to every point in an open ball (see [23] for a result on the consequences of a similar hypothesis on every point in a ball).
The most specific and restrictive assumption is (H2)-(ii), which is an adaptation to infinite-dimensional processes of the conditions on the density function introduced in [17] for real valued processes and transposed in [21, pages 135-136] to the estimation of the regression function with a vectorial regressor. Note that when H=Rd and ϕ(h)=hd, the rate of convergence obtained in Theorem 5.3 in [21, page 136] is the same as the one we obtain here, and the condition I2 used implies (H2)-(ii). On the other hand, processes can meet (H2)-(ii) and infringe the condition in [21], especially when the vectorial process Xt does not admit a density. For real valued processes, a slightly different version of the Castellana and Leadbetter hypothesis on the joint density is given in [24] where it is shown that this hypothesis is satisfied for a wide class of diffusion processes, including the Ornstein-Uhlenbeck Process: these processes are also examples of the range of applications of our result. Real continuous-time fractional ARMA processes studied in [25] are given as examples in [26]. Depending on the choice of the impulse response functions, a vector composed of such d processes can fulfil (H2)-(ii) for any d: using the notations of [25], if ((X1,T),…,(Xd,T)) are d independent processes complying with conditions of Proposition 4 in [25] with D>-1/2-1/d and a>0, then the vectorial process ((X1,T),…,(Xd,T)) meets (H2)(ii). For processes valued in infinite-dimensional spaces, we can also give the example of hidden processes: let (Zt) be a nonobserved process valued in Rd, for which conditions of Theorem 5.3 in [21, page 136] hold for every x in a compact A, let Γ be an unknown function from Rd to a space H (that can be infinite-dimensional) equipped with a semimetric d, and let (Xt)=(Γ(Zt)) be an observed process. If there exist two positive constants (a,b) such that for any (x,y)∈Rd×Rd, a∥x-y∥≤d(Γ(x),Γ(y))≤b∥x-y∥, then (Xt) fulfills (H2) with ϕ(h)=hd and C=Ψ(A). Note that even if H=Rd′ with d′>d, (Xt) does not satisfy the assumptions usually imposed to vectorial processes to obtain a superoptimal rate.
There are two conditions in (H3). The condition |E(ɛs∣Xs,Xt)|≤g1(|s-t|) is less restrictive than imposing that the regressor and the noise are independent. |E(ɛsɛt∣Xs,Xt)|≤g1(|s-t|) is a weak condition on the decay of dependence as the distance between observations increases, and (ɛt) may not be α-mixing. Note that we do not impose to (ɛt) to be an irregular path process.
At last, it is much less restrictive to impose (H4) than to suppose that Ψ(Yt) is bounded. In particular, this assumption allows us to consider the model
(10)Ψ(Yt)=r(Xt)+ɛt,
where r is a bounded function, (ɛt) is a square integrable process, and ɛt and (Xt) are independent.
On a given space, we can define many semidistances and hypotheses (H1)-(H2,) as well as the estimator itself, depending largely on the choice of this semidistance: the importance of this choice is widely discussed in [22] and a method to choose the semimetric for independent variables is proposed in [27], but this method does not ensure that (H1) holds. Actually, we can obtain a semimetric d such that d(x,y)=0⇏r(x)=r(y). It would be of interest to develop a data driven method adapted to continuous time processes to select the semimetric.
In the statement of our theorem, we impose that hT=T-1/η where η is an unknown parameter so that the adaptation to continuous time processes of the method developed in [28] to choose the bandwidth would be interesting but is not in theory necessary in our framework. In point of fact, and it is what was very surprising when Castellana and Leadbetter first obtained a superoptimal rate of convergence, the bound for the variance of the estimator does not depend on ht and we can choose hT=T-log(T) which will always satisfy hT=T-1/η for T large enough: even if this choice has no reason to be optimal, it leads to the claimed superoptimal rate of convergence.
Recently, results have been obtained in the case where the response Y is valued in a Banach space, which can be infinite-dimensional (see [29, 30]). Note that until Ψ is a real valued Borelian function, there is no need to change our proofs to obtain our result if Y is valued in a Banach space. However, in the case where Ψ(Y) is a Banach valued variable, we could not easily adapt our proofs and obtaining a superoptimal rate would involve very different techniques; it would be an interesting extension for further works.
5. Simulations
We chose L2([-1,1]) endowed with its natural norm as the functional space and simulated our process as follows.
At first we simulated an Ornstein-Uhlenbeck process solution of the stochastic differential equation
(11)dOUt=-9(OUt-2)dt+6dWt,
where Wt denotes a Wiener process. Here, we took dt=0.0005.
Denoting the floor function by ⌊·⌋, let Γ be the function from R to L2([-1,1]) defined by
(12)Γ(x)∶=(1+⌊x⌋-x)Pnum(⌊x⌋)+(x-⌊x⌋)Pnum(⌊x+1⌋)∀x∈R,
where Pi is the Legendre polynomial of degree i and num(x)∶=1+2×sign(x)×x-sign(x)×(1+sign(x))/2. Then we define our functional process for any t∈[0,T] setting
(13)Xt∶=Γ(OUt).
For any square integrable function x on [-1,1], we define the function
(14)r(x)∶=∫u=-11x(u)(2u+x(u))du
and set
(15)Yt=r(Xt)+Ut,
where Ut=Wt′-WT-1′ and Wt′ is a Wiener process independent of X.
In order to obtain a panel of 20 points (in L2([-1,1])) where we can evaluate the regression function, we did a first simulation with T=10 and set C∶=(Xi/2,i∈1,2,…,20). Once obtained, C is considered as a deterministic set. We represent these functions in Figure 1.
Remark. We check here that the simulated processes fulfil our hypotheses.
At first, denoting by Id the identity function on [-1,1], for any (x,y)∈L2([-1,1])×L2([-1,1]), we have
(16)|r(x)-r(y)|=|∥x∥2-∥y∥2+2〈x-y,Id〉|≤(∥x+y∥+2)∥x-y∥
and r satisfies (H1) with η=1.
The Ornstein-Uhlenbeck process satisfies the part of Condition I2 on the regressor's density in [21, page 136]. Moreover, Γ is a bijection from R to Im(Γ), and it can be shown that, for some constant C, there exist 0<a<b such that for any 0<h<C and any x∈Γ-1(C), the two following implications are correct:
(17)(z∈B(x,ah))⟹(Γ(z)∈B(Γ(x),h)),(Γ(z)∈B(x,h))⟹(z∈B(x,bh)),
which implies that (H2)(i)-(ii) are fulfilled when taking ϕ(h)=h.
Since (ɛt) and (Xt) are independent and Cov(ɛt,ɛs)=0 if |t-s|>1, (H3) is satisfied.
Finally, the model used in the simulation corresponds to the choice of the identity function for Ψ in (1), where (Yt) is an unbounded process and r(·) is not a bounded function. However, r(·) is bounded on Im(Γ) and so (H4) is fulfilled.
We simulated the paths of the process (Xt,Yt)t∈[0,T] for different values of T. Figure 2 represents the path of the process (Yt) for t∈[0,1].
We estimated the regression function at each point in C, for different values of T, and compared our results to those obtained when studying a discrete time functional process, that is, when we observe (Xt,Yt) only for t∈N, and we use the estimator defined in [12] with the indicator function as the kernel: it corresponds to an infinite-dimensional version of Nadaraya-Watson estimator with a uniform kernel. When working with the discrete time process we used the data-driven way of choosing the bandwidth proposed in [28]. When working with the continuous time process that is observed on a very thin grid, for T=50, we chose the same bandwidth as the one used for the discrete time process and, for T>50, we supposed r to be Lipschitz (i.e., η=1, which is the case here) and used the bandwidth hT=h50(50/T). In Table 1, we give the mean square error evaluated on the functions of the panel for different T=50, 500, and 2000.
Continuous time process
Discrete time process
T=50
0.056623
0.231032
T=500
0.003235
0.037855
T=2000
0.000698
0.0155137
We can see that, for T=50, we already have a smaller mean square error with the estimator using the continuous time process, and when T increase, the mean square error seems to decrease much more quickly when working with the continuous time process. However, the continuous time approach takes much more time and much more memory; we had to split the calculation into several parts and delete intermediate calculations to avoid saturating memory.
In Figures 3 and 4, we have in abscissa the value of the real regression function applied to each function of our panel and in ordinate the estimated value of the regression function. We represent on the left the results for the continuous time estimator and on the right the results for the discrete time estimator.
Continuous time estimator (left) and discrete time estimator (right); in abscissa the value of the real regression function applied to each function of our panel and in ordinate the estimated value of the regression function.
Continuous time estimator (left) and discrete time estimator (right); in abscissa the value of the real regression function applied to each function of our panel and in ordinate the estimated value of the regression function.
Outputs for T = 500
Outputs for T = 2000
6. Proofs6.1. Intermediary Results
In the sequel, we use the following notations:
(18)ΔT,t(x)=K(hT-1d(x,Xt)),r^1,T(x)∶=1TE(ΔT,0(x))∫t=0TΔT,t(x)dt,r^2,T(x)∶=1TE(ΔT,0(x))∫t=0TΨ(Yt)ΔT,t(x)dt.
Lemma 2 below studies the behavior of the bias of r^2,T.
Lemma 2.
Under the conditions of Theorem 1, one has
(19)supx∈C|E(r^2,T(x))-r(x)|=O(1T).
Lemma 3 below provides an upper bound for the variances of r^1,T and r^2,T.
Lemma 3.
Under the conditions of Theorem 1, one has
(20)supx∈C(Var(r^2,T(x))+Var(r^1,T(x)))=O(1T).
6.2. Proofs of the Intermediary Results
For the sake of conciseness, when no confusion is possible, we use the notations Ψt∶=Ψ(Yt) and ΔT,t∶=ΔT,t(x).
Proof of Lemma 2.
Observe that, for any x∈C,
(21)E(r^2,T(x))=1TE(ΔT,0)∫t=0TE(ΨtΔT,t)dt=E(Ψ0ΔT,0)E(ΔT,0)=E(E(r(X0)+ɛ0∣X0)ΔT,0)E(ΔT,0)=E(r(X0)ΔT,0)E(ΔT,0).
Hence,
(22)E(r^2,T(x))-r(x)=E(r(X0)ΔT,0)E(ΔT,0)-r(x)=E((r(X0)-r(x))ΔT,0)E(ΔT,0).
Owing to (H1), we have |r(X0)-r(x)|ΔT,0≤ΔT,0supu∈B(x,hT)|r(u)-r(x)|≤CΔT,0hTη. Therefore, by Jensen's inequality and hη=O(1/T), we have
(23)supx∈C|E(r^2,T(x))-r(x)|≤supx∈CE(|r(X0)-r(x)|ΔT,0)E(ΔT,0)≤ChTη=O(1T).
This ends the proof of Lemma 2.
Proof of Lemma 3.
For any x∈C, by Fubini's Theorem, we have
(24)Var(r^2,T(x))=1T2E(ΔT,0)2∫t=0T∫s=0TCov(ΨsΔT,s,ΨtΔT,t)dtds.
Upper Bound of the Covariance Term. In order to simplify the notations, we set R(Xt)∶=E(Ψt∣Xt) and ɛt∶=Ψt-Rt. Note that
(25)E(ΨsΨt∣Xs,Xt)=R(Xs)R(Xt)+R(Xs)E(ɛt∣Xs,Xt)+R(Xt)E(ɛs∣Xs,Xt)+E(ɛsɛt∣Xs,Xt).
Therefore, the covariance term can be expended as follows:
(26)Cov(ΨsΔT,s,ΨtΔT,t)=E(ΨsΔT,sΨtΔT,t)-E(ΨsΔT,s)E(ΨtΔT,t)=E(ΔT,sΔT,tE(ΨsΨt∣Xs,Xt))-E(ΔT,sR(Xs))2=E(ΔT,sΔT,tR(Xs)R(Xt))+E(ΔT,sΔT,t(R(Xs)E(ɛt∣Xs,Xt)iiiiiiiiiiiiiiiiiiiiiii+R(Xt)iiiiiiiiiiiiiiiiiiiiiii×E(ɛs∣Xs,Xt)))+E(ΔT,sΔT,tE(ɛsɛt∣Xs,Xt))-E(ΔT,sR(Xs))2.
Set
(27)dt∶=R(Xt)-r(x).
We have
(28)Cov(ΨsΔT,s,ΨtΔT,t)=r(x)2E(ΔT,sΔT,t)+r(x)(E(ΔT,sΔT,tdt)hhhhihiiih+E(ΔT,sΔT,tds))+E(ΔT,sΔT,tdtds)+r(x)×E(ΔT,sΔT,t(E(ɛt∣Xs,Xt)hhhhhhihhhihhh+E(ɛs∣Xs,Xt)))+E(ΔT,sΔT,thhhiihh×(dsE(ɛt∣Xs,Xt)hhhhhiihii+dtE(ɛs∣Xs,Xt)))+E(ΔT,sΔT,tE(ɛsɛt∣Xs,Xt))-r(x)2E(ΔT,s)2-E(ΔT,sds)2-2r(x)E(ΔT,sds)E(ΔT,s)=r(x)2(E(ΔT,sΔT,t)-E(ΔT,s)2)-(E(ΔT,sds)2hhhiihh+2r(x)E(ΔT,sds)E(ΔT,s))+E(ΔT,sΔT,tQ),
with
(29)Q=dsE(ɛt∣Xs,Xt)+dtE(ɛs∣Xs,Xt)+dsdt+E(ɛsɛt∣Xs,Xt)+r(x)(ds+dt+E(ɛt∣Xs,Xt)+E(ɛs∣Xs,Xt)).
The triangular inequality and Jensen's inequality yield
(30)|Cov(ΨsΔT,s,ΨtΔT,t)|≤L+M+N,
where
(31)L=r(x)2|E(ΔT,sΔT,t)-E(ΔT,s)2|,M=E(ΔT,s|ds|)2+2|r(x)|E(ΔT,s|ds|)E(ΔT,s),N=E(ΔT,sΔT,t|Q|).
Upper Bound for L. Using (H2)-(ii), we have
(32)L≤r(x)2g0(|s-t|)ϕ(hT)2.Upper Bound for M. Owing to (H1), we have ΔT,s|ds|≤ΔT,ssupu∈B(x,hT)|r(u)-r(x)|≤CΔT,shTη. It follows from this inequality and (H2)-(i) that
(33)M≤(2|r(x)|ChTη+C2hT2η)E(ΔT,s)2≤(2r(x)ChTη+C2hT2η)β22ϕ(hT)2.Upper Bound for N. By similar techniques to those in the bound for M and (H3), we obtain
(34)ΔT,sΔT,t|Q|≤ΔT,sΔT,t×(2|r(x)|ChTη+C2hT2ηiiiiiiii+(2(|r(x)|+ChTη)+1)g1(|s-t|)(2|r(x)|ChTη+C2hT2η).
On the other hand, by (H2)-(ii),
(35)E(ΔT,sΔT,t)≤|Cov(ΔT,s,ΔT,t)|+E(ΔT,s)2≤(β22+g0(|s-t|))ϕ(hT)2.
Hence,
(36)N≤(2|r(x)|ChTη+C2hT2ηiiiii+(2(|r(x)|+ChTη)+1)g1(|s-t|)(2|r(x)|ChTη+C2hT2η)×(β22+g0(|s-t|))ϕ(hT)2.
Therefore, setting
(37)GT(y)∶=r(x)2g0(y)ϕ(hT)2+(2|r(x)|ChTη+C2hT2ηiiiiiiii+(2(|r(x)|+ChTη)+1)g1(y)(2|r(x)|ChTη+C2hT2η)×(β22+g0(y))ϕ(hT)2+(2r(x)ChTη+C2hT2η)β22ϕ(hT)2,
the obtained upper bounds for L, M, and N yield
(38)|Cov(ΨsΔT,s,ΨtΔT,t)|≤GT(|s-t|).
Final Bound. Combining (24) and (38) and using (H2)-(i), we have
(39)Var(r^2,T(x))≤2T2E(ΔT,0)2∫t=0T∫s=tTGT(s-t)dtds≤2TE(ΔT,0)2∫y=0TGT(y)dy≤2Tβ12ϕ(hT)2∫y=0TGT(y)dy.
Since g0 and g1 are integrable and r is bounded on C and hη=O(1/T), there exists a constant C0 such that
(40)supx∈CVar(r^2,T(x))≤C0T.
The special choice of Ψ:(x)↦1 leads us to
(41)supx∈CVar(r^1,T(x))≤C1T.
This last inequality concludes the proof of Lemma 3.
Proof of Theorem 1.
We can write
(42)r^T(x)-r(x)=(r^T(x)-r^2,T(x))+(r^2,T(x)-E(r^2,T(x)))+(E(r^2,T(x))-r(x)).
The elementary inequality: (a+b+c)2≤3(a2+b2+c2), (a,b,c)∈(0,∞)3, yields
(43)supx∈CE(r^T(x)-r(x))2≤3(U+V+W),
where
(44)U=supx∈CE(r^T(x)-r^2,T(x))2,V=supx∈CE(r^2,T(x)-E(r^2,T(x)))2,W=supx∈C(E(r^2,T(x))-r(x))2.
Upper Bound for V. Lemma 3 yields
(45)V=supx∈CVar(r^2,T(x))=O(1T).
Upper Bound for W. Lemma 2 yields
(46)W=O(1T2).
Upper Bound for U. We define, for any t∈[0,T], the quantity:
(47)Zt∶={K(hT-1d(x,Xt))∫t=0TK(hT-1d(x,Xt))dtif∫t=0TK(hT-1d(x,Xt))dt≠0,1Totherwise.
Note that, when r^1,T(x)≠0,
(48)r^2,T(x)=r^T(x)×r^1,T(x),
so that
(49)U≤supx∈CE(r^T(x)(1-r^1,T(x)))2+E(I{0}(r^1,T(x))∫0TΨ(Yt)Tdt).
Using (H4) and Lemma 3, we get
(50)supx∈CE(r^T(x)(1-r^1,T(x)))2=supx∈CE(E(r^T2(x)(1-r^1,T(x))2∣TT))=supx∈CE(E(r^T2(x)∣TT)iiiiiiiiiiiiiiiiii×(1-r^1,T(x))2)=supx∈CE(E((∫t=0TZtΨ(Yt)dt)2∣TT)iiiiiiiiiiiiiiiiiiiiiii×(1-r^1,T(x))2((∫t=0TZtΨ(Yt)dt)2∣TT))=supx∈CE(∫(s,t)∈[0,T]2ZtZsE(Ψ(Yt)Ψ(Ys)∣TT)dsdtiiiiiiiiiiiiiiiii×(1-r^1,T(x))2∫(s,t)∈[0,T]2)≤supx∈CE(∫(s,t)∈[0,T]2ZtZsRdsdtiiiiiiiiiiiiiiiii×(1-r^1,T(x))2∫(s,t)∈[0,T]2)≤supx∈CRE((1-r^1,T(x))2)=supx∈CRVar(r^1,T(x))=O(1T).
Similarly, (H4), Lemma 3, and Chebyshev's inequality lead to
(51)supx∈CE(I{0}(r^1,T(x))∫0TΨ(Yt)Tdt)≤supx∈CRE(I{0}(r^1,T(x)))≤supx∈CRP(|r^1,T(x)-1|≥1)≤RT.
We finally obtain
(52)U=O(1T).
Putting the obtained upper bounds for U, V, and W together, we have
(53)supx∈CE(r^T(x)-r(x))2=O(1T).
Theorem 1 is proved.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors wish to thank the Editor and two anonymous referees for their constructive suggestions which led to some improvements in some earlier versions of the paper.
NadarayaE.On estimating regression19649141142WatsonG.Smooth regression analysis1964264359372RosenblattM.Conditional probability density and regression estimators1969New York, NY, USAAcademic Press2531StoneC. J.Optimal global rates of convergence for nonparametric regression19821041040105310.1214/aos/1176345969MR673642Zbl0511.62048CollombG.HärdleW.Strong uniform convergence rates in robust nonparametric time series analysis and prediction: kernel regression estimation from dependent observations1986231778910.1016/0304-4149(86)90017-7MR8662882-s2.0-0001908707KrzyzakA.PawlakM.The pointwise rate of convergence of the kernel regression estimate198716215916610.1016/0378-3758(87)90065-6MR8957562-s2.0-38149144256RoussasG. G.Nonparametric regression estimation under mixing conditions199036110711610.1016/0304-4149(90)90045-TMR10756042-s2.0-0002591194BosqD.Vitesses optimales et superoptimales des estimateurs fonctionnels pour les processus à temps continu19933171110751078MR1249792RamsayJ. O.SilvermanB. W.20052ndNew York, NY, USASpringerSpringer Series in StatisticsRamsayJ. O.SilvermanB. W.2002New York, NY, USASpringerSpringer Series in StatisticsHorváthL.KokoszkaP.2012New York, NY, USASpringerSpringer Series in Statistics10.1007/978-1-4614-3655-3MR2920735FerratyF.VieuP.Nonparametric models for functional data, with application in regression, time-series prediction and curve discrimination2004161-2111125The International Conference on Recent Trends and Directions in Nonparametric Statistics10.1080/10485250310001622686MR20530652-s2.0-0842264210MasryE.Nonparametric regression estimation for dependent functional data: asymptotic normality2005115115517710.1016/j.spa.2004.07.006MR21053732-s2.0-10244243742FerratyF.MasA.VieuP.Nonparametric regression on functional data: inference and practical aspects200749326728610.1111/j.1467-842X.2007.00480.xMR23964962-s2.0-34548232194FerratyF.VieuP.Kernel regression estimation for functional data2011Oxford, UKOxford University Press72129MR2908020BosqD.Parametric rates of nonparametric estimators and predictors for continuous time processes1997253982100010.1214/aos/1069362734MR1447737Zbl0885.620412-s2.0-0031529178CastellanaJ. V.LeadbetterM. R.On smoothed probability density estimation for stationary processes198621217919310.1016/0304-4149(86)90095-5MR8339502-s2.0-38249043318JirinaM.Conditional probabilities on strictly separable σ-algebras1954479372380MR0069416JirinaM.On regular conditional probabilities1959984445451MR0115202GrunigR.Probabilités conditionnelles régulières sur des tribus de type non dénombrable196623227229MR0196799BosqD.19981102ndNew York, NY, USASpringerLecture Notes in StatisticsFerratyF.VieuP.2006New York, NY, USASpringerSpringer Series in StatisticsMR2229687AzaïsJ.FortJ.-C.Remark on the finite-dimensional character of certain results of functional statistics20133513-413914110.1016/j.crma.2013.02.004MR30380042-s2.0-84875874872LeblancF.Density estimation for a class of continuous time processes199762171199MR1466626Zbl0880.62043VianoM.-C.DeniauC.OppenheimG.Continuous-time fractional ARMA processes199421432333610.1016/0167-7152(94)00015-8Zbl0809.620852-s2.0-0001698703BlankeD.Sample paths adaptive density estimation2004132123152MR2090469Zbl1129.62075TimmermansC.DelsolL.von SachsR.Using Bagidis in nonparametric functional data analysis: predicting from curves with sharp local features201311542144410.1016/j.jmva.2012.10.0132-s2.0-84870714388BenhenniK.FerratyF.RachdiM.VieuP.Local smoothing regression with functional data200722335336910.1007/s00180-007-0045-0MR23363412-s2.0-36049026514FerratyF.LaksaciA.TadjA.VieuP.Kernel regression with functional response2011515917110.1214/11-EJS600Zbl1274.622812-s2.0-79960997496FerratyF.van KeilegomI.VieuP.Regression when both response and predictor are functions2012109102810.1016/j.jmva.2012.02.008Zbl1241.620542-s2.0-84858233847