Exploiting an expansion for analytic functions of operators, the asymptotic distribution of an estimator of the functional regression parameter is obtained in a rather simple way; the result is applied to testing linear hypotheses. The expansion is also used to obtain a quick proof for the asymptotic optimality of a functional classification rule, given Gaussian populations.
1. Introduction
Certain functions of the covariance operator (such as the square root of a regularized inverse) are important components of many statistics employed for functional data analysis. If Σ is a covariances operator on a Hilbert space, Σ̂ a sample analogue of this operator, and φ a function on the complex plane, which is analytic on a domain containing a contour around the spectrum of Σ, a tool of generic importance is the comparison of φ(Σ̂) and φ(Σ) by means of a Taylor expansion:φ(Σ̂)=φ(Σ)+φ̇Σ(Σ̂-Σ)+remainder.
(It should be noted that φ̇Σ is not in general equal to φ′(Σ), where φ′ is the numerical derivative of φ; see also Section 3.) In this paper, two further applications of the approximation in (1.1) will be given, both related to functional regression.
The first application (Section 4) concerns the functional regression estimator itself. Hall and Horowitz [1] have shown that the IMSE of their estimator, based on a Tikhonov type regularized inverse, is rate optimal. In this paper, as a complementary result, the general asymptotic distribution is obtained, with potential application to testing linear hypotheses of arbitrary finite dimension, mentioned in Cardot et al. [2] as an open problem: these authors concentrate on testing a simple null hypotheses. Cardot et al. [3] establish convergence in probability and almost sure convergence of their estimator which is based on spectral cutoff regularization of the inverse of the sample covariance operator. In the present paper, the covariance structure of the Gaussian limit will be completely specified. The proof turns out to be routine thanks to a “delta-method’’ for φ(Σ̂)-φ(Σ), which is almost immediate from (1.1).
The second application (Section 5) concerns functional classification, according to a slight modification of a method by Hastie et al. [4], exploiting penalized functional regression. It will be shown that this method is asymptotically optimal (Bayes) when the two populations are represented by equivalent Gaussian distributions with the same covariance operator. The simple proof is based on an upper bound for the norm of φ(Σ̂)-φ(Σ), which follows at once from (1.1).
Let us conclude this section with some comments and further references. The expansion in (1.1) can be found in Gilliam et al. [5], and the ensuing delta method is derived and applied to regularized canonical correlation in Cupidon et al. [6]. For functional canonical correlation see also Eubank and Hsing [7], He et al. [8], and Leurgans et al. [9]. When the perturbation (Σ̂-Σ in the present case) commutes with Σ the expansion (1.1) can already be found in Dunford & Schwartz [10, Chapter VII], and the derivative does indeed reduce to the numerical derivative. This condition is fulfilled only in very special cases, for instance, when the random function, whose covariance operator is Σ, is a second order stationary process on the unit interval. In this situation, the eigenfunctions are known and only the eigenvalues are to be estimated. This special case, that will not be considered here, is discussed in Johannes [11] who in particular deals with regression function estimators and their IMSE is Sobolev norms, when the regression is such a stationary process. General information about functional data analysis can be found in the monographs by Ramsay and Sliverman [12] and Ferraty and Vieu [13]. Functional time series are considered in Bosq [14]; see also Mas [15].
2. Preliminaries and Introduction of the Models2.1. Preliminaries
As will be seen in the examples below, it is expedient to consider functional data as elements in an abstract Hilbert space ℍ of infinite dimension, separable, and over the real numbers. Inner product and norm in ℍ will be denoted by 〈·,·〉 and ∥·∥, respectively. Let (Ω,ℱ,ℙ) be a probability space, X:Ω→ℍ a Hilbert space valued random variable (i.e., measurable with respect to the σ-field of Borel sets ℬℍ in ℍ), and η:Ω→ℝ a real valued random variable. For all that follows it will be sufficient to assume thatE‖X‖4<∞,Eη2<∞.
The mean and covariance operator of X will be denoted byEX=μX,E(X-μX)⊗(X-μX)=ΣX,X,
respectively, where a⊗b is the tensor product in ℍ. The Riesz representation theorem guarantees that these quantities are uniquely determined by the relationsE〈a,X〉=〈a,μX〉,∀a∈H,E〈a,X-μX〉〈X-μX,b〉=〈a,ΣX,Xb〉,∀a∈H∀b∈H,
see Laha & Rohatgi [16]. Throughout ΣX,X is assumed to be one-to-one.
Let ℒ denote the Banach space of all bounded linear operators T:ℍ→ℍ equipped with the norm ∥·∥ℒ. An operator U∈ℒ is called Hilbert-Schmidt if∑k=1∞‖Uek‖2<∞,
for any orthonormal basis e1,e2,… of ℍ. (The number in (2.4) is in fact independent of the choice of basis.) The subspace ℒHS⊂ℒ of all Hilbert-Schmidt operators is a Hilbert space in its own right with the inner product〈U,V〉HS=∑k=1∞〈Uek,Vek〉,
again independent of the choice of basis. This inner product yields the norm‖U‖HS2=∑k=1∞‖Uek‖2,
which is the number in (2.4). The tensor product for elements a,b∈ℍ will be denoted by a⊗b, and that for elements U,V∈ℒHS by U⊗HSV.
The two problems to be considered in this paper both deal with cases where the best linear predictor of η in terms of X is linear:E(η∣X)=α+〈X,f〉,α∈R,f∈H.
Just as in the univariate case (Rao [17, Section 4g]), we have the relationΣX,Xf=E(η-μη)(X-ηX)=ΣX,η.
It should be noted that if ΣX,X is one-to-one and ΣX,η in its range, we can solve (2.8) and obtainf=ΣX,X-1ΣX,η.
Since the underlying distribution is arbitrary, the empirical distribution, given a sample (X1,η1),…,(Xn,ηn) of independent copies of (X,η), can be substituted for it. The minimization property is now the least squares property, the same formulas are obtained with μX, ΣX,X, μη, and ΣX,η replaced with their estimatorsμ̂X=1n∑i=1nXi=X¯,Σ̂X,X=1n∑i=1n(Xi-X¯)⊗(Xi-X¯),μ̂η=1n∑i=1nηi=η¯,Σ̂X,η=1n∑i=1n(ηi-η¯)×(Xi-X¯).
Let us next specify the two problems.
2.2. Functional Regression Estimation
The model here isη=α+〈X,f〉+ε,
where ε is a real valued error variable and the following assumption is satisfied.
Assumption 2.1.
The error variable has a finite second moment, and
ε⫫X,Eε=0,Varε=v2<∞.
Example 2.2.
The functional regression model in Hall and Horowitz [1] is essentially obtained by choosing ℍ=L2(0,1), so that the generic observation is given by
η=α+∫01X(t)f(t)dt+ε.
Example 2.3.
Mas and Pumo [18] argue that in the situation of Example 2.2, the derivative X′ of X may contain important information and should therefore be included. Hence, these authors suggest to choose for ℍ the Sobolev space W2,1(0,1) in which case the generic observation satisfies
η=α+∫01X(t)f(t)dt+∫01X′(t)f′(t)dt+ε.
Example 2.4.
Just as in the univariate case, we have that the model
η=α+〈X,f〉2+ε,
quadratic in the inner product of ℍ, is in fact linear in the inner product of ℒHS, because
〈X,f〉2=〈X⊗X,f⊗f〉HS.
We will not pursue this example here.
In the infinite dimensional case Σ̂X,X cannot be one-to-one, and in order to estimate f from the sample version of (2.9), a regularized inverse of Tikhonov type will be used, as in Hall & Horowitz [1]. Thus, we arrive at the estimator (see also (2.11) and (2.13))f̂δ=(δI+Σ̂X,X)-1(1n∑i=1nηi(Xi-X¯))=(δI+Σ̂X,X)-1Σ̂X,η,forsomeδ>0.
Let us also introducefδ=(δI+Σ)-1Σf,f∈H.
In Section 4, the asymptotic distribution of this estimator will be obtained, and the result will be applied to testing.
2.3. Functional Classification
The method discussed here is essentially the one in Hastie et al. [4] and Hastie et al. [19, Sections 4.2 and 4.3]. Let P1 and P2 be two probability distributions on (ℍ,ℬℍ) with means μ1 and μ2 and common covariance operator Σ. Consider a random element (I,X):Ω→{1,2}×ℍ with distribution determined byP{X∈B∣I=j}=Pj(B),B∈BH,P{I=j}=πj≥0,π1+π2=1.
In this case, the distribution of X is π1P1+π2P2, with meanμX=π1μ1+π2μ2,
and covariance operatorΣX,X=Σ+π1π2(μ1-μ2)⊗(μ1-μ2).
Hastie et al. [19] now introduce the indicator response variables ηj=1{j}(I), j=1,2, and assume that the ηj satisfies (2.7) for αj∈ℝ and fj∈ℍ. Note thatμηj=Eηj=πj,ΣX,ηj=Eηj(X-μX)={π1π2(μ1-μ2),j=1,π1π2(μ2-μ1),j=2.
Since ηj is Bernoulli, we have, of course, 𝔼(ηj∣X)=ℙ{I=j∣X}. Precisely as for matrices (Rao & Toutenburg [20, Theorem A.18]), the inverse of the operator in (2.24) equalsΣX,X-1=Σ-1-γ⋅Σ-1((μ1-μ2)⊗(μ1-μ2))Σ-1,
where γ=π1π2/(1+π1π2〈μ1-μ2,Σ-1(μ1-μ2)〉), provided that the following assumption is satisfied.
Assumption 2.5.
The vector μ1-μ2 lies in the range of Σ, that is,
Σ-1(μ1-μ2)iswelldefined.
It will also be assumed that
π1=π2=12.
Assuming (2.28), (2.8) can be solved and yields after some algebrafj=ΣX,X-1ΣX,ηj={γΣ-1(μ1-μ2),j=1,γΣ-1(μ2-μ1),j=2.
If only X, and not I, is observed, the rule in Hastie et al. [19] assigns X to P1 if and only ifE(η1∣X)>E(η2∣X),⟺〈X-μX,Σ-1(μ1-μ2)〉>(π2-π1)2γ.
Because of assumption (2.29), the rule reduces to〈X-12(μ1+μ2),Σ-1(μ1-μ2)〉>0.
Hastie et al. [19] claim that in the finite dimensional case, their rule reduces to Fisher’s linear discriminant rule and to the usual rule when the distributions are normal. This remains in fact true in the present infinite dimensional case. Let us assume thatPj=G(μj,Σ),j=1,2,
where 𝒢(μ,Σ) denotes a Gaussian distribution with mean μ and covariance operator Σ. It is well known [21–23] that under Assumption 2.5 these Gaussian distributions are equivalent. This is important since there is no “Lebesgue measure’’ on ℍ [24]. However, now the densities of P1 and P2 with respect to P1 can be considered; it is well known thatdP1dP2(x)=e-〈x-(1/2)(μ1+μ2),Σ-1(μ1-μ2)〉,x∈H.
This leads at once to (2.33) as an optimal (Bayes) rule, equal in appearance to the one for the finite dimensional case.
In most practical situations, μ1, μ2, and Σ are not known, but a training sample (I1,X1),…,(In,Xn) of independent copies of (I,X) is given. LetX¯j=1nj∑j∈JjXi,Jj={i:Ii=j},#Jj=nj,Σ̂=1n∑j=12∑i∈Jj(Xi-X¯j)⊗(Xi-X¯j),
and we have (cf. (2.24))Σ̂X,X=Σ̂+n1n2n(X¯1-X¯2)⊗(X¯1-X¯2).
Once again the operator Σ̂ (and Σ̂X,X for that matter) cannot be one-to-one. In order to obtain an empirical analogue of the rule (2.32), Hastie et al. [4] employ penalized regression, and Hastie et al. [19] also suggest to use a regularized inverse. (The methods are related.) Here the latter method will be used and X will be assigned to P1 if and only if〈X-12(X¯1+X¯2),(δI+Σ̂)-1(X¯1-X¯2)〉>0.
Section 5 is devoted to showing that this rule is asymptotically optimal when Assumption (2.5) is fulfilled.
3. A Review of Some Relevant Operator Theory
It is well known [16] that the covariance operator Σ is nonnegative, Hermitian, of finite trace, and hence Hilbert-Schmidt and therefore compact. The assumption that Σ is one-to-one is equivalent to assuming that Σ is strictly positive. Consequently, Σ has eigenvalues σ12>σ22>⋯↓0, all of finite multiplicity. If we let P1,P2,… be the corresponding eigenprojections, so that ΣPk=σk2Pk, we have the spectral representation:Σ=∑k=1∞σk2Pk,with∑k=1∞Pk=I.
The spectrum of Σ equals σ(Σ)={0,σ12,σ22,…}⊂[0,σ12]. Let us introduce a rectangular contour Γ around the spectrum as in Figure 1, where δ>0 is the regularization parameter in (2.20), and let Ω be the open region enclosed by Γ. Furthermore, let D⊃(Ω∪Γ)=Ω¯ be an open neighborhood of Ω¯ and suppose thatφ:D⟶Cisanalytic.
We are interested in approximations of φ(Σ̃)=φ(Σ+Π), where Π∈ℒ is a perturbation. The application we have in mind arises for Π=Π̂=Σ̂-Σ and yields an approximation of φ(Σ̂); see also Watson [25] for the matrix case. Therefore, we will not in general assume that Π and Σ will commute. In the special case where X is stationary, as considered in Johannes [11], there exists a simpler estimator Σ of Σ, such that Σ and Π do commute, which results in a simpler theory; see also Remark (4.1).
The resolvent of Σ,R(z)=(zI-Σ)-1,z∈ρ(Σ)=[σ(Σ)]c,
is analytic on the resolvent set ρ(Σ), and the operatorφ(Σ)=12πi∮Γφ(z)R(z)dz
is well defined. For the present operator Σ, as given in (3.1), the resolvent equals more explicitlyR(z)=∑k=1∞1z-σk2Pk.
Substitution of (3.5) in (3.4) and application of the Cauchy integral formula yieldsφ(Σ)=∑k=1∞φ(σk2)Pk.
Example 3.1.
The two functions
φ1(z)=1δ+z,φ2(z)=zδ+z,z∈C∖{-δ},
are analytic on their domain that satisfies the conditions. With the help of these functions we may write (cf. (2.20) and (2.21))
f̂δ-fδ=(φ1(Σ))(1n∑i=1nεi(Xi-X¯))+(φ2(Σ̂)-φ2(Σ))f,f∈H,
and (cf. (2.39))
(δI+Σ̂)-1(X¯1-X¯2)=φ1(Σ̂)(X¯1-X¯2).
Regarding the following brief summary and slight extension of some of the results in [5], we also refer to Dunford & Schwartz [10], Kato [26], and Watson [25]. Henceforth, we will assume that‖Π‖L≤δ4.
For such perturbations, we have σ(Σ̃)=σ(Σ+Π)⊂Ω, so that the resolvent set of Σ̃ satisfiesρ(Σ̃)=ρ(Σ+Π)⊃Ωc⊃Γ.
It should also be noted that‖R(z)‖L=supk∈N1|z-σk2|≤2δ∀z∈Ωc.
The basic expansion (similar to Watson [25])R̃(z)=(zI-Σ̃)-1=R(z)+∑k=1∞R(z)(ΠR(z))k,z∈Ωc,
can be written asR̃(z)=R(z)+R(z)ΠR(z)(I-ΠR(z))-1,
useful for analyzing the error probability for δ→0, as n→∞, and also asR̃(z)=R(z)+R(z)ΠR(z)+R(z)(ΠR(z))2(I-ΠR(z))-1,
useful for analyzing the convergence in distribution of the estimators.
Let us decompose the contour Γ into two parts Γ0={-(1/2)δ+iy:-1≤y≤1} and Γ1=Γ∖Γ0, write Mφ=maxz∈Γ|φ(z)|, and observe that (3.10) and (3.12) entail that‖(I-ΠR(z))-1‖L≤2,z∈Ωc.
We now have‖12πi∮Γφ(z)R(z)(ΠR(z))k(I-ΠR(z))-1dz‖L≤1πMφ‖Π‖Lk∮Γ‖R(z)‖Lk+1dz≤1πMφ‖Π‖Lk{∫-1+1(14δ2+y2)-(1/2)(k+1)dy+|∫Γ11dz|}≤2πMφ‖Π‖Lk{(4δ)k+5+2‖Σ‖L},k∈N.
Multiplying both sides by φ(z), taking (1/2πi)∮Γ, and using 0<C<∞ as a generic constant that does not depend on Π or δ, (3.14) and (3.15) yield the following.
Theorem 3.2.
Provided that (3.10) is fulfilled, one has
‖φ(Σ+Π)-φ(Σ)‖L≤CMφ‖Π‖Lδ,‖φ(Σ+Π)-φ(Σ)-φ̇ΣΠ‖L≤CMφ(‖Π‖Lδ)2,
where φ̇Σ:ℒ→ℒ is a bounded operator, given by
φ̇ΣΠ=∑kφ′(σk2)PkΠPk+∑∑j≠kφ(σk2)-φ(σj2)σk2-σj2PjΠPk.
Remark 3.3.
If Σ and Π commute, so will Pk and Π, and R(z) and Π, and the expressions simplify considerably. In particular, (3.20) reduces to
φ̇ΣΠ=(∑kφ′(σk2)Pk)Π,
that is, φ̇Σ=φ′(Σ), where φ′ is the numerical derivative of φ, and φ′(Σ) should be understood in the sense of “functional calculus’’ as in (3.6). For the commuting case, see Dunford & Schwartz [10].
Let us now present some basic facts that are useful to subsequent statistical applications. Dauxois et al. [27] have shown that there exists a Gaussian random element 𝒢Σ:Ω→ℒHS, such that n(Σ̂-Σ)→d𝒢Σ, as n→∞, in ℒHS. Because the identity map from ℒHS in ℒ is continuous, we may staten(Σ̂-Σ)⟶dGΣ,asn⟶∞,inLHS⟹inL,
by the continuous mapping theorem. This entails that‖Π̂‖L=‖Σ̂-Σ‖L=Op(1n),asn⟶∞,
so that condition (3.10) is fulfilled with arbitrary high probability for n sufficiently large. Expansions (3.14) and (3.15) and the resulting inequalities hold true for Σ̃ replaced with Σ̂(ω)=Σ+Π̂(ω) for ω∈{∥Π̂∥ℒ≤δ/4}.
Example 3.4.
Application to asymptotic distribution theory. In this application δ>0 will be kept fixed: see also Section 4.1. It is based on the delta method for functions of operators [6] which follows easily from (3.19). In conjunction with (3.22) this yields
n(φ2(Σ̂)-φ2(Σ))⟶dφ̇2,ΣGΣ,asn⟶∞,inL.
In turn this yields
n(φ2(Σ̂)-φ2(Σ))f⟶d(φ̇2,ΣGΣ)f,asn⟶∞,inH,
for any f∈ℍ, by the continuous mapping theorem. This result will be used in Section 4.
Example 3.5.
Application to classification. Here we will let δ=δ(n)↓0, as n→∞, and write φ1,n(z)=1/(δ(n)+z) to stress this dependence on sample sizes. Since
maxz∈Γ|φ1,n(z)|≤1δ(n),
it is immediate from (3.18) that
‖φ1,n(Σ̂)-φ1,n(Σ))‖L=Op(1δ2(n)n),asn⟶∞,
a result that will be used in Section 5.
4. Asymptotics for the Regression Estimator4.1. The Asymptotic Distribution
The central limit theorem in Hilbert spaces entails at once1n∑i=1nεi(Xi-μ)⟶dG0,asn⟶∞,inH,
where 𝒢0 is a zero mean Gaussian random variable in ℍ, and1n∑i=1n{(Xi-μ)⊗(Xi-μ)-Σ}⟶dGΣ,asn⟶∞,inLHS,
where 𝒢Σ is a zero mean random variable in ℒHS. These convergence results remain true with μ replaced by X¯ and, because ε⫫X by assumption (2.15), we also have that jointly[1n∑i=1nεi(Xi-X¯)n(Σ̂-Σ)]⟶d[G0GΣ],asn⟶∞,inH×LHS,
where 𝒢Σ is the same in (3.22), andG0⫫GΣ.
Because the limiting variables are generated by the sums of iid variables on the left in (4.1) and (4.2) we haveEG0⊗G0=E{ε(X-μ)}⊗{ε(X-μ)}=(Eε2)E(X-μ)⊗(X-μ)=v2Σ,EGΣ⊗HSGΣ=E{(X-μ)⊗(X-μ)-Σ}⊗HS{(X-μ)⊗(X-μ)-Σ},
for the respective covariance structures. These are important to further specify the limiting distribution of the regression estimator as will be seen in Section 4.2.
Let us write, for brevity,f̂δ-fδ=Un+Vn,
where, according to (2.20) and (2.21),Un=(φ1(Σ̂))(1n∑i=1nεi(Xi-X¯)),Vn=(φ2(Σ̂)-φ2(Σ))f.
Note that φ1 and φ2 depend on δ.
With statistical applications in mind, it would be interesting if there would exist numbers an↑∞ and δ(n)↓0, as n→∞, such thatan(f̂δ(n)-f)⟶dH,asn⟶∞,inH,
where ℋ is a nondegenerate random vector. It has been shown in Cardot et al. [28], however, that such a convergence in distribution when we center at f is not in general possible.
Theorem 4.1.
For fixed δ>0, one has
n(f̂δ-fδ)⟶dH=H1+H2,asn⟶∞,inH,
where ℋ1=φ1(Σ)𝒢0 and ℋ2=(φ̇2,Σ𝒢Σ)f are zero mean Gaussian random elements, and ℋ1⫫ℋ2.
Further information about the structure of the covariance operator of the random vector ℋ on the right in (4.10) will be needed in order to be able to exploit the theorem for statistical inference. This will be addressed in the next section.
4.2. Further Specification of the Limiting Distribution
It follows from (4.5) that 𝒢0 has a Karhunen-Loève expansionG0=∑i=1∞vσjZjpj,
where the real valued random variablesZj(j∈N)areiidN(0,1).
Accordingly ℋ1 in (4.10) can be further specified asH1=φ1(Σ)G0=v∑j=1∞σjδ+σj2Zjpj.
The Gaussian operator in (4.2) has been investigated in Dauxois et al. [27], and here we will briefly summarize some of their results in our notation. By evaluating the inner product in ℒHS in the basis p1,p2… it follows from (4.6) thatE〈pj⊗pk,GΣ〉HS〈GΣ,pα⊗pβ〉HS=E〈pj,〈X-μ,pk〉(X-μ)-σk2pk〉×〈pα,〈X-μ,pβ〉(X-μ)-σβ2pβ〉=E〈X-μ,pj〉〈X-μ,pk〉〈X-μ,pα〉〈X-μ,pβ〉-δj,kδα,βσk2σβ2.
This last expression does not in general further simplify. However, if we assume that the regressor X satisfiesX=dGaussian(μ,Σ),
it can be easily seen that the〈X-μ,pj〉=dN(0,σj2)areindependent,
so that the expression in (4.14) equals zero if (j,k)≠(α,β). As in Dauxois et al. [27], we obtain in this caseE〈pj⊗pk,GΣ〉HS〈GΣ,pα⊗pβ〉HS={0,(j,k)≠(α,β),vj,k2,(j,k)=(α,β),
wherevj,k2={2σj4,j=k,σj2⋅σk2,j≠k.
Consequently the pj⊗pk(j∈ℕ,k∈ℕ) are an orthonormal basis of eigenvectors of the covariance operator of 𝒢Σ with eigenvalues vj,k2. Hence 𝒢Σ has the Karhunen-Loève expansion (in ℒHS)GΣ=∑j=1∞∑k=1∞vj,kZj,kpj⊗pk,
where the random variablesZj,k(j∈N,k∈N)areiidN(0,1).
Let us, for brevity, write (see (3.7) for φ2)wj,k={φ2(σk2)-φ2(σj2)σk2-σj2,j≠kφ2′(σj2),j=k}=δ(δ+σj2)(δ+σk2),∀j,k∈N,H2=(φ̇2,ΣG)f=∑j=1∞∑k=1∞wj,kPjGΣPkf=∑j=1∞∑k=1∞wj,kPj(∑α=1∞∑β=1∞vα,βZα,βpα⊗pβ)Pkf=∑j=1∞∑k=1∞wj,kvj,kZj,k〈f,pk〉pj.
Summarizing, we have the following result.
Theorem 4.2.
The random element ℋ1, on the right in (4.10) can be represented by (4.13). If one assumes that the regressor X=dGaussian(μ,Σ), the random element ℋ2 on the right in (4.10) can be represented by (4.22), where the Zj,k in (4.20) are stochastically independent of the Zj in (4.12).
4.3. Asymptotics under the Null Hypothesis
Let us recall that fδ is related to f according to (2.21) so that the equivalenceH0:fδ=0⟺f=0,
where again δ>0 is fixed, holds true. The following is immediate from Theorem 4.1.
Theorem 4.3.
Under the null hypothesis in (4.23), one has
n‖f̂δ‖2⟶d‖H‖2=‖H1‖2,asn⟶∞,
where
‖H1‖2=dv2∑j=1∞σj2(δ+σj2)2Zj2.
An immediate generalization of the hypothesis in (4.23) isH0:fδ=fδ,0=(φ2(Σ))f0⟺f=f0,
for some given f0∈ℍ. This hypothesis is in principle of interest for confidence sets. Of course, testing (4.26) can be reduced to testing (4.23) by replacing the ηi withηi′=ηi-〈Xi,fδ,0〉=α+〈Xi,f-fδ,0〉+εi,
and then using the estimatorf̂δ′=(δI+Σ̂)-1(1n∑i=1nηi′(Xi-X¯)).
Since f-fδ,0=0 the following is immediate.
Theorem 4.4.
Assuming (4.27), one has
n‖f̂δ′‖2⟶d‖H1‖2,asn⟶∞,
where ∥ℋ1∥2 has the same distribution as in (4.25). Related results can be found in Cardot et al. [2].
Another generalization of the hypothesis in (4.23) isH0:fδ∈M=[q1,…,qM],
where q1,…,qM are orthonormal vectors in ℍ. Let Q and Q⊥ denote the orthogonal projection onto 𝕄 and 𝕄⊥ respectively, and note that fδ∈𝕄 if and only if ∥Q⊥fδ∥=0. A test statistic might be based on ∥Q⊥f̂δ∥2 and we have the following.
Theorem 4.5.
Under H0 in (4.30), one has
n‖Q⊥f̂δ‖2⟶d‖Q⊥H‖2,asn⟶∞.
The distribution on the right in (4.31) is rather complicated if q1,…,qM remain arbitrary. But if we are willing to assume (4.20), it follows from (4.27) that‖Q⊥H‖2=‖Q⊥H1‖2+‖Q⊥H2‖2+2〈Q⊥H1,Q⊥H2〉=v2∑j=1∞∑k=1∞σjδ+σj2σkδ+σk2ZjZk〈Q⊥pj,Q⊥pk〉+∑j=1∞∑k=1∞∑α=1∞∑β=1∞wj,kwα,βvj,kvα,βZj,kZα,β〈f,pk〉〈f,pβ〉〈Q⊥pj,Q⊥pα〉+2v∑j=1∞σjδ+σj2Zj∑α=1∞∑β=1∞wα,βvα,βZα,β〈f,pβ〉〈Q⊥pj,Q⊥pα〉.
A simplification is possible if we are willing to modify the hypothesis in (4.30) and use a so-called neighborhood hypothesis. This notion has a rather long history and has been investigated by Hodges & Lehmann [29] for certain parametric models. Dette & Munk [30] have rekindled the interest in it by an application in nonparametric regression. In the present context we might replace (4.30) with the neighborhood hypothesisH0,ε:‖Q⊥fδ‖2≤ε2,forsomeε>0.
It is known from the literature that the advantage of using a neighborhood hypothesis is not only that such a hypothesis might be more realistic and that the asymptotics are much simpler, but also that without extra complication we might interchange null hypothesis and alternative. This means in the current situation that we might as well test the null hypothesisH0,ε′:‖Q⊥fδ‖2≥ε2,forsomeε>0,
which could be more suitable, in particular in goodness-of-fit problems.
The functional g↦∥Q⊥g∥2, g∈ℍ, has a Fréchet derivative at fδ given by the functional g↦2〈g,Q⊥fδ〉, g∈ℍ. Therefore, the delta method in conjunction with Theorem 4.1 entails the following result.
Theorem 4.6.
One has
n{‖Q⊥f̂δ‖2-‖Q⊥fδ‖2}⟶d2〈H,Q⊥fδ〉,asn⟶∞.
The limiting distribution on the right in (4.35) is normal with mean zero and complicated varianceΔ2=4E〈H,Q⊥fδ〉2=4{E〈H1,Q⊥fδ〉2+E〈H2,Q⊥fδ〉2}=4v2∑j=1∞σj2(δ+σj2)2〈pj,Q⊥fδ〉2+4E〈∑j=1∞∑k=1∞wj,kPj(∑α=1∞∑β=1∞vα,βZα,βpα⊗pβ)Pkf,Q⊥fδ〉2=4v2∑j=1∞σj2(δ+σj2)2〈pj,Q⊥fδ〉2+4∑j=1∞∑k=1∞wj,k2vj,k2〈f,pk〉2〈pj,Q⊥fδ〉2.
Remark 4.7.
As we see from the expressions in (4.24), (4.32), and (4.36), the limiting distributions depend on infinitely many parameters that must be suitably estimated in order to be in a position to use the statistics for actual testing. Estimators for the individual parameters are not too hard to obtain. The eigenvalues σj2 and eigenvectors pj of Σ, for instance, can in principle be estimated by the corresponding quantities of Σ̂. Although in any practical situation only a finite number of these parameters can be estimated, theoretically this number must increase with the sample size and some kind of uniform consistency will be needed for a suitable approximation of the limiting distribution. This interesting question of uniform consistency seems to require quite some technicalities and will not be addressed in this paper.
Remark 4.8.
In this paper we have dealt with the situation where Σ is entirely unknown. It has been observed in Johannes [11] that if X is a stationary process on the unit interval, the eigenfunctions pj of the covariance operator are always the same, known system of trigonometric functions, and only its eigenvalues σj2 are unknown. Knowing the pj leads to several simplifications. In the first place, Σ can now be estimated by the expression on the right in (3.1) with only the σk2 replaced with estimators. If Σ̃ is this estimator, it is clear that Σ and Π̃=Σ̃-Σ commute, so that the derivative φ̇2,Σ now simplifies considerably (see Remark 3.3). Secondly, we might consider the special case of H0 in (4.30), where qj=pj, j=1,…,M. We now have
fδ∈M=[p1,…,pM]⟺f∈M,
so that even for fixed δ we can test the actual regression function. In the third place, under the null hypothesis in (4.37), the number of unknown parameters in (4.32) reduces considerably because now Q⊥pj=0 for j=1,…,M. When the pj are known, in addition to all the changes mentioned above, also the limiting distribution of Σ differs from that of Σ̂. Considering all these modifications that are needed, it seems better not to include this important special case in this paper.
4.4. Asymptotics under Local Alternatives
Again we assume that X is Gaussian. Suppose thatf=fn=f̃+1ng,forf̃,g∈H.
For such fn only minor changes in the asymptotics are required, because the conditions on the Xi and εi are still the same and do not change with n. Let us writefδ=fn,δ=f̃δ+1ngδ,
where f̃δ=(δI+Σ)-1f̃, gδ=(δI+Σ)-1Σg. The following is immediate from a minor modification of Theorem 4.1.
Theorem 4.9.
For fδ=fn,δ as in (4.39), one has
n(f̂δ-f̃δ)⟶dgδ+H1+H̃2,
where ℋ1=φ1(Σ)𝒢0 is the same as in (4.13), ℋ̃2 is obtained from ℋ2 in (4.22) by replacing f with f̃, and ℋ1⫫ℋ̃2.
By way of an example, let us apply this result to testing the neighborhood hypothesis and consider the asymptotics of the test statistics in (4.35) under the local alternatives fn,δ in (4.39) with‖Q⊥f̃δ‖2=ε2,gδ⊥M,〈f̃δ,gδ〉>0.
Note that under such alternatives, f̂δ(n) is still a consistent estimator of f̃ for any δ(n)↓0, as n→∞, and so is f̂δ for fδ. The following is immediate from Theorem 4.9. To conclude this section, let us first assume that parameters can be suitably estimated. We then arrive at the limiting distribution of a test statistics that allows the construction of an asymptotic level-α test whose asymptotic power can be computed.
Theorem 4.10.
For fδ=fn,δ, as in (4.40) and (4.41), one has
Tn=nΔ̂(‖Q⊥f̂δ‖2-ε2)⟶dN(2〈gδ,Q⊥f̃δ〉Δ,1),asn⟶∞,
assuming that Δ̂ is a consistent estimator of Δ in (4.36). Note that the limiting distribution is N(0,1) under ℋ0 (i.e., gδ=0).
5. Asymptotic Optimality of the Classification Rule
In addition to Assumption 2.5 and (2.34), it will be assumed that the smoothness parameter δ=δ(n) in (2.39) satisfiesδ(n)⟶0,δ(n)≫n-1/4,asn⟶∞.
We will also assume that the sizes of the training samples n1 and n2 (see (2.36)) are deterministic and satisfy (n=n1+n2)0<limn→∞infnjn≤limn→∞supnjn<1.
Let us recall from (3.7) that φ1(z)=φ1,n(z)=1/{δ(n)+z},z≠-δ(n).
Under these conditions the probability of misclassification equals ℙ{〈X-(1/2)(X¯1+X¯2),φ1,n(Σ̂)(X¯1-X¯2)〉>0∣X=d𝒢(μ2,Σ)}. Let us note that|〈X-12(X¯1+X¯2),φ1,n(Σ̂)(X¯1-X¯2)〉-〈X-12(μ1+μ2),φ1,n(Σ)(μ1-μ2)〉|≤12{‖X¯1-μ1‖+‖X¯2-μ2‖}‖φ1,n(Σ)‖L‖μ1-μ2‖+‖X-12(X¯1+X¯2)‖×[‖φ1,n(Σ̂)-φ1,n(Σ)‖L‖X¯1-X¯2‖+{‖X¯1-μ1‖+‖X¯2-μ2‖}‖φ1,n(Σ)‖L].
Since ∥X¯j-μj∥=𝒪p(n-1/2), ∥φ1,n(Σ)∥ℒ=𝒪(δ-1(n)), and, according to (3.21),‖φ1,n(Σ̂)-φ1,n(Σ)‖L=Op(1δ2(n)n),
it follows from (5.1) that the limit of the misclassification probability equalslimn→∞P{〈X-μ2-12(μ1-μ2),φ1,n(Σ)(μ1-μ2)〉>0}=1-Φ(12〈μ1-μ2,Σ-1(μ1-μ2)〉),
where Φ is the standard normal cdf.
For (5.5) we have used the well-known property of regularized inverses that ∥(δ+Σ)-1Σf-f∥→0, as δ→0, for all f∈ℍ, and the fact that we may choose f=Σ-1(μ1-μ2) by Assumption 2.5. Since rule (2.33) is optimal when parameters are known, we have obtained the following result.
Theorem 5.1.
Let Pj=d𝒢(μj,Σ) and let Assumption 2.5 and (5.1) be satisfied. Then the classification rule (2.39) is asymptotically optimal.
HallP.HorowitzJ. L.Methodology and convergence rates for functional linear regression2007351709110.1214/0090536060000009572332269ZBL1114.62048CardotH.FerratyF.MasA.SardaP.Testing hypotheses in the functional linear model200330124125510.1111/1467-9469.003291965105ZBL1034.62037CardotH.FerratyF.SardaP.Functional linear model1999451112210.1016/S0167-7152(99)00036-X1718346ZBL0962.62081HastieT.BujaA.TibshiraniR.Penalized discriminant analysis19952317310210.1214/aos/11763244561331657ZBL0821.62031GilliamD. S.HohageT.JiX.RuymgaartF.The Fréchet derivative of an analytic function of a bounded operator with some applications20092009172390252496753ZBL1185.47013CupidonJ.GilliamD. S.EubankR.RuymgaartF.The delta method for analytic functions of random operators with application to functional data20071341179119410.3150/07-BEJ61802364231ZBL1129.62011EubankR. L.HsingT.Canonical correlation for stochastic processes200811891634166110.1016/j.spa.2007.10.0062442373ZBL1145.62048HeG.MiillerH.-G.WangJ.-L.Functional canonical analysis for square integrable stochastic processes2003851547710.1016/S0047-259X(02)00056-81978177ZBL1014.62070LeurgansS. E.MoyeedR. A.SilvermanB. W.Canonical correlation analysis when the data are curves19935537257401223939ZBL0803.62049DunfordN.SchwartzJ. T.1957New York, NY, USAInterscienceJohannesJ.Privileged communication2008RamsayJ. O.SilvermanB. W.20052ndNew York, NY, USASpringerxx+426Springer Series in Statistics2168993FerratyF.VieuP.2006New York, NY, USASpringerxx+258Springer Series in Statistics2229687BosqD.2000149New York, NY, USASpringer-VerlagLecture Notes in Statistics178313810.1007/978-1-4612-1154-9ZBL0971.62023MasA.2000Université Paris VILahaR. G.RohatgiV. K.1979New York, NY, USAJohn Wiley & Sonsxiii+557Wiley Series in Probability and Mathematical Statistic534143RaoC. R.1965New York, NY, USAJohn Wiley & Sonsxviii+5220221616MasA.PumoB.Functional linear regression with derivatives2006Institut de Modélisation Mathématique de MontpellierHastieT.TibshiraniR.FriedmanJ.2001New York, NY, USASpringer-Verlagxvi+533Springer Series in Statistics1851606RaoC. R.ToutenburgH.1995New York, NY, USASpringerxii+352Springer Series in Statistics1354840FeldmanJ.Equivalence and perpendicularity of Gaussian processes195886997080102760ZBL0084.13001HájekY.On a property of normal distribution of any stochastic process19588836106170104290GrenanderU.1981New York, NY, USAJohn Wiley & Sonsix+526Wiley Series in Probability and Mathematical Statistic599175SkorohodA. V.1974New York, NY, USASpringer0466482WatsonG. S.1983New York, NY, USAJohn Wiley & Sons709262KatoT.1966New York, NY, USASpringer-Verlagxix+592Die Grundlehren der mathematischen Wissenschaften, Band 1320203473DauxoisJ.PousseA.RomainY.Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference198212113615410.1016/0047-259X(82)90088-4650934ZBL0539.62064CardotH.MasA.SardaP.CLT in functional linear regression models20071383-432536110.1007/s00440-006-0025-22299711ZBL1113.60025Hodges,J. L.Jr.LehmannE. L.Testing the approximate validity of statistical hypotheses1954162612680069461ZBL0057.35403DetteH.MunkA.Validation of linear regression models199826277880010.1214/aos/10281448601626016ZBL0930.62041