AAA Abstract and Applied Analysis 1687-0409 1085-3375 Hindawi Publishing Corporation 139318 10.1155/2013/139318 139318 Research Article Regularized Least Square Regression with Unbounded and Dependent Sampling Chu Xiaorong Sun Hongwei Chun Changbum School of Mathematical Science University of Jinan Shandong Provincial Key Laboratory of Network Based Intelligent Computing Jinan 250022 China ujn.edu.cn 2013 10 4 2013 2013 29 10 2012 22 03 2013 22 03 2013 2013 Copyright © 2013 Xiaorong Chu and Hongwei Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper mainly focuses on the least square regression problem for the α-mixing and ϕ-mixing processes. The standard bound assumption for output data is abandoned and the learning algorithm is implemented with samples drawn from dependent sampling process with a more general output data condition. Capacity independent error bounds and learning rates are deduced by means of the integral operator technique.

1. Introduction and Main Results

The aim of this paper is to study the least square regularized regression learning algorithm. The main novelty of this problem here is the unboundedness and dependence of the sampling process. Let X be a compact metric space (usually a subset of n) and Y=. Suppose that ρ is a probability distribution defined on Z=X×Y. In regression learning, one wants to learn or approximate the regression function fρ:XY given by (1)fρ(x)=𝔼(yx)=Yydρ(yx),xX, where ρ(yx) is the conditional distribution of y for given x. fρ is not directly computable because ρ is unknown in fact. Instead we learn a good approximation of fρ from a set of observations z={(xi,yi)}i=1mZm drawn according to ρ.

The learning algorithm studied here is based on a Mercer kernel K:X×X which is a continuous, symmetric, and positive semidefinite function. The RKHS HK associated with the Mercer kernel K is the completion of span {Kx=K(·,x):xX} with the inner product satisfying K(x,·),K(x,·)K=K(x,x). The learning algorithm is a regularization scheme in HK given by (2)fz,λ=argminfHK{1mi=1m(f(xi)-yi)2+λfK2}, where λ>0 is a regularization parameter.

Error analysis for learning algorithm (2) has been studied in a lot of literatures , which focused on independent samples. In recent years, there are some studies relaxing the independent restriction and turning to the dependent sampling learning . In  the learning performance of regularized least square regression was studied with the mixing sequences, and the result for this setting was refined by an operator monotone inequality in .

For a stationary real-valued sequence {zi}i1, the σ-algebra generated by the random variables za,za+1, , zb is denoted by ab. The uniformly mixing condition (or ϕ-mixing condition) and the strongly mixing condition (or α-mixing condition) are defined as follows.

Definition 1 (<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M35"><mml:mrow><mml:mi>ϕ</mml:mi></mml:mrow></mml:math></inline-formula>-mixing).

The lth ϕ-mixing coefficient for the sequence is defined as (3)ϕl=supk1supA1k,Bk+l|P(AB)-P(A)|. The process {zi}i1 is said to satisfy a uniformly mixing condition (or ϕ-mixing condition) if ϕl0, as l.

Definition 2 (<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M43"><mml:mrow><mml:mi>α</mml:mi></mml:mrow></mml:math></inline-formula>-mixing).

The lth α-mixing coefficient for random sequence {zi}i1 is defined as (4)αl=supk1supA1k,Bk+l|P(AB)-P(A)P(B)|. The random process {zi}i1 is said to satisfy a strongly mixing condition (or α-mixing condition) if αl0, as l.

By the fact P(AB)=P(AB)P(B), α-mixing condition is weaker than ϕ-mixing condition. Many random processes satisfy the strongly mixing condition, for example, the stationary Markov process which is uniformly pure nondeterministic, the stationary Gaussian sequence with a continuous spectral density that is bounded away from 0, certain ARMA processes, and some aperiodic, Harris-recurrent Markov processes; see [5, 9] and the references therein.

In this paper we follow [7, 8] to consider α-mixing and ϕ-mixing processes, estimate the error bounds, and derive the learning rates of algorithm (2), where the output data satisfy the following unbounded condition.

Unbounded Hypothesis. There exist two constants M>0 and p2 such that (5)𝔼|y|pM.

The error analysis for the algorithm (2) was usually presented under the standard assumption that |y|M almost surely with some constant M>0. This standard assumption was abandoned in . In  the authors introduced the condition (6)Y(exp{-|y-f|2M}-|y-f(x)|M-1)dρ(yx)Σ22M2 for almost every xX and some constants M,Σ>0, where f is the orthogonal projection of fρ onto the closure of HK in LρX2(X). In  the error analysis was conducted in another setting satisfying the following moment hypothesis; that is, there exist constants M~>0 and C^>0 such that Y|y|ldρ(yx)C^l!M~l for all l,xX. Notice that with different constants the moment hypothesis and (6) are equivalent in the case fL(X)  . Obviously, our unbounded hypothesis is a natural generalization of the moment hypothesis. An example for which unbounded hypothesis (5) is satisfied but moment hypothesis failed has been given in . It mainly studies the half supervised coefficient regularization with indefinite kernels and unbounded sampling, where the unbounded condition is Zy2dρM^2 for some constant M^>0.

Since (fρ)=min(f), where the generalization error (f)=Z(f(x)-y)2dρ, the goodness of the approximation of fρ by fz,λ is usually measured by the excess generalization error (fz,λ)-(fρ)=fz,λ-fρρX2. Denoting (7)κ=supxXK(x,x)<, the reproducing property in RKHS HK yields that fκfK for any fHK. Thus, the distance between fz,λ and fρ in HK can be applied to measure this approximation as well when fρHK.

The noise-free limit of algorithm (2) takes the form (8)fλ=argminfHK{f-fρρX2+λfK2}, thus the error analysis can be divided into two parts. The difference between fz,λ and fλ is called the sample error, and the distance between fλ and fρ is called the approximation error. We will bound the error in LρX2(X) and HK, respectively. Estimate of the sample error is more difficult because fz,λ changes with the sample z and cannot be considered as a fixed function. The approximation error does not depend on the samples, which has been studied in the literature [2, 3, 7, 16, 17].

We mainly devote the next two sections to estimating the sample error with more general sampling processes. Our main results can be stated as follows.

Theorem 3.

Suppose that the unbounded hypothesis holds, LK-rfρLρX2(X) for some r>0, and the ϕ-mixing coefficients satisfy a polynomial decay, that is, ϕiai-t for some a>0 and t>0. Then, for any 0<η<1, one has with confidence 1-η, (9)fz,λ-fρρX=O(m-θmin{t/2,1}(logm)3/4), where θ is given by (10)θ={3r4(r+1)if0<r<12,r2r+1if12r<1,13ifr1.

Moreover, when r>1/2, one has with confidence 1-η, (11)fz,λ-fρK=O(m-θmin{t/2,1}(logm)1/2), where θ is given by (12)θ={2r-12(2r+1)if  12<r<32,14if  r32.

Theorem 3 proves the asymptotic convergence of algorithm (2) with the samples satisfying a uniformly mixing condition. Our second main result considers this algorithm with α-mixing process.

Theorem 4.

Suppose that the unbounded hypothesis with p>2 holds, LK-rfρLρX2(X) for some r>0, and the α-mixing coefficients satisfy a polynomial decay, that is, αlbl-t for some b>0 and t>0. Then, for any 0<η<1, one has with confidence 1-η, (13)fz,γ-fρρX=O(m-ϑmin{(p-2)t/p,1}(logm)1/2), where ϑ is given by (14)ϑ={pr2(2r+p-1)if  0<r<12,0<t<pp-2;3pr2(4r+3p-2)if  0<r<12,tpp-2;r2r+1if  12r<1;13if  r1.

Moreover, when r>1/2, with confidence 1-η, (15)fz,γ-fρK=O(m-ϑmin{(p-2)t/p,1}(logm)1/2), where ϑ is given by (16)ϑ={2r-14r+2if  12<r<32,14if  r32.

The proof of these two theorems will be given in Sections 2, 3, and 4, and notice that the log term can be dropped when t2. Our error analysis reveals some interesting phenomena for learning with unbounded and dependent sampling.

Smoother target function fρ (i.e., r becomes larger) implies better learning rates. Stronger dependence between samples (i.e., t becomes smaller) implies that they contain less information and hence lead to worse rates.

The learning rates are improved as the dependence between samples becomes weaker and r becomes larger but they are no longer improved after some constant t,r. This phenomenon is called saturation effect, which was discussed in . In our setting, saturation effects include saturation for smoothness of function fρ mainly relative to the approximation error and saturation for dependence between samples. An interesting phenomenon revealed here is that when α-mixing coefficients satisfy αlO(l-t),l for some t>0, the saturation for dependence between samples is t=p/(p-2) for p>2, which is dependent on the unbounded condition parameter p.

For ϕ-mixing process, the learning rates have nothing to do with unbound condition parameter p since 𝔼(y-fλ(x))2 is bounded by 𝔼y2<. But for α-mixing process, to derive the learning rate, we have to estimate 𝔼|y-fλ(x)|p with p>2.

Under α-mixing condition, when t>p/(p-2) and r1/2, the influence of the unbounded condition becomes weak. Recall that the learning rate derived in  is O(m-r/(1+2r)) for 1/2r1, t1. It implies that when t is large enough, our learning rate for unbounded samples is as sharp as that for the uniform bounded sampling.

2. Sampling Satisfying <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M161"><mml:mrow><mml:mi>ϕ</mml:mi></mml:mrow></mml:math></inline-formula>-Mixing Condition

In this section, we would apply the integral operator technique in  to handle the sample error with ϕ-mixing condition. However, different from the uniform bounded case the learning performance of the unbounded sampling is not measured directly. Instead, the expectations are estimated first and then the bound for the sample error can be obviously deduced by Markov inequality:

To this end, define the sampling operator Sx:HKl2(x) as Sx(f)=(f(xi))i=1m, where x is the set of input data {x1,,xm}. Then its adjoint is SxTc=i=1mciKxi for cl2(x). The analytic expression of optimization solution fz,λ,fλ was given in , (17)fz,λ=(1mSxTSx+λI)-11mSxTy,fλ=(LK+λI)-1LKfρ, where LK:LρX2(X)LρX2(X) is the integral operator defined as (18)LKf(x)=XK(x,t)f(t)dρX(t),for  any  xX.

For a random variable ξ with values in a Hilbert space and 0u+, denote the uth moment as ξu=(𝔼ξu)1/u if 1u< and ξ=supξ. Lemma 5 is due to Billingsley .

Lemma 5.

Let ξ and η be random variables with values in a separable Hilbert space measurable σ-field 𝒥 and 𝒟 and having finite pth and qth moments, respectively, where p,q1 with p-1+q-1=1. Then (19)|𝔼(ξ,η)-(𝔼ξ,𝔼η)|2ϕ1/p(𝒥,𝒟)ξpηq.

Lemma 6.

For an ϕ-mixing sequence {xi}, one has (20)𝔼LK-1mSxTSx2κ4m(1+4i=1m-1ϕi1/2).

Proof.

With the definition of the sample operator, we have (21)LK-1mSxTSx=LK-1mi=1mKxiKxi. Letting η(x)=KxKx, then η(x) is an HS(K)-valued random variable defined on X. Note that 𝔼η(x)=LKHS(K), and LKHSκ2,η(x)HSκ2. We have (22)𝔼LK-1mSxTSx2𝔼𝔼η-1mi=1mη(xi)HS2=1mη22+1m2ij𝔼η(xi),η(xj)HS-LKHS2.

By Lemma 5 with p=q=2, for ij, (23)𝔼η(xi),η(xj)HS𝔼η(xi),𝔼η(xj)HS+2ϕ|i-j|1/2η22LKHS2+2κ4ϕ|i-j|1/2. Thus the desired estimate can be obtained by plugging (23) into (22).

Proposition 7.

Suppose that the unbounded hypothesis holds with some p2 and that the sample sequence {(xi,yi)}i=1m satisfies an ϕ-mixing condition and LK-rfρLρX2(X) with r>0. Then one has (24)𝔼fz,λ-fλρXC(λ-1/2m-1/2+λ-1m-3/4(1+4i=1m-1ϕi1/2)1/4)×1+4l=1m-1ϕi1/2, where C is a constant only dependent on κ,M.

Proof.

By [7, Theorem 3.1], we have (25)𝔼fz,λ-fλρX(λ-1/2+λ-1(𝔼LK-1mSxTSx2)1/4)×𝔼1ml=1m-1ξ(zi)-LK(fρ-fλ)K2, where ξ(z)=(y-fλ(x))Kx is a random variable with values in HK, and 𝔼ξ=LK(fρ-fλ). A similar computation together with the result of Lemma 6 leads to (26)𝔼1mi=1mξ(zi)-LK(fρ-fλ)K21m(1+4i=1m-1ϕi1/2)ξ22.

It suffices to estimate ξ2. By Hölder inequality, there is (27)𝔼y2(𝔼|y|p)2/pM2/p,fρρX2=Xfρ2(x)dρX=X(Yydρ(yx))2dρXZy2dρM2/p. Thus 𝔼(y-fρ(x))2=𝔼y2-fρρX2M2/p and (28)𝔼(fρ(x)-fλ(x))2=λ(λI+LK)-1fρρX2fρρX2M2/p, which implies (29)ξ22=𝔼((y-fλ(x))2K(x,x))κ2𝔼(y-fλ(x))2=κ2(𝔼(y-fρ(x))2+𝔼(fρ(x)-fλ(x))2)2κ2M2/p.

Plugging (29) into (26), there holds (30)𝔼1mi=1mξ(zi)-LK(fρ-fλ)K22M2/pk2m-1(1+4i=1m-1ϕi1/2).

Combining (25), (22), and (30) and taking the constant C=2k(k+1)M1/p, we complete the proof.

The following proposition provides the bound of the difference between fz,λ and fλ in HK with ϕ-mixing process.

Proposition 8.

Under the assumption of Proposition 7, there holds (31)𝔼fz,λ-fλK2M1/pκλ-1m-1/21+4i=1m-1ϕi1/2.

Proof.

The representations of fz,λ and fλ imply that (32)𝔼fz,λ-fλK=𝔼(1mSxTSx+λI)-1(1ml=1m-1ξ(zi)-LK(fρ-fλ))Kλ-1𝔼1ml=1m-1ξ(zi)-LK(fρ-fλ)K2. Then the desired bound follows from (30) and (32).

3. Samples Satisfying <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M233"><mml:mrow><mml:mi>α</mml:mi></mml:mrow></mml:math></inline-formula>-Mixing Condition

Now we turn to bound the sample error when the sampling process satisfies strongly mixing condition, and unbounded hypothesis holds. In Section 2, the key point is to estimate ξ2 with the lack of uniform boundedness. For the sampling satisfying α-mixing condition, we have to deal with ξp for some p>2.

Proposition 9.

Suppose that the unbounded hypothesis holds with some p>2 and that the sample sequence {(xi,yi)}i=1m satisfies an α-mixing condition and LK-rfρLρX2(X) with r>0. Then one gets (33)𝔼fz,λ-fλρXC~λmin{(p-2)(2r-1)/2p,0}1+l=1m-1αl(p-2)/p𝔼fz,λ-fλρX×(λ-1/2m-1/2+λ-1m-3/4(1+l=1m-1αl)1/4), where C~ is a constant only depending on κ,M and LK-min{r,1/2}fρρX.

Proof.

For the strongly mixing process, by [8, Lemma 5.1], (34)𝔼LK-1mSxTSx2k4m(1+30l=1m-1αl). Taking δ=p-2 in [8, Lemma 4.2], we have (35)𝔼1mi=1mξ(zi)-LK(fρ-fλ)K21mξ22+30ml=1m-1αl(p-2)/pξp2.

The estimation of ξ2 has been obtained in Section 2, and now we mainly devote to estimating ξp. To get this estimation, the bound of fλ is needed which can be stated as follows ([3, Lemma 3] or [8, Lemma 4.3]): (36)|fλ(x)|κfλKC1κλmin{(2r-1)/2,0}, where C1=LK-min{r,1/2}fρρX. Observe that fλρX2fρρX2𝔼y2M2/p. Hence, (37)(𝔼|fλ(x)|p)2/p(fλρX2C1p-2κp-2λ(p-2)min{(2r-1)/2,0})2/pM4/p2(C12κ2+1)λmin{(p-2)(2r-1)/p,0}. Now we can deduce that (38)ξp2=(𝔼((y-fλ(x))2K(x,x))p/2)2/pκ2(𝔼|y-fλ(x)|p)2/p4κ2(𝔼max{|y|p,|fλ(x)|p})2/p2κ2((𝔼|y|p)2/p+(𝔼|fλ(x)|p)2/p)2κ2(M2/p+M4/p2(C12κ2+1))λmin{(p-2)(2r-1)/p,0}. Plugging this estimate into (35) yields (39)𝔼1mi=1mξ(zi)-LK(fρ-fλ)K2C2m-1λmin{(p-2)(2r-1)/p,0}(1+l=1m-1αl(p-2)/p), where C2 is a constant only depending on κ,M and LK-min{r,1/2}fρρX. Then combining (34) and (39) with (25), we complete the proof.

For α-mixing process we have the following proposition to get the bound of sample error in HK, and the proof can be directly obtained by the inequality (32).

Proposition 10.

Under assumption of Proposition 9, one has (40)𝔼fz,λ-fλKC3m-1/2λ-11+l=1m-1αl(p-2)/p, where C3=C2.

4. Error Bounds and Learning Rates

In this section we derive the learning rates, that is, the convergence rates of fz,λ-fρρX and fz,λ-fρK as m by choosing the regularization parameter λ according to m. The following approximation error bound is needed to get the convergence rates.

Proposition 11.

Supposing that LK-rfρLρX2(X) for some r>0, there holds (41)fλ-fρρXλmin{r,1}LK-min{r,1}fρρX. Moreover, when r1/2, that is, fρHK, there holds (42)fλ-fρKλmin{r-(1/2),1}LK-min{r,3/2}fρρX.

The first conclusion in Proposition 11 has been proved in , and the second one can be proved in the same way. To derive the learning rates, we need to balance the approximation error and sample error. For this purpose, the following simple facts are necessary: (43)l=1m-1l-s{11-sm1-sif0<s<1,logmifs=1,1s-1ifs>1.

Proof of Theorem <xref ref-type="statement" rid="thm1.3">3</xref>.

The estimate of learning rates in LρX2(X) norm is divided into two cases.

Case 1. For 0<t<2, by (43) and ϕiai-t, there is (44)1+4i=1m-1ϕi1/21+4ai=1m-1i-t/2(1+8a2-t)m1-(t/2). Thus Proposition 7 yields that (45)𝔼fz,λ-fλρX2C(1+16a2-t)(λ-1/2m-t/4+λ-1m-3t/8). By Proposition 11 and Markov inequality, with confidence 1-η, there holds (46)fz,λ-fρρXO(λmin{r,1}+η-1(λ-1/2m-t/4+λ-1m-3t/8)).

For 0<r<1/2, by taking λ=m-3t/(8(r+1)), we can deduce the learning rate as O(m-3tr/(8(r+1))). When 1/2r<1, taking λ=m-t/(2(2r+1)), the learning rate O(m-rt/(2(2r+1))) can be derived. When r1, the desired convergence rate is obtained by taking λ=m-t/6.

Case 2. t 2 . With confidence 1-η, there holds (47)fz,λ-fρρX=O(λmin{r,1}+η-1(λ-1/2m-1/2+λ-1m-3/4)(logm)3/4). For 0<r<1/2, taking λ=m-3/(4(r+1)), the learning rate O(m-3r/(4(r+1))(logm)3/4) can be derived, and for 1/2r<1, by taking λ=m-1/(2r+1), we can deduce the learning rate O(m-r/(2r+1)(logm)3/4). When r1, the desired convergence rate is obtained by taking λ=m-1/3.

Next for bounding the generalization error in HK, Proposition 8 in connection with Proposition 11 tells us that with confidence 1-η, (48)fz-fρK(λmin{r-(1/2),1}+η-1λ-1m-1/21+4i=1m-1ϕi1/2). The rest of the proof is analogous to the estimate of fz,λ-fρρX mentioned previously.

Proof of Theorem <xref ref-type="statement" rid="thm1.4">4</xref>.

For 0<t<1, by (43) and αlbl-t, there is (49)1+l=1m-1αl(p-2)/p1+b(p-2)/pl=1m-1l-((p-2)/p)t(1+pb(p-2)/pp-(p-2)t)m1-((p-2)/p)t,1+l=1m-1αl1+bl=1m-1l-t(1+b1-t)m1-t. By Propositions 9 and 11 and Markov inequality, with confidence 1-η, there holds (50)fz,λ-fρρX=O((m-(p-2)t2p+λ-12m-(p-2)t2p-t4)λmin{r,1}+η-1λmin{(p-2)(2r-1)/2p,0}-1/2fz,λ-fρρX=OMi×(m-(p-2)t/2p+λ-1/2m-(p-2)t/2p-t/4)λmin{r,1}+η-1λmin{(p-2)(2r-1)2p,0}-12).

For 0<r<1/2, by taking λ=m-(p-2)t/2(2r+p-1), we can deduce the learning rate as O(m-(p-2)tr/2(2r+p-1)). When 1/2r<1, taking λ=m-(p-2)t/p(2r+1), the learning rate O(m-(p-2)rt/p(2r+1)) can be derived. When r1, the desired convergence rate is obtained by taking λ=m-(p-2)t/3p.

The rest of the analysis is similar; we omit it here.

Acknowledgment

This paper is supported by the National Nature Science Foundation of China (no. 11071276).

Evgeniou T. Pontil M. Poggio T. Regularization networks and support vector machines Advances in Computational Mathematics 2000 13 1 1 50 10.1023/A:1018946025316 MR1759187 ZBL0939.68098 Smale S. Zhou D.-X. Shannon sampling. II. Connections to learning theory Applied and Computational Harmonic Analysis 2005 19 3 285 302 10.1016/j.acha.2005.03.001 MR2186447 ZBL1107.94008 Smale S. Zhou D.-X. Learning theory estimates via integral operators and their approximations Constructive Approximation 2007 26 2 153 172 10.1007/s00365-006-0659-y MR2327597 ZBL1127.68088 Wu Q. Ying Y. Zhou D.-X. Learning rates of least-square regularized regression Foundations of Computational Mathematics 2006 6 2 171 192 10.1007/s10208-004-0155-9 MR2228738 ZBL1100.68100 Modha D. S. Masry E. Minimum complexity regression estimation with weakly dependent observations IEEE Transactions on Information Theory 1996 42 6 2133 2145 10.1109/18.556602 MR1447519 ZBL0868.62015 Smale S. Zhou D.-X. Online learning with Markov sampling Analysis and Applications 2009 7 1 87 113 10.1142/S0219530509001293 MR2488871 ZBL1170.68022 Sun H. Wu Q. A note on application of integral operator in learning theory Applied and Computational Harmonic Analysis 2009 26 3 416 421 10.1016/j.acha.2008.10.002 MR2503313 ZBL1165.68059 Sun H. Wu Q. Regularized least square regression with dependent samples Advances in Computational Mathematics 2010 32 2 175 189 10.1007/s10444-008-9099-y MR2581234 ZBL1191.68535 Athreya K. B. Pantula S. G. Mixing properties of Harris chains and autoregressive processes Journal of Applied Probability 1986 23 4 880 892 MR867185 10.2307/3214462 ZBL0623.60087 Caponnetto A. De Vito E. Optimal rates for the regularized least-squares algorithm Foundations of Computational Mathematics 2007 7 3 331 368 10.1007/s10208-006-0196-8 MR2335249 ZBL1129.68058 Guo Z.-C. Zhou D.-X. Concentration estimates for learning with unbounded sampling Advances in Computational Mathematics 2013 38 1 207 223 10.1007/s10444-011-9238-8 MR3011339 Lv S.-G. Feng Y.-L. Integral operator approach to learning theory with unbounded sampling Complex Analysis and Operator Theory 2012 6 3 533 548 10.1007/s11785-011-0139-0 MR2944069 Wang C. Zhou D.-X. Optimal learning rates for least squares regularized regression with unbounded sampling Journal of Complexity 2011 27 1 55 67 10.1016/j.jco.2010.10.002 MR2745300 ZBL1217.65024 Wang C. Guo Z. C. ERM learning with unbounded sampling Acta Mathematica Sinica 2012 28 1 97 104 10.1007/s10114-012-9739-5 MR2863753 Chu X. R. Sun H. W. Half supervised coefficient regularization for regression learning with unbounded sampling International Journal of Computer Mathematics 2013 10.1080/00207160.2012.749985 Smale S. Zhou D.-X. Shannon sampling and function reconstruction from point values The American Mathematical Society 2004 41 3 279 305 10.1090/S0273-0979-04-01025-0 MR2058288 ZBL1107.94007 Sun H. Wu Q. Application of integral operator for regularized least-square regression Mathematical and Computer Modelling 2009 49 1-2 276 285 10.1016/j.mcm.2008.08.005 MR2480050 ZBL1165.45310 Bauer F. Pereverzev S. Rosasco L. On regularization algorithms in learning theory Journal of Complexity 2007 23 1 52 72 10.1016/j.jco.2006.07.001 MR2297015 ZBL1109.68088 Lo Gerfo L. Rosasco L. Odone F. De Vito E. Verri A. Spectral algorithms for supervised learning Neural Computation 2008 20 7 1873 1897 10.1162/neco.2008.05-07-517 MR2417109 ZBL1147.68643 Sun H. Wu Q. Least square regression with indefinite kernels and coefficient regularization Applied and Computational Harmonic Analysis 2011 30 1 96 109 10.1016/j.acha.2010.04.001 MR2737935 ZBL1225.65015 Billingsley P. Convergence of Probability Measures 1968 New York, NY, USA John Wiley & Sons MR0233396