Probability error bounds for approximation of functions in reproducing kernel Hilbert spaces

We find probability error bounds for approximations of functions $f$ in a separable reproducing kernel Hilbert space $\mathcal{H}$ with reproducing kernel $K$ on a base space $X$, firstly in terms of finite linear combinations of functions of type $K_{x_i}$ and then in terms of the projection $\pi^n_x$ on $\mathrm{Span}\{K_{x_i}\}^n_{i=1}$, for random sequences of points $x=(x_i)_i$ in $X$. Given a probability measure $P$, letting $P_K$ be the measure defined by $\mathrm{d} P_K(x)=K(x,x)\mathrm{d} P(x)$, $x\in X$, our approach is based on the nonexpansive operator \[L^2(X;P_K)\ni\lambda\mapsto L_{P,K}\lambda:=\int_X \lambda(x)K_x\mathrm{d} P(x)\in \mathcal{H},\] where the integral exists in the Bochner sense. Using this operator, we then define a new reproducing kernel Hilbert space, denoted by $\mathcal{H}_P$, that is the operator range of $L_{P,K}$. Our main result establishes bounds, in terms of the operator $L_{P,K}$, on the probability that the Hilbert space distance between an arbitrary function $f\in\mathcal{H}$ and linear combinations of functions of type $K_{x_i}$, for $(x_i)_i$ sampled independently from $P$, falls below a given threshold. For sequences of points $(x_i)_{i=1}^\infty$ constituting a so-called uniqueness set, the orthogonal projections $\pi^n_x$ to $\mathrm{Span}\{K_{x_i}\}^n_{i=1}$ converge in the strong operator topology to the identity operator. We prove that, under the assumption that $\mathcal{H}_P$ is dense in $\mathcal{H}$, any sequence of iid samples from $P$ yields a uniqueness set with probability $1$. This result improves on previous error bounds in weaker norms, such as uniform or $L^p$ norms, which yield only convergence in probability and not a.c. convergence. Two examples that show the applicability of this result to a uniform distribution on a compact interval and to the Hardy space $H^2(\mathbb{D})$ are presented as well.


Introduction
Several machine learning algorithms that use positive semidefinite kernels, such as support vector machines (SVM), have been analysed and justified rigorously using the theory of reproducing kernel Hilbert spaces (RKHS), yielding statements of optimality, convergence, and L p approximation bounds, e.g., see Cucker and Smale [1].Reproducing kernel Hilbert spaces are Hilbert spaces of functions associated to a suitable kernel such that convergence with respect to the Hilbert space norm implies pointwise convergence, and in the context of approximation possess various favourable properties resulting from the Hilbert space structure.For example, under certain conditions on the kernel, every function in the Hilbert space is sufficiently differentiable, and differentiation is in fact a nonexpansive linear map with respect to the Hilbert space norm, e.g., see ([2], Subsection 2. 1.3).
In order to substantiate the motivation for our investigation, we briefly review previously obtained bounds on the approximation of functions as linear combinations of kernels evaluated at finitely many points.The theory of Vapnik and Chervonenkis of statistical learning theory [3][4][5] relies on concentration inequalities such as Hoeffding's inequality to bound the supremum distance between expected and empirical risk.The theory considers a data space X ⊆ ℝ d on which an unknown probability distribution P is defined, a hypothesis set H , and a loss function V : H × X ⟶ ℝ + , such that one wishes to find a hypothesis h ∈ H that minimizes the expected risk Since P is not known in general, instead of minimizing the expected risk one usually minimizes the empirical risk over a finite set S = fx i g n i=1 ⊆ X of samples.Vapnik-Chervonenkis theory measures the probability with which the maximum distance between R and R falls below a given threshold.Recall that the Vapnik-Chervonenkis (VC) dimension of H with respect to V is the maximum cardinality of finite subsets Y ⊆ X that can be shattered by H , i.e. for each Y ′ ⊆ Y, there exist h ∈ H and α ∈ ℝ such that Thus, they prove that, assuming that A ≤ Vðh, xÞ ≤ B for each h ∈ H , x ∈ X and the VC dimension of H is d < ∞, then, for any η ∈ ð0, 1Þ, Girosi, see [6] and ([7], Proposition 2), has used this general result to bound the uniform distance between integrals Ð Jðx, yÞλðyÞdy and sums of the form 1/n∑ n i=1 Jðx, x i Þ, by reinterpreting H as ℝ d , V as J, and dPðyÞ as jλðyÞj/kλk L 1 d y.Kon and Raphael [7] then applied this methodology to obtain uniform approximation bounds of functions in reproducing kernel Hilbert spaces.They consider two cases where the Hilbert space is dense in L 2 ðℝ d Þ with a stronger norm ([7], Theorem 4), and where it is a closed subspace with the same norm ([7], Theorem 5).Also, Kon et al. [8] extended Girosi's approximation estimates for functions in Sobolev spaces.While these bounds guarantee uniform convergence in probability, the approximating functions are neither orthogonal projections of f nor necessarily elements of a reproducing kernel Hilbert space and hence may not capture f exactly at ðx i Þ n i=1 nor converge monotonically.Furthermore, the fact that the norm is not a RKHS norm means that derivatives of f may not be approximated in general, since differentiation is not bounded with respect to the uniform norm, unlike the RKHS norm associated with a continuously differentiable kernel.
The purpose of this article is thus to establish sufficient conditions for convergence and approximation in the repro-ducing kernel Hilbert space norm.In Section 3, we find probability error bounds for approximations of functions f in a separable reproducing kernel Hilbert space H with reproducing kernel K on a base space X, firstly in terms of finite linear combinations of functions of type K x and then in terms of the projection π n x onto spanfK x i g n i=1 , for random sequences of points x = ðx i Þ i in the base space X.Given a probability measure P, letting P K be the measure defined by dP K ðxÞ = Kðx, xÞdPðxÞ, x ∈ X, we approach these problems by firstly showing the existence of the nonexpansive operator where the integral exists in the Bochner sense.Using this operator, we then define a new reproducing kernel Hilbert space, denoted by H P , that is the operator range of L P,K .
Our main result establishes bounds, in terms of the operator L P,K , on the probability that the Hilbert space distance between an arbitrary function f in H and linear combinations of functions of type K x i , for ðx i Þ i sampled independently from P, falls below a given threshold, see Theorem 8.For sequences of points ðx i Þ ∞ i=1 constituting a so-called uniqueness set, see Subsection 3.4, the orthogonal projections π n x onto the spanfK x i g n i=1 converge in the strong operator topology to the identity operator.As an application of our main result, we show that, under the assumption that H P is dense in H , any sequence of points sampled independently from P yields a uniqueness set with probability 1.
The results obtained in this article improve on the results obtained by Kon and Raphael in several senses: the convergence of approximations is in the RKHS norm, which is stronger than the uniform norm whenever the kernel is bounded; the type of convergence with respect to the points ðx i Þ i is strengthened from convergence in probability to almost certain convergence; and the separability of H then allows the result to be extended from the approximation of a single function to the simultaneous approximation of all functions in the Hilbert space.In addition, when compared to the existing methods for this kind of problems, our approach based on the operator L P,K defined at (5), that encodes the interplay between the kernel K and the probability measure P, and the associated RKHS H P , is completely new and has the potential to overcome many difficulties.
These results are confined to the special case of a separable RKHS H of functions on an arbitrary set X, due to several reasons, one of them being the fact that the Bochner integral is requiring the assumption of separability, but we do not see this as a loss of generality since most of the spaces of interest for applications are separable.In the last section, we present two examples that point out the applicability, and the limitations of our results as well, the first to the uniform probability distribution on the compact interval ½−π, π, together with a class of bounded continuous kernels, and the second to the Hardy space H 2 ðDÞ corresponding to the Szegö kernel which is unbounded.In each case, we can explicitly calculate the space H P , its reproducing kernel K P , and the operator L P,K .

Notation and Preliminary Results
2.1.Reproducing Kernel Hilbert Spaces.In this subsection, we briefly review some concepts and facts on reproducing kernel Hilbert spaces, following classical texts such as Aronszajn [9,10] and Schwartz [11], or more modern ones such as Saitoh and Sawano ([2], Chapter 2) and Paulsen and Raghupathi [12].
Throughout this article, we denote by F one of the commutative fields ℝ or ℂ.For a nonempty set X, let F X denote the set of F-valued functions on X, forming an F-vector space under pointwise addition and scalar multiplication.For each p ∈ X, the evaluation map at p is the linear functional The evaluation maps equip F X with the locally convex topology of pointwise convergence, which is the weakest topology on F X that renders each evaluation map continuous.Under this topology, a generalized sequence in F X converges if and only if it converges pointwise, i.e., its image under each evaluation map converges.Since each evaluation map is linear and hence the vector space operations are continuous, this renders F X into a complete Hausdorff locally convex space.With respect to this topology, if X is a topological space, a map ϕ : X ⟶ F X is continuous if and only if ev p ∘ ϕ : X ⟶ F is continuous for all p ∈ X.
We are interested in Hilbert spaces H ⊆ F X with topologies at least as strong as the topology of pointwise convergence of F X , so that the convergence of a sequence of functions in H implies that the functions also converge pointwise.When X is a finite set, F X ≅ F d , where d is the number of elements of X, can itself be made into a Hilbert space with a canonical inner product h f , gi ≔ ∑ p∈X f ðpÞ gðpÞ, or in general by an inner product induced by a positive semidefinite d × d matrix.This leads to the concept of reproducing kernel Hilbert spaces.
Recalling the Riesz's Theorem of representations of bounded linear functionals on Hilbert spaces, if each ev p : H ⟶ F restricted to H ⊆ F X is continuous, for each p ∈ X , then there exists a unique vector K p ∈ H such that ev p = h• , K p i. But, since each vector in H is itself a function X ⟶ F, these vectors altogether define a map K : X × X ⟶ F, Kð p, qÞ ≔ K q ðpÞ.Also, recall that a map K : X × X ⟶ F is usually called a kernel.Definition 1.Let H ⊆ F X be a Hilbert space, K : X × X ⟶ F a kernel.For each p ∈ X define K p ≔ Kð•, pÞ ∈ F X .K is said to be a reproducing kernel for H , and H is then said to be a reproducing kernel Hilbert space (RKHS), if, for each p ∈ X, we have The second property is referred to as the reproducing property of the kernel K.
We may then summarize the last few paragraphs with the following characterization: Let H ⊆ F X be a Hilbert space.The following assertions are equivalent: In that case, the reproducing kernel admitted by the Hilbert space is unique, by the uniqueness of the Riesz representatives K p of the evaluation maps.We may further apply the reproducing property to each K q to obtain that Kðp, qÞ = h K q , K p i for each p, q ∈ X, yielding the following properties: (i) For each p ∈ X, Kðp, pÞ = kK p k 2 ≥ 0 (ii) For each p, q ∈ X, Kðq, pÞ = Kðp, qÞ, and The property in (7) is the analogue of the Schwarz Inequality.As a consequence of it, if Kðp, pÞ = 0 for some p ∈ X then Kðp, qÞ = Kðq, pÞ = 0 for all q ∈ X.
For any K : X × X ⟶ F, each K p ∈ F X so we may define the subspace of F X .If K is the reproducing kernel of a Hilbert space H , H K is also a subspace of H and therefore, H K is a dense subspace of H , equivalently, fK p jp ∈ Xg is a total set for H .The property at item (iii) is known as the positive semidefiniteness property.A positive semidefinite kernel K is called definite if Kðp, pÞ ≠ 0 for all p ∈ X. Positive semidefiniteness is in fact sufficient to characterize all reproducing kernels.By the Moore-Aronszajn Theorem, for any positive semidefinite kernel K : X × X ⟶ F, there is a unique Hilbert space H K ⊆ F X with reproducing kernel K.
Let us briefly recall the construction of the Hilbert space H K in the proof.We first render H K into a pre-Hilbert space satisfying the reproducing property.Define on H K the inner product for any ∑ n i=1 a i K p i , ∑ m j=1 b j K q j ∈ H K .It is proven that the definition is correct and provides indeed an inner product.
Let Ĥ K be the completion of H K , then Ĥ K is a Hilbert space with an isometric embedding ϕ : H K ⟶ Ĥ K whose image is dense in Ĥ K .It is proven that this abstract completion can actually be realized in F X and that it is the RKHS with reproducing kernel K that we denote by H K .
In applications, one of the most useful tools is the interplay between reproducing kernels and orthonormal bases of the underlying RKHSs.Although this fact holds in higher generality, we state it for separable Hilbert spaces since, most of the time, this is the case of interest: letting H ⊆ F X be a separable RKHS, with reproducing kernel K, and let fϕ n g n be an orthonormal basis of H , then where the series converges absolutely pointwise.We now recall a useful result on the construction of new RKHSs and positive semidefinite kernels from existing ones.It also shows that the concept of reproducing kernel Hilbert space is actually a special case of the concept of operator range.Let H be a Hilbert space, ϕ : H ⟶ F X a continuous linear map.Then ϕðH Þ ⊆ F X with the norm is a RKHS, unitarily isomorphic to ðker ϕÞ ⊥ .The kernel for ϕðH Þ is then given by the map where u q ∈ H such that ev q ∘ ϕ = h•, u q i on H . Applying this proposition to particular continuous linear maps, one obtains useful results for pullbacks, restrictions, sums, scaling, and normalizations of kernels.

Integration of RKHS-Valued Functions.
In this article, we use integrals of Hilbert space-valued functions.We first provide fundamental definitions and properties concerning the Bochner integral, an extension of the Lebesgue integral for Banach space-valued functions, following Cohn ([13], Appendix E).Let ðE ; k•kÞ be a (real or complex) Banach space and ð X, Σ, μÞ a finite measure space.On E, we consider the Borel σ-algebra denoted by BðEÞ.A map f : X ⟶ E is called measurable if f −1 ðSÞ ∈ Σ for all S ∈ BðEÞ, and it is called strongly measurable if it is measurable and its range f ðXÞ is separable.If E is a separable Banach space then the concepts coincide.Both sets of measurable functions, respectively, strongly measurable functions, are vector spaces.It is proven that a function f : X ⟶ B is strongly measurable if and only if there exists a sequence of simple functions ðϕ n Þ n such that ϕ n → n f pointwise on X.In addition, in this case, the sequence A function f : X ⟶ E is Bochner integrable if it is strongly measurable and the scalar function X ∋ x ↦ k f ðxÞk ∈ ℝ is integrable.In this case, the Bochner integral of f is defined by approximation with simple functions.Bochner integrable functions share many properties with scalarvalued integrable functions, but not all.For example, the collection of all Bochner integrable functions makes a vector space, and, for any Bochner integrable function f , we have Also, letting L 1 ðX ; μ ; EÞ denote the collection of all equivalence classes of Bochner integrable functions, identified μ-almost everywhere, this is a Banach space with norm In addition, the Dominated Convergence Theorem holds for the Bochner integral as well, e.g., see ( [13], Theorem E.6).
In this article, we will use the following result, which is a special case of a theorem of Hille, e.g., see ( [14], Theorem III.2.6).In Hille's Theorem, the linear transformation is supposed to be only closed, and, consequently, additional assumptions are needed, so we provide a proof for the special case of bounded linear operators for the reader's convenience.Theorem 2. Let E be a Banach space, ðX, μÞ a measure space, and f : Proof.Since f is Bochner integrable, there exists a sequence ðϕ n Þ n of simple functions that converges pointwise to f on X and ∥ϕ n ðxÞ∥≤∥f ðxÞ∥ for all x ∈ X and all n ∈ ℕ.Then, hence, the sequence ðL ∘ ϕ n Þ n converges pointwise to L ∘ f .Also, it is easy to see that L ∘ ϕ n is a simple function for all n ∈ ℕ.These show that L ∘ f is strongly measurable.Since k 4

Journal of Function Spaces
Lf ðxÞk ≤ kLkk f ðxÞk for all x ∈ X and f is Bochner integrable, it follows that hence, L ∘ f is Bochner integrable.
On the other hand, hence, by the Dominated Convergence Theorem for the Bochner integral, it follows that ☐ A direct consequence of this fact is a sufficient condition for when a pointwise integral coincides with the Bochner integral, valid not only for RKHSs but also for Banach spaces of functions on which evaluation maps at any point are continuous, e.g., CðYÞ for some compact Hausdorff space Y. Proposition 3. Let ðX, Σ, μÞ be a measure space, B ⊆ F X a Banach space of functions on X, such that all evaluation maps on B are continuous.Let λ : X × X ⟶ F be such that for each q ∈ X we have λ q ≔ λð•, qÞ ∈ B.
If, for each q ∈ X, the map X ∋ q ↦ λ q ∈ B is Bochner integrable, then the scalar map X ∋ q ↦ λðp, qÞ ∈ F is integrable, for each fixed p ∈ X.
Moreover, in that case, the pointwise integral map X ∋ p ↦ Ð X λðp, qÞdμðqÞ lies in B and coincides with the Bochner integral Ð X λ q dμðqÞ.
Proof.Since, for each q ∈ X, the map X ∋ q ↦ ϕðqÞ ≔ λð•, qÞ ∈ B is Bochner integrable, and taking into account that, for all p ∈ X, the linear functional ev p is continuous, by Theorem 2, we have Since ev p ∘ ϕðqÞ = λðp, qÞ for all p, q ∈ X, this means that the scalar map X ∋ q ↦ λðp, qÞ ∈ F is integrable, for each fixed p ∈ X, and hence, the pointwise integral map X ∋ p ↦ Ð X λðp, qÞdμðqÞ lies in B and coincides with the Bochner integral Ð X λ q dμðqÞ.☐

Main Results
Throughout this section, we consider a probability measure space ðX ; Σ ; PÞ and a RKHS ðH ; h•, • iÞ in F X , with norm denoted by k•k H , such that its reproducing kernel K is measurable.In addition, throughout this section, the reproducing kernel Hilbert space H is supposed to be separable.
3.1.The Reproducing Kernel Hilbert Space H P .On the measurable space ðX ; ΣÞ, we define the measure P K by that is, P K is the absolutely continuous measure with respect to P such that the function X ∋ x ↦ Kðx, xÞ is the Radon-Nikodym derivative of P K with respect to P.
With respect to the measure space ðX ; Σ ; P K Þ, we consider the Hilbert space L 2 ðX ; P K Þ.Our approach is based on the following natural bounded linear operator mapping L 2 ðX ; P K Þ to H . Proposition 4. With notation and assumptions as before, let λ : X ⟶ F be a measurable function such that the integral Ð X jλðxÞj 2 dP K ðxÞ is finite.Then, the Bochner integral exists in H .
In addition, the mapping is a nonexpansive, hence, bounded, linear operator.
Proof.By assumptions, the map X ∋ x ↦ λðxÞK x ∈ H is measurable, and, since H is separable, it follows that this map is actually strongly measurable.Letting k•k H denote the norm on H and using the assumption that Ð X jλðxÞj 2 Kðx, xÞdPðxÞ is finite, we have 5 Journal of Function Spaces hence, by the Schwarz Inequality and taking into account that P is a probability measure, we have By Theorem 2, this implies that the Bochner integral Ð X λðxÞK x dPðxÞ exists in H . Consequently, the mapping L P,K as in ( 26) is correctly defined, and it is clear that it is a linear transformation.
For arbitrary λ ∈ L 2 ðX ; P K Þ, by the triangle inequality for the Bochner integral (15), we then have and applying the Schwarz Inequality for the integral and taking into account that P is a probability measure hence, L P,K : L 2 ðX ; P K Þ ⟶ H is a nonexpansive linear operator.□ Using the bounded linear operator L P,K defined as in (26), let us denote its range by which is a subspace of the RKHS H .
Proposition 5. H P is a RKHS contained in H , hence, in F X , and its reproducing kernel K P is where whenever Kðz, zÞ = 0, by convention we define Kðx, zÞ Kðz, yÞ/Kðz, zÞ = 0 for all x, y ∈ X.
Proof.Since L 2 ðX ; P K Þ is a Hilbert space and L P,K is a bounded linear map, by (13) it follows that H P is a RKHS in F X , isometrically isomorphic to the orthogonal complement of ker L P,K ⊆ L 2 ðX ; P K Þ, and its norm is given by and let us define u x : X ⟶ F by From the Schwarz Inequality for the kernel K, it follows that if x ∈ X 0 then Kðx, yÞ = 0 for all y ∈ X.This shows that u x = 0 for all x ∈ X 0 .
For each x ∈ X, by the Schwarz inequality and the fact that P is a probability measure, we have hence, u x ∈ L 2 ðX, P K Þ.Then, taking into account that K ðx, yÞ = 0 for all y ∈ X 0 and all x ∈ X, it follows that, for each λ ∈ L 2 ðX, P K Þ and x ∈ X, we have In conclusion, u x is exactly the representative for the functional ev x L P,K so, by (13) the kernel of H P is and, using the convention that Kðx, zÞKðz, yÞ/Kðz, zÞ = 0 whenever Kðz, zÞ = 0 and for arbitrary x, y ∈ X, = Journal of Function Spaces One of the main results of this article, see Theorem 11, assumes that the space H P is dense in H .The next proposition provides sufficient conditions for this.Proposition 6.Let X be a topological space, P a Borel probability measure on X, H ⊆ F X a RKHS with measurable kernel K, and let P K , L P,K , and H P defined as in (24), (26), and (31), respectively.
Suppose that K is continuous on X, that H ⊆ L 2 ðX ; P K Þ, and that P is strictly positive on any nonempty open subset of X.Then, H P is dense in H .
Proof.The assertion is clearly equivalent with showing that the orthogonal complement of H P in H is the null space.To this end, let f ∈ H , f ⊥H P .That is, for each λ ∈ L 2 ðX ; P K Þ, we have Then noting the fact that Ð λðxÞK x dPðxÞ is a Bochner integral, and hence, by Theorem 2, it commutes with inner products, By assumption, f ∈ H ⊆ L 2 ðX ; P K Þ, so we can take λ = f to obtain This implies that f = 0P-almost everywhere, i.e., the set f −1 ðF \ f0gÞ has zero P measure.
Since K is continuous by assumption, by the Theorem 2.3 in is an open subset of X.But, since P is assumed strictly positive on any nonempty open set, it follows that f −1 ðF \ f0gÞ must be empty, hence, f = 0 identically.☐

Probability
Error Bounds of Approximation.The first step in our enterprise is to find error bounds for approximations of functions in the reproducing kernel Hilbert space H in terms of distributional finite linear combinations of functions of type K x .To do that, we use the celebrated Markov-Bienaymé-Chebyshev Inequality on the concentration of probability measures to obtain regions of large measure with small approximation error, in terms of the Hilbert space norm and not simply the uniform norm.
Theorem 7. (Markov-Bienaymé-Chebyshev's Inequality) Let ðX ; Σ ; PÞ be a probability space, ðB ; k•kÞ a Banach space, and let f , g : X ⟶ B be two Borel measurable functions.Then, for any δ > 0, we have The classical Bienaymé-Chebyshev Inequality is obtained from (43) applied for B = ℝ, gðxÞ = f ðxÞ − Eð f Þ, and δ = kσ, for k > 0, where Eð f Þ = Ð X f ðxÞdx is the expected value of the random variable f and Theorem 8.With notation and assumptions as before, let λ ∈ L 2 ðX ; P K Þ and f ∈ H .For each n ∈ ℕ and δ > 0, consider the set Then, letting P n denote the product probability measure on X n and defining the bounded linear operator L P,K as in (26), we have Proof.By Proposition 4, the Bochner integral Ð X λðxÞK x dPðxÞ exists in H and the linear operator L P,K is well-defined and bounded.In order to simplify the notation, considering g : X n ⟶ H the function defined by observe that g is measurable and for each δ > 0, we have 7 Journal of Function Spaces Then, we have Since P n is a probability measure, we have On the other hand, by Fubini's theorem and the fact that the Bochner integral commutes with continuous linear operations, see Theorem 2, we have and, for each i, j = 1, ⋯, n, i ≠ j, Integrating both sides of (49) and using all the previous equalities, we therefore have Finally, in view of the Markov-Bienaymé-Chebyshev Inequality as in (43), when X is replaced by X n and P by P n , and taking into account the previous equality and (48), we get which is the required inequality.☐

Convergence in Probability.
As with the special case of kernel embeddings, for which λ = 1, see Smola et al. [15], we may use the bound in Theorem 8 to obtain a statement of convergence in probability.
With notation and assumptions as before, given f ∈ H and fixed ðx i Þ N i=1 ∈ X, the problem of finding the optimal We may assume without loss of generality that fK x i g N i=1 are linearly independent, by removing points as necessary without affecting spanfK x i g N i=1 (or losing any information about f , since Journal of Function Spaces reproducing property).According to Körezlioglu [16], if ðx i Þ N i=1 ∈ X is a sampling such that fK x i g N i=1 are linearly independent and considering the finite-dimensional subspace H N x ≔ spanfK x i g N i=1 of H , then the orthogonal projection π N x of H onto H N x is given by for any f ∈ H , where of fx 1 , ⋯ , x N g.More generally, if fK x i g N i=1 are not linearly independent, for any subset s Note that, in general, ω π i is not simply a multiple of f ðx i Þ, hence, setting ω i ≔ V i f ðx i Þ for any fixed V i will not yield the best possible approximation.However, with such coefficients dependent only on x i , it will be easier to bound kf Then, any upper bound on k f − ∑ i ω i K x i k for some fixed ðω i Þ i will also be an upper bound on k f − π N x f k.
Theorem 9. (Convergence in Probability of Projections) Let X, P, K, and H be as in Theorem 8.For each sequence x = ðx i Þ i ∈ X ℕ and each n ∈ ℕ, let π n x denote the orthogonal projection of H onto spanfK x i g n i=1 .Let f ∈ H and, for each δ > 0 and n ∈ ℕ, define Then, for each δ > 0 where d H ðf , H P Þ = inf g∈H P ∥f − g∥.
In particular, if f belongs to H P H , the closure of H P with respect to the topology of H , then Proof.Let λ ∈ L 2 ðX, P K Þ and fix δ > 0, arbitrary.Then hence, with notation as in (45), we have B n,δ ⊆ A n,δ .By Theorem 8, this implies Therefore, Thus, since the left-hand side is independent of λ, In particular, if f belongs to H P H , then d H ðf , H P Þ = 0. ☐

Uniqueness Sets and Almost Certain Convergence of
Projections.With notation and assumptions as before, we now follow ([2], Subsection 2.4.4) in recalling the strong convergence of π N x to the identity map as N ⟶ ∞ for appropriately chosen ðx i Þ ∞ i=1 .Since H is separable, there exists a countable subset of fK p g p∈X which is total in H ; thus, there exists a countable set F ⊆ X such that spanfK x g x∈F is dense in H .This motivates the following definition: under the topology of H , with distance decreasing monotonically.Consequently, 9 Journal of Function Spaces for f , g ∈ H , and for f ∈ H , x ∈ X.This has implications in interpolation theory, e.g., see ([2], Corollary~2.6).
Coming back to our problem, by noting that , is monotonically nonincreasing with respect to n, our next goal is to strengthen Theorem 9 to almost certain convergence after passing to a single measure space.First, recall that, e.g., see ( [13], Proposition 10.6.1), the countably infinite product space X ℕ equipped with the smallest σ-algebra rendering each projection map X i : X ℕ ⟶ X measurable admits a unique probability measure P ℕ such that the projection maps are independent random variables with distribution P. Lemma 10.Let X, P, K, and H be as in Theorem 8 and f ∈ H .For each δ > 0 define and Then, and, consequently, if f ∈ H P H , then , and hence S n,δ ⊆ S m,δ for each δ > 0.Then, hence, for any λ ∈ L 2 ðX, P K Þ, since P ℕ is monotone and S δ ⊆ S N,δ for all N ∈ ℕ. ☐ The main result of this subsection is the following.
Theorem 11. (Almost Certain Convergence of Projections) Let X, P, K, H be as in Theorem 8 and suppose H P is dense in H .Then, for each f ∈ H , hence, Proof.Let f ∈ H .With the same sets S δ defined in (69), Observe further that S δ ⊆ S δ ′ whenever δ > δ′, and for each δ > 0 there exists m ∈ ℕ such that δ > 1/m, so that thus, taking into account that H P is dense in H and using Lemma 10, we get Since H is separable let D be a countable dense subset of H . Since each π n x is a continuous linear operator with operator norm 1, Thus, by the countable subadditivity of P ℕ , ☐ In summary, for a given probability measure P under the assumption that it renders the space H P , the image of L P,K , dense in H , a sequence of points sampled independently from P yields a uniqueness set with probability 1. Proposition shows a sufficient condition, valid for many applications, when this assumption holds.
4.1.Uniform Distribution on a Compact Interval.Let ðμ j Þ j∈ℤ ∈ l 1 ðℤÞ be such that μ j > 0 for all j ∈ ℤ and denote μ ≔ ∑ j∈ℤ μ j .For each j ∈ ℤ define and consider the Hilbert space with the inner product Then f ffiffiffiffi ffi μ j p ϕ j g j∈ℤ is an orthonormal basis of H and, for an arbitrary function f ∈ H , we have the Fourier representation with coefficients fc j g j∈ℤ subject to the condition where the convergence of the series from (83) is at least guaranteed with respect to the norm k•k H .However, for any m ∈ ℕ 0 and t ∈ ½−π, π, by the Cauchy inequality, we have hence, the convergence in ( 83) is absolutely and uniformly on ½−π, π, in particular f is continuous.
By (12), H has the reproducing kernel and the convergence of the series is guaranteed at least pointwise.In addition, for any t ∈ ½−π, π, we have and hence the kernel K is bounded.In particular, this implies that, actually, the series in (86) converges absolutely and uniformly on ½−π, π, hence, the kernel K is continuous on ½−π, π × ½−π, π.That is, Kðs, tÞ is given by κðs − tÞ where κ : ℝ ⟶ ℂ is a continuous function with period 2π whose Fourier coefficients ðμ j Þ j∈ℤ are all positive and absolutely summable.
We consider now the nonexpansive operator L P,K : L 2 P K ½ −π, π ⟶ H defined as in (26).Then, for any j ∈ ℤ and t ∈ ½−π, π, we have where the series commutes with the integral either by the Bounded Convergence Theorem for the Lebesgue integral, or by using the uniform convergence of the series and the Riemann integral.Similarly, the Hilbert space H P ≔ L P,K ð L 2 P K ½−π, πÞ, as in Proposition 5, is a RKHS, with kernel, Thus, letting μ j ′ ≔ μ 2 j /μ ≤ μ j , j ∈ ℤ and noting that ∑ j∈ℤ μ j ′ ≤ ∑ j∈ℤ μ j < ∞, we have In particular, H P is dense in H since both contain span fϕ j g j∈ℤ as dense subsets, but this follows from the more general statement in Proposition 6 as well.
11 Journal of Function Spaces Let now λ ∈ L 2 P K ½−π, π = L 2 P ½−π, π be arbitrary, hence Then, and, consequently, Also, for arbitrary f ∈ H as in ( 83) and (84), we have Let ðx n Þ n∈ℕ be a sequence of points in ½−π, π.By Theorem 8 and taking into account of the inequality (61), for any N ∈ ℕ and δ > 0, we have On the other hand, we observe that in the inequality (95) the left hand side does not depend on λ and hence, for any ε > 0 there exists λ ∈ L 2 P K ½−π, π such that and then, for sufficiently large N, we get In particular, if f ∈ H P , that is, the inequality (84) is replaced by the stronger one we can choose λ j = c j /μ j , j ∈ ℤ, and we have λ ∈ L 2 P K ½−π, π, hence For example, this is the case for f = ϕ k for some k ∈ ℤ, hence, c j = δ j,k , j ∈ ℤ, and letting λ = ϕ k /μ k , hence, λ j = δ j,k / μ j , j ∈ ℤ, we have f = L P,K λ and hence, This shows that, the larger μ k is, the faster ϕ k will be approximated but, since μ j → j 0, ϕ j s cannot be approximated uniformly, in the sense that there does not exist a single N to make each kϕ j − π N x ϕ j k H bounded by the same δ with the same probability η.
This analysis can be applied more generally to kernels that admit an expansion analogous to (86) under basis functions ðϕ j Þ j which constitute a total orthonormal set in L 2 ðX ; P K Þ, e.g., as guaranteed by Mercer's Theorem ([2], Theorem 2.30).

The Hardy Space H 2 ðDÞ.
We consider the open unit disc in the complex plane D = fz ∈ ℂ | jzj < 1g and the Szegö kernel where the series converges absolutely and uniformly on any compact subset of D. The RKHS associated to K is the Hardy space H 2 ðDÞ of all functions f : D ⟶ ℂ that are holomorphic in D with power series expansion 12 Journal of Function Spaces with norm For each ζ ∈ D, we have hence, the kernel K is unbounded.We consider P the normalized Lebesgue measure on D, that is, for z = x + iy = re iθ , we have hence, Then, L 2 ðD ; P K Þ is contractively embedded in L 2 ðD ; PÞ.Further on, in view of Proposition 5 and (101), for any z, ζ ∈ D, we have which, by using twice the Bounded Convergence Theorem for the Lebesgue measure, equals This shows that the RKHS H 2 P ðDÞ induced by K P consists of all functions h that are holomorphic in D with power series representation hðzÞ = ∑ ∞ n=0 h n z n and such that In particular, an orthonormal basis of H In order to calculate the operator L P,K : L 2 ðD ; P K Þ ⟶ H 2 ðDÞ, let λ ∈ L 2 ðD ; P K Þ be arbitrary, that is, λ is a complex-valued measurable function on D such that Then, in view of Proposition 3, we have which, by the Bounded Convergence Theorem, equals where for each integer n ≥ 0 we denote Observing that, letting ϕ n ðzÞ ≔ ffiffiffiffiffiffiffiffiffiffi n + 1 p z n , for all integer n ≥ 0 and z ∈ D, the set fϕ n g n≥0 is orthonormal in L 2 ðD ; PÞ, it follows that λ n = hλ, ϕ n i L 2 ðD;PÞ for all integer n ≥ 0 and, hence, ðλ n Þ n≥0 is the weighted sequence of Fourier coefficients of λ with respect to the system of orthonormal functions fϕ n g n≥0 in L 2 ðD ; PÞ.On the other hand, since L 2 ðD ; P K Þ is contractively embedded in L 2 ðD ; PÞ, this shows that L P,K is the restriction to L 2 ðD ; P K Þ of a Bergman type weighted projection of L 2 ðD ; PÞ onto a subspace of the Hardy space H 2 ðDÞ, that happens to be exactly H 2 P ðDÞ.Finally, let f ∈ H 2 ðDÞ with power series representation as in (102) and let λ ∈ L 2 ðD ; P K Þ with norm given as in (111).Then, by Theorem 8 and taking into account of the inequality (61), for any N ∈ ℕ and δ > 0, we have where z = ðz i Þ i∈ℕ denotes an arbitrary sequence of points in D and π N z denotes the projection of H 2 ðDÞ onto spanf K z i | i = 1, ⋯, Ng.By exploiting the fact that the left hand side 13 Journal of Function Spaces in (115) does not depend on λ and the density of H 2  ðDÞ in H 2 ðDÞ, for any ε > 0 there exists λ ∈ L 2 ðD ; P K Þ such that and hence, for N sufficiently large, we have Let us consider now the special case when the function f ∈ H 2 P ðDÞ, that is, with respect to the representation as in (102), we have the stronger condition In this case, letting calculations similar to (108) and (112) show that hence λ ∈ L 2 ðD ; P K Þ, and hence, the first term in the right hand side of (115) vanishes and we get ð122Þ For example, if f ðzÞ = z n for some integer n ≥ 0, then showing that better approximations are obtained for smaller n than for bigger n.

Conclusions
Certain key properties of Hilbert spaces drive the analysis that has been obtained in this article, as well as the properties of reproducing kernel Hilbert spaces that render them attractive for function approximation.The Hilbert space structure provides orthogonal projections as the unique best approximation, which can be computed using the reproducing property as an exact interpolation and are shown to converge monotonically to the function for uniqueness sets.The monotonicity of convergence is then used to derive almost certain convergence directly from convergence in probability, and thus establish sufficient conditions for almost every sequence of samples from a probability distribution to be a uniqueness set.For the approximation bound itself, stated in Theorem 8, the mean squared distance in Chebyshev's inequality can be calculated explicitly thanks to the norm being induced by an inner product and the existence of the Bochner integral.
We did not include in this article an example with the Gaussian kernel, one of the most useful kernels in applications, although calculations similar to those obtained in Section 4 are available.One of the reasons for this omission is that the Gaussian kernels have additional invariance and differentiability/analyticity properties that can be used in order to provide stronger results by using slightly different techniques that are in progress and will make the contents of a future research.