Analysis of Approximation by Linear Operators on Variable L p ( ⋅ ) ρ Spaces and Applications in Learning Theory

and Applied Analysis 3 When d = 1 and Ω = R, we apply the classical identity ∫ R g(x)dPΩ = ∫ ∞ 0 PΩ({x ∈ Ω : g(x) ≥ t})dt to the nonnegative function g(x) = (|fP(x)|/λ) p(x) and find that the above condition is equivalent to


Introduction
Approximation by Bernstein type positive linear operators has a long history and is an important topic in approximation theory.It started with Bernstein operators [1] for proving the Weierstrass theorem about the denseness of the set of polynomials in the space [0, 1] of continuous functions on the interval [0, 1].These classical operators are defined as   (, ) = ∑  =0 (/) , () for  ∈ [0, 1] and  ∈ [0, 1] with the Bernstein basis given by  , () = (   )   (1 − ) − .The Bernstein operators have been extended in various forms for the purpose of approximating discontinuous functions, by replacing the point evaluation functionals by some integrals.
The classical examples for approximation in   [0, 1] (with 1 ≤  < ∞), the Banach space of all integrable functions  on [0, 1] with the norm ‖‖   = (∫ ( Quantitative estimates for approximation by Bernstein type positive linear operators in [0, 1] or   [0, 1] have been presented in a large literature (e.g., [4,5]).See the book [6] and references therein for details and extensions to infinite intervals and linear combinations of positive operators for achieving high orders of approximation.
In this paper we provide a general framework for approximation by linear operators on variable  (⋅)   (Ω) spaces on an open subset Ω of R  .Here  : Ω → [1, ∞) is a measurable function called the exponent function and  is a positive bounded Borel measure on Ω.The variable space  (⋅)  (Ω) is a generalization of the weighted   spaces with a constant exponent  ∈ [  ≤ 1} .
The space  (⋅)  (Ω) is a Banach space [7].The idea of variable  (⋅) spaces was introduced by Orlicz [8].Motivated by connections to variational integrals with nonstandard growth related to modeling of electrorheological fluids [9], these function spaces have been developed in analysis and research topics include boundedness of maximal operators, continuity of translates, and denseness of smooth functions.We will not go into details which can be found in [7,10] and references therein.Instead, we only mention the following core condition on the log-Hölder continuity of the exponent function which leads to the boundedness of Hardy-Littlewood maximal operators and the rich theory of the variable  (⋅)  (Ω) spaces.Denote The issue of approximation by Bernstein type positive linear operators on variable  (⋅)  (Ω) spaces was raised by the second author in [11].It turned out that the variety of the exponent function  creates technical difficulty in the study of approximation.In particular, the uniform boundedness of the Bernstein-Kantorovich operators (1) and Bernstein-Durrmeyer operators ( 2) is already a difficult problem.The key analysis in [11] is to show that the Bernstein-Kantorovich operators and Bernstein-Durrmeyer operators are uniformly bounded when the exponent function  is Lipschitz  for some  ∈ (0, 1].It was conjectured there that the uniform boundedness still holds when  is log-Hölder continuous.The first main result of this paper is to confirm this conjecture in Theorem 6 below.
Our second main result is to abandon the positivity and present quantitative estimates for the high order approximation by linear operators including linear combinations of Bernstein type positive linear operators, extending the results in [11] for the first order approximation by positive operators.

Motivations from Learning Theory
Our main motivation for considering the approximation of functions by linear operators on variable  (⋅)  (Ω) spaces is from learning theory.Besides the example of extending the Bernstein-Durrmeyer operators (2) to those associated with a general probability measure  on Ω in [12,13] for the multivariate case, we mention two learning theory settings here.Since error analysis for concrete learning algorithms in terms of the introduced noise conditions involves sample error estimates which are out of the scope of this paper, we leave the detailed error bounds to our further study.

Noise Conditions for Classification and Approximation.
The first learning theory setting related to approximation on variable  (⋅)  (Ω) spaces is noise conditions for binary classification.Here Ω is an input space consisting of possible events while the output space is denoted as  = {1,−1}.A Borel probability measure  on the product space Ω ×  can be decomposed into its marginal distribution  Ω on Ω and conditional distributions (⋅ | ) for  ∈ Ω.A binary classifier  : Ω →  makes predictions () ∈  for future events  ∈ Ω.The best classifier   , called Bayes rule, is given by   () = 1 if (1 | ) > 1/2 and −1 otherwise.The probability measure  fits the binary classification problem well if the conditional probabilities (1 | ) and (−1 | ) are well separated from the boundary 1/2 for most events .Their separations are equivalent to the separation of the value   () of the regression function   () = ∫  ( | ) = (1 | ) − (−1 | ) from 0 and can be measured in various quantitative ways.The Tsybakov noise condition [14] with noise exponent  ∈ (0, ∞] asserts that for some constant   > 0, there holds When  = ∞, Tsybakov noise condition (7) means |  ()| ≥   almost surely, and   () is well separated from 0. The case  < ∞ means the measure of the set of events  with   () not well separated from 0 decays polynomially fast as the threshold    tends to 0. More details about the Tsybakov noise condition, the so-called Tsybakov function, and its applications to the study of classification problems can also be found in [15].Here we introduce a noise condition by allowing some noise situations measured by an exponent function .
Example 2. We say that the probability measure  satisfies the noise condition associated with an exponent function  : Ω → [0, ∞) if for some  > 0, there holds Remark 3. The above condition can be applied to the regression setting for dealing with unbounded regression functions.
The following is an example to show some differences.

Noise Conditions for Quantile Regression and Approximation.
The second learning theory setting related to approximation on variable  (⋅)  (Ω) spaces is noise conditions for quantile regression.Here the output space is  = R. Similar to the least squares regression [16] for learning means of conditional distributions (⋅ | ) but providing richer information [17] about response variables such as stretching or compressing tails, the learning problem for quantile regression aims at estimating quantiles of conditional distributions.With a quantile parameter 0 <  < 1, the value of a quantile regression function  , at  ∈ Ω is defined by its value  , () as a -quantile of (⋅ | ), that is, a value  * ∈  satisfying Quantile regression has been studied by kernel-based regularization schemes in a learning theory literature (e.g., [18,19]).For optimal error analysis of these learning algorithms, asymptotic behaviors of the conditional distributions near the -quantiles are needed.In particular, one is interested in how slow the following function decays as  decreases: A noise condition was introduced in [18] by requiring lower bounds  , () ≥    −1 for every  ∈ [0,  ] and some  ∈ (1, ∞),  ∈ (0, ∞) and constants   ,   > 0 satisfying (   −1  ) −1 ∈    Ω .This condition was extended to a logarithmic bound in [19] by replacing  −1 by (log(1/)) − and  −1  by (log(1/  )) − .Here we introduce the following noise condition which is more general than the one in [18] by allowing the indices ,  to depend on the events  ∈ Ω.
Example 5. We say that the probability measure  satisfies the quantile noise condition associated with exponent functions  : Ω → (1, ∞) and  : Ω → (0, ∞) if for every  ∈ , there exist a -quantile  * ∈ R and constants   ∈ (0, 2],   > 0 such that for each  ∈ [0,   ]  , () ≥    ()−1 (12) and that for some  > 0, there holds While the lower bounds (12) imply polynomial decays of the conditional distributions near the -quantiles with a power index depending on the event, the finiteness of the integral is equivalent to the requirement that the function

𝜌
Our first theorem is about the uniform boundedness of a sequence of linear operators on the variable  (⋅)  spaces.These operators take the form (13) in terms of their kernels {  (, )} ∞ =1 defined on Ω × Ω.We assume that the kernels satisfy the following three conditions with some positive constants  0 ≥ 1, ,   , and   (depending on  ∈ N) sup Then the uniform boundedness follows, which will be proved in Section 5.  defined by (13) are uniformly bounded as by a positive constant  , (depending on  and the constants in ( 14), (15), and (16), given explicitly in the proof).
The following theorem, to be proved in Section 5 and extending the results for  = 1 in [11], gives orders of approximation by linear operators on  (⋅)  (Ω) when the Kfunctional has explicit decay rates.
where  − is the integer part of  − /2 and the constant  ,, is independent of  ∈  (⋅)  (given explicitly in the proof).
The vanishing moment assumption (20) corresponds to Strang-Fix type conditions in the literature of shift-invariant spaces, for example [20,21].It has appeared in the literature of Bernstein type operators when linear combinations are considered, as described by (34) in the next section.

Approximation by Bernstein Type Operators
In this section we apply our main results to Bernstein type positive linear operators and give high orders of approximation by linear combinations of these operators on variable  (⋅)  (Ω) spaces.We demonstrate the analysis for the general Bernstein-Durrmeyer operators in detail and describe briefly results for the general Bernstein-Kantorovich operators as an example of other families of operators.
Since Ψ  (, ) ≥ 0, we know that With all the three conditions verified, the desired uniform bound (25) for the Bernstein-Durrmeyer operators follows from Theorem 6.This proves the proposition.
The Bernstein-Durrmeyer operators ( 23) are positive, which prevent from achieving high order approximation due to a saturation phenomenon.Linear combinations of such operators can be used to get high orders of approximation.The idea and literature review of this method can be found in [6] while further developments will not be mentioned here.The linear combinations are defined as where  , = ( +  − 1)!/!( − 1)! is the dimension of the space of polynomials of degree at most  − 1, and with two positive constants B1 , B2 independent of , we have For the classical Bernstein-Durrmeyer operators with respect to the Lebesgue measure (or even the Jacobi weights), the existence of the above linear combinations can be seen and found in the literature.The existence of such linear combinations with respect to the arbitrary measure  is a nontrivial problem and deserves intensive study.This technical question is out of the scope of this paper and will be discussed in our further work.Here we concentrate on the variable  (⋅)  (Ω) spaces and state the following result for the high orders of approximation under the condition (35) which is an immediate consequence of Theorem 7.
where  − is the integer part of  − /2 and the constant  ,, is independent of  ∈  (⋅)  .
Let us now briefly describe approximation results for the Bernstein-Kantorovich operators on  defined [27] as where { , }  are subdomains of  defined by In the same way as for the Bernstein-Durrmeyer operators, we have the following results for the Bernstein

Proof of Main Results
In this section we give detailed proof of our main results.Let us first prove Theorem 6.
For  ∈ Ω, we define two subsets Ω , and Ω  , of Ω as Set Then the value   (, ) can be decomposed into three parts as and we have In the following we estimate the three terms in (44) separately. Step Finally, we put the estimates (48), (56), and (59) into (44) to conclude Take  =  , := 1 + 3 We are now in a position to prove Theorem 7.
Proof of Theorem 7. We follow the standard procedure in approximation theory and consider the error   (, ) − () for  ∈  ,∞  .Apply the Taylor expansion where the remainder term  , (, ) is given by We see from the vanishing moment condition (20) that The proof of Theorem 7 is complete.