Global universality of the two-layer neural network with the $k$-rectified linear unit

This paper concerns the universality of the two-layer neural network with the $k$-rectified linear unit activation function with $k=1,2,\ldots$ with a suitable norm without any restriction on the shape of the domain. This type of result is called global universality, which extends the previous result for $k=1$ by the present authors. This paper covers $k$-sigmoidal functions as an application of the fundamental result on $k$-rectified linear unit functions.


Introduction
The goal of this note is to specify the closure of linear subspaces generated by the k-rectified linear unit functions under various norms.As in [1], for k ∈ ℕ, we set The function ReLU k is called the k-rectified linear unit (k -ReLU for short), which is introduced to compensate for the properties that ReLU does not have.Our approach will be a completely mathematical one.Recently, increasing attention has been paid to the k-ReLU function as well as the original ReLU function.For example, if k ≥ 2, the function k-ReLU is in the class C k−1 , so that it is smoother than the ReLU function.When we study neural networks, the function k-ReLU is called an activation function.As in [2], k-ReLU functions are used to reduce the amount of computation.Using this smoothness property, Siegel and Xu investigated the error estimates of the approximation [1].Mhaskar and Micchelli worked in compact sets in ℝ n , while in the present work, we consider the approximation on the whole real line.
A problem arises when we deal with k-ReLU as a function over the whole line.The function k-ReLU is not bounded on ℝ.Our goal in this paper is to propose a Banach space that allows us to handle such unbounded functions.Actually, for k = 1, 2, … , we let equipped with the norm and define Our main result in this paper is as follows: Understanding the structure of H ReLU k ℝ n is important in the field machine learning in the last decade.We refer to [4,5] for example.Furthermore, dealing with unbounded activation functions is important from the viewpoint of application (see [6]).Remark that the approximation over bounded domains has a long history (see [7]).
As is seen from the definition of the norm • Y k , when we have a function f ∈ Y k ℝ , with ease, we can find a function g ∈ H ReLU k ℝ such that lim x⟶±∞ f x − g x /1 + x k = 0.However, after choosing such a function g, we have to look for a way to control f − g inside any compact interval by a function h ∈ Y k ℝ ∩ C c ℝ .Although Y k ℝ consists of unbounded functions, we can manage to do so by induction on k.Actually, we will find h ∈ Y k ℝ ∩ C c ℝ such that f − g − h is sufficiently small once we are given a compact interval.
Theorem 1 says that the space Y k ℝ is mathematically suitable when we consider the activation function k-ReLU.We compare Theorem 1 with the following fundamental result by Cybenko.For a function space X ℝ over the real line ℝ and an open set Ω, X Ω stands for the restriction of each element f to Ω, that is, and the norm is given by Theorem 2 (see Cybenko [8]).Let K ⊂ Ω be a compact set and σ ℝ ⟶ ℝ be a continuous sigmoidal function.Then, for all f ∈ C K and ε > 0, there exists g ∈ H σ Ω such that We remark that Theorem 1 is not a direct consequence of Theorem 2. Theorem 2 concerns the uniform approximation over compact intervals, while Theorem 2 deals with the uniform approximation over the whole real line.We will prove Theorem 1 without using Theorem 2.
Let k = 0, 1, … .Our results readily can be carried over to the case of k-sigmoidal functions.As in Definition 4.
Needless to say, ReLU k is k-sigmoidal.If k = 0, then we say that σ is a continuous sigmoidal.As a corollary of Theorem 1, we extend this theorem to the case of k-sigmoidal.
We can transplant Theorem 3 to various Banach lattices over any open set Ω on the real line ℝ.Here and below, L 0 Ω denotes the set of all Lebesgue measurable functions from Ω to ℂ. Let X Ω be a Banach space contained in L 0 Ω endowed with the norm • X Ω .We say that X Ω is a Banach lattice if for any f ∈ L 0 Ω and g ∈ X Ω satisfying the estimate f x ≤ g x , i.e., x ∈ Ω, f ∈ X Ω , and the estimate f X Ω ≤ g X Ω holds.We refer to [3] for the case where X is the variable exponent Lebesgue spaces.See [9] for the function spaces to which Theorem 1 is applicable.
We write It is noteworthy that we can deal with the case of Ω = ℝ.
(1) The condition that χ Ω ∈ X Ω is a natural condition, since σ ∈ X Ω (2) If k = 0, then we saw in [9] that our result recaptures the result by Funahashi [10].So, our result includes a further extension of his result Remark 6.Let X Ω be a Banach lattice, and let σ be a 1 -sigmoidal.We put Then, by the result for the case of k = 1,

Proof of Theorem 1
We need the following lemmas: we embed Y k ℝ into a function space over ℝ = ℝ ∪ ±∞ .

Journal of Function Spaces
If k = 1, then this can be found in Lemma 3 in [9].
Proof.Observe that the inverse is given for F ∈ BC ℝ as follows: Since the operator Y k ℝ ⟶ BC ℝ , f ↦ f /1 + • k , preserves the norms, we see that this operator is an isomorphism.
We set We will use the following algebraic relation for Then, for all x ∈ ℝ, Proof of Lemma 8.By comparing the coefficients, we may reduce the matter to the proof of the following two equalities: for each ℓ = 0, 1 … , k − 1.We compute and then, Hence, Although ReLU k is unbounded, if we consider suitable linear combinations, we can approximate any function in c ℝ .

Lemma 9. Any function in C c ℝ can be approximated uniformly over ℝ by the functions in H ReLU
For the proof, we will use the following observation: if Proof.We induct on k.The base case k = 1 was proved already [9].Suppose that we have f ∈ C c ℝ with supp f ⊂ −a ′ , a for a, a ′ > 0. In fact, we can approximate f with the functions in H ReLU k ℝ supported in −a ′ , a .Let ε > 0 be given.By mollification and dilation, we may assume f ∈ C 1 ℝ .By the induction assumption, there exists ψ ∈ where ℓ = a + a′ = diam supp f .Note that Integrating estimate (22), we obtain Using Lemma 8, the dilation and translation, we choose φ * ∈ H ReLU k ℝ , which depends on k, a, and a ′ , such that supp φ * ⊂ ′ ,∞ and that φ * Therefore, the function τ is a function in H ReLU k ℝ satisfying supp τ ⊂ −a ′ , a and f − τ L ∞ < Cε, where C depends on k, a, and a ′ , that is, k and f .We will prove Theorems 1 and 3.
Proof of Theorem 1.We identify Y k with BC ℝ as in Lemma 7. We have to show that any finite Borel measure μ in ℝ which annihilates H ReLU k ℝ is zero.Since C c ℝ is contained in the closure of the space H ReLU k ℝ as we have seen in Lemma 9, μ is not supported on ℝ.Therefore, we have only to show that μ ∞ = 0 and that μ −∞ = 0.However, since we have shown that μ is not supported on ℝ, this is a direct consequence of the following observations: Proof of Theorem 3. We identify Y k with BC ℝ as in Lemma 7 once again.Then to show that is dense in BC ℝ under this identification, it suffices to show that any finite measure μ over ℝ is zero if it annihilates H σ ℝ .Assuming that μ annihilates H σ ℝ , we see that for any fixed b ∈ ℝ.Furthermore, Therefore, by the Lebesgue convergence theorem, letting a ⟶ ±∞ in (32), we have This means that μ annihilates H ReLU k ℝ .Thus, by Theorem 1, μ = 0.

Proof of Theorem 4-Application of Theorem 1
We show We have by Lemma 9. Hence, Thus, we prove the opposite inclusion.
For any f ∈ H ReLU k ℝ , there exist x is a polynomial of degree k − 1 both on K, ∞ and on −∞, − K for K ≫ 1. Fix R ≫ K for the time being.Then, we have We define

0, otherwise 41
By the use of Lemma 9, we choose a compactly supported function Then, we have Then, we have Thus, we obtain (36).
From (36), we deduce Thus, the proof is complete if X = Y k .For general Banach lattices X, we use a routine approximation procedure.We prove Since we know that Therefore, we have

Conclusion
We specified the closure of H ReLU k ℝ under the norm • Y k .This is useful when we consider the approximation by functions in the function space H ReLU k ℝ .We illustrated this situation using Banach lattices.Our result contains the existing result on the approximation by means of a variable exponent Lebesgue space.It is also remarkable that our attempt can be located as an attempt of understanding the neural network.For example, Carroll and Dikinson used the Radon transform [11], and other research employed some other topologies (see [12,13]).
Remark that this note is submitted as a preprint coded: https://arxiv.org/abs/2212.13713.

Discussion
So far, we can manage to handle the case where k is a nonnegative integer.Our discussion heavily depended on the algebraic relation such as Lemma 8. So, we do not know 5 Journal of Function Spaces how to handle the case where k is not an integer.Even for the case where k = 1/2, the problem is difficult.

3
Journal of Function Spacesagrees with 1 over a, ∞ .If t < −a ′ , then for τ