A global universality of two-layer neural networks with ReLU activations

In the present study, we investigate a universality of neural networks, which concerns a density of the set of two-layer neural networks in a function spaces. There are many works that handle the convergence over compact sets. In the present paper, we consider a global convergence by introducing a norm suitably, so that our results will be uniform over any compact set.


Introduction
Neural network is a function that models a neuron system of a biological brain, and defined as alternate compositions of affine map and a nonlinear map. The nonlinear map in a neural network is called activation function. The neural networks have been playing a central role in the field of machine learning with a vast number of applications in the real world in the last decade.
We focus on a two-layer feed-forward neural network with ReLU (Rectified Linear Unit) activation, that is a function f : Here, the function ReLU is called the rectified linear unit defined by ReLU(x) := max(x, 0).
The ReLU is one of the most popular activation functions for feed-forward neural networks in practical machine learning tasks for real world problems.
We consider the space of two-layer feedforward neural networks defined by the following linear space Then, it is natural to ask ourselves whether X spans a dense subspace of a function space (topological linear space). Historically, the density property of X in the space C(R) of continuous functions on R is investigated by several authors ([2, 3, 5]) as it is important to finding a feed-forward neural network f ∈ X that approximates an unknown continuous function. Here, the topology of C(R) is generated by the seminorms h → sup x∈K |h(x)|, where K ranges over all compact sets in R. Thus, the approximation property of two-layer feed-forward neural networks makes sense only on a local domain.
In this study, we prove a approximation property of X in a global sense. More precisely, we prove the space X is dense in the Banach subspace of C(R) defined as Note that any element in Y, divided by 1 + | · |, is a continous funciton over R := R ∪ {±∞}. Our main result in this paper is as follows: thm:201113-1 Theorem 1.1. The linear subspace X is dense in Y.
Before we conclude this section, we will offer some words on some existing results. See [8] for the L 2 -approximation over the real line. Other attempts has been made to grasp the neural network by the use of the Radon transform [1] or by considering some other topologies [5,7].

Proof of the main theorem
Lemma 2.2. The operator A : Y → BC(R) is an isomorphism from Y to BC(R).
A tacit understanding here is that we extend f 1+|·| , which is initially defined over R, continuously to R.
Thus, any continuous functional on Y is realized by a Borel measure over R.
Our theorem can recapture the case where the underlying domain is bounded. Indeed, if the domain Ω is contained in [−R, R] for some R > 0, then we have which will give results by Cybenko [2] and Funahashi [3]. Now we start the proof of Theorem 1.1. As Cybenko did in [2], take any measure µ over R such that µ annihilates X . We will show that µ = 0. Once this is proved, from the Riesz representation theorem we conclude that the only linear functional that vanishes on X is zero. Using the Hahn-Banach theorem, we see that X is dense in Y.

Remark that
Thus, any element in C c (R) can be approximated by a function X in the L ∞ -norm.
Thus, we conclude that X is dense in Y. Availability of data and material. No data and material were used to support this study.
Competing interests. The authors declare that there are no conflicts of interest regarding the publication of this paper.
Funding. This work was supported by a JST CREST Grant (Number JP-MJCR1913, Japan). This work was also supported by the RIKEN Junior Research Associate Program. The second author was supported by a Grant-in-Aid for Young Scientists Research (No.19K14581), Japan Society for the Promotion of Science. The fourth author was supported by a Grant-in-Aid for Scientific Research (C) (19K03546), Japan Society for the Promotion of Science.
Authors' contributions. The four authors contributed equally to this paper. All of them read the whole manuscript and approved the content of the paper.