A Smoothing Interval Neural Network

In many applications, it is natural to use interval data to describe various kinds of uncertainties. This paper is concerned with an interval neural network with a hidden layer. For the original interval neural network, it might cause oscillation in the learning procedure as indicated in our numerical experiments. In this paper, a smoothing interval neural network is proposed to prevent the weights oscillation during the learning procedure. Here, by smoothing we mean that, in a neighborhood of the origin, we replace the absolute values of the weights by a smooth function of the weights in the hidden layer and output layer. The convergence of a gradient algorithm for training the smoothing interval neural network is proved. Supporting numerical experiments are provided.


Introduction
In the last two decades artificial neural networks have been successfully applied to various domains, including pattern recognition 1 , forecasting 2, 3 , and data mining 4, 5 .One of the most widely used neural networks is the feedforward neural network with the well-known error backpropagation learning algorithm.But in most neural network architectures, input variables and the predicted results are represented in the form of single point value, not in the form of intervals.However, in real-life situations, available information is often uncertain, imprecise, and incomplete, which can be represented by fuzzy data, a generalization of interval data.So in many applications it is more natural to treat the input variables and the predicted results in the form of intervals than a set of single-point value.
Since multilayer feedforward neural networks have high capability as a universal approximator of nonlinear mappings 6-8 , some methods via neural networks for handling interval data have been proposed.For instance, in 9 , the BP algorithm 10, 11 was extended to the case of interval input vectors.In 12 , the author proposed a new extension of backpropagation by using interval arithmetic which called Interval Arithmetic Backpropagation IABP .This new algorithm permits the use of training samples and targets which can be indistinctly points and intervals.In 13 , the author proposed a new model of multilayer perceptron based on interval arithmetic that facilitates handling input and output interval data, where weights and biases are single valued and not interval valued.
However, weights oscillation phenomena during the learning procedure were observed in our numerical experiments for these interval neural networks models.In order to prevent the weights oscillation, a smoothing interval neuron is proposed in this paper.Here, by smoothing we mean that, in the activation function and in a neighborhood of the origin, we replace the absolute values of the weights by a smooth function of the weights.Gradient algorithms 14-17 are applied to train the smoothing interval neural network.The weak and strong convergence theorems of the algorithms are proved.Supporting numerical results are provided.
The remainder of this paper is organized as follows.Some basic notations of interval analysis are described in Section 2. The traditional interval neural network is introduced in Section 3. Section 4 is devoted to our smoothing interval neural network and the gradient algorithm.The convergence results of the gradient learning algorithm are shown in Section 5. Supporting numerical experiments are provided in Section 6.The appendix is devoted to the proof of the theorem.

Interval Arithmetic
Interval arithmetic as a tool appeared in numerical computing in late 1950s.Then the interval mathematic is a theory introduced by Moore 18 and Sunaga 19 in order to give control of errors in numeric computations.Fundamentals used in this paper are described below.
Let us denote the intervals by uppercase letters such as A and the real numbers by lowercase letters such as a.An interval can be represented by its lower bounds L and upper bounds U as A a L , a U , or equivalently by its midpoint C and radius R as A a C , a R , where

2.1
For intervals A a L , a U and B b L , b U , the basic interval operations are defined by where k is a constant.
If f is an increasing function, then the interval output is given by In this paper, we use the following weighted Euclidean distance for a pair of intervals A and The parameter β ∈ 0, 1 facilitates giving more importance to the prediction of the output centres or to the prediction of the radii.For β 1 learning concentrates on the prediction of the output interval centre and no importance is given to the prediction of its radius.For β 0.5 both predictions centres and radii have the same weights in the objective function.For our purpose, we assume β ∈ 0, 1 .

Interval Neural Network
In this paper, we consider an interval neural network with three layers, where the input and output are interval value, the weights are real value.The numbers of neurons for the input, hidden and output layers are N, M, 1, respectively.Let W m w m1 , w m2 , . . ., w mN T ∈ R N , m 1, 2, . . ., M be the weight matrix connecting the input and the hidden layers.The weight vector connecting the hidden and the output layers is denoted by W 0 w 0,1 , w 0,2 , . . ., w 0,M T ∈ R M .To simplify the presentation, we write In the interval neural network, a nonlinear activation function f x is used in the hidden layer, and a linear activation function in the output layer.
For an arbitrary interval-valued input X X 1 , X 2 , . . ., X N , where X i x C i , x R i , i 1, 2, . . ., N, as the weights of the proposed structure are real value, this linear combination results in a interval given by

3.1
Then the output of the interval neuron in the hidden layer is given by

3.2
Finally, the output of the interval neuron in the output layer is given by 3.5

Smoothing Interval Neural Network Structure
As revealed in the numerical experiment below in this paper, there appear weights oscillation phenomena during the learning procedure for the original interval neural network presented in the last section.In order to prevent the weights oscillation, we propose a smoothing interval neural network by replacing |w mi | and |w 0m | with a smooth function ϕ w mi and ϕ w 0m in 3.1 and 3.5 .Then, the output of the smoothing interval neuron in the hidden layer is defined as

4.2
The output of the smoothing interval neuron in the output layer is given by

4.3
For our purpose, ϕ x can be chosen as any smooth function that approximates |x| near the origin.For definiteness and simplicity, we choose ϕ x as a polynomial function: where μ > 0 is a small constant and We observe that the above defined ϕ x is a convex function in C 2 R , and it is identical to the absolute value function |x| outside the zero neighborhood −μ, μ .

Gradient Algorithm of the Smoothing Interval Neural Network
Suppose that we are supplied with a training sample set {X j , O j } J j 1 , where X j 's and O j 's are input and ideal output samples, respectively, as follows: Our task is to find the weights O j Y X j , j 1, 2, . . ., J.

4.6
But usually, the weight W W T 0 , W T 1 , . . ., W T M T satisfying 4.6 does not exit and, instead, the aim of the network learning is to choose the weight W to minimize an error function of the smoothing interval neural network.By 2.4 , a simple and typical error function is the quadratic error function:

4.7
Let us denote , j 1, 2, . . ., J, t ∈ R, then the error function 4.7 is rewritten as Now, we introduce the gradient algorithm 15, 16 for the smoothing interval neural network.The gradient of the error function E W with respect to W 0 is given by where

4.10
The gradient of the error function E W with respect to W m , m 1, 2, . . ., M is given by where

4.12
In the learning procedure, the weights W are iteratively refined as follows: where where η > 0 a constant learning rate and k 1, 2, . . . .

Convergence Theorem for SINN
For any x ∈ R n , its Euclidean norm is 0} be the stationary point set of the error function E W , where Ω ⊂ R NM M is a bounded region satisfying A2 below.Let Ω 0,s ⊂ R be the projection of Ω 0 onto the sth coordinate axis, that is, To analyze the convergence of the algorithm, we need the following assumptions.
A2 There exists a bounded region

A3
The learning rate η is small enough such that A.10 below is valid.
Now we are ready to present one convergence theorem of the learning algorithms.Its proof is given in the appendix later on.
Theorem 5.1.Let the error function E W be defined by 4.7 , and the weight sequence {W k } be generated by the learning procedure 4.13 and 4.14 for smoothing interval neuron with W 0 being an arbitrary initial guess.If Assumptions A1 , A2 , and A3 are valid, then we have Furthermore, if Assumption A4 also holds, there exists a point

Numerical Experiment
We compare the performances of the interval neural network and the smoothing interval neural network by approximating a simple interval function  For the above two interval neural networks, the error function E W is defined as in 4.7 .But in order to see the error more clearly in the figures, we will also use the error D defined by The number of training iterations is 2000, the initial midpoint of weight vector is selected randomly from −0.01, 0.01 , and two neurons are selected in the hidden layer.The fix learning rate is η 0.2, β 0.5, and μ 0.5.
In the learning procedure for the interval neural network, we clearly see from Figure 1 a that the gradient norm is not convergent.Figure 2 a shows that the error function D is oscillating and not convergent.On the contrary, we see from Figure 1 b that the gradient norm of the smoothing interval neural network is convergent.Figure 2 b shows that the error function D, as well as E, is monotone decreasing and convergent.
From this numerical experiment, we can see that the proposed smoothing neural network can efficiently avoid the oscillation during the training process.

Appendix
First, we give Lemmas A.1 and A.2.Then, we use them to prove Theorem 5.1.
For any a ∈ γ 1 , γ 2 , there exists 0, we observe that b m travels between γ 1 and γ 2 with very small pace for all large enough m.Hence, there must be infinite number of points of the sequence {b m } falling into a − ε, a ε .This implies a ∈ S and thus γ 1 , γ 2 ⊂ S. Furthermore, γ 1 , γ 2 ⊂ S immediately leads to γ 1 , γ 2 ⊂ S.This completes the proof.
Lemma A.2. Suppose Assumption A2 , A3 holds, for any k 0, 1, 2, . . .and 1 ≤ j ≤ J, then we have where M i i 0, 1, 2, 3, 4, 5, 6 is independent of k and j, ξ C 0,k,j lies on the segment between Proof.The proof of A.3 in Lemma A.2: For the given training sample set, by Assumption A2 , 4.2 , and 4.4 , it is easy to known that A.3 is valid.The proof of A.4 in Lemma A.2: by 4.9 and 4.14 , we have

A.11
This proves A.4 .The proof of A.5 in Lemma A.2: using the Mean Value Theorem, for any 1 ≤ m ≤ M, 1 ≤ j ≤ J, and k 0, 1, 2, . .., we have where t 1 k,j,m is on the segment between s R k,j,m .By A.3 , we have where τ k 1,m is on the segment between According to A.16 and A.13 , we can obtain that A.17 By A.17 , for any 1 ≤ j ≤ J and k 0, 1, 2, . .., we have According to the definition of f C j t , we get that f C j t t C j − o C j , combining with A.3 , we deduce that |f C j Φ C 0,k,j | ≤ 2M 0 .By A.18 , we have where M 1 βJM 0 max{1, 4M 4 0 }.This proves A.5 .The proof of A.6 in Lemma A.2: using the Taylor expansion, we get that where τ k 2,m , τ k 3,m both lie on the segment between W k 1 m and W k m .Similarly, we can deduce that where τ k 4,m , τ k 5,m both lie on the segment between W k 1 m and W k m .Combining with A.20 , we have

A.23
By A.23 , we get that where A.25

A.26
This together with A.25 leads to

A.27
This together with A.26 leads to A.28 A.29 Similarly, we can obtain that 1 2

A.30
So by A.28 , A.29 , and A.30 , we have A.31 with A.23 , similarly, we get that where τ k 6,m , τ k 7,m , τ k 8,m , τ k 9,m lie on the segment between W k 1 m and W k m , t 5 k,j,m lies on the segment between s and s C k,j,m − s R k,j,m .By A.32 , we have where A.34

A.35
By A.34 , we have with A.31 , similarly, this together with A.35 leads to A.37 By A.27 , A.31 , A.36 and A.37 , we obtain that

20
Discrete Dynamics in Nature and Society The proof of A.8 in Lemma A.2: With A.17 , similarly, for any 1 ≤ j ≤ J and k 0, 1, 2, . .., we can get that A.41 According to the definition of f R j t , we get that f R j t t R j − o R j , combining with A.3 , we can obtain that |f R j Φ R 0,k,j | ≤ 2M 0 .By A.16 and A.41 , we deduce that where M 4 1 − β JM 0 max{1, 4M 4 0 }.This proves A.8 .The proof of A.9 in lemma A.2: By |f R j Φ R 0,k,j | ≤ 2M 0 , A.3 and A.16 , we get that 1 2 where M 5 3/2μ 1 − β JM 2 0 .This proves A.9 .
The proof of A.10 in Lemma A.2: According to the definition of f R j t , we get that f R j t 1, combining with A.3 and A.41 , we have where M 6 1 − β JM 2 0 max{1, 4M 4 0 }.This proves A.10 .Thus this completes the proof of Lemma A.2. Now we are ready to prove Theorem 5.1.
Proof.Using the Taylor expansion and Lemma A.2, for any k 0, 1, 2, . .., we have Discrete Dynamics in Nature and Society where M 7 M 1 M 2 M 3 M 4 M 5 M 6 , ξ C 0,k,j lies on the segment between Φ C 0,k 1,j and Φ C 0,k,j , ξ R 0,k,j lies on the segment between Φ R 0,k 1,j and Φ R 0,k,j , T .Then, we see that w λ is an accumulation point of {w m } for any λ ∈ 0, 1 .But this means that Ω 0,1 has interior points, which contradicts A4 .Thus, w * must be a unique accumulation point of {w m } ∞ m 0 .This proves 5.4 .Thus this completes the proof of Theorem 5.1.

Y 0.01 × X 11 2 . 6 . 1 bFigure 1 :b
Figure 1: Norm of gradient of the interval neural network and the smoothing interval neuron in the training.

Figure 2 :
Figure 2: Values of the error function D for the interval neural network and the smoothing neural network.
A1 , the sequence {w m } m ∈ N has a subsequence {w m k } k ∈ N that is convergent to, say, w * ∈ Ω 0 .It follows from 5.3 and the continuity of E w w that This implies that w * is a stationary point of E w .Hence, {w m } has at least one accumulation point and every accumulation point must be a stationary point.Next, by reduction to absurdity, we prove that {w m } has precisely one accumulation point.Let us assume the contrary that {w m } has at least two accumulation points w / w.T.It is easy to see from 4.13 and 4.14 that lim m → ∞ w m 1 − . .., n p 1 .Without loss of generality, we assume that the first components of w and w do not equal to each other, that is, w 1 / w 1 .For any real number λ ∈ 0, 1 , let w λ .Repeating this procedure, we end up with decreasing subsequences{m k 1 } ⊃ {m k 2 } ⊃ • • • ⊃ {m k n p 1 } with w λ 2