A Novel Single Neuron Perceptron with Universal Approximation and XOR Computation Properties

We propose a biologically motivated brain-inspired single neuron perceptron (SNP) with universal approximation and XOR computation properties. This computational model extends the input pattern and is based on the excitatory and inhibitory learning rules inspired from neural connections in the human brain's nervous system. The resulting architecture of SNP can be trained by supervised excitatory and inhibitory online learning rules. The main features of proposed single layer perceptron are universal approximation property and low computational complexity. The method is tested on 6 UCI (University of California, Irvine) pattern recognition and classification datasets. Various comparisons with multilayer perceptron (MLP) with gradient decent backpropagation (GDBP) learning algorithm indicate the superiority of the approach in terms of higher accuracy, lower time, and spatial complexity, as well as faster training. Hence, we believe the proposed approach can be generally applicable to various problems such as in pattern recognition and classification.


Introduction
In various computer applications such as pattern recognition, classification, and prediction, a learning module can be implemented by various approaches including statistical, structural, and neural approaches. Among these methods, artificial neural networks (ANNs) are inspired by physiological workings of the brain. They are based on mathematical model of single neural cell (neuron) named single neuron perceptron (SNP) and try to resemble the actual networks of neurons in the brain. As computational models, SNP has particular characteristics such as the ability to learn and generalize. Although the multilayer perceptron (MLP) can approximate any functions [1,2], traditional SNP is not universal approximator. MLP can learn through the error backpropagation algorithm (EBP), whereby the error of output units is propagated back to adjust the connecting weights within the network. In MLP architecture, by increasing the number of neurons in input layer or (and) the number of neurons in output layer or (and) the number of neurons in hidden layer(s), the number of learning parameters and the algorithm computational complexity are significantly increased. This problem is usually referred to as the curse of dimensionality [3,4]. So many researchers have tried to propose more powerful single layer architectures and faster algorithms such as functional link networks (FLNs) and Levenberg-Marquardt (LM) and its modified and extended versions [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].
In contrast to the MLP, SNP and FLNs do not impose high computational complexity and are far from the curse of dimensionality. But because of disregarding the universal approximation property, SNP and FLNs are not very popular in the applications. In contrast to the previse knowledge about SNP, this paper aims to propose a novel SNP model that can solve the XOR problem and we show that it can be universal approximator. Proposed SNP can solve XOR problem only if additional nonlinear operator is used. As illustrated in the next section, the SNP universal approximation property can simply be archived by extending the input patterns and using the nonlinear operator max. Like functional link networks 2 Computational Intelligence and Neuroscience Input: Initial random weights; w 1 , w 2 , . . . , w n , w n+1 and input bias b (1) Take th learning sample ( th and )  (FLNs) [21], the proposed SNP does not include hidden units or expand the input vector, but guarantees universal approximation. FLNs are single-layer neural networks that can be considered as an alternative approach in the data mining to overcome the complexities associated with MLP [22] but they do not guarantee universal approximation.
The paper is organized as follows. Proposed SNP and universal approximation theorem are proposed in Section 2. Section 3 presents the numerical results, where the proposed SNP is compared with backpropagation MLP. There are various versions of backpropagation algorithms. In classification problems, we compare with gradient descent backpropagation (GDBP) [23], that is, the standard basic algorithm. Finally, conclusions are made in Section 4. Figure 1 shows the proposed SNP. In the figure, the model is presented as + 1-inputs single-output architecture. The variable is the input pattern and the variable is related target applied in the learning process (3). Let us extend the input pattern as follows:

Proposed Single Neuron Perceptron
Actually, max operation increases the input dimension to + 1.
So, the new input pattern has + 1 elements. In Figure 1, the input pattern is illustrated by vector 1≤ ≤ +1 and the calculated by the following formula is the final output: where is activation function and 1 , 2 , . . . , +1 , and b are adjustable weights. So, error can be achieved as follows: and the learning weights can be adjusted by the following excitatory learning rule: = + max ( , 0) ; for = 1, . . . , + 1 (4) and then by the following inhibitory rule: where is target, is output of network, is related error, and is the learning rate. Also can be trained by It should be added that max operation applied on the input pattern and also in the learning phase has been motivated from computational models of limbic system in the brain [24][25][26]. Limbic system is an emotional processor in the mammalian brain [27][28][29]. In these models [24][25][26], the max operator prepares the output and input of main parts of limbic system.
In summary, the feedforward computation and backward learning algorithm of proposed SNP, in an online form and with tansig activation function, is as in Algorithm 1.
In the algorithm, can be picked empirically or changed adaptively during the learning process according to the adaptive learning [30,31].
The proposed SNP solves the XOR problem. Consider 2−1 architecture with hardlim activation function and by using the following weights: 1 = −1, 2 = −2, V 3 = 2, and = −1; thus, where hardlim is calculated by the following formula: Since _ is in the form of (2), so _ based on SNP can approximate the XOR function. The proposed model has a lower computational complexity than other methods such as spiking neural networks [32] that solved XOR problem. The computational complexity of proposed SNP is ( ); this is while it profits from very simple questions adjusting the weights.
In the next section, we prove that SNP is a universal approximator and can approximate all real continuous functions.

Universal Approximation Theorem.
Let us ignore the activation function from the model and rewrite (2) like this Consider as the set of all equations in form (9) and ∞ ( 1 , 2 ) = sup ∈ | 1 ( ) − 2 ( )| as a submetric; then ( , ∞ ) is ametric space [33]. The following theorem shows that ( , ∞ ) is dense in We use the following Stone-Weierstrass theorem to prove the theorem.
Stone-Weierstrass Theorem (see [33,34]). Let be a set of real continuous functions on compact set . If (1) is algebra,that is, the set is closed under scalar multiplication (the closing under addition and multiplication is not necessary for real continuous functions [34]), (2) separates points on , that is, for every , ∈ such that ̸ = , there exists ∈ such that ( ) ̸ = ( ); and (3) vanishes at no point of , that is, for each ∈ , there exists ∈ such that ( ) ̸ = 0, then the uniform closure of consists of all real continuous functions on ; that is, ( , ∞ ) is dense in ( [ ], ∞ ).

Numerical Results
One parameter that related to computational complexity of a learning method is the number of learning weights in each epoch. The lower number of learning weights concludes lower number of computations and lower computational complexity. To evaluate the number of proposed SNP learning weights 4 Computational Intelligence and Neuroscience The Rw is a measure that can be used to compare the computational complexity of proposed SNP and MPL. The higher Rw shows SNP has a lower number of learning weights. Thus, it has a lower number of computations and so has a lower computational complexity. Additionally, in the classification problems, the accuracy can be a proper performance measure to evaluate the algorithms. This measure is generally expressed as follows: For all learning scenarios listed below, the training set contained 70% while the testing set contained 15% of the data and the remaining was used for the validation set. Input patterns have been normalized between [0 1]. Output targets are binary digits (i.e., the single class is labeled by digits "1" and "0, " the two classes are labeled as "01" and "10, " and the three classes are labeled as "001, " "010, " and "100, " and. . .). Also the initial weights were randomly selected between [0 1].
Here and prior to entering comparative numerical studies, let us analyze the computational complexity. Regarding the proposed learning algorithm, the algorithm adjusts (2 ) weights for each learning sample, where is number of input attributes. In contrast, computational time is ( ) for MLP, where is number of hidden neurons (the lowest is 2). Additionally, GDBP MLP compared here is based on derivative computations which impose high complexity, while the proposed method is derivative free. So, the proposed method has lower computational complexity and higher efficiency with respect to the MLP. This improved computing efficiency can be important for online predictions, especially when the time interval of observations is small.
To test and assess the SNP in classification, 6  labeling was binary. Table 1 shows the information related to the datasets that include the number of attributes and instances. Additionally, the SNP and MLP architectures and the number of learning weights and Rw are presented in the table too. As illustrated in Table 1, SNP reduces the number of learning weights approximately about 50% for each dataset.
In the proposed SNP algorithm, we consider = 0. And the learning parameters values are shown in Table 1. The activation function was tansig and the stop criterion in learning process was the maximum epochs, which means the maximum number of epochs has been reached. The maximum and minimum values of each dataset were determined and the scaled data (between 0 and 1) were used to adjust the weights. The training was repeated 10 times and the average of accuracy in test set was recorded. Figure 2 presents the accuracy average and the confidence interval obtained from SNP and MLP. It is obvious that SNP is more accurate than MLP with GDBP algorithm in some datasets. The results indicated in Figure 2 are based on student's -test with 95% confidence.
Although, according to Figure 2, it seems that GDBP is better in some cases, what is very important in the results is number of learning epochs. Table 2 shows the learning epoch comparisons. According to Table 2, MLP needs many epochs to reach the results of SNP. It is the main feature of proposed SNP, fast learning with lower computational complexity, that makes it suitable for usage in various applications and especially in online problems.
Computational Intelligence and Neuroscience 5

Conclusion
In this paper, we prove that a single neuron perceptron (SNP) can solve XOR problem and can be a universal approximator. These features can be achieved by extending input pattern and by using max operator. SNP with this extension ability is a novel computational model of neural cell that is learnt by excitatory and inhibitory rules. This new SNP architecture works with fewer numbers of learning weights. Specifically, it only generates (2 ) learning weights and only requires (2 ) operations during each training iteration, where is size of input vector. Furthermore, the universal approximation property is theoretically proved for this architecture. The source code of proposed algorithm is accessible from http://www.bitools.ir/projects.html. In numerical studies, SNP was utilized to classify 6 UCI datasets. The comparisons between proposed SNP and backpropagation MLP present the following conclusions. Firstly, the number of learning parameters of SNP is much lower with respect to the standard MLP. Secondly, in classification problems, the performance of supervised excitatory and inhibitory learning algorithm is higher than gradient descent backpropagation (GDBP). Thirdly, lower computational complexity caused from the fewer learning parameters and faster training of proposed SNP make it suitable for real time classification. In short, SNP is a universal approximator with a simple structure and is motivated by neurophysiological knowledge of the human's brain. We believe, based on the multiple case studies as well as the theoretical results in this report, that SNP can be effectively used in pattern recognition and classification problems.