A Novel Kernel for RBF Based Neural Networks

Radial basis function (RBF) is well known to provide excellent performance in function approximation and pattern classification. The conventional RBF uses basis functions which rely on distance measures such as Gaussian kernel of Euclidean distance (ED) between feature vector and neuron’s center, and so forth. In this work, we introduce a novel RBF artificial neural network (ANN) where the basis function utilizes a linear combination of ED based Gaussian kernel and a cosine kernel where the cosine kernel computes the angle between feature and center vectors. Novelty of the proposed work relies on the fact that we have shown that there may be scenarios where the two feature vectors (FV) are more prominently distinguishable via the proposed cosine measure as compared to the conventional ED measure. We discuss adaptive symbol detection for multiple phase shift keying (MPSK) signals as a practical example to show where the angle information can be pivotal which in turn justifies our proposed RBF kernel. To corroborate our theoretical developments, we investigate the performance of the proposed RBF for the problems pertaining to three different domains. Our results show that the proposed RBF outperforms the conventional RBF by a remarkable margin.


Introduction
Computational model for neural networks was first proposed by McCulloch and Pitts [1].Since then, artificial neural networks (ANN) have been recognized as a decision making tool by many researches [2][3][4].ANN is particularly very useful in solving problems which are difficult to solve with the conventional rule-based programming [5].
Simple but yet powerful generalization capability of ANN had drawn the attention of numerous past and present researchers [2,3,6,7].It all started with Rosenblatt when he created the perceptron [8], a pattern recognition algorithm for supervised classification.
However, Rosenblatt's idea could not be translated into a computer program until the development of backpropagation algorithm which has so far been the most popularly used algorithm in ANN paradigm [9].Thereafter, immense research was done in this field, and in last 50 years or so there has been extraordinary growth in this domain and the result is invention of several sophisticated algorithms [6,10,11].
Radial basis function (RBF) network [12] is an ANN and its activation functions are radial basis functions.It was first introduced by Broomhead and Lowe [10] and since then it has become a very popular methodology to solve problems that suit ANN paradigm [11][12][13][14].The main advantage of RBF when compared with other algorithms based on ANN paradigm is the simplicity of the computation of network parameters [12].Another very important feature of radial basis function neural networks is to be able to perform complex nonlinear mappings that allow a fast linear and robust learning mechanism [5].Originally, RBF networks were developed for data interpolation in high dimensional space [12].Nonetheless, RBF networks have been used in diverse domains, including pattern classification [7], time series prediction [13], system control [14], and function approximation [15].Some of the most commonly used basis functions are Gaussian functions [12], multiquadric functions [12], thin plate spline function [12], inverse multiquadric functions [12], and so forth.There is no general rule, but the choice of a radial basis function is highly problem specific.Also, most applications using RBF make use of a free shape parameter that plays pivotal role in the accuracy of the method and is commonly chosen with the help of cross-validation [16] technique.It is a standard practice [12] to learn three sets of parameters for RBF network: locations, widths, and weight factors of RBF kernel.Enormous amount of work [17][18][19] has already been done to select those parameters optimally.
In the conventional RBF kernel, mostly Gaussian of the Euclidean distance between feature vector and neuron's center is used [17].However, there can be scenarios where Euclidean distance is not the dominant measure to find separation among the features, for example, if two feature vectors are separated by equal distance from a center but separated from the center via unequal angles.In that case, the cosine of the angle can play a vital role in differentiating the feature vectors.We have discussed in detail such scenarios in Section 3.
Motivated by this observation, we propose a novel RBF kernel which consists of a linear combination of Gaussian and cosine RBF kernels.The cosine RBF kernel computes the cosine of the angle between supplied feature vector and the center vector associated with that neuron.
There are some existing works in the literature that had discussed usage of cosine measure with RBF kernels [20][21][22][23][24][25][26].Karayiannis and Randolph-Gips [27] have proposed a novel RBF which is a normalized version of the multiquadratic radial basis function, where the cosine represents the angle between the transformed vectors rather than the original vectors.Liu et al. [28] have used cosine similarity measure to achieve high performance of classification by selecting meaningful features.They compute the cosine similarity among the kernels rather than the original vectors.By doing so, they transformed all the vectors to the same length, whereas we do not perturb the feature space.Moreover, these cosine kernels are developed for Support vector machine (SVM).Cho and Saul [29] have used arc cosine of the angles between inputs in their kernel.
To the best of our knowledge, the existing works in the literature related to the idea of incorporating cosine measure inside kernels have been used either in SVM kernel [28,29] without using ANN paradigm or in a transformed domain [30] in ANN paradigm.Our work is different from all of them in many aspects.Firstly, we proposed a cosine kernel in the original vector space rather than in any transformed space.
Secondly, we incorporated the effect of both ED and cosine measures with a linear combination.Finally, unlike the existing works [27][28][29], we used the proposed kernel with RBF ANN classifier.
In order to validate our theoretical developments, we have investigated two different research problems: pattern recognition and nonlinear plant identification.
We composed the rest of this paper in the following manner.Section 2 provides an overview of conventional RBF.In Section 3, we describe our proposed algorithm.In Sections 4 and 5, we provide the proof-of principle of our method with application examples.Then, in Section 6, we discuss our results in detail.

Overview of the Conventional RBF
RBF networks in their general form consist of three layers: an input layer, a hidden layer where nonlinear activation functions operate, and a linear output layer as shown in Figure 1.Generally, the input is a real vector, x ∈ R n .The network output maps the input vector to a scalar, : R n → , which is achieved by employing the following equation: where  and  0 are the number of hidden and output layer neurons, respectively, c i ∈ R n is the center for th neuron,   is output layer weight for th neuron,   is the bias term for the th output neuron, and   is the basis function associated with th hidden neuron.RBF solves a problem by mapping it into a high dimensional space in a nonlinear manner and then applies linear decision boundary.The concept of transformation to high dimensional space is justified by Cover's theorem, according to which classification via linear separation becomes easier by translating the features from low dimension to high dimension [31].
The significance of adding bias to the output is to improve the approximation quality by shifting the decision boundary.The weights of the network govern the position of the decision boundary in the feature space.However, during the adaptive weight update, if bias is not used, then the hyperplane is forced to pass through the origin of the feature space defined by the inputs or feature vectors.Although it is valid for some problems, in many others this separation boundary is desired to be located somewhere else.
As a general rule, all inputs are connected to each hidden neuron.The domain of activation function is a norm which is typically taken to be the Euclidean distance between input and the centers of every neuron.Most commonly used RBF kernels are as follows [5].

Multiquadrics:
Inverse multiquadrics: Gaussian: where  > 0 is a constant and  is spread parameter.The sensitivity of a hidden neuron towards a data point varies in proportion with the distance of the data point from its center.For example, in case of a conventional ED based RBF network that uses Gaussian in its kernel, this sensitivity can be fine-tuned by adjusting ; if  is large, it implies less sensitivity and vice versa.The weights and biases are usually updated adaptively by implying the following steepest decent approach [32]: where  is the learning rate for the network and () is the error between the desired and the actual output of the RBF for the th iteration.

Proposed RBF
Intuition suggests that ED is not the only measure to contrast the FVs.For example, in the case when FVs are equally separated in distance, then the ED will be no more effective.
To deal with this issue, we proposed a generalized RBF kernel by linearly combining the conventional ED based RBF kernel and our proposed cosine based RBF kernel which can be formulated as follows: where  1 ,  2 are weightage parameters for cosine and Euclidean kernels, respectively, which can acquire values in this range: are the cosine and the Euclidean kernels, respectively, for th neuron.These are defined as follows: where x ⋅ c i represents the dot product between the two vectors.Consequently, (7) can be rewritten as follows: where  is the length of each incoming feature vector x.By observing (7), we can notice that the kernel  1 (x⋅c i ) computes the cosine of angle between x and c i .Hence,  1 (x ⋅ c i ) may attain the values in the range [−1, +1].If it returns to 1, it implies that the x is aligned with c i , whereas its 0 return value corresponds to the scenario when x is perfectly orthogonal to c i ; and the return value of −1 indicates that x and c i are aligned in opposite directions.
3.1.Some Special Scenarios Related to the Proposed RBF Kernel.In order to get more insight into the advantage of our proposed RBF kernel, we consider 2 feature vectors f 1 and f 2 separated by distances  1 and  2 and angles  and , respectively, from the center vector c (see Figure 2).We explore some special scenarios of the aforementioned situation in the ensuing subsections.

Scenario 1.
Consider In this case, we clearly see (see Figure 2(a)) that a RBF kernel based on Euclidean distance will not be able to differentiate between f 1 and f 2 .Using the cosine of such angle as activation, we propose a new kernel for RBF which can work in the described scenario.Therefore, we get rid of the ED component from ( 6) by setting  2 = 0 and  1 = 1 and hence obtain one cosine kernel which is defined by ( 7) and (9).A similar scenario is observed in our pattern classification application dataset discussed in Section 5.2.To support our argument, we performed a statistical analysis using the Silhouette widths [33].

Scenario 2. Consider
We clearly see (see Figure 2(b)) that, for this scenario, the Euclidean kernel is more suitable than ours, so we set  2 = 0 and  1 = 1 in (6) and the resulting kernel is defined in (6).

Scenario 3. Consider
Form (6), it is clear that we fused the two classifiers with certain weightage [34].It is also evident that the weightage parameters in (6) can be tuned according to various different mentioned scenarios to produce good classification results.For example, we saw that, in scenario 1, we have  1 = 1 and  1 = 0 and in scenario 2 they will assume value is complementary to the scenario 1.However, in scenarios like 3 (see Figure 2(c)), weightage parameters for the kernel can assume any values in the range (0, 1) but with one condition: summation of  1 and  2 is always unity.
We have another scenario where  1 =  2 and also  = .In this case although individually both conventional and our RBF will fail to produce good results, we anticipate that proper choice weightage parameters may improve the results.

Properties of the
The above mentioned properties can be justified with the help of ( 6)- (8).For example, in (6), if we swap the orders of x and c i , then the results of the equation remain unchanged; hence, proposed kernel obeys commutative property.
Similarly, if we plug x = c i in (6), then with the help of ( 7) and ( 8) the right-hand side of (6) boils down to the summation of the weightage parameters  1 and  2 .We have already discussed earlier that summation of the weightage parameters is 1 always, which justifies the second property of our kernel.
Finally, from ( 7) and ( 8), we see that upper bounds of both the kernels are 1; deploying this concept in (6), we can write |  (x, c i )| ≤ 1, which is the third property.

Intuition from a Practical Example: Adaptive Symbol Detection of MPSK Modulated Symbols
In digital communication systems, MPSK modulation is a common and well-known practice.When such signals are transmitted through noisy channel, the symbols are dispersed around its original locations (see Figure 3).As a result, the system performance degrades severely.Therefore, it is crucial to design efficient receiver capable of recovering the original symbols without errors.For this purpose, there are many well-known methods in the literature.However, our rationale to discuss this problem here is to show a practical example where the angle measure is more discriminative than the ED measure.To get more insights, observe the scatter plot of MPSK modulated signals (For  = 16 symbols) shown in Figure 4.It can be clearly seen from Figure 4 that these modulated signals have equal amplitudes but differ in their phases.Thus, intuitively, one can say that in this scenario a more suitable receiver will be the one which can deal with the phases only.Therefore, our proposed kernel will be more preferred candidate in this case.Our simulation study for this problem presented in Section 5.1 supports our intuition and our proposed RBF kernel is found to be better than the conventional ED based RBF kernel.

Comparative Study of the Conventional and the Proposed RBF
In the present work, we aim to show a proof-of-principle application for our proposed algorithm.We conducted a comparative study by applying both the conventional RBF and our proposed RBF to the following domains, where very often RBF network has been used for classification.10.In order to recover the original transmitted signals, we employ our proposed RBF kernel and compare its performance with the conventional ED based RBF.For this purpose, we used four neurons in the hidden layer and one in the output layer with  = 0.05 and  = 1.

Pattern Recognition: Classification of Leaves.
Our dataset is extracted from [35].The dataset contains three features: shape, margin, and texture for one hundred plant species leaves and, for each feature, a 64-element vector is given per sample of leaf.We have concatenated these features into a single feature vector for every sample.As a result, the length of one feature vector is 192.In order to perform a binary classification, we had chosen samples for the species Acer campestre and Zelkova serrata which consisted of 16 instances per class originally.Therefore, the problem to be solved is as follows.Given these features, classify the leaves of the abovementioned species with minimum possible classification error.We made a comparative study of convergence time in terms of number of epochs and accuracy using both approaches: conventional ED based RBF and the proposed RBF whose results are shown in Figures 4, 5 and 6 and are discussed in Section 5.2.
We have used adaptive kernels for both the conventional and the proposed RBF.We tried different combinations of , , and  and after exhaustive investigation we got 100% accuracy at  = 0.05,  = 1, and  = 6.A vital step towards nonlinear plant identification is the development of a nonlinear model [36].Therefore, it is very important to develop as accurate models as possible for plants which have highly nonlinear behavior.ANN in general and RBF to be specific have been used very often in this regard [37][38][39].The system model is shown in Figure 9.In our study, we have considered a highly nonlinear plant whose output and input can be mapped with the following relation:

Control Theory
where () is plant's input, () is the plant's disturbance which we have modeled as zero mean normal distributed random variable,   's are polynomial coefficients defining systems zeros, and  > 0 is a constant.In our experiment, we have chosen the polynomial coefficients as  1 = 2,  2 = −0.5,  3 = −0.1, 4 = −0.7,and  = 3.
In Figure 9, () is the plant's impulse response and () is the final output of the plant.p(), ŷ() are estimations of (), (), respectively, and () is error in the estimation.In this study, we have generated plant's disturbance with a variance of 0.0025.
Since our method relies on the fact that norms of the vectors will be greater than zero and in the plant's input we may have zero values in some occasion, hence we need to modify (7) in order to be meaningful here as follows: where the term  > 0; a very small number is added to the denominator in order to avoid divide by zero scenario.
As before, we have used adaptive kernels for both the conventional and the proposed RBF.We tried different combinations of , , and  but ultimately got 100% accuracy at  = 0.05,  = 1, and  = 41.The results of this application are shown in  which are discussed in Section 5.2.Environment.In this application, we compare the total mean square error (MSE) of our algorithm with that of conventional ED based RBF.The MSE of the two algorithms is computed as the sum of average value of the squared error and mathematically it can be defined as follows:

Recovery of MPSK Modulated Signals in Noisy
where  is the total number of epochs, o is vector of true symbols, o  is the vector of predicted symbols, and  is the expectation operator showing ensemble average.For comprehensive comparison, we investigated the performance of the two algorithms for three values of signal-to-noise ratio (SNR) which are 10 dB, 20 dB, and 30 dB.These results are reported in Table 1, which show that the proposed RBF kernel has superior performance over the conventional one.
Moreover, the scatter plot of the recovered signals at 10 dB SNR is shown in Figure 13, which clearly shows that the effect of noise is eliminated efficiently by the proposed RBF kernel.Through statistical data analysis, we have evaluated our claim that, for our data, cosine RBF is more powerful than its Euclidean counterpart.Silhouette widths, which were first described by Rousseeuw [33], provide a succinct graphical representation of how well each object lies within its cluster.In other words, we can visualize how well the algorithm associates each data point with its center.
For validation with standard approach, we used k-means and Silhouette functions from MATLAB statistics toolbox.The result is shown in Figure 3. Ideally, observations with large and positive Silhouette value (∼1) are well clustered, those with Silhouette value around 0 lie between clusters, and those with negative Silhouette value are placed in "wrong" cluster.It is worth noticing that in Figure 3(a) where we used Euclidean kernel some sample values are clustered as negatives indicating misclassification.We found that it is fixed in Figure 3(b) where we used cosine kernel.
Our observation is directly backed by testing accuracies of both the approaches.We started our experiments with 100 epochs for both the conventional RBF kernel and our proposed kernel and repeated the experiments for many runs.In every subsequent run, we increased the number of epochs by 50.However, we achieved 100% accuracy with our kernel once it reached to 150 epochs as shown in Figure 8.In  Figure 7, we noticed that Euclidean kernel takes 3500 epochs to acquire 100% classification accuracy, which is much higher than the one obtained by the proposed kernel.This fact is also reflected in Figure 6, where we show the comparative objective functions for the three approaches used and it is evident from there that both the proposed kernel and only cosine part of it work very well with this dataset, whereas the conventional kernel takes much more epochs to minimize the objective function.6.3.Nonlinear Plant Identification.The plant's identification error for both the RBFs is shown in Figure 10 which indicates that our method converges faster than ED based RBF.Moreover, it can be seen from Figure 11 that the proposed kernel emulating the plants behavior very well except in transition phases where it produces spikes, which results due to the state transition of the input square wave.Furthermore, the proposed RBF has faster convergence speed as compared to its counterpart; that is, it takes smaller number of iterations to get accustomed again as compared to the conventional RBF as evident from Figure 11.

Conclusion
In this work, we have introduced a new generalized RBF kernel by fusing the conventional ED kernel and the proposed cosine kernel.The proposed RBF kernel promises good performance in the scenarios where the angle between the feature and center vectors is distinguishable.This fact is also observed in our application studies where we have shown via statistical analysis of Silhouette widths why the proposed RBF kernel is more suitable in such scenarios.
For validating the performance of the proposed RBF kernel, we have investigated three diverse nature of applications: adaptive symbol detection of MPSK modulated symbols, classification of leaves, and nonlinear plant identification.Our algorithm has outperformed the conventional RBF kernel in terms of recognition accuracy and run time in epochs.We have achieved 100% accuracy in case of pattern recognition with faster convergence, for the adaptive symbol detection example the effect of noise is eliminated more efficiently by the proposed RBF kernel and in case of nonlinear plant identification we observed that our kernel converges faster and traces the nonlinear plant output function better than its conventional counterpart.Our work promises to find more interesting applications in other research areas too.

Figure 1 :
Figure 1: Architecture of the radial basis function neural network.

Figure 3 :Figure 4 :
Figure 3: (a) Silhouette plot for Euclidean RBF kernel: negative Silhouette indicates misclassification.(b) Silhouette plot for proposed RBF kernel: large Silhouette widths with no negatives.

Figure 7 :
Figure 7: RBF NN based modeling of a nonlinear plant.

Figure 8 :Figure 9 :
Figure 8: Comparing plant identification error for the two RBF kernels: Euclidean and proposed.
: Nonlinear Plant Identification.Plants are dynamic systems with high complexity and nonlinearity.

Table 1 :
Comparison of the total mean square error.