Improved SpikeProp for Using Particle Swarm Optimization

A spiking neurons network encodes information in the timing of individual spike times. A novel supervised learning rule for SpikeProp is derived to overcome the discontinuities introduced by the spiking thresholding. This algorithm is based on an errorbackpropagation learning rule suited for supervised learning of spiking neurons that use exact spike time coding. The SpikeProp is able to demonstrate the spiking neurons that can perform complex nonlinear classification in fast temporal coding. This study proposes enhancements of SpikeProp learning algorithm for supervised training of spiking networks which can deal with complex patterns. The proposed methods include the SpikeProp particle swarm optimization (PSO) and angle driven dependency learning rate.These methods are presented to SpikeProp network for multilayer learning enhancement and weights optimization. Input and output patterns are encoded as spike trains of precisely timed spikes, and the network learns to transform the input trains into target output trains. With these enhancements, our proposed methods outperformed other conventional neural network architectures.


Introduction
For all three generations of neural networks, the output signals can be continuously altered by variation in synaptic weights (synaptic plasticity).Synaptic plasticity is the basis for learning in all ANNs.As long as there is nonvariant activation function, accurate classification based on certain vector input values can be implemented with the help of a BP learning algorithm like gradient descent [1,2].
Spiking neural network (SNN) utilises individual spikes in time domain to communicate and to perform computation in a manner like what the real neurons actually do [3,4].This method of sending and receiving individual pulses is called pulse coding where information which is transmitted is carried by the pulse rate.Hence, this type of coding permits multiplexing of data [2].
For instance, analysis of visual input in humans requires less than 100 ms for facial recognition.Yet, facial recognition was performed by [5] by using SNN with a minimum of 10 synaptic steps on the retina at the temporal lobe, allowing nearly 10 ms for the neurons to process.Processing time is short, but it is sufficient to permit an averaging procedure which is required by pulse coding [2,5,6].In fact, pulse coding technique is preferred when speed of computation is the issue [5].

Spiking Neural Networks (SNNs).
Neural networks which perform artificial information processing are built using processing units composed of linear or nonlinear processing elements (a sigmoid function is widely used) [6][7][8][9].SNN had remained unexplored for many years because it was considered too complex and too difficult to analyze.Apart from that, biological cortical neurons have long time constants.Inhibition speed can be of the order of several milliseconds, while excitation speed can reach several hundreds of milliseconds.These dynamics can considerably constrain applications that need fine temporal processing [10,11].
Little is known about how information is encoded in time for SNNs.Although it is known that neurons receive and emit spikes, whether neurons encode information using spike rate or precise spike time is still unclear [12].For those supporting the theory of spike rate coding, it is reasonable to approximate the average number of spikes in a neuron with continuous values and consequently process them with traditional processing units (sigmoid, for instance).Therefore, it is not necessary to perform simulations with spikes, as the computation with continuous values is simpler to implement and evaluate [13].
An important landmark study by Maass [14] has shown that SNN can be used as universal approximations of continuous functions.Maass proposed a three-layer SNN (consisting of the input layer, the generalization layer, and the selection layer) to perform unsupervised pattern analysis.Reference [15] applied spiking neural networks to several benchmark datasets (which include internet traffic data, EEG data, XOR problems, 3-bit parity problems, and iris dataset) and performed function approximation and supervised pattern recognition [16].
One of the ongoing issues in SNN research is how the networks can be trained.Much research has been done on biologically inspired local learning rules [13,17], but these rules can only carry out supervised learning for which the networks cannot be trained to perform a given task.Classical neural network research became famous because of the errorbackpropagation learning rule.Due to this, a neural network can be trained to solve a problem which is specified by a representative set of examples.Spiking neural networks use a learning rule called SpikeProp which operates on networks of spiking neurons and uses the exact spike time temporal coding [18].This means that the exact spike time of input and output spikes encodes the input and output values.

Learning in Networks of Spiking Neurons (SpikeProp).
Learning in the perceptron networks is usually performed by a gradient descent method [19] by using the backpropagation algorithm [20], which explicitly evaluates the gradient of an error function.The same approach has been employed in the SpikeProp gradient learning algorithm [18] which learns the desired firing times of the output neurons by adapting the weight parameters in the Spike Response Model SRM0 [21].Several experiments have been carried out on SpikeProp to clarify several burning issues, for example, the role of the parameter initialization and negative weights [22].The performance of the original algorithm can be improved by adding the momentum term [23].SpikeProp can be further enhanced with additional learning rules for synaptic delays, thresholds, and time constants [24], which will normally result in faster convergence and smaller network sizes for the given learning tasks.An essential speedup was achieved by approximating the firing time function using the logistic sigmoid [25].Implementation of SpikeProp algorithm on recurrent network architectures has shown promising results [26].
SpikeProp does not usually allow more than one spike per neuron, which makes it suitable only for "time-to-first-spike" coding scheme [5].Its adaptation mechanism fails for the weights of neurons that do not emit spikes.These difficulties are due to the fact that spike creation or its removal due to weight updates is very discontinuous.ASNA-Prop has been proposed [24] to solve this problem by emulating the feed forward networks of spiking neurons with the discrete-time analog sigmoid networks with local feedback, which is then used for deriving the gradient learning rule.It is possible to estimate the gradient by measuring the fluctuations in the error function in response to the dynamic neuron parameter perturbation [27].
SpikeProp adopts error backpropagation procedures which have been used widely in the training of analog neural networks to perform supervised learning [28].SpikeProp does have weaknesses.The first weakness concerns sensitivity to parameter initialization values, which means that if the neuron is still inactive after initialization, the SpikeProp will not perform training for these weights which will not produce any spike.The second weakness is that SpikeProp is only suitable in cases where there is latency-based coding.The third weakness is that SpikeProp works only for SNNs where neurons spike only once in the simulation time.Finally, SpikeProp algorithm has been designed for training the weights only.To address these weaknesses, several improvements to SpikeProp algorithms have been suggested [29].

Methodology
We introduce new learning rules and enhancement architecture for improved SpikeProp.

Enhancement of SpikeProp Architecture by PSO (PSO-SpikeProp) (Model 1).
In this proposed method, the Spike-Prop was accelerated using four basic parameters of PSO; these parameters are the acceleration constant for  best , the acceleration constants for  best , the time interval (Δ), depending on the time of SpikeProp, and the number of particles used in SpikeProp.
The acceleration constants are used in the simulation to specify swarm behavior of particles. best and  best positions are formed by the constants  1 and  2 .The global best solution over the particle depends on the pulse time of SpikeProp (this is given by the constant  1 ).The individual personal best solution over the particle depends on the pulse time of the SpikeProp (this is given by the constant  2 ).If  1 is more than  2 , the swarm moves around the global best solution.On the other hand, when  2 is greater than  1 , the swarm moves around the individual personal best.
The time Δ of the parameters in SpikeProp defines the time in each pulse interval over each movement that occurs in the solution space in the pools processes.Reducing these parameters in SpikeProp leads to higher granularity movement within the solution space and a higher  time value of the highest pulse in SpikeProp on the node.The higher numbers of particles in the swarm or simulation in SpikeProp form the greater amount of the space that is covered in the problem; hence, the optimization will be fewer.
Depending on the node that has the higher pulse time in SpikeProp solution space, the parameters can be adapted to achieve better optimization.These basic parameters in PSO are used by SpikeProp.There are subparameters which depend on the dataset of the problem (like particle dimension, number of particles, and the stopping condition).The number of particles in SpikeProp using PSO significantly affects the execution time.There is a tradeoff between the size of the practical swarms and the execution time.PSO with a well-selected parameter set can perform well under all circumstances [30].
In this study, the parameters that have been used are summarized in Table 1, while particle position (weight and bias values) is initialized randomly with initial position velocity value set at 0.
Each particle position of the swarm is represented by a set of the weights for the current iteration.The dimension of the practical swarm determines the weight number of the network.In order to minimize the learning error, the particle should move within the weight space.Updating the weight of the network means changing the position in order to reduce the number of iterations.For each iteration, a new velocity calculation takes place to determine the new particle position movement.A set of new weights is used to obtain the new error, thus a new position.For PSO, the new weights are registered even if there is no noticeable improvement.This process applies for all particles.The global best particle position is the one with the least number of errors.The training process stops when the target minimum error is reached or the numbers of computational processes exceed the number of iterations allowable.When the training is complete, the weights are used to compute the classification error for the training patterns.The same patterns are used to test the network by using the same set of weights.
No researcher has yet used SpikeProp on PSO. best value and  best value are applied to solve problems associated with the learning error.The SpikeProp weight and SpikeProp bias are achieved by adding the calculated velocity value as shown in ( 1) and (2).A new set of positions is used to produce the new learning error.In this proposed method, the classification dataset output has been written in a minimum number of iterations with the lowest error.The summary on PSO-SpikeProp learning process is shown in Figure 1.In this proposed technique, the PSO is applied to SpikeProp algorithm (refer to Figure 1).Equation ( 1), the new weight (  new), is performed for each element of the positions of  best and  best .In (2), Δ  () subtracts the dimensional weight of the element from the dimension from the best vector and then multiplies this by a random number (between 0.0 and 1.0) and an acceleration constant ( 1 and  2 ).A number of particles have been used to solve 8 different dataset problems.The objectives of the study are to reduce errors, enhance the learning rate of SpikeProp, and to speed up the algorithm process.Figure 1 shows that particle swarms with initial random rates have different mean squared error (MSE).During the learning process, all particles move together to get  best and  best . best fit is an optimum solution.

Proposed 𝜇 Angle Driven Dependency Learning Rate (Model 2).
The proposed  angle driven dependency learning rate is an extension of Chan's [31] adapted learning rate and momentum during the training used in BP.This proposed method enhances SpikeProp learning according to the angle calculation between Δ() and Δ( − 1).The adaptation adjusts the angle at 90 degrees as per the Pythagoras formula method to get the square of the hypotenuse that equals to the sum of the squares of the other two sides.If the angle is less than 90 degrees, the learning rate is increased inversely, but if the angle is larger than 90 degrees, the learning rate is decreased.These mathematical methods have been applied to enhance SpikeProp, as shown next.
(2) The adaption learning rate cab be calculated by using the flowing formula: (3) Adaption of the momentum can be acquired as follows: (4) As mentioned previously, the weights can be changed as follows: Fortunately, the learning rate is adapted much faster when we are using the modified adaptation rule: Moreover, a backtracking strategy has been used, which reruns learning steps taking more than half learning rate time if total error increases.From this it can be concluded that the learning rate gets improved in Spikeprop with a higher rate than the standard SpikeProp.

Merging Model 1 with Model 2 for Enhancing SpikeProp (Model 3).
In order to get better performance and enhance the operation of Spikeprop, a merging process between Model 1 and Model 2 (resulting in Model 3) has been carried out.Model 3 has an architecture which is partly PSO and partly angle driven dependency learning rate system.The flowchart in Figure 2 shows the general working of Model 3.

Error Measurement
This thesis uses other error functions such as RMSE (9), MAPE (10) and MAD (11) to get more validation to evaluate the accuracy of SpikeProp (SNN) that has been enhanced and proposed (Models 1, 2, and 3).Consider    Heart, Iris, Diabetes, Breast Cancer, Liver, Hepatitis, and XOR, respectively.The RMSE is also better than MAPE, and MAD error measurements for all datasets.These show that the SpikeProp algorithm also demonstrates in a direct way that networks of spiking neurons can carry out complex, nonlinear tasks in a temporal code.As the experiments indicate, the SpikeProp algorithm is able to perform correct classification on nonlinearly separable datasets with all types of errors measurement compared to traditional sigmoidal networks (BP) as shown in Table 3. Table 2 2 and 3.It shows that BP generates higher errors compared to SpikeProp.5 shows the findings of the model, which are based on ten independent runs on both training and testing datasets, respectively.The average testing errors are being calculated along with the standard deviations for all datasets.

Analysis of the Proposed Model 1: PSO-Spikeprop.
As can be seen from Table 5, it is interesting to see the small standard deviations for all error rates on the training set and the testing set in all datasets.The results of this proposed Model 2 have demonstrated that the generalization of MSE is better for Diabetes, Heart, and Wine datasets, while the results for Liver, BTX, Hepatitis, Iris, Breast Cancer, and XOR datasets are the least competitive.The results also have shown that the RMSE is better on all datasets and less competitive for other error measurements except for the Diabetes dataset which has better MSE values compared to RMSE.In general, the diversity of errors rates (MSE, RMSE, MAPE, and MAD) for this proposed Model 2 is considered better for all datasets.According to this model, we can find that the error is less than BP and SpikeProp standard.

Analysis of the Proposed Model 3: Hybridization of Model 1 and Model 2.
Just as we get good strain by cross-breeding two good genes, it may be possible to get good SpikeProp algorithm by merging (hybridizing) two good techniques.Therefore, this paper is concerned about the merging implementation of Model 1 and Model 2 (to get Model 3) as maintained in Figure 2.
In this section, we hybridize Model 1: PSO-SpikeProp and Model 2: SpikeProp with angle driven dependency to obtain a better performance for error rates (MSE, RMSE, MAPE, and MAD).We notice that the performance measurement is better than in the previous proposed method when the hybridization takes place as shown in Table 6.The experiments are based on 10 independent runs for the training and testing for all datasets, respectively.The results have revealed that generalization of error rates in RMSE is better for BTX, Wine, Diabetes, Iris, and Heart datasets and the least competitive for Breast Cancer, Hepatitis, Liver, and XOR datasets.Similarly, the result of MSE error rate has been demonstrated to be less better for Diabetes, Heart, Wine, Iris, Breast Cancer, Liver, Hepatitis, and BTX datasets, respectively.To get better performance, we hybridized Model 1 and Model 2, to get Model 3, to minimize the error to reach the optimum error value.

Analysis and Result for Proposed Methods Based on Error.
In this section, the spiking neural network and SpikeProp use the encoding by depending on the timing of spike, where the first spike has a higher weight than the last one.Since a biological neuron uses few milliseconds to process information data, only few spikes are required and emitted, however the few first spikes with highest value information can contribute to all process learning, as we used Gaussian function for encoding.Figures 3-10 show the convergence of the error and the number of iterations of the Spikeprop standard and the proposed methods for the improved SpikeProp in the classification Liver dataset, Breast Cancer, BTX, Diabetes, Heart, Hepatitis, Iris, and Wine data problems.The SpikeProp standard configuration had a much slower rate of convergence, as can be clearly seen in the plot.Although its rate of progress gradually slow down from the beginning till last iteration but kept in high error in the MSE, despite this slowdown in all data problems as shown in Figures 3-10.

Analysis Error and Iterations of Model 1 (PSO-SpikeProp).
We also see from the convergence that the proposed method for improved SpikeProp is much better than standard SpikeProp as PSO-SpikeProp (Model 1) had dramatic slowdown from the first 10 iterations down to iteration number 40, and afterwards the plot got a slight slowdown in Liver as shown in Figure 3.In Breast Cancer data problem in Figure 4 the curve steps down at the start in error near to 0.5 until 0.38 exactly and then it continued descending to the last iteration at error 0.35 gradually.PSO-SpikeProp (model 1) is the first model from the proposed methods; the error turned down quickly in the first 10 iterations at 1.78 and continued to drop down till the last iteration in the error 1.27 in the BTX data problem as seen in Figure 5. Also in Diabetes data problem in Figure 6 the curve for the error steps down so fast in the first 10 iterations from 0.7 till 0.39 error and then continues to drop in a steady way until iteration number 86 of error 0.24.From the iteration number 87 till the last iteration  it was a little sloped and stopped at the last iteration in error 0.24.
As we can see from Figure 7 in Heart problem the curve in error has dropped in a dramatic manner in the first 15 iterations in error from 0.45 to 0.28, and then the dropping is slowed down till iteration number 67 in error 0.28 and continued steadily till the last iteration for the same error.From the same proposed model in Hepatitis data problem in Figure 8 we can see that the slope of the curve starts at the first 15 iterations in a fast way in error 0.76 to 0.49, and then it slows down till iteration number 38 in error 0.48 and it stays likely stable till the last iteration.In Iris data problem in this model the plot from Figure 9 is stepped down in very fast drop from the first 10 iterations in error 1.105, and then it is zigzagged in error range 0.433 to 0.410 till iteration number 90 and then immediately fell down till the iteration number 112 in error 0.351 and it stayed on it till last iteration.Finally the plot of the error is stepped down in a high manner in the first 15 iterations of error 0.534 to 0.312, and then it stabilized in this error till the last iteration in Wine data problem as shown in Figure 10.From this model we have seen from Figures 3-10 how to obtain the list error from list iteration number from the whole data sets.4.6.2.Analysis Error and Iterations of Model 2. In the second improvement for SpikeProp through the learning rate angle driven dependency (Model 2), we can also notice in Liver data problem as shown in Figure 3 that it has a slight dropping for the first 10 iterations in error around 0.75 and got a huge fall down till iteration number 50 for the error of nearly 0.55 and then it had a slight dropping till last iteration for the error 0.4.In the same model in Breast Cancer problem in Figure 4, the plot started to fall down on the first 20 iterations in error 0.57 until 0.47 gradually.Also the curve in BTX data problem as shown in Figure 5 start dropping down from the first 5 iterations in error 1.73 to 1.72, and it is fluctuated between the 6th iteration and iteration number 197 of the error range 1.73 to 1.31.Afterwards it has a steady drop till iteration 250 at error interval 1.30 to 1.25.From the result in Diabetes problem at the same Model 2 on the plot in Figure 6, the error in first seven iterations has dropped down slowly at error 0.47; after that it stepped down quickly till iteration number 20 in error 0.35 and then continued steadily until the last iteration for error 0.24.In the Heart data problem from Figure 7, the plot is stepped down from first iteration in error 0.45 until last iteration in error 0.36, and in some iterations, it is stable and then it continues descending.Also in Hepatitis data problem as shown in Figure 8, it has a stable error of 0.786 for the first 5 iterations, and then it start to fall down till iteration number 80 for error 0.549; after that it has a slower dropping till last iteration on error 0.523 in a sequence stepping down.
From Figure 9 as we can see clearly the curve is dropped down slowly in first 10 iterations in error 1.111 till 1.040; then it accelerates in dropping till iteration number 55 of error 0.53 and the drop slows down till last iteration in 0.438 error in the Iris data problem.Finally from Figure 10 in Wine data problem the plot shows Model 2 (learning rate angle driven dependency) is almost steady in first 15 iterations in error 0.538 to 0.526, and then the dropping got accelerated continuously till last iteration for error 0.366.

Analysis Error and Iterations of Model 3 (SpikeProp with
Angle Driven Dependency ()).(PSOSpikeProp and Learning Rate Angle Driven Dependency) or (model 3), in Liver dataset as it is clear from Figure 3 the plot it provides the best enhancement result for the convergence of error and number of iteration.The slant starts from the first iteration in the error a bit less than 0.8 till iteration number 20 impressively, and then a gradual descending has been done till last iteration of error close to 0.354.The curve from Figure 5 witnessed a dramatic drop in first 10 iterations starting from error 1.95 and the drop decreases till last iteration at error 0.97.We can notice from the curves in Figure 4 that the last model is much better from the SpikeProp standard and the previous methods of Breast Cancer dataset problem.The error steps down in a steady and fast manner from 4.7.Result and Analysis Comparison of the Proposed Methods in Terms of Accuracy.This section displays the result of SpikeProp standard besides the proposed methods for enhanced SpikeProp measured in terms of accuracy.The experiments are run 10 s, 10 dependent runs on training and testing for all datasets, respectively (refer to Tables 7  and 8 and Figures 11 and 12).As it is shown in Table 7 for training, the first proposed method PSO-SpikeProp (Model 1) is evaluated in terms of accuracy; we can see that we got the value in Breast Cancer better than SpikeProp standard and proposed methods except Model 5. Regarding the BTX dataset problem, it is also better than SpikeProp standard and other proposed methods except Model 3. The generalization of accuracy for the proposed method PSOSpikeProp is better than SpikeProp standard and learning rate angle driven dependency (Model 2) in all datasets.Learning rate angle driven dependency is our proposed method; it is better than SpikeProp standard in all datasets.Finally, Model 3 is merging model (PSO-SpikeProp and Learning Rate Angle Driven Dependency) as illustrated in Figure 11 and Table 7 that it is better in accuracy generalization from all proposed methods and SpikeProp standard in all datasets.

Conclusions
We introduced several extensions to the SpikeProp learning algorithm that make it possible to learn not only the weights, but also the delays and synaptic time constants of the connections and the thresholds of the neurons.Due to these enhancements, smaller network architecture can be used.This is mainly due to the fact that delays can now be trained and need not be enumerated.The simple 8 data sets could be solved with the same precision as the original SpikeProp algorithm, less errors (making the simulation and learning phase of the network much faster), and an increased learning convergence.There are several proposed models needed to improve the performance of SpikeProp further; hybridization of two or more good architectures is carried out (for instance the hybridization of Model 1 and Model 2 to obtain Model 3).The purpose of hybridization is to leverage the best function from each component of the hybrid.As an example, Model 3 is the hybridization of Model 1 which is PSO-SpikeProp (enhancement Spikeprop architecture by PSO) with Model 2 which is SpikeProp enhancement using angle driven dependency learning rate.For Model 3, when the position of search is far from the optimum, PSO is used to directly move the point of search close to the optimum.When the search point is close to the optimum, Model 3 switches over to the system where there is SpikeProp enhancement using angle driven dependency learning rate to reach the optimum position.Also a thorough analysis of the weight initialization problem is required.The convergence rate seems to be pretty sensitive to this.Several techniques used in classic neural networks to speed up backpropagation learning could be added to SpikeProp to further speed up learning.

Figure 11 :Figure 12 :
Figure 11: Results in training of the proposed methods in terms of accuracy.
Functions.The target of the Spike-Prop algorithm is to learn a set of target firing times, denoted by {   }, at the output neurons  ∈  for a given set of input patterns {[  ⋅ ⋅ ⋅  ℎ ]}, where [  ⋅ ⋅ ⋅  ℎ ] defines a single input pattern described by single spike times for each neuron ℎ ∈ .Given the desired spike times {   } and actual firing times {   }, this error function is defined by This section presents the results of study on learning of Spike-Prop network based on the proposed method of improved SpikeProp.The results for all datasets involved are analyzed based on the convergence to MSE, RMSE, MAPE, and MAD with their classification performance.The results of the proposed methods for each dataset are analyzed based on performance (accuracy).For analysis purposes, methods of improved SpikeProp are used to train and optimize the networks, comparing different measurements of error.The results of SpikeProp based on the proposed method of improved SpikeProp are presented in the following subsections.
4.1.Results and Analysis of Standard SpikeProp.This section presents the result of standard SpikeProp for all datasets.The results are analyzed based on the convergence to MSE, RMSE, MAPE and MAD findings with their classification performance as shown in Table 2.All experiments for standard SpikeProp are based on ten runs.From Table 2, RMSE is better than MSE for the training datasets of BTX, Wine,

Table 3
[18]e RMSE is better than MSE for the training datasets of Diabetes, Heart, Breast Cancer, Liver, Hepatitis, Wine, Iris, BTX, and XOR, respectively.The MSE is also better than MAPE and MAD for all datasets except BTX, which shows that MAD values are less compared to MSE.This traditional BP network is widely used by other researchers for classification problems.Bohte et al.[18]designed SpikeProp for BP learning strategy; the comparisons between BP and SpikeProp are given in Tables

Table 4
SpikeProp gives the smallest error in MAPE compared to standard Spikeprop.However, PSO-SpikeProp stands out to be better if RMSE is being compared.Since the errors are squared before they are averaged, the RMSE gives a relatively low error rates, although the MSE has close values to RMSE as shown in Table4.Therefore, Model 1: PSO-SpikeProp has its own good characteristics in generating

Table 2 :
Analysis for standard SpikeProp algorithm.

Table 3 :
Analysis for standard BP algorithm.