Coincidence Detection Using Spiking Neurons with Application to Face Recognition

τ s is used as learning parameter in representing the variations learned from a set of training data at classifier level.This classifier uses coincidence detection (CD) strategy trained in supervised manner using a novel supervised learning method called τ s Prediction which adjusts the precise timing of output spikes towards the desired spike timing through iterative adaptation of τ s . This paper also discusses the approximation of spike timing in Spike Response Model (SRM) for the purpose of coincidence detection. This process significantly speeds up the whole process of learning and classification. Performance evaluations with face datasets such as AR, FERET, JAFFE, and CK+ datasets show that the proposed method delivers better face classification performance than the network trained with Supervised Synaptic-Time Dependent Plasticity (STDP). We also found that the proposed method delivers better classification accuracy than k nearest neighbor, ensembles of kNN, and Support VectorMachines. Evaluation on several types of spike codings also reveals that latency coding delivers the best result for face classification as well as for classification of other multivariate datasets.


Introduction
Donald Hebb first proposed that if the synapses between two neurons effectively cooperate in an activity then the synaptic efficacy of the synapse would be strengthened.Since the cooperativeness between these neurons would be more effective if it happens within a specific period of time, the idea of "Hebbian Plasticity" could also be considered as a form of coincidence detection or neuronal synchronization between the inputs of the two neurons.Previous studies show that thalamic synchronization has significant impact on cortical responsiveness and suggest that coincidence detection plays a critical role in the sensory information transmission between different brain's regions [1] as well as in phosphoinositide signaling [2].Subsequently, the resulting Long-Term Potentiation (LTP), a phenomenon in which synaptic strength is enhanced following bursts of synaptic activity, is vital for learning and memory [3].
In this paper, we discuss two ways of learning and classification by coincidence detections, namely, (1) learning by weight adaptation in the form of Supervised STDP and (2) learning by synaptic time constant adaptation in the form of a novel approach called   Prediction.These two strategies are both based on Hebbian plasticity but their implementations are quite different.Here supervised learning rules are used to form the necessary synaptic weights or synaptic time constant that represent the training data and then the trained network is used for classification.
In learning stage, the network is presented with set of positive (negative) samples, and the network will be allowed certain amount of time to fire spikes.If the designated neurons fail to fire spikes, the weights or synaptic time constant will be adjusted accordingly based on the desired spike timing.This strategy would result in higher (lower) weights or synaptic time constants for neurons that barely (easily) fire.In face identification, similar faces compete more than dissimilar faces; thus this strategy would enforce stricter conditions for spike firing on neurons dedicated to facial regions with high degree of similarity.This is done by imposing smaller weights or synaptic time constant on the synaptic 2 Journal of Applied Mathematics connections.The restrictions would ensure that only highly similar facial regions would cause firing in output neuron.
On the other hand, the process would also impose looser restrictions on dissimilar facial regions.Coincidence detection strategy in practice would cause neurons connected to similar faces to fire more easily and vice versa; thus, based on coincidence detection and input synchrony point of view, it can be hypothesized that if the synaptic connection between presynaptic input pair of similar face requires larger weight or larger synaptic time constant to facilitate output spike firing, it signifies that the input pairs are connected to the facial area which possesses smaller discriminative capacity for face recognition.
Implicitly, to realize this hypothesis, it is thus assumed that (1) human beings exhibit similar intrapersonal variations, and thus (2) the learned weights and synaptic time constant from set of generic face samples could represent the actual variations that might appear on the unseen face samples.

Related Works
In this section we review briefly the idea of coincidence detection as proposed by Maass [4].Then we highlight several supervised spiking neural network (SNN) learning methods.Subsequently, we discuss the basics of Spike Response Model (SRM) used in this paper.Finally, we take a closer look on multilayer supervised learning algorithm based on Synaptic-Time-Dependent Plasticity (STDP) approach proposed by Sporea and Grüning [5].

Coincidence Detection Overview.
Spiking neurons can act as a coincidence detector for incoming input pulses by relaying the synchronized synaptic inputs and exact timing of spikes [4,6,7].Studies of the somatosensory [8] and visual systems [9] suggested that neuronal synchronization is critical in transmitting sensory information.This relies on the fact that synchronous inputs signals are more effective in producing higher firing rates of output spikes than asynchronous inputs signals.
Assuming the input pulses received are encoding some set of numbers, the coincidence detection can determine whether some of these numbers have equal or almost equal values.This operation if carried out on more traditional type of ANN is actually very expensive [4].By the description of the basic idea of coincidence detection by spiking neurons [4], an output neuron  would not fire if the input neurons  fire at a temporal distance of ≥  2 but it will fire when the input neurons  fire at a temporal distance of ≤  1 .If a set of  input neuron  is used to encode  real numbers  1 ,  2 , . . .,   , the firing patterns {0 or 1} of output neuron  denoted as   can be used to decode the input.For example, two cases, namely,  4 = 0 and  4 = 1, are shown in Figure 1.
Recent work showed that a simple SNN model constructed by integrate-and-fire neurons and single coincidencedetector neuron can precisely read out subthreshold noisy signal [10].The authors highlight that the two important parameters that will determine reliability and precision of the coincidence detection of the input pulses are the detection time window which can be manipulated by   and the threshold.They suggest that it is possible to obtain as much as 100% reliability of the outputs by having an optimal pair of the detection time window and the threshold.

Supervised
Learning Methods for SNN.Supervised learning for SNN is usually performed based on the traditional gradient descent techniques.However, due to nature of spiking neurons timing, some modifications or special methods are introduced for dealing specifically with this temporal adaptation problem.Some popular methods of supervised learning in SNN includes learning method designed specifically for SNN based on the gradient descent by backpropagation of error called SpikeProp [11,12], and Remote Supervised Method (ReSuMe) [13].Later, Sporea and Grüning [5] extend ReSuMe into multilayer supervised learning algorithm based on STDP where results of benchmarks on XOR problem and Iris dataset reveal successful implementation of the algorithm as well as the flexibility of this learning rule to learn different spike coding and timing patterns.
Recently an SNN learning rule called Chronotron is proposed by Florian [14].Xu et al. [15] then proposed a supervised multispike learning rule with temporal coding based on gradient descent that aims to solve the problem of error function construction and interference among multiple output spikes during learning.Another learning rule, called Spike Pattern Association (SPAN) [16], is based on Widrow-Hoff learning rule and temporal coding that can associate multiple spatiotemporal spike patterns to desired output spike pattern.Other methods also include statistical method [17,18], linear algebra method [19], evolutionary method [20], and Analog Spiking Neuron Approximation Backpropagation (ASNAProp) [21].

Spike Response Model.
The formulation of spiking neuron behavior in SNN implementation described by integrate and fire model can be further simplified and represented by SRM [22].Let   be the weight between postsynaptic neuron  and presynaptic neuron ,   is the synaptic time constant,  rec is recovery time constant,  ext is the external current,    is the time of presynaptic spikes, and t is time of output spike, while ,   , and   are kernels, () is Dirac delta function, and  =  −    .According to [22] the state of membrane potential   () can be computed from (1) where the kernel   is an alpha function computed from (2): In SRM, each incoming spike from neuron  at time    will perturb   to produce presynaptic potentials (PSP) and the time course of   as a result of the perturbation is defined by the kernel  0 .If after the summation of PSPs   reaches the threshold V, output spike at time t is therefore triggered.The form of the spike and the after-spike potential is described by kernel .Then the zero order SRM can be constructed by neglecting the dependence of and upon the  − t argument, so the kernels   and   are set so that  0 () =   (∞, ) and  0 () =   (∞, ), respectively.Assuming that there is no external current discharged into the neuron, we let  ext = 0, so now (1) becomes: Therefore each presynaptic spike evokes a PSP with the same time course, independent of the index  of the presynaptic neuron and independent of the last firing time t of the postsynaptic neuron.Thus the synaptic efficacies   and   are the parameters that are responsible to scale the amplitude of the PSPs and their "effective time interval, " respectively.
2.4.Supervised STDP Learning Method.Using neurons described by SRM model in fully connected feed-forward SNN with single hidden layer, this learning rule is based on backpropagation of error [5].The error is defined as the difference between the actual firing rate and target firing rate for all neurons.It is similar to the standard backpropagation in discrete time but derived as a functional derivative in continuous time.Assuming that the neuron has only single spike train, according to the STDP learning [22][23][24][25], the weight change between output and hidden neurons Δ ℎ and between  ℎ hidden neurons and   input neurons Δ ℎ can be described as where  + > 0 is the amplitude,  + > 0 is the time constant,  ℎ is hidden neuron firing time,   is input neuron firing time,   is output neuron firing time, and   is target neuron firing time.Note that the weight modification rules do not depend on the specific dynamics of the neuron model but only depend on the target, output, and input firing time thus making it applicable to any neuron model.

Detailed Description on Coincidence Detection
To  (2) or none at all at the CD neuron.Since  is effectively determined by the spatiotemporal pattern of the input, then the objective of a CD neuron would be to facilitate an output spike at only certain range of  and depress firing at other instances of .
However, in order to achieve the specified objective of coincidence detection, behaviors of learning parameters such as   , , and V need to be closely examined.Descriptions on a few possible coincidence detection outcomes of presynaptic inputs Thus, the PSP 1,max can be denoted as For a CD neuron receiving two presynaptic spikes at a time, to ensure appropriate firing while avoiding firing facilitated by only single presynaptic spike, the proper selection of threshold should follow ∑  ≥ V > .It is assumed that there exists a minimum required synaptic time constant  ,min that would cause a CD neuron to fire a spike.For any value of   larger than  ,min it would definitely cause the CD neuron to fire, but with a larger delay.This can be summarized in There are two distinctive regions which can be defined for the firing behavior of a CD neuron.These two regions are (1) firing region (FG) and (2) nonfiring region (NFG).These regions are illustrated in Figure 3.According to Figure 3, there is 100% probability that any PSP that are strong enough to cause a CD neuron to fire would reach the threshold in FG region, while, in NFG, any neuron that failed to fire in FG region would absolutely not be able to fire in NFG region.The two NFG regions are discontinued by an FG region, where   −   2 signifies the maximum time interval of FG region.The boundaries for these distinctive regions are given in the following equation, where we defined   =   + Consider if threshold V is set at the highest possible value such that V = 2, for this threshold, only PSP resulting from two presynaptic spikes fired at the same time can reach it, where Based on (3), assuming that the CD neuron is allowed to fire only once, the refractory kernel ( − t ) can be set to be fixed at  0 ≤ 0. Let  = ( −   )/  and  0 = 0; since  would yield a lower membrane potential, thus the maximum achievable membrane potential for this coincidence neuron would happen before or precisely at   +

Proposed Method
4.1.Output Spike Time Prediction.One major problem of any spiking neuron model is the processing time taken to evaluate the exact level of membrane potential prior to any spike triggering.According to Makino [26] it is difficult to predict firing for a complex neuron such as SRM model since it involves delayed-firing and causality.Furthermore, approximation methods such as firing time prediction can be inexact while exact simulation is limited to simple models [27].Makino [26] proposes an event-driven SRM using incremental partitioning method which uses linear envelopes of the state variable of a neuron to partition the simulated time.This would cause the firing time to be reliably calculated by implementing the bisection-combined Newton-Rhapson method to each resulting partition.
In discrete-time approach, the system needs to compute accurately the level of membrane potential in each discrete time and also to update the state variables for precise output spike timing.If the numbers of discrete intervals  (which is also known as sampling rate,   in continuous-to-discrete signal conversion) between specific period of time Δ are large, then the evaluations would take considerably large amount of time.One way to overcome this problem is by reducing the number of  but this would reduce the precision of the resulting output spike timing.Here we propose an approximation method called Output Spike Time Prediction (OSTP) SRM to solve this problem.
Consider the simplified SRM model in (3) where the output neuron  is assumed to receive two presynaptic inputs from two presynaptic neurons  = 1 and  = 2 at two different times Then, let  = . Thus, we can write (10) as (11).For simplification of notation, let  = exp(1)[1 + exp(/  )] and  = [(/  ) ⋅ exp(1 + (/  ))] so that the OSTP equation can be described in (12).For complete calculation readers are referred to Appendix B. Then, using the numerical approximation as discussed previously to solve for  1 , the estimated time of spike  est can be obtained using (13).For   1 = 1 ms,   2 = 3 ms, and   = 5,  1 = [0 5] plot of OSTP equation in (12) with 10 equally spaced values of the term ( −  0 )/ from 0 until 3 is illustrated in Figure 4: From Figure 4, several lines manage to cross 0 while others do not, which indicates that no solution of  1 could be obtained.The lines that cross 0 (blue lines) are drawn from variables that would produce spike firings while others that do not cross (red-dashed lines) are drawn from variables that would not produce any spike firing.Thus, the precise spike firing time can be approximated given that the numbers of discrete intervals between the boundaries of  1 are sufficient to provide precise estimation.Similar to discrete estimation case, OSTP relies on discrete intervals  OSTP between Δ 1 in order to produce accurate estimation.However, discrete intervals between Δ 1 in OSTP do not have significant effect on the overall processing time unlike discrete-time which will be seen later.

Output of Coincidence Detection Network.
As explained earlier, any sufficiently close pair of presynaptic inputs would produce output spikes.However, for reliable and accurate output spikes, two additional spikes or "cues" called the start cue and end cue are added for each output spike train.These two cues indicate the start and end of CD neuron simulation.Without start cue, the firing delay of each occurring spike cannot be accurately determined.The significance of the start cue has been discussed at length in literatures [11,28].The time of start cue is kept when the first presynaptic spike arrives at CD neuron.In contrast, without end cue, some meaningful information encoded by presynaptic spikes that are unable to evoke any output spike would be lost and this would affect the spike coding accuracy.
Another important element which is particularly important for STDP supervised learning is "imaginary spike." It is important to mention that imaginary spike is not used for classification; however it is used to compute error signal for supervised adjustment of the weight.This imaginary spike indicates the end of a designated time  deg if a presynaptic input fails to produce a spike within certain period.As shown earlier, the maximum time for presynaptic input pair to attain the highest membrane level and evoke a spike is   =   +   2 so the designated time  deg for a spike to fire is set slightly larger than   , where Note that the presynaptic inputs only maintain the temporal integrity of the input, while the spatial integrity is embedded into the spatial location L of the presynaptic neuron.To achieve spatiotemporally reliable output spike pattern, it is vital to sequence the presynaptic inputs.This process is carried out by allowing only single pair of presynaptic inputs from neurons sharing the same spatial location to take part in evoking output spike at one time.After an output or imaginary spike is produced, another pair from next location is allowed to take part in evoking spike, if any.This process is repeated until the end of simulation; that is, all presynaptic inputs have taken part in evoking output spike.
In this implementation, the computation of exact membrane potential and the term ( − t ) would slow the whole network down.Thus, for simplicity, the current level of membrane potential   (for spiking case) and the membrane reset term ( − t ) (for nonspiking case) just before next presynaptic input pair takes part in evoking the output spike are changed to membrane reset constants   =  no spike and   =  spike , respectively, with respect to (10).This is actually feasible assuming that both spiking and nonspiking cases of coincidence detection are producing spikes (actual and imaginary spikes).The output spike train of a CD neuron using continuous SRM, discrete-time SRM, and OSTP SRM in a CD classification network accepting 9 presynaptic input pairs is shown in Figure 5, where the parameters used are  = 5,   = 3 ms, V = 7 mV,  no spike = 2 mV, and  spike = −2 mV.Based on Figure 5 each bar represents the precise timing of output spikes (blue), imaginary output spikes (red), cues (dashed black), sequenced presynaptic inputs (green), and original presynaptic inputs (purple).The resulting output spikes for these 3 approaches are exactly the same.Imaginary output spikes are needed specifically to train the network by Supervised STDP approach and are not used for classification.There are 2 cues (start and end (cooccurring at the end of output spike)), and 7 spikes and 2 imaginary spikes are produced by the CD neuron.
The numbers of actual spikes are generally less than the number of presynaptic input pairs; however the number of total spikes (actual + imaginary) would be equal to the number of presynaptic input pairs.Note that the original presynaptic inputs do not have spatial integrity intact since they share several instances of similar presynaptic spike time and tend to clutter together in temporal neural network.
The temporal sequencing of input makes use of spatial locations by temporally rearranging the presynaptic inputs.Furthermore, it can be observed that the OSTP SRM implementation produces identical spikes as discrete-time SRM.

Learning Process for Coincidence Detection.
We propose a novel learning approach called   Prediction, which is carried out by approximating the required synaptic time constant to produce an output spike.Unlike Supervised STDP in [5], this learning process only requires positive (+1) class sample.
The main objective of   is much simpler than Supervised STDP, that is, to evoke small-delayed spike in matching sample.Additionally, this process is assumed to be able to implicitly depress the spike or delay it longer for nonmatching sample, given that the temporal distances between the input pairs are greater than a certain range.Since changes in   could facilitate, depress, or delay the output spike, using as   learning parameter should allow this type of training in coincidence detection spiking neural network.
Assume that the training process needs to approximate the value of   that would allow a pair of inputs   1 and   2 from matching face to evoke a spike at CD neuron output at desired time  =   .By letting  = 1/  , from (10) we can have (15).For complete calculation readers are referred to Appendix C.Then,  can be solved by finding the zero crossing using the numerical approximation discussed earlier.After that, we can compute the approximation of synaptic time constant  ,approx using (16).Finally, the change in synaptic time constant Δ ,, for each pixel  belongs to local patch  such that  ∈  can be computed using (17): where  is total number of training subjects (classes),  + is positive sample per training subject, and  + is learning constant.Since the teacher signal   =   2 +   depends on the current state of   , it would evolve along the training process allowing it to have dynamic behavior.Additionally, there are instances of function in (15) where the plot does not produce any zero crossing, similar to no-solution case shown in Figure 5. Therefore, the approximated   can take the value of  simply as a resulting error from sensitivity (true positive rate) which is given as where, for   Prediction, the spike errors  TP =   −   and  FN =   −   .Thus, from (18) the total spike error becomes 4.4.Face Classification Using Coincidence Detection.In order to apply coincidence detection as a classifier, the CD neurons are used as output neurons in a 2-layer feed-forward neural network.Using local ensemble strategy for face recognition employed in [29,30], each CD neuron is attached to each local patch.Thus the number of CD neurons in the classifier network would be equivalent to total number of local patches .The number of inputs neuron however depends on the dimension,  of the local patch.Since this coincidence detection would evaluate the synchronization between a gallery and a probe input, each CD neuron would have presynaptic neurons and connections of 2.The detailed network connections and elements comprising single CD neuron as output neuron are shown in Figure 6.
The output spike train for each local patch   from the CD neuron will then be fed to a summation stage where the outputs of all CD neurons in the network will then be evaluated to produce vectors called Non-Coincidence Factor (NCF),   for each local patch.This NCF describes degree of coincidence between an input pair where smaller values of   would indicate higher coincidence between the inputs thus higher matching probability and vice versa.Different spike codings can be used to interpret the output spike trains and,

Latency
Rate RO TTFS Spike codings Delay (ms) or spike count (f)  here in this paper, the performance of several spike codings, namely, latency, rate by spike count (conveniently denoted simply as "rate" afterwards), rank order (RO), and time to first spike (TTFS) is investigated.Equations ( 20) to ( 23) describe several spike codings used and the associated NCF,   as follows.
Latency.Consider the following: Rate (Spike Count).Consider the following: RO. Consider the following: TTFS.Consider the following: At the summation stage, the NCF obtained from different spike codings would indicate the gallery image which has the highest probability to be the correct match of the probe.For a more detailed description on the classification model, the delay and spike firing counts of a CD neuron's firings caused by classification of a probe image's patch of dimension,  = 49 with 10 lateral gallery images' patches are shown in Figure 7.Note that each bar for each spike coding case represents a gallery image patch, and the first gallery in each case is the actual match for the probe.Based on the figure, we can observe that, for latency, RO, and TTFS coding, lower delay in CD neurons' firing signifies higher coincidence and hence higher probability of correct match, while, for  spike interpretation by rate coding, the firing delay is totally insignificant but higher rate of spike firings would indicate higher coincidence between the probe and the gallery image.Note also in the figure that a misclassification occurs in TTFS case, where the 9th gallery image patch is found to be having the highest coincidence.
Aside from applying the proper spike codings to interpret the output spike trains, two other processes are also carried out at the summation stage in order to add discriminative influence on the final classification.Firstly, after the spike codings are applied, local ensemble strategies are adopted to locally classify the resulting   and then   are normalized to find the local confidence vectors   = log(arg min{  } + 2)/ log(  + 2).Secondly, these confidence vectors will be weighted using discriminants denoted as   acquired from the learnt synaptic time constants  ,, computed using ,, where  ∈ .
Here, the discriminants are then normalized to ensure that   would take the values between [0 1].Consider a probe  needing to be matched to gallery  with each local patch dimension  = 2 using a CD classification network with fully trained   , the total weighted confidence  *  of probe  as belonging to gallery  can be obtained from (25).For illustration purpose, the whole classification network is shown in Figure 8:

Experimental Results and Discussions
In this section, we conduct a test to validate and evaluate the accuracy of OSTP approximation and its efficiency.Then we compare the performance of coincidence detection trained by   Prediction against coincidence detection trained by supervised STDP.Subsequently, we investigate the performance of several spike codings used in our proposed coincidence detection.Then using Principal Component Analysis (PCA) and Gabor features, we assess the performance of the proposed CD classifier against several types of classifiers, namely,  nearest neighbor classifier (kNN), ensembles of kNN classifier (soft kNN) [30], Support Vector Machine (SVM), and ensembles of SVM classifiers (soft SVM) (inspired by [30]).We adopt Single Sample per Person (SSPP) face recognition (for review, see [31]), where only single image per person is used as gallery.Four publicly available datasets are used for the experiments, namely, AR, JAFFE, FERET, and CK+ datasets.The AR dataset [32] contains frontal images of 76 males and 60 females with several types of variations such as different illumination conditions, expressions, and partial occlusions.Images were taken in two sessions (S1 and S2) with 13 images per session.We use only 8 expressionvariant images (neutral, smile, angry, and scream) and 4 partially occluded images (sunglasses and scarf) from both sessions.JAFFE dataset [33] contains 212 expression-variant images from 10 female Japanese subjects.There are 7 types of expressions in this dataset and each subject portrays at least 3 images for each expression.FERET dataset [34] consists of 13,539 facial images corresponding to 1,565 subjects, which are diverse across ethnicity, gender, and age.Two subsets were used, namely,  and  following the standard FERET evaluation protocol [34].Subset , containing 1,196 frontal images of 1,196 subjects, was specifically used as gallery, while  (1,195 expression-variant images) was used as probes.CK+ dataset [35] contains 523 sequences from 123 subjects portraying seven basic expressions (happiness, sadness, surprise, anger, disgust, fear, and contempt).Examples of images from AR, JAFFE, FERET, and CK+ datasets are shown in Figure 9.
As standard preprocessing step, all images used are aligned and resized to 84 × 84 pixels.Histogram equalization is applied on all images except for images with scarves and sunglasses in AR dataset.This is due to too much irregularity caused by the histogram equalization process on the occluded parts of the image (i.e., the scarves and sunglasses), consistent with the suggestion in [30].Each image is partitioned into 144 square local patches of 7 × 7 scanning window.This will result into total dimension of 7056 pixels per image and the vector dimension of  = 49 per local patch.Hence the number of afferents that connect input neurons pair to CD neuron is equal to 7056.Each afferent is assumed to be representing a pair of neurons connected to inputs at spatial location L.

OSTP Performance Analysis.
A test is carried out to determine the accuracy and speed of the OSTP approximation by comparing  est with the exact spike firing time  exc obtained from computation of discrete-time SRM in coincidence neuron.Consider that the sampling frequency or discrete intervals  for discrete-time SRM and OSTP SRM is measured for each 1 ms and Δ 1 = 1, respectively; the evaluation of discrete-time SRM uses  disc = 1 while OSTP SRM uses  OSTP = 100.The error  for this test is defined as  = |⌈ est ⌉ −  exc |, where ⌈⌉ denotes the ceiling process.The test is performed on 191000 neurons, having different values of threshold, synaptic time constant, and weights.It is found that the approximation is correct 99.6% of the time and the processing time taken for OSTP is 9.2454 seconds while discrete-time SRM took 498.7757seconds.This yields  a significant reduction in processing speed by more than 98% of the discrete-time SRM's.
In order to investigate further on the effect of discrete intervals  OSTP and  disc on the performance of the spike firing time approximation, the accuracy, Mean Squared Error (MSE), processing speed, and number of floating points operations of OSTP and discrete-time SRM are compared for different values of  OSTP and  disc , ranging between 1 and 10000.This test uses  = 20500 pairs of input neurons with different combinations of presynaptic inputs, thresholds, synaptic time constants, and weights.The accuracy and MSE of both OSTP and discrete-time SRM is compared with the exact spike timing  exc , where  exc is obtained by discrete-time using  disc = 1000.The approximated spike firing accuracy is calculated based on the following equation while MSE is computed from MSE = (1/)(∑  =1 ( est −  exc ) 2 ): The number of floating points operations is computed as the total numerical operations from the start to the end of the test.The results on the effect of discrete intervals' size on the performance of OSTP and discrete-time SRM are shown in Figure 10.
According to Figure 10, OSTP produces comparable optimal accuracy and MSE to discrete-time SRM at  OSTP = 100, while consuming minimal processing speed and constant floating points operations.Discrete-time operation on the other hand, even though producing good accuracy and low MSE, consumes exponentially increasing processing speed and floating points operations with respect to  disc .From Figure 10(b), for all tested , an average of 12.82 seconds is required by OSTP to produce all spikes as opposed to 18.75 × 10 4 seconds required by discrete-time SRM.As a matter of fact, according to Figure 10(b), at  OSTP =  disc = 100, the processing speed achieved by OSTP is more than 99% faster than discrete-time SRM while producing comparable accuracy of spike timing as indicated by Figure 10(a).These results highlight the efficiency and performance of the proposed OSTP.

Face Recognition Performance of 𝜏 𝑠 Prediction and Supervised STDP.
In order to examine the recognition accuracy of CD classifier trained with Supervised STDP and   Prediction, an experiment is conducted.Following the recommendation by Nordlie et al. [36], a tabular description of experimental setup is given in Table 1.
Each dataset is randomly split into two groups, where each group has half of total number of subjects available.Each split follows 2-fold cross validation method, where each group is interchangeably used as training and then tested once, and after that the average recognition accuracy is taken.This random split is repeated 10 times and the final average accuracy for both training and test along with the standard deviation is recorded in Table 2.For this particular experiment, the spike coding used to interpret the output spikes of CD neuron is latency coding.The Baseline accuracy in Table 2 is obtained from kNN approach.
Based on result presented in Table 2, CD classifier using either Supervised STDP or   Prediction on average delivers better recognition accuracy than the Baseline approach.For the test sets, CD classifier with Supervised STDP delivers average recognition accuracy of 95.56%, while   prediction is at 96.31% where both are more than 20% better than Baseline accuracy.In terms of performance between training and test samples, their performances are comparable, signifying that no overfitting occurs during training process.On average,   Prediction performs slightly better than Supervised STDP by just around 1% difference in recognition accuracy.

Convergence Analysis.
Since both learning methods are iterative algorithms, their performances with respect to different number of iterations need to be examined.Using AR Scream S1 and Scarf S1, the average recognition accuracy of the training set as the iteration grows for Supervised STDP and   Prediction is shown in Figures 11(a) and 11(b), respectively.Similarly the corresponding spike error  with  = 2 is also presented in Figure 11(c).According to Figure 11(b), the latency coding converges to minimum recognition error immediately at epoch = 1, while RO and TTFS coding converges at epoch > 10.Special case of convergence is observed for rate coding since it converges to minimum after epoch = 30 and produces lowest error  rate = 0.However, the performance of test set for rate coding is not as high, in which we found that test error of  rate = 0.10 is obtained (not shown in Figure 11).This indicates an overfitting case for rate coding, which also signifies that, in rate coding, large data with large dimension would require a very large number of spikes to reliably distinguish each individual class.Thus it is recommended to use epoch = 1  To investigate the convergence of both methods further, evolution of output spikes from a CD neuron receiving inputs from 2 populations of neurons sharing similar spatial location as reference and test during Supervised STDP Learning and   Prediction learning process is shown in Figure 12.The 2 populations of neurons are acquired from images belonging to the same subject (matching samples).Based on Figure 12, both methods start the training by producing only small number of spikes at epoch 0. However, Supervised STDP learning requires more epochs (30 to 35 epochs) before the output spikes stabilize while   Prediction learning only requires 5 to 15 epochs in order to do so.Based on the convergence alone,   Prediction would be favorable since it trains and converges faster than Supervised STDP. to the initial objective of enforcing stricter conditions for spike firing on neurons attached to facial regions with high degree of similarity.This would ensure that only highly similar facial regions would cause firing in output neuron.The distribution of fully trained   is shown in Figure 13.

Discriminants from
According to Figure 13, facial regions with low discrimination such as mouth in scream set receives higher values of fully trained   which signifies lower importance to final classification.This is used in accordance to feature selection strategy by locally rewarding or penalizing each local NCF obtained from CD classifier based on the computed discriminants   .

Performance Comparison of Different Spike Codings.
From results presented earlier, in terms of recognition accuracy and convergence, one learning method stands out from the other.  Prediction delivers better recognition accuracy than Supervised STDP and trains faster too.Furthermore, there is a limitation on type of spike coding that could be used by Supervised STDP, where the rank-order coding does not deliver acceptable result.Thus, next analysis on performance of CD classifier using different spike codings to interpret the output spike is based only on CD classifier trained using   Prediction.Using similar experimental settings described in Section 5.2, the recognition accuracy is recorded in Table 3.
According to Table 3, on average, latency coding produces the best test result on test samples with 96.31% accuracy, followed by rate coding with 94.67% accuracy, RO coding with 94.30% accuracy, and TTFS coding with 93.42% accuracy.RO coding particularly works slightly better than latency coding in AR Scarf S1, where it produces 95.90% accuracy as opposed to 95.30% produced by latency coding.On the other hand, considering that TTFS only uses the first output spike from each CD neuron, it delivers quite an impressive result, on average only lacks around 3% accuracy compared to latency coding.
In addition, in order to closely examine the interpretation of each spike coding on output spike distribution for both matching samples and nonmatching samples, 4 images of 2 subjects from AR Scream S1 and Scarf S1 are used.Each pair constitutes to 2 matching samples and 2 nonmatching samples, with each pair of image from the 4 different samples classified by fully trained CD classifier and the input and output spike patterns are recorded.The input patterns and output spike interpretations of different spike codings are given in Figure 14.The codings are applied to outputs from each local population of input neurons (i.e., 144 local patches).
From Figure 14, the output spikes delays rely heavily on the coincidence of the presynaptic input sequence where matching samples appear to produce lower-delay spikes and vice versa.The rate of output spike firing is also higher in matching sample while slightly lower in nonmatching sample.In the plot, upper face parts are bound to neurons' afferents at  lower location, while lower face parts are attached to neurons' afferents at higher location.Note that, in both matching and nonmatching samples, the output spike delays and the spike counts are quite the same for upper afferent.However, at lower afferents, significant changes in delays and spike counts can be observed between matching and nonmatching samples.Nonmatching samples produce less spike counts and higher delay than matching sample at lower afferents.Since stricter condition is imposed on upper face part, it is much harder to evoke output spikes when the inputs actually belong to different subjects.

Results on Face Recognition
Using PCA and Gabor Features.For the final experiment, we investigate the performance of our proposed CD classifier against several widely used classifiers.We use two popular feature representation approaches, namely, PCA and Gabor Wavelets, to represent the face.The PCA implementation follows Locally Lateral Subspace (LLS) strategy employed in [29] where the retained PCA features per local patch are 8. Local Gabor features on the other hand were acquired using approach adopted in [37] and the resulting Gabor features per local patches were further downsampled by a factor of 3. Soft kNN follows the approach detailed in [30] while the soft SVM implementation follows the similar sum aggregation of ensembles of classifiers adopted by soft kNN [30].SVM implementation uses LibSVM library with RBF kernel [38].For AR, JAFFE, and FERET datasets, the trained CD classifier acquired in Section 5.2 was used, while, for CK+ dataset, 123 images at the beginning of first sequence are used as gallery while 577 peak images from each sequence are used as probe in training.Then, 4 most expressive images from each sequence resulting into a total of 2290 test images were used as test samples.The result of this experiment is given in Table 4.
According to Table 4, for PCA representation, CD classifier delivers the best result for all tested datasets, while, for Gabor representation, CD classifier gives best recognition accuracy except for CK+ dataset.On average, CD classifier is more than 5% and 11% better than soft kNN and soft SVM, respectively, in PCA representation.For Gabor representation, CD classifier is 2% and 13% better than soft kNN and soft SVM, respectively.The reason why the advantage of using CD classifier is more apparent in PCA representation rather than Gabor is due to the robustness of Gabor features against small spatial perturbations thus increasing the discriminations of facial features, while, in PCA, the noise due to variations   According to results presented in Table 5, CD classifier with latency coding produces slightly superior result, which is around 2% better than kNN and SVM approach in all datasets, even though the local discrimination is not applicable since the variations within the data are not as generic as the variations found in face image and there is no clear indication on how to locally divide each piece of data into locally lateral vectors.Even if the division was done by assuming single element of the vector as a local vector, we found that no further improvements in classification can be achieved.Furthermore, variations are more random and even though the discrimination can be computed, the learnt discriminants of training data would not be able to faithfully represent the variations in the test set.
Additionally, from this experiment it is found that, for classification of multivariate data, latency coding works best with the average accuracy being 18%, 2%, and 33% better than rate, RO, and TTFS, respectively.The reason behind the inferiority of rate coding is the limitation on maximum encoding capacity,  max = 2  of rate of firings due to relatively small number of variables when compared against the number of the samples; that is, for iris dataset only  = 4 different variables were available for classification of 120 samples.In contrast, significantly better results achieved by rate coding on Wisconsin and Statlog rather than Iris dataset are due to higher number of variables, which are 32 and 36 variables, respectively, thus increasing the rate-offiring's maximum encoding capacity.Meanwhile, RO coding is just slightly inferior to latency coding.On the other hand, worst average performance is produced by TTFS coding since it failed to capture the underlying similarities between the probe and the gallery due to only one spike per CD neuron (first spike) being considered in this type of coding.

Conclusions and Future Works
In this paper, a classifier based on SNN is proposed, namely, coincidence detection (CD) classifier, where two learning methods used to train CD classifier are also presented.A method of optimizing the discrete-time Spike Response Model (SRM) by predicting the output spike time is also discussed in details.We found that our proposed Output Spike Time Prediction (OSTP) method can produce output spike pattern from input pair identical to discrete SRM but with significantly lower floating operations and much faster processing time, with an average of 12.82 seconds as opposed to 18.75 × 10 4 seconds in discrete-time SRM for all tested discrete intervals.Besides, we showed that coincidence detection can capture the degree of synchronization between two presynaptic inputs by producing lower-delay output spikes for more synchronized input pairs and vice versa.While CD classifier can produce spike based on the coincidence of inputs, the closeness between the inputs that will trigger the output spike is explicitly determined by the training process of learning parameter   .
In addition, CD classifier trained with   Prediction delivered comparable performance to Supervised STDP; however it can achieve convergence faster with less number of epochs required.We found that latency coding produced best recognition accuracy at 96.31% but its performance is not too far from other spike codings.Furthermore, the distribution of discriminants derived from the learning parameters revealed the ability of   Prediction learning to capture the underlying variation within the training faces.Further investigation on the performance of CD classifier using PCA and Gabor features showed that our proposed method performs 5% and 11% better than soft kNN and soft SVM, respectively, in PCA representation, while as for Gabor representation it is 2% and 13% better than soft kNN and soft SVM, respectively.Besides, experiment on the feasibility of CD classifier on classifying other multivariate data revealed that CD classifier with latency coding is around 2% better than kNN and SVM classifiers.Additionally, for the tested multivariate data, latency coding delivers the best result which is 18%, 2%, and 33% better than rate, RO, and TTFS, respectively.
As for future work, we will explore the possibility of extending the application of proposed method into object recognition task and also for temporal recognition of faces from video sequences.We would further study how to embed the global information of face image together with local patches information so that the resulting classification is more robust against global variations such as poses, age variation, and illumination.

𝑓 1 = 2
ms and   2 = 8 ms (thus  = 6 ms) with respect to different learning parameters are shown in Figure 2. All spikes generated using SRM model with fixed threshold.Consider the maximum amplitude of PSP evoked by the first presynaptic input, conveniently denoted afterwards as

Figure 3 :
Figure 3: Illustration on firing region and nonfiring region of a CD neuron receiving 2 presynaptic spikes at   1 = 0 ms and   2 = 7 ms.

1 .
The objective of OSTP is to find the estimated spike time  ,est of the output neuron.For convenience, let   = ( −    )/  and ( − t ) =  0 so that (3) now becomes

Figure 5 :
Figure 5: Precise spike firing time at a CD neuron by continuous SRM ( disc = 100 ms, providing almost actual membrane level for comparison), discrete-time SRM ( disc = 2ms), and OSTP SRM ( OSTP = 10) approximation in CD SNN classification.

𝑓 2 −Figure 6 :
Figure6: Illustration of a CD neuron (center) attached to single patch of face receiving presynaptic inputs attached to locally lateral LP  from gallery face and probe face.Each neuron connected to the CD neuron patch assumes spatially different input L to ensure that spatiotemporal information is intact.This coincidence detection implementation adopts a feed-forward 2-layer neural network approach comprising an input layer and an output layer.

Figure 7 :
Figure 7: Detailed description on the classification model, showing different spike interpretations on classification of a probe patch against 10 gallery patches.

Figure 8 :
Figure 8: Coincidence detection classifier evaluating synchronization of two classes of inputs, namely, reference face  and test face  comprising  CD neurons as output neurons.

Figure 9 :
Figure 9: Examples of images used in this paper.The variations of probe samples in AR are neutral, smile, angry, scream, sunglass, and scarf, while the variations of probe samples in JAFFE are neutral, smile, angry, surprised, sad, disgust, and fear.FERET probe samples' variations, however, consist of mixtures of several different expressions while CK+ consists of several FACS-coded expressions.

Figure 10 :
Figure 10: Comparisons on the (a) accuracy, (b) processing speed, (c) MSE, and (d) number of floating points operations of OSTP and discrete-time SRM for different values of  OSTP and  disc between 1 and 10000.
) and 11(b) the convergence to minimum recognition error is achieved faster in   Prediction for all types of spike codings.Similarly, from Figure11(c), the spike error converges to minimum faster in   Prediction than in Supervised STDP.

Figure 11 :
Figure 11: Convergence analysis on (a) Supervised STDP and (b)   Prediction using different spike codings (c) shows the comparison on spike error  of Supervised STDP and the   Prediction learning methods as the training iteration grows.The errors are taken as average of recognition error from AR Scream S1 and Scarf S1.The recognition error at  > 0.25 in (a) and (b) is not shown for clarity of the graph.For Supervised STDP, recognition error for RO coding  RO is not shown since  RO = 1 for all tested epochs.

Figure 12 :
Figure 12: Evolution of output spikes from a CD neuron during Supervised STDP Learning and   Prediction learning process.The output spikes are shown as small bar plot indicating the exact output spike time in milliseconds.The input spikes are obtained from identical subject and the output spikes shown also include start and end cues but exclude the imaginary spikes.

Figure 13 :
Figure 13: Result of learning by   prediction for several face datasets where each pixel in L location within the images represents the synaptic time constant   of synaptic connection between CD neuron and input neuron attached to the L location.Darker pixels represent higher   and vice versa.Number of epochs used is 1.

Figure 14 :
Figure 14: Different spike codings applied to output spikes of CD classifier fully trained by   Prediction with epoch = 1, taking inputs from AR Scream S1 on the left and AR Scarf S1 on the right.Figures in row (a) are plotted from matching face samples, while in bottom row (b) they are from nonmatching face samples.Scatter plots are comprised of spatiotemporal input spike patterns of reference face (blue asterisk) and test face (red circle) from 7056 neurons afferents while bar plots on the right show the CD neurons latency, rate, rank order, and TTFS coding applied.The input spikes are obtained by direct conversion of pixel values to time domain which is then normalized to be between 0 and 18.

Table 2 :
Result of face recognition accuracy using CD classifier with Supervised STDP and   Prediction. to avoid overtraining the synaptic time constant   , where the train error  rate = 0.08 and test error  rate = 0.08 are obtained.By comparing Figures11(a

Table 3 :
Result of face recognition accuracy using CD classifier trained by   Prediction with different spike codings.

Table 4 :
Comparison of face recognition accuracy using CD classifier trained by   Prediction against several other popular classifiers.
5.7.Results on Other Multivariate Datasets.To investigate the viability of the proposed CD classifier approach on multivariate data other than face images, another experiment is conducted using Iris dataset, Breast Cancer Wisconsin

Table 5 :
Comparison of classification accuracy of several multivariate datasets using NN, SVM and CD classifier.neighborhoods in a satellite image.There are 6435 vectors of 36 elements to be classified into 7 classes.Using 10fold cross validations, 10 splits of training/testing are carried out, except for Statlog dataset since the training and test data are fixed, and the average and standard deviations are recorded in Table 5. CD classifier parameters used are  = 5, V = 6 mV,  + = 1.0,  spike = −2 mV, and  no spike = 2 mV.Training is done by   Prediction learning method where epoch = 1 is used for classification by latency and rate coding, and epoch = 30 is used for classification by RO and TTFS coding.