Evolutionary Dendritic Neural Model for Classification Problems

In this paper, an evolutionary dendritic neuron model (EDNM) is proposed to solve classification problems. It utilizes synapses and dendritic branches to implement the nonlinear computation. Distinct from the classical dendritic neuron model (CDNM) trained by the backpropagation (BP) algorithm, the proposed EDNM is trained by a metaheuristic cuckoo search (CS) algorithm instead, which has been regarded as a global searching algorithm. CS algorithm enables EDNM to avoid several disadvantages, such as slow convergence, trapping into local minimum, and being sensitive to initial values. To evaluate the performance of EDNM, we compare it with a multilayer perceptron (MLP) and CDNM on two benchmark classification problems. *e experimental results demonstrate that EDNM is superior to MLP and CDNM in terms of accuracy rate, receiver operator characteristic curve (ROC), and convergence speed. In addition, the neural structure of EDNM can be replaced by a logical circuit completely, which can be implemented in hardware easily. *e corresponding experimental results also verify the effectiveness of the logical circuit classifier.


Introduction
Classification is machine learning techniques that allocate objects in a collective form to find the classes. Many problems of science, business, and medicine can be treated as classification problems, for example, medical diagnosis, quality control, bankruptcy prediction, credit scoring, handwritten character recognition, and speech recognition [1]. Various machine learning techniques have been proposed to solve classification problems, namely, k-nearest neighbor [2], decision tree [3], artificial neural networks (ANNs) [4], rule-based classifier [5], naive Bayes classifiers [6], linear discriminant analysis [7], and support vector machine [8].
Among them, ANNs are considered as one of the comprehensive classifiers [9], which are computational models inspired by the biological nervous system, to mimic the information processing way of neurons in the human brain [10]. e first mathematical model of ANNs which apply nonlinear threshold unit was proposed by McCulloch and Pitts in 1943 [11]. In McCulloch and Pitts's model, the neuron receives input signals from other neurons and assigns the weight that represents the connection strength between nerve cells to each input. It determined whether it is activated or remains inactive based on the results in the model [12]. e research reveals that it is difficult for the model to solve the nonlinearly separated problems because of its oversimplified structure [13].
Although the learning ability of MLP which utilizes McCulloch and Pitts's model as a fundamental calculative unit makes it a powerful tool for various applications [14], further biological studies have inferred that a single neuron could own powerful computation capacity by taking into account the synaptic nonlinearities of a dendritic tree, which is significantly different from McCulloch and Pitts's model [15,16]. Similar issues also occur in the spiking neural networks (SNNs), in which integrate-and-fire neurons communicate information via discrete spike events based on spike-timing-dependent plasticity (STDP) rule [17]. It is proven that although SNNs are biologically more plausible, they ignore the information computation of dendritic structures and bear little resemblance with the biological neural model [18].
Different neurons own distinct dendritic structures in vivo; even a small variation in the dendritic morphology will produce a great change in neuron functions [19,20]. By analyzing the interaction of the excitatory and inhibitory synaptic inputs in neural cells, Koch et al. proposed a ϵ-like neural model with a dendritic structure [21,22]. While the dendritic structure of this model remains unchanged concerning all the given tasks, it cannot realize the plasticity property of the dendritic morphology. To be specific, the ϵ-like neural model lacks an effective mechanism to determine whether the synapse is excitatory or inhibitory, as well as which branches of dendritic trees are redundant and need to be eliminated, which means that Koch's model is unrealistic when compared with biological neuron models [23].
Recently, Legenstein and Maass proposed a single neuron model with dynamic dendritic structure based on STDP and branch-strength potentiation (BSP) [24]. e neuron model is used to solve a simple feature-binding problem in a self-organized manner. However, it has been proven that the neuron model is not capable of solving the nonlinear separable problem, such as the EXOR benchmark problem [25]. In addition, Ritter et al. proposed a mathematical model named lattice neural networks (LNNs) in a morphological neuron based on lattice algebra [26]. LNNs make use of lattice operators inf and sup for the construction of the computational algorithms and replace the multiplication algebra operator of the real numbers by the addition operator.
ey are closely related to Lattice Computing which is regarded as the collection of computational intelligence tools [27]. Although LNNs incorporate the dendrite computation in the neural model, they do not further interpret and realize the implement of the plasticity mechanisms in the dendritic structure.
In our previous work [28,29], we proposed a single neuron model with nonlinear interaction among synapses in the dendrites, named CDNM. Experimental studies prove that CDNM can effectively settle practical tasks including cancer diagnoses [30,31], financial time series forecasting [32], and credit risk assessment [33]. Besides, an unsupervised learnable neuron model based on CDNM was proposed, and it has been proved to be able to solve the twodimensional multidirectional selectivity problem [34]. Moreover, studies prove that plasticity mechanisms of CDNM can be implemented via a neuron-pruning mechanism, which consists of synaptic pruning and dendritic pruning. Neuron pruning occurs along with the training process of the model. e pruned neural model can be replaced with the logical circuits that merely contain the comparators, logical AND, OR, and NOT [35,36].
Although CDNM has been used in various applications effectively, the original BP algorithm largely limits the computation capability of CDNM. BP is the gradient-based training algorithm; it requires that the neuron transfer function must be differentiable. e gradient information is highly sensitive to the initial conditions, which makes BP suffer from trapping into local minima easily [37]. In addition, BP and its variations have several drawbacks, such as slow convergence speed and overfitting [38]. erefore, to avoid these disadvantages caused by BP algorithm, the proposed EDNM employs a nature-inspired CS algorithm [39] as the learning algorithm in this paper, which is acknowledged as a global search algorithm. e CS algorithm combines a global random walk with a local random walk, which mimics brood parasitic behavior of cuckoo species and the Lévy flight [40] behavior of some birds and fruit flies. e powerful optimization ability enables CS to become an effective training algorithm, and EDNM can avoid trapping into local minima due to the update of the solution being independent of explicit gradient information. e performances of EDNM are evaluated and compared on two benchmark classification datasets in our experiments. In addition, we also verify the effectiveness of neural pruning and logical circuit replacement. e rest of this paper is organized as follows: Section 2 describes the details of the proposed model. Section 3 introduces CS algorithm. Simulations related to the descriptions of two benchmark datasets, evaluation metrics, experimental setup, and performance comparison are provided and discussed in Section 4. Finally, concluding remarks are presented in Section 5.

Model Description
e proposed EDNM mimics the mechanism of signal interactions in the biological neural model. e signal processing of EDNM is shown as follows: First, the synaptic layer receives the input signals and processes them through one of defined connection cases. en, the results of the synapses are transferred to the dendritic branches. e membrane layer sums the dendritic activation and transfers the results to the cell body. e structural morphology of EDNM has been presented in Figure 1.

Synaptic Layer.
In the synaptic layer, each synapse connects one feature attribute to receive the input signals of training samples. A sigmoid function is adopted to describe the process; it can be expressed by where x i represents the ith (i � 1, 2, 3, . . . , I) input signal and Y i,m represents the output of the ith synapse on mth (m � 1, 2, 3, . . . , M) dendritic branches. k is a user-defined parameter and remains constant in the calculation process. e parameters w i,m and q i,m are initialized randomly in the range [−2, 2]; then they are trained by the learning algorithms. Based on the values of w i,m and q i,m , the threshold θ i,m of the synaptic layer can be calculated as follows: In addition, according to different values of w i,m and q i,m , the connection cases of the synaptic layer can be divided into four types, namely, the direct connection (•), the inverse 2 Complexity connection ( _ ), the constant-0 connection (⓪), and the constant-1 connection (①). e graphic symbols of the synapses in the four connection cases are provided in Figure 2.
(i) Type 1: Direct Connection. Case (a): 0 < q i,m < w i,m ; for example, w i,m � 1.0 and q i,m � 0.5. As shown in Figure 3(a), direct connection means if the input x i is greater than the threshold, the synapse will output "1." Otherwise, it will output "0." (ii) Type 2: Inverse Connection. Case (b): w i,m < q i,m < 0; e.g., w i,m � −1.0 and q i,m � −0.5. e sigmoid function of inverse connection is illustrated in Figure 3(b). Contrary to direct connection, if the input x i is greater than the threshold, the synapse will output "0." Otherwise, it will output "1." (iii) Type 3: Constant-1 Connection. Case ((c 1 )): 0 < q i,m < w i,m ; e.g., w i,m � 1.0 and q i,m � 1.5. Case ((d 2 )): w i,m < 0 < q i,m ; for example, w i,m � −1.0 and q i,m � 0.5. Similarly, constant-0 connection implies that the output of the synapse remains "0" regardless of the input. It also contributes to the dendritic pruning, which will be introduced in the next section.

Dendritic Layer.
Dendritic structure plays an important role in neural computation. Different neurons own distinct dendritic structure; even a small variation in the dendritic morphology arouses a great change in the neural function. us, to realize the plasticity of the dendritic morphology, the simplest nonlinear operation named "multiplication" is adopted in DENM. Combined with four connection cases of the synaptic layer, it can implement neural pruning function to build a unit dendritic structure for each specific problem. e mathematical formula can be expressed as

Membrane Layer.
e membrane layer receives the signals from each branch of dendrites and completes a sublinear summation operation. en, it transfers the results to the cell body. Its equation is defined as follows:

Cell Body (Soma).
e output signal from the membrane layer is processed by a nonlinear sigmoid function in the cell body. It is the core part of the computation of the single neural model. e signal will be compared with the threshold of the soma; if it is larger, the neuron will fire; otherwise, it will not. e function of the cell body is expressed as follows: where k soma denotes the positive constant parameters of the cell body. θ soma represents the threshold of the cell body and its range is [0, 1].

Neuron-Pruning Function.
EDNM adopts the neuronpruning function to realize the plasticity of the dendritic structure. Specifically, neuron-pruning function prunes unnecessary synapses and dendritic branches during the training process. en it builds a unit structural morphology of EDNM for each specific problem. In EDNM, the pruning mechanism contains two parts, namely, synaptic pruning and dendritic pruning.

Synaptic Pruning.
As introduced above, if one synaptic layer is in the constant-1 connection case, its output is fixed to 1 no matter what its input is. e fundamental math operation of the dendritic layer is multiplication; it is known that any value multiplied by 1 is equal to itself. It implies that the output of this synaptic layer has no influence on the result of its local dendritic branch. us, we can ignore the synapse and the feature attribute it connects to, and this kind of synaptic layers needs to be discarded in EDNM.

Dendritic Pruning.
Similarly, if a synaptic layer is a constant-0 connection, whatever the input is, its output will remain 0. Because of the multiplication operation and the rule that any value multiplied by 0 is equal to 0, the output of the whole dendritic branch is fixed to 0. e branch makes no contribution to the output of the soma body. erefore, we should eliminate this kind of dendritic layers which include all the synaptic layers on them and the connecting feature attributes.
In order to further demonstrate the mechanism of neuron pruning, an example of the pruning process in EDNM is illustrated in Figure 4. It can be observed that, before pruning, the neural structure owns two dendritic layers and each dendritic branch has four synaptic layers in Figure 4(a). Since the synapse that connects to the input x 2 on Dendrite-2 is in the constant-0 connection case, according to the mechanism of dendritic pruning, Dendrite-2 and all the synaptic layers on it need to be pruned simultaneously. us, the pruned parts of the neural structure are drawn in dotted lines as illustrated in Figure 4(b).
Besides, because the synaptic layer that connects to the input x 2 on Dendrite-1 is in the constant-1 connection case, on the basis of the synaptic pruning, this synapse should be detected. Finally, the simplified neural structure is presented in Figure 4(c).

Logical Circuit.
rough the synaptic pruning and dendritic pruning, only the direct connections and inverse connections are retained and a unique simplified neural structure is formed according to the problem. Furthermore, the simplified structure can be transformed into a logical circuit by the comparators, logical AND, OR, and NOT gates. As shown in Figure 5, in the synaptic layer, the direct connection can be implemented by the comparator and a combination of the comparator and logical NOT gate can be used to replace the inverse connection. For the dendritic layer, multiple synaptic layers on a branch can be connected by the logical AND gate. All the dendritic layers are aggregated to the membrane layer, which can be equivalent to the logical OR gate. In the cell body, a simple nonlinear mapping operation is implemented and it can be replaced by a single wire. A unique logical circuit can be obtained through these processes, and since there is no float-point calculation in the logical circuit, the classification speed can be extremely improved without sacrificing the accuracy. In the era of big data, logical circuit classifier might be a talented technology owing to its simplicity.

CS Algorithm
CS algorithm is inspired by a special lifestyle and aggressive reproduction strategy of cuckoo species. Cuckoo never hatched eggs by themselves and put their eggs in the nest of other bird species. Let other bird species help them to hatch eggs. Some cuckoo species (e.g., ani and guira) not only put their eggs in the communal host nest but also throw hosts' eggs away to upgrade the hatching probability of their own eggs [41]. Sometimes, the hosts have the possibility to find the alien eggs and take a counterattack through throwing these alien eggs away or abandoning the nest and building a new nest. Studies have found that, in addition to simple parasitic behavior, a cuckoo called Tapera mimics the color and pattern of the eggs of the selected host [41]. is behavior is more conducive to increasing the number of the eggs that are successfully hatched. CS algorithm is first proposed by Yang and Deb in 2009 [39]; for the simplicity in describing the CS, the following three basic rules are utilized: (i) Each cuckoo lays an egg at one time and places it in a randomly selected nest.
(ii) e nests with the highest quality of the eggs are carried over to the next generations. (iii) e number of the available host nests is constant, and the egg is discovered by the host bird with a probability of P a ∈ [0, 1]. e latter assumption can be approximated by the fraction P a where the n nests are replaced by new ones (new random solutions).
With these three rules, the basic steps of the CS are summarized as the pseudocode shown in Algorithm 1. In the CS algorithm, a global random walk combined with a local random walk is adopted. First, the equation of the local random walk can be expressed as follows: where x t j and x t k are two distinct random solutions in the current population, s represents the step size, and α denotes its scaling factor. H(·) represents a unit step function. p a is a switching parameter that controls the balance between a local random walk and a global random walk, ϵ is a random value from a uniform distribution. e symbol ⊗ represents the entry-wise multiplications operation. en, the global random walk that applies Lévy flights can be described as follows: where α denotes the scaling factor of step size and the function L(s, λ) can be calculated by where λ is the Lévy exponent, function Γ will be a constant for a given λ, and α represents the step size scaling factor. It is widely regarded that the Lévy fights can maximize the efficiency of the resource searches, and it has been observed from the foraging patterns of albatrosses, fruit flies, and spider monkeys [42][43][44]. In addition, empirical evidence has verified that CS is superior to PSO and genetic algorithms [39]. erefore, the CS algorithm is employed as the training algorithm in our experiments (Algorithm 1).   Table 1.

GID.
GID is obtained by measuring the chemical constitution of glass, fabricated by two different processes [45]. e dataset contains 163 samples of window glass and 51 samples of nonwindow glass. Each record has 9 attributes, which include its refractive index and the proportion of its eight chemical components (Na, Mg, Al, Si, K, Ca, Ba, and Fe).
ese analytical results are recorded as 9 numerical continuous values.

CVRD. CVRD records the voting results of the 98th
Congress. It contains 435 samples that record the data of votes for each of the U.S. House of Representatives Congressmen on the 16 key votes (attributes) identified by the CQA. Its classification task is to find the correct political party affiliation of each congressman [46]. Since some attributes of CVRD include missing attribute values, the attribute with missing values needs to be deleted. Finally, 232 complete samples are left, which include 124 "Democrat" samples and 108 "Republican" samples. CVRD is recorded as categorical attributes; thus, 16 categorical attributes, "Yes" and "No," are converted to numerical "1" and "0"; two categorical classes "Republican" and "Democrat" are changed to numerical "1" and "0," respectively [47].

Evaluation Metrics.
In our experiments, to measure the performance of each model, we adopt four performance evaluation criteria, namely, accuracy rate, receiver operator characteristic curve (ROC), convergence speed, and nonparametric statistical test.
(a) Accuracy Rate. e most important evaluation metric is the accuracy rate, which can be expressed as follows: where TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative, respectively. To understand the equation better, the confusion matrix constructed by TP, TN, FP, and FN is shown in Table 2. Although accuracy is the simplest, most intuitive, and    6 Complexity commonly used performance comparison method, it is not enough for a complete performance evaluation [48].
(b) ROC. ROC curve is a widely used method to display complete information on the set of all possible combinations of sensitivity and specificity and is also useful as a graphical characterization of the magnitude of separation between the case and control marker distributions [49]. e area under the ROC curve, known as the AUC, is more intuitive and has been considered as the standard method to assess the accuracy of predictive distribution models. When continuous probability derived scores are converted to a binary presence-absence variable, the supposed subjectivity in the threshold selection process can be avoided by summarizing the overall model performances with all possible thresholds [50]. If case measurements and control measurements have no overlap, then the AUC takes the value "1" for any false positive rate greater than 0; the marker is perfect in discriminating between cases and controls. Alternatively, if the case and control distributions are identical, the marker is in a random classification case. Correspondingly, the equation of AUC can be described as follows: (c) Convergence Speed. High convergence speed indicates the high efficiency of the model. us, it is necessary to use mean squared error at each iteration to compare the convergence speeds of different classifiers [51,52]. e mean squared error is calculated and expressed as follows: where O j and T j indicate the actual output and the predicted output, respectively. J is the sample number of the training dataset.
(d) Nonparametric Statistical Test. Since the nonparametric statistical test is not limited by the overall distribution and its assumption is relatively fewer, it is more robust and has wider applicability than a parametric statistical test. e nonparametric analysis test based on the assumption of the normal    distribution is more sensitive and reliable than t-test which ignores the absolute magnitudes of the differences [53]. In our experiments, Wilcoxon's ranksum test [54] is adopted to complete the nonparametric statistical test.

Simulation Setup.
In our experiment, each dataset is split into a training set and a testing set. Each set contains 50% of the samples [55], as shown in Table 3. Before training, the dataset will be normalized to the range of [0, 1]. e normalized rule follows the maximum and minimum normalization method, which can be expressed as follows: In order to maintain the fairness of the comparison, the number of the parameters in each model should be set to the same or approximately equal as possible. e modal structure of MLP is different from CDNM and EDNM; the numbers of the weights and thresholds in MLP can be calculated as follows: where I and W represent the numbers of neurons in the input layer and hidden layer, respectively. In the neural structure of EDNM and CDNM, since each synapse owns two parameters w i,m and q i,m , when the number of dendritic layers M is determined, the total parameter number in EDNM and CDNM structure is expressed by the following equation: In our experiment, when a benchmark dataset is chosen, the value of I will be determined. Based on equations (13) and (14), setting suitable values of W and M will make N MLP and N EDNM approximately equal to each other. Table 4 summarizes the model structures of MLP, CDNM, and EDNM on the two benchmark datasets. It is easy to observe that all the three methods have nearly the same parameter numbers for both datasets. In addition, the transfer functions of MLP in the hidden layer and output layer are both set to "Log-sigmoid." e learning rate of CDNM and MLP is 0.01. e population size of EDNM is set to 50. e iteration times of three methods are set to be 1000; each method runs 30 times in our experiments independently.

Optimal Parameter
Setting. In EDNM, there are three parameters, namely, k, θ soma , and M, which need to be defined by users. k is a constant value in the sigmoid function of the synaptic layer, θ soma denotes the threshold of the soma, and M represents the number of the dendritic branches. In order to find an optimal combination of these parameters, the Taguchi method is adopted in our experiment, which can reduce the number of experimental running times and ensure the dependability of the experiment [56,57]. According to the Taguchi method, only 16 experiments out of 64 are run; and 16 experiments are enough to find the optimal parameter setting quickly and efficiently. Table 5 shows four levels of interest for the two benchmark datasets. e L 16 (4 3 ) orthogonal arrays are shown in Tables 6 and 7, respectively. It can be observed that, in Table 6, the 12th parameter setting (k � 8, M � 12, and θ soma � 0.5) holds the highest testing accuracy; in Table 7, the highest testing accuracy is obtained by the 6th parameter setting (k � 5, M � 18, and θ soma � 0.3).
rough the above experiments, the optimal parameter settings of the two benchmark datasets can be determined.

Performance Comparison.
In order to verify the classification performance of EDNM, we compare it with MLP and the original CDNM on two benchmark datasets. Table 8 presents the experimental results. It is easy to observe that EDNM obtains higher accuracy than MLP and CDNM on  Complexity 9 both problems. To detect the significant differences between EDNM and the other models, Wilcoxon's rank-sum test is utilized in our experiment. Its significance level is set to 0.05. If the p value is less than 0.05, the null hypothesis that there are no significant differences between the two comparison objects can be rejected. e statistical results are shown in Table 8. From Table 8, it is implied that EDNM performs significantly better than both MLP and CDNM on the two benchmark problems. In addition, for the comprehensive evaluation of model performance, the convergence curves of three models on two benchmark problems are illustrated in Figure 6. As shown in Figure 6, EDNM has a higher convergence speed than MLP and CDNM, obviously. In Table 8, the statistical results demonstrate that the AUC value of EDNM is significantly larger than those of the other models on both problems. e corresponding ROC curves are compared and presented in Figure 7.
Based on the above experimental results, it can be concluded that EDNM is capable of providing more powerful classification performances to solve GID and CVRD problems compared to MLP and CDNM. Higher convergence speed indicates that EDNM is a more efficient classifier, which will save computation time in the practical applications.
Firstly, we present the evolution of the structural morphology of the GID problem in Figure 3. It can be observed that there are 9 feature attributes and 12 dendritic branches in the structure before learning; the connection cases of all the 108 synapses are randomly set in Figure 3(a). After training by CS algorithm, the structural morphology of EDNM is presented in Figure 3(b). According to the rules of dendritic pruning, 10 branches of dendrites are deleted and only "Dendrite − 5" and "Dendrite − 8" are left. Figure 3(c) illustrates the simplified structure of EDNM after dendritic pruning. en, based on the rule of synaptic pruning, 14 unnecessary synaptic layers are ruled out and only 4 synapses are retained. Finally, the mature structural morphology of EDNM on the GID problem is provided in Figure 3(d). Similarly, Figure 9 illustrates the evolution of the structural morphology of the CVRD problem. e pruning results of the two benchmark problems are summarized in Table 9. It is easy to conclude that the pruned neural structures are much more simplified than the original ones; the neuron-pruning function can significantly simplify the structural morphology of EDNM. 4.7. Logical Circuit Analysis. As mentioned above, the simplified structures of EDNM can be completely substituted by the logical circuits. In this section, we attempt to verify the effectiveness of the logical circuit classifiers. According to the final neural structures in Figures 3 and 9, the logical circuit classifiers of two benchmark problem are presented in Figure 10. As shown in Figure 10, the logical circuit classifiers consist of the comparators, logical AND, OR, and NOT gates, where the comparators are used for comparison with the input signals with their thresholds θ. If the inputs exceed the thresholds, the outputs are 1 and 0 otherwise. It is noteworthy that since the final neural structure of CVRD only has one synaptic layer left, there are no logical AND, OR, and NOT gates in the corresponding logical circuit classifier, except a comparator.
Besides, we compare the classification performances of the logical circuit classifiers and the normal EDNM in Table 10. As illustrated in the table, the logical circuits do not sacrifice the accuracies on both benchmark problems. In addition, once the logical circuit classifiers are implemented on hardware, the classification speed will be much higher than that in all the other classifiers in the literature. According to the above characteristics, the logical circuit classifier is considered as a satisfactory and efficient classifier for real-world classification tasks.

Conclusion
In this study, an EDNM is proposed to solve the classification problems. It consists of four layers, namely, the synaptic layer, the dendritic layer, the membrane layer, and the soma. e unique structure makes EDNM implement the neural pruning mechanism, which can rule out the unnecessary synapse and dendritic branches. Compared with the original BP algorithm of CDNM, CS algorithm has higher convergence speed and great classification accuracy on two benchmark problems, where the statistical results demonstrate that EDNM performs significantly better than   MLP and CDNM. Besides, we also present the logical circuit classifiers produced by EDNM and verify their accuracy rate. e experimental results show that the logical circuits maintain satisfying classification performances. It is noted that, to the best of our knowledge, when the logical circuit classifiers run on hardware, the classification speed will be higher than that in all the other classifiers in the literation. In our future research, we will attempt to adopt the multiobjective optimization algorithms to train the structure and weights of EDNM, simultaneously, which may be able to produce a more simplified and high-accuracy logical circuit for each classification problem.
Data Availability e benchmark classification datasets could be downloaded freely at https://archive.ics.uci.edu/ml/index.php.

Conflicts of Interest
e authors declare that they have no conflicts of interest.