An Extension Network of Dendritic Neurons

Deep learning (DL) has achieved breakthrough successes in various tasks, owing to its layer-by-layer information processing and sufficient model complexity. However, DL suffers from the issues of both redundant model complexity and low interpretability, which are mainly because of its oversimplified basic McCulloch–Pitts neuron unit. A widely recognized biologically plausible dendritic neuron model (DNM) has demonstrated its effectiveness in alleviating the aforementioned issues, but it can only solve binary classification tasks, which significantly limits its applicability. In this study, a novel extended network based on the dendritic structure is innovatively proposed, thereby enabling it to solve multiple-class classification problems. Also, for the first time, an efficient error-back-propagation learning algorithm is derived. In the extensive experimental results, the effectiveness and superiority of the proposed method in comparison with other nine state-of-the-art classifiers on ten datasets are demonstrated, including a real-world quality of web service application. The experimental results suggest that the proposed learning algorithm is competent and reliable in terms of classification performance and stability and has a notable advantage in small-scale disequilibrium data. Additionally, aspects of network structure constrained by scale are examined.


Introduction
In recent years, deep learning (DL) has dominated the research feld of artifcial intelligence (AI) and achieved dramatic successes in terms of speech recognition, protein structure prediction, drug discovery, image and video processing, and others. [1]. At present, the mainstream deep learning models are mostly constructed based upon neural networks, referring to multiple-layered parameterized McCulloch-Pitts neurons inspired by the biological neuron [2]. Neural networks as black boxes are not only extensively studied in the feld of artifcial intelligence but also highly applied to the industry of information technology [3,4]. Te appearance of deep neural networks pushes the development of neural networks to a new peak. However, given the numerous difculties and problems they face, including the lack of a theoretical foundation for explanation [5,6], fairness [7,8], and causal discovery [9,10], neural networks tend to become stuck in a relatively inert state. In pace with the increasing attention of interpretative theory, it is essential to urgently require newer and better discoveries and more valuable research orientations to avoid blindness for the promotion of scientifc and technological progress.
From the perspective of understanding the mechanics of artifcial neural networks, various methods and contributions are introduced [11,12]. Trough the application of the Monte Carlo simulation for quantifying the variable's importance, Olden et al. justifed the rightness of the connection weight calculation method in neural networks [13]. Statistics pointed out that Sarle et al. proclaimed the relations between neural networks and the generalized linear model, maximum redundancy analysis, projection tracking, clustering analysis, and other statistical models and also transformed terms in neural networks into statistical terms [14]. Similarly, certain relations between artifcial neural networks and statistical methods were proposed in [15].
Despite putting forward the abovementioned studies, the bottleneck of neural networks is not addressed. Originating from the simulation of neurons in biological concepts, the artifcial neural network takes artifcial neurons as nodes to construct a complete conduction structure. As a basic information processing unit, the neuron is formed by a dendrite, cell body (soma), axon, and synapse, as shown in Figure 1(a). To be specifc, the dendrite receives information from the outside world, the cell body processes the information, and the axon and synapse transmit the signal and pass it on to other neurons, respectively [16]. Te structure of biological neurons can be traced back to the 1940s, when McCulloch and Pitts jointly published the abstract neuron model McCulloch-Pitts (MP) [17] for the frst time, as illustrated in Figure 1(b). Ten, in 1949, the Hebb rule [18] was proposed based on the theory of the variability of synapse connections of neurons within the human brain. Te adjusting weight method was introduced into machine learning, thereby laying the foundation for the learning algorithms.
Inspired by the structure of biological neurons and MP, Todo et al. proposed the dendritic neural model (DNM) [19]. Diferent from the traditional neural network model, the dendritic neural model is designed based on neuron conduction and single neuron. DNM obtains the support of the biological theory to simulate the biological neuron. Furthermore, DNM compensates for some defects of the homologous perceptron model, such as the inability to solve the XOR problem. At the same time, the novel study of the human brain [20] is also brought about.
As a classifer, shown in Figure 1(c), DNM has been applied to various classifcation problems. For example, Sha et al. classifed the breast cancer dataset, and Jiang et al. detected the liver disorder [21] for assisted disease diagnosis. Apart from the development of medical aid applications, an unconventional method was also applied to the fnancial feld. To improve the classifcation performance, metaheuristic algorithms were introduced to train the hyperparameters of DNM [22,23]. Trough the use of the decision tree, Luo et al. initialized the model to realize better efectiveness [24]. For solving a generalized large-scale classifcation problem, Jia et al. suggested a reconciliation method with DNM by using a particle antagonism mechanism, and Ji et al. proposed a DNM-based multiobjective evolutionary algorithm [25]. In terms of feature selection, Song et al. addressed the high-dimensional challenge [26], and Gao et al. also showed the expansibility and fexibility of DNM for diverse applications [27]. Utilizing the multiplication operation that is useful to the information processing for a single neuron, the computing in synapses is imaginatively described using sigmoid functions. It is advantageous to establish the morphology of a neuron by determining the values of the parameters in synapses since the output of synapses can efectively represent signals. Nevertheless, it is noted that the single neuron is limited in partial application scenarios. In [28], the binary classifcation results of DNM were incorporated to undertake multiple classifcation tasks and thus recognize the multiclassifcation datasets.
By adopting the quality of service (QoS) as the evaluation dataset, this study implements the multiclassifcation of web service selection. QoS is defned as the fact that a network utilizes a range of basic technologies to provide superior service capabilities for the designated network communication [29]. As a security mechanism of the network and a technology, QoS is carried out to deal with network delay [30], blocking [31], and other problems. For a general situation, the common network bandwidth as a signifcant metric is instanced in order to illustrate QoS. When the standard of service quality has not appeared, the network environment treats all services and applications in an equal way, resulting in a disordered situation, as shown in Figure 2(a) where the colored area stands for diferent web services and applications. In other words, when a network device does not have the capacity of QoS, the network environment will be threatened, and a bottleneck will be created [32]. As shown in Figure 2(b), prioritization from the perspective of QoS provides a more orderly, efcient, and stable network environment. QoS contains a set of nonfunctional attributes, which is the measure and criteria of such characteristics of the web services, such as reliability and response time, to efectively classify and sort diferent services.
Web services refer to some software modules running on the network, which are service-oriented and based on distributed programs. Due to the fact that the web service employs general Internet standards, such as HTTP and XML (a subset of the standard generalized markup language) [33], human beings then have access to data on the web via various terminal devices in various places. In this article, the described web service is diferent from the common network application. It generally refers to some application modules, such as the network protocol and method, which is the basis of network applications. With the development of the Internet, many candidate services have implemented the same task, and most of them have the same functions but diferent nonfunctional characteristics. As a result, these services are divided into diferent service quality levels. Overlapping is seen to be inevitable because of the existence of a wide range of web services on the network. Based on the QoS, web service selection is considered an efective solution [34]. As network technology and operation concepts develop rapidly, web services are becoming the latest technology and development trend for constructing distributed, modular, and service-oriented applications.
Based on DNM, this article proposes a multiple dendritic neural network (MDNN) with multiple single neurons to achieve the multiclassifcation of web service selection based on QoS. To adjust the multiclassifcation mechanism, the structure of DNM introduced in Figure 3 is reconstructed. For the purpose of accelerating the gradient descent and improving the multiclassifcation accuracy, the backpropagating algorithm and adaptive moment estimation optimization are derived for the frst time. Experiments are carried out on the Quality of Web Service dataset and nine UCI multiclassifcation datasets [35]. In the comparison between MDNN and nine state-of-the-art classifers, the superiority of the proposed method is demonstrated.
Te contributions are majorly classifed into the aspects as follows: 1) a novel multiple single-neuron neural network 2 Computational Intelligence and Neuroscience for multiclassifcation tasks is developed. 2) Te potential and application scenarios of the dendritic neural network are explored. 3) A new approach for QoS-driven web service selection is proposed. Given as follows is the organization of the remaining parts of this article: Section 2 presents the structure of the multiple dendritic neural networks. Section 3 elaborates on the learning processes of the proposed method and expounds on the optimization strategies. Te comparison with other algorithms and experimental results are shown in Section 4. At last, Section 5 concludes the paper and formulates future work.

The Dendritic Neuron Network-Based Multiclassifier Approach
Te proposed multiclassifer is constructed by multiple single neurons. Te general architecture is shown in Figure 4. As for each neuron, x i , the input of the model is preprocessed by using a nonlinear sigmoid flter. To diferentiate neurons, the function introduces the subscript j, which is defned as follows: where i is the number of attributes of the sample, m is the number of nodes within the hidden layer, and j is the number of classifcations of output results. In addition, the weight w j,i,m and threshold q j,i,m denote the neural network parameters in the training stage and are randomly initialized within (0, 0.01) and 0, respectively. In contrast to the perceptron model, a quadrature method is adopted for the hidden layer to not only rule out the inhibited neuronal excitation but also enhance the activated neuronal excitation. Te formula is described as follows: Eq.
(2) means that all of the hidden layers are activated, which is equivalent to a logical AND. Eq. (3) is equivalent to a logical OR where all inhibited neuronal excitations from the former layer are suppressed exclusively and the rest are reactivated. As a result, the multiclassifcation structure of multiple neurons is formed.
Apart from the dendritic mechanism, MDNN utilizes the normalized exponential function to output fnal results. For ease of consistency in representation, the illustrated style of the normalized exponential function is followed. To be noted, the output of multiple neurons is processed by all information from the previous layer instead of being directly conveyed, expressed as follows: Class 2 Class n Computational Intelligence and Neuroscience where O j is the possibility of the prediction for each class. Te normalized exponential function converts the output value of the upper layer, V j , to the probability distribution with the range of [0, 1], and the sum of the probability values of each neuron being 1. Te formula frst converts the results V j into an exponential function, ensuring the nonnegative probability, and then normalizes the probability values into 1.
Since the prediction result follows the rules of the probability distribution, the cross-entropy function as the loss function is considered a proper substitute for mean square error, which is defned as follows: where E j represents the similarity of probability distribution between the prediction of the model and the actual classifcation and T j is the actual classifcation label.

Learning Mechanism and Optimization Strategies
Te existing learning algorithms cannot be directly applied since MDNN is a new dendritic neuron model containing multiplication operators in its calculation. Accordingly, in this section, we for the frst time derive the learning algorithms for our proposed MDNN, specifcally one is the traditional error backpropagation, and the other is an Adam-like learning algorithm.

Backpropagation.
In the course of learning samples, the model is promoted by the stochastic gradient descent of parameters w j,i,m and q j,i,m , which is described as follows: where η as the learning rate is a positive constant. t and t − 1 denote the current iteration and the previous iteration in the training stage, respectively. Te error of the proposed MDNN is calculated by the cross-entropy function. According to the calculated error, the error backpropagation algorithm is introduced as the learning scheme. In backpropagation, all the samples or a batch of samples are involved. To better realize intuition, the relation among layers is shown in Figure 5.
∆w j,i,m and ∆q j,i,m are expressed by the partial diferential form as follows: Since the model is trained by batches, ∆w j,i,m and ∆q j,i,m obtained by the gradient descent are fnally calculated as follows: where N denotes the size of input data within the current iteration.
Following the chain rule, the derivation procedures and results are presented according to the backpropagation. Firstly, the partial diferential of error E is calculated. By the empirical evidence of normalized exponential function, zE j /zO j and zO j /zV j are computed collectively instead of computing them separately. Te forward propagation for the multiclassifcation is not directly corresponding. Tus, the derivation of error E is discussed in Cases (1) and (2). To avoid confusion, the subscripts of V and O are redefned as m and n, respectively. Teir relation is simplifed as follows: On the basis of Equations. (5) and (4), when n � m, there is Case (1): When n ≠ m, there is Case (2): We incorporate Cases (1) and (2) into the following formula: Tus, zE j /zV j is expressed as follows: For the rest layers of MDNN, they are derived according to Equations. (3) and (2) as follows: Taking the derivative of Equation.
(1) with the sigmoid function, zY j,i,m /zw j,i,m and zY j,i,m /zq j,i,m are obtained as follows:

Adam-Like Optimization.
For improving the convergence and classifcation ability of the proposed model, inspired by the well-known adaptive moment estimation (Adam) [36], an Adam-like learning algorithm for MDNN is also introduced to accelerate the gradient descent without diverging. Te way of updating weights in each iteration is optional. Te traditional way mentioned in Section 3.1 or Adam can be altered according to the user's setting.
As an extended optimization strategy of stochastic gradient descent (SGD) [37], momentum [38,39] is introduced to reduce the oscillation and accelerate the gradient descent. Te fundamental concept of gradient descent with momentum lies in updating the weight by calculating the exponentially weighted average of the gradient as follows: where α is a positive constant to smooth out the gradient descent process. Intuitively, ∆w j,i,m and ∆q j,i,m are interpreted as the acceleration in physics. v ∆w j,i,m and v ∆q j,i,m are regarded as the velocity, and α is seen as the friction. In addition, ∆w j,i,m and ∆q j,i,m accelerate the gradient descent and gain the velocity ∆w j,i,m and ∆q j,i,m , and the friction α prevents the acceleration. In this case, the updates of parameters w j,i,m and q j,i,m are modifed as follows: Serving as a crucial part of Adam, the root mean square prop (RMSprop) [40] auxiliary accelerates the gradient descent as follows: where β is a positive constant similar to α. w j,i,m is calculated as follows: Computational Intelligence and Neuroscience Similarly, u ∆q j,i,m is obtained by In order to avoid the bias of exponentially weighted average in the initial learning stage, Equations. (17), (18), (21), and (23) are modifed to obtain more accurate results as follows: For the acceleration of the gradient descent, Adam combines RMSprop with momentum. Tus, based on Equations. (23) and (24), the parameter updating equations optimized by Adam are expressed as follows: where ε is an infnitesimal so as to prevent the computation overfow.

Experimental Setup.
Te quality of web service (QWS) dataset [41,42] is a real-world dataset based on the quality of service. Several versions of QWS are available. In this study, experiments use the original version, which consists of 364 web services. Its quality is described by a total of 10 nonfunctional attribute indexes. Te QWS dataset divides the web service into 4 levels from the highest to the lowest, which are platinum, gold, silver, and bronze. To avoid overftting while improving the accuracy of results, data preprocessing strategies are adopted in the experiments. Te raw data are normalized by using the rule of standardization: Moreover, the normalized data are randomly divided into three parts: 70 percent for the training process, 15 percent for the testing process, and the remaining data for the validation, to reduce unnecessary time consumption. Table 1, which presents their description and values. Sigmoid, tanh, Rectifed Linear Unit (ReLU), and Leaky ReLU are available to freely choose the suitable active function. For large datasets, the samples are divided into mini-batch and shufed for the gradient descent during the training stage. If batchsize is equal to the number of samples during the iteration, then the batch gradient descent will be executed.

Optimal Parameter Settings. All of the hyperparameters are illustrated in
However, it is tricky to determine the epoch size. To enable the model to reach optimal performance, a selfadaptive appending training epoch is arranged in the training stage according to the convergence of the validation process. Beginning with the default confguration, the epoch then adaptively increases pivoting on whether the gradient descent is approaching stagnation. At the same time, the initial value is set to 100 to avoid a higher epoch causing the time consumption. For general neural networks, experimental results are highly afected by the combination of parameters. Terefore, parameters, which are batchsize, M, η, and precision, are adjusted by the orthogonal experiment with 4 factors and 3 levels. Te specifc design is listed in Table 2. Te optimal parameters of MDNN for QWS are fnally shown in Table 3. Te other parameters comply with the setting in Table 1.

Experimental Results.
For the comprehensive performance evaluation of MDNN, the following statistical indicators are used: precision, recall, F 1 score, accuracy, and area under the curve (AUC) [43]. It is worth noting that the precision refers to the classifcation precision. For not only the convenience of the intuitive evaluation but also the justifcation of the following comparison, the macroaverage, which is the arithmetic average of performance indicators of all categories instead of instances, is adopted to statistically process classifcation results. Table 4 shows the classifcation results of MDNN on the 4-class QWS dataset. Te average values and optimum values represent the classifcation performance of MDNN. Although the dataset is technically unbalanced, such as the number of classes of QWS in Table 5, the stability and generalization ability of MDNN are considered to be effectively validated.
Te fve statistical indicators suggest that the classifcation achieved by MDNN for each class is efective, stable, and reliable. In their mean values, it is indicated that MDNN has good classifcation performance. Te gradient descent optimization strategy efectively reduces the errors. Moreover, MDNN accelerates the gradient descent to maintain a continuous downward trend, thereby fnally guaranteeing the generalization and robustness of MDNN.

Comparison of Methods.
To further verify the efciency of MDNN, nine classifers in total are used to compare with MDNN on nine multiclassifcation datasets from the UCI machine learning repository and the QWS dataset. Te information of datasets and parameter settings of MDNN are listed in Table 5. Te nine classifers consist of BP, SVM, KNN, CART, naïve Bayes, LDA, QDA, J48, and random forest [44]. In addition, the ten datasets include Iris, Wine, Vehicle, Balance scale, CMC, Seed, Vowel, Tyroid, Robot navigation, and QWS.
Experimental results are shown in Table 6 where the best result for each dataset among all compared methods is highlighted in bold. According to fve statistical indicators, it can be found that MDNN has the most optimal values in comparison with other classifers. On the Iris, Wine, Seed, and Tyroid datasets, MDNN gives the best performance, with perfect outcomes of 100 percent correctness. Also, MDNN performs well with unbalanced data such as Vowel, Tyroid, and QWS. As a result, MDNN's classifcation performance is more constant than that of other methods. Nevertheless, MDNN appears to have a minor disadvantage on the large datasets, such as Balance scale, Vehicle, CMC, and Robot navigation, which seem to be constrained by the distribution of the network structure. In the comparison between MDNN and other classifers, the superiority and efectiveness of the multiple dendritic neuron structure are verifed.
Te receiver operating characteristic (ROC) curves of ten multiclassifcation methods show the correct classifcation coverage of each class of the QWS dataset in Figure 6. It can be found that MDNN not only has a consistent performance on each class of QWS but also outperforms the other classifers, thus indicating the efectiveness and stability of MDNN for the QWS classifcation and unbalanced multiclassifcation applications. Besides, experiments also demonstrate the efciency and superiority of MDNN in terms of classifcation performance and stability.

Morphology and Logical Circle Realization.
For the display of a data sample, the shufe operation set in the pretraining period is at disposal. According to the initialization of w j,i,m and q j,i,m within Section 2, the synapses were calculated around 0.5 in the previous state. Trough the   training stage, the weights w j,i,m and q j,i,m were gradually stabilized. As shown in Figure 7, synaptic changes, thus, yield to accomplish the pruning of the redundant network structure. It can be easily observed that neuron 1, neuron 2, and neuron 4 are fully inhibited in accordance with the rule of Equation.
(3). Consequently, the structure of neuron 1, neuron 3, and neuron 4 is ruled out. To specify the states of dendrites, a total of four scenarios are listed as follows: dendritic state � constant − 1 connection, when q j,i,m < w j,i,m < 0 or q j,i,m < 0 < w j,i,m, constant − 0 connection, when 0 < w j,i,m < q j,i,m or w j,i,m < 0 < q j,i,m, excitatory connection, when 0 < q j,i,m < w j,i,m, inhibitory connection. when w j,i,m < q j,i,m < 0.
For the remainder neuron, the residual dendrite morphology is formed in Figure 8(a), which places the line of dashes indicating pruning. As mentioned previously in Section 2, logic-based inherent relations have existed within dendritic structures. Finally, since constant-1 holds no substantive impact on the attributes, the connections among dendrites are equivalent to logic OR. Tus, the hardware realization is transformed as illustrated in Figure 8(b), where the multiplexer is a 1 : 2 numerical compactor. In addition to showing the extendibility of MDNN, the overhead also indicates that MDNN can avoid overftting by increasing the dendritic matrix sparsity.

Conclusion and Future Directions
Tis article puts forward a novel extended network of dendritic neurons, namely, the multiple dendritic neural network (MDNN). Te architecture of MDNN is completely diferent from the previous DL models which are based on MP neuron models. By deriving its new learning algorithms, MDNN is for the frst time able to resolve the multiclassifcation problems in comparison with previous single dendritic neuron models. Besides, we propose an approach to improve the interpretability of artifcial neural networks with the theoretical support of neuroscience. Experiments are mainly carried out on a QoS-related application. In the comparison between MDNN and other classifers, the superior performance of the proposed model is shown, and MDNN is also highly advantageous to small-scale unbalanced data. In view of this, the performance and efciency of the proposed neural network are limited by scale. In the follow-up work, the defciency of this experiment will be made up to improve the generalization ability [45,46] and study the capabilities and limitations. Meanwhile, the exploration of applicable domains for MDNN will be conducted in the following aspects: 1) expanding research on more computer-related data mining to solve practical engineering problems, such as quality of service of mobile networks [47] and security bug report [48,49]; 2) practicing in other forms of data structures, e.g., semantic [50]; 3) focusing on the unbalanced data [51] and simplifying the network structure adequately [52] with the practice.

Conflicts of Interest
Te authors declare that they have no conficts of interest.

Authors' Contributions
Qianyi Peng was involved in the investigation, methodology, software, visualization, validation, and writing the original draft. Shangce Gao was responsible for conceptualization, methodology, supervision, validation, and writing, review, and editing. Yirui Wang was involved in the conceptualization, software, methodology, writing, review, and editing, as well as supervision. Junyan Yi, Gang Yang, and Yuki Todo were responsible for conceptualization, methodology, and writing, review, and editing. Computational Intelligence and Neuroscience 11