Designing Artificial Neural Networks Using Particle Swarm Optimization Algorithms

Artificial Neural Network (ANN) design is a complex task because its performance depends on the architecture, the selected transfer function, and the learning algorithm used to train the set of synaptic weights. In this paper we present a methodology that automatically designs an ANN using particle swarm optimization algorithms such as Basic Particle Swarm Optimization (PSO), Second Generation of Particle Swarm Optimization (SGPSO), and a New Model of PSO called NMPSO. The aim of these algorithms is to evolve, at the same time, the three principal components of an ANN: the set of synaptic weights, the connections or architecture, and the transfer functions for each neuron. Eight different fitness functions were proposed to evaluate the fitness of each solution and find the best design. These functions are based on the mean square error (MSE) and the classification error (CER) and implement a strategy to avoid overtraining and to reduce the number of connections in the ANN. In addition, the ANN designed with the proposed methodology is compared with those designed manually using the well-known Back-Propagation and Levenberg-Marquardt Learning Algorithms. Finally, the accuracy of the method is tested with different nonlinear pattern classification problems.


Introduction
Artificial Neural Networks (ANNs) are system composed of neurons organized in input, output, and hidden layers. The neurons are connected to each other by a set of synaptic weights. An ANN is a powerful tool that has been applied in a broad range of problems such as pattern recognition, forecasting, and regression. During the learning process, the ANN continuously changes their synaptic values until the acquired knowledge is sufficient (until a specific number of iterations is reached or until a goal error value is achieved). When the learning process or the training stage has finished, it is mandatory to evaluate the generalization capabilities of the ANN using samples of the problem, different to those used during the training stage. Finally, it is expected that the ANN can classify with an acceptable accuracy the patterns from a particular problem during the training and testing stage.
Several classic algorithms to train an ANN have been proposed and developed in the last years. However, many of them can stay trapped in nondesirable solutions; that is, they will be far from the optimum or the best solution. Moreover, most of these algorithms cannot explore multimodal and noncontinuous surfaces. Therefore, other kinds of techniques, such as bioinspired algorithms (BIAs), are necessary for training an ANN.
BIAs have a good acceptance by the Artificial Intelligence community because they are powerful optimization tools and can solve very complex optimization problems. For a given problem, BIAs can explore big multimodal and noncontinuous search spaces and can find the best solution, near the optimum value. BIAs are based on nature's behavior 2 Computational Intelligence and Neuroscience described as swarm intelligence. This concept is defined in [1] as a property of systems composed of unintelligent agents with limited individual capabilities but with an intelligent collective behavior.
There are several works that use evolutionary and bioinspired algorithms to train ANN as another fundamental form of learning [2]. Metaheuristic methods for training neural networks are based on local search, population methods, and others such as cooperative coevolutionary models [3].
An excellent work where the authors show an extensive literature review of evolutionary algorithms that are used to evolve ANN is [2]. However, most of the reported researches are focused only on the evolution of the synaptic weights, parameters [4], or involve the evolution of the neuron's numbers for hidden layers, but the number of hidden layers is established previously by the designer. Moreover, the researches do not involve the evolution of transfer functions, which are an important element of an ANN that determines the output of each neuron.
For example, in [5], the authors proposed a method that combines Ant Colony Optimization (ACO) to find a particular architecture (the connections) for an ANN and Particle Swarm Optimization (PSO) to adjust the synaptic weights. Other researches like [6] implemented a modification of PSO mixed with Simulated Annealing (SA) to obtain a set of synaptic weights and ANN thresholds. In [7], the authors use Evolutionary Programming to get the architecture and the set of weights with the aim to solve classification and prediction problems. Another example is [8] where Genetic Programming is used to obtain graphs that represent different topologies. In [9], the Differential Evolution (DE) algorithm was applied to design an ANN to solve a weather forecasting problem. In [10], the authors use a PSO algorithm to adjust the synaptic weights to model the daily rainfall-runoff relationship in Malaysia. In [11], the authors compare the back-propagation method versus basic PSO to adjust only the synaptic weights of an ANN for solving classification problems. In [12], the set of weights are evolved using the Differential Evolution and basic PSO.
In other works like [13], the three principle elements of an ANN are evolved at the same time: architecture, transfer functions, and synaptic weights. The authors proposed a New Model of a PSO (NMPSO) algorithm, while, in [14], the authors solve the same problem by means of a Differential Evolution (DE) algorithm. Another example is [15], where the authors used an Artificial Bee Colony (ABC) algorithm to evolve the design of an ANN with two different fitness functions.
This research has significant contributions in comparison with these last three works. First of all, eight fitness functions are proposed to deal with three common problems that emerge during the design of the ANN: accuracy, overfitting, and reduction of the ANN. In that sense, to handle better the problems that emerge during the design of the ANN, the fitness functions take into account the classification error, mean square error, validation error, reduction of architectures, and a combination of them. Furthermore, this research explores the behavior of three bioinspired algorithms using different values for their parameters. During the experimentation phase, the best parameter's values for these algorithms are determined to obtain the best results. In addition, the best configuration is used to generated a set of statistically valid experiments for each selected classification problem. Moreover, the results obtained with the proposed methodology in terms of the connection's number, the neuron's number, and the transfer functions selected for each ANN are presented and discussed. Another contribution of this research is related to a new metric that allows comparing efficiently the results provided by an ANN generated with the proposed methodology. This metric takes into account the recognition rate obtained during training and testing stages where testing accuracy is more weighted in comparison to training accuracy. Finally, the results achieved by the three bioinspired algorithms are compared against those achieved with two classic learning algorithms. The selection of the three bioinspired algorithms was done because NMPSO is a relatively new algorithm (proposed in 2009) which is based on the metaphor of basic PSO technique so it is important to compare its performance with others inspired in the same phenomenon.
In general, it is possible to define the problem to be solved as giving a set of input patterns = {x 1 , . . . , x }, x ∈ R , and a set of desired patterns = {d 1 , . . . , d }, d ∈ R , and finding the ANN represented by ∈ R ×( +3) such that a function defined by min( ( , , )) is minimized and defined the maximum number of neurons. It is important to remark that the search space involves three different domains (architecture, synaptic weight, and transfer functions).
This research provides a complete study about how an ANN can be automatically designed by applying bioinspired algorithms, particularly using the Basic Particle Swarm Optimization (PSO), Second Generation PSO (SGPSO), and New Model of PSO (NMPSO). The proposed methodology evolves at the same time the architecture, the synaptic weights, and the kind of transfer functions in order to design the ANNs that provide the best accuracy for a particular problem. Moreover, a comparison of the Particle Swarm algorithm performance versus classic learning methods (back-propagation and Levenberg-Marquardt) is presented. In addition, in this research is presented a new way to select the maximum number of neurons (MNN). The accuracy of the proposed methodology is tested solving some real and synthetic pattern recognition problems. In this paper, we show the results obtained with ten classification problems of different complexities.
The basic concepts concerning the three PSO algorithms and ANN are presented in Sections 2 and 3, respectively. In Section 4 the methodology and the strategy used to design the ANN automatically are described. In Section 5 the eight fitness functions used in this research are described. In Section 6, the experimental results about tuning the parameters for PSO algorithms are described. Moreover, the experimental results are outlined in Section 7. Finally, in Sections 8 and 9 the general discussion and conclusions of this research are given.

Particle Swarm Optimization Algorithms
In this section, three different algorithms based on PSO metaphor are described. The first one is the original PSO algorithm. Then, two algorithms which improve the original PSO are shown: the Second Generation of PSO and a New Model of PSO.

Original Particle Swarm Optimization Algorithm. The
Particle Swarm Optimization (PSO) algorithm is a method for the optimization of continuous nonlinear functions proposed by Eberhart et al. [16]. This algorithm is inspired by observations of social and collective behavior on the movements of bird flocks in search of food or survival as well as fish schooling. A PSO algorithm is inspired on the movements of the best member of the population and at the same time also on their own experience. The metaphor indicates that a set of solutions is moving in a search space with the aim to achieve the best position or solution.
The population is considered as a cumulus of particles where each represents a position x ∈ R , = 1, . . . , in a multidimensional space. These particles are evaluated in a particular optimization function to recognize their fitness value and save the best solution. All the particles change their position in the search space according to a velocity function k which takes into account the best position of a particle in a population p ∈ R (i.e., social component) as well as their own best position p ∈ R (i.e., cognitive component). The particles will move in each iteration to a different position until they reach an optimum position. At each time , the particle velocity is updated using where is the inertia weight and typically set up to vary linearly from 1 to 0 during the course of an iteration run; 1 and 2 are acceleration coefficients; 1 and 2 are uniformly distributed random numbers between (0, 1). The velocity k is limited to the range [V max , V min ]. Updating velocity in this way enables the particle to search for its best individual position p ( ), and the best global particle position is computed as in 2.2. Second Generation of PSO Algorithm. The SGPSO algorithm [17] is an improvement of the original PSO algorithm that considers three aspects: the local optimum solution of each particle, the global best solution, and a new concept, the geometric center of optimum swarm. The authors explain that the birds keep a certain distance from the swarm center (food). On the other hand, no bird accurately calculates the position of the swarm center every time. Bird flocking always stays in the same area for a specified time, during which the swarm center will be kept fixed in every bird eyes. Afterward, the swarm moves to a new area. Then all birds must keep a certain distance in the new swarm center. This fact is the basis of the SGPSO.
The position of the geometric centre ∈ R of the optimum swarm is updated according to where is the number of particles in the swarm, CI is the current iteration number, and is the geometric centre updating time of optimum swarm with a value between [1,MAXITER].
In SGPSO the velocity is updated by (4) and the position of each particle by (5): where 1 , 2 , and 3 are constants called acceleration coefficients, 1 , 2 , and 3 are random numbers in the range [0, 1], and is the velocity inertia.

New Model of Particle Swarm
Optimization. This algorithm was proposed by Garro et al. [13] and is based on some ideas that other authors proposed to improve the basic PSO algorithm [4]. These ideas are described in next paragraphs. Shi and Eberhart [18] proposed a linearly varying inertia weight over the course of generations, which significantly improves the performance of Basic PSO. The following equation shows us how to compute the inertia: where 1 and 2 are the initial and final values of the inertia weight, respectively, iter is the current iteration number, and MAXITER is the maximum number of allowable iterations. The empirical studies in [18] indicated that the optimal solution could be improved by varying the value of from 0.9 at the beginning of the evolutionary process to 0.4 at the end of the evolutionary process. Yu et al. [4] developed a strategy that when the global best position is not improving with the increasing number of generations, each particle will be selected by a predefined probability from the population, and then a random perturbation is added to each velocity vector dimension k of the selected particle . The velocity resetting is computed as in where is a uniformly distributed random number in the range (0, 1) and V max is the maximum random perturbation magnitude to each selected particle dimension. Based on some evolutionary schemes of Genetic Algorithms (GA), several effective mutation and crossover operators have been proposed for PSO. Løvberg et al. [19] proposed (1) Given a population of x ∈ R , = 1, . . . , individuals.
(2) Initialize the population at random.
Apply mutation operator, (10). a crossover operator in terms of a certain crossover rate defined in where is a uniformly distributed random number in the range (0, 1), ch 1 is the offspring, and par 1 and par 2 are the two parents randomly selected from the population. The offspring velocity is calculated in the following equation as the sum of the two parents velocity vectors, normalized to the original length of each parent velocity vector: Higashi and Iba [20] proposed a Gaussian mutation operator to improve the performance of PSO in terms of a certain mutation rate defined in where ch is the offspring, par is the parent randomly selected from the population, iter is the current iteration number and MAXITER is the maximum number of allowable iterations, and is a Gaussian distribution. Utilization of these operators in PSO has the potential to achieve faster convergence and find better solutions.
In the NMPSO, the use of dynamic random neighborhoods that change in terms of certain rates is proposed. First of all, a maximum number of neighborhoods MAXNEIGH is defined in terms of population size divided by 4. With this condition at least each neighborhood , = 1, . . . , MAXNEIGH, will have 4 members. Then, the members of each neighborhood are randomly selected, and the best particle p is computed. Finally, the velocity of each particle is updated as in for all ∈ , = 1, . . . , MAXNEIGH. The NMPSO combines the varying schemes of inertia weight and acceleration coefficients 1 and 1 , velocity resetting, crossover and mutation operators, and dynamic random neighbourhoods [13]. The NMPSO algorithm is described in Algorithm 1.

Artificial Neural Networks
An ANN is a system that performs a mapping between input and output patterns that represent a problem [22]. The ANNs Computational Intelligence and Neuroscience 5 learn information during the training process after several iterations. When the learning process finishes, the ANN is ready to classify new information, predict new behaviours, or estimate nonlinear function problems. Its structure consists of a set of neurons (represented by functions) connected among others organized in layers. The patterns that codify the real problem codification a ∈ R are sent through layers and the information is transformed with the corresponding synaptic weights W ∈ R (values between 0 and 1). Then, neurons in the following layers perform a summation of this information depending on whether there exists a connection between them. In addition, in this summation another input called bias is considered where the value of its input is 1. This bias is a threshold that represents the minimum level that a neuron needs for activating and is represented by . The summation function is presented in After that, the result of the summation is evaluated in transfer functions ( ) activated by the neuron input. The result is the output neuron, and this information is sent to the other connected neurons until they reach the last layer. Finally, the output of the ANN is obtained.
The learning process consists of adapting the synaptic weights until they reach the desire behaviour. The output is evaluated to measure the performance of the ANN; if the output is not as desired, the synaptic weights have to be changed or adjusted in terms of the input patterns a ∈ R . There are two ways to verify if the ANN has learned: first, the ANN computes grades similarity between input patterns and information that it knew before (nonsupervised learning). Secondly, the ANN output with desire patterns y ∈ R is compared (supervised learning). In our case, supervised learning where the objective is to produce an output approximation with the desired patterns of a input-output samples set is applied (see the following equation): where a is the input pattern and d the desired response. Given the training sample T , the requirement is to design and compute the neural network free parameters so that the actual output y of the neural network due to a is close enough to d for all in a statistical sense [15]. We may use the mean square error (MSE) given by (14) as the first objective function to be minimized. There are algorithms that adjust the synaptic weights to obtain a minimum error such as the classic back-propagation (BP) algorithm [23,24]. This algorithm like others is based on the descendant gradient technique, which can stay trapped in a local minimum. Furthermore, a BP algorithm cannot solve noncontinuous problems. For this reason, the applications of other techniques that can solve noncontinuous and nonlinear problems are necessary to implement for obtaining a better performance of the ANN and solving really complex problems:

Proposed Methodology
The most important elements to design and improve the accuracy of an ANN are the architecture (or topology), the set of transfer functions (TF), and the set of synaptic weights and bias. These elements should be codified into the individual that represents the solution of our problem. The solutions generated by the bioinspired algorithms will be measured by the fitness function with the aim to select the best individual which represents the best ANN. The three bioinspired algorithms (basic PSO, SGPSO, and NMPSO) are going to lead the evolutionary learning process until finding the best ANN by using one of the eight fitness functions proposed in this paper. It is important to remark that only pattern classification problems will be solved by the proposed methodology.
The methodology is evaluated with three particle swarm algorithms and eight fitness functions. Therefore, this involves an extensive behavioral study for each algorithm. Another point to review is the maximum number of neurons (MNN) used by the methodology to generate the ANN which is directly related to the dimension of the individual. Due to the information needed to determine the size of the individuals for a specific problem only depending on the input and output patterns (because the supervised learning is applied), it was necessary to propose an equation that allow us to obtain the MNN to design the ANN. This equation is explained in the individual section.
In Figure 1, a diagram of the proposed methodology is shown. During the training stage, it is necessary to define the individual and the fitness functions to evaluate each individual. The size of the individual depends on the size of the input patterns as well as the desire patterns. The individual will be evolved during a certain time to obtain the best solution (with a minimum error). At the end of the learning process, it is expected that the ANN provides an acceptable accuracy during the training and testing stage.

Individual.
When solving an optimization problem, the problem has to be described as a feasible model. After the model is defined, the next step is focused on designing the individual that codifies the solution for the problem. Equation (15) shows an individual represented with a matrix that codifies the ANN design. This codification was previously described in [13][14][15]. As it is necessary to evolve the three ANN elements at the same time, a matrix W ∈ R ×( +3) is composed by three principal parts with the following information: first, the topology ( ), second the synaptic weights and bias (SW), and third the transfer functions (TF), where is the maximum number of neurons (MNN) defined  by = + + (( + )/2), is the input patterns vector dimension, and is the desired patterns vector dimension: The matrix that represents the individual codifies three different types of information (topology, synaptic weights, and transfer function). In that sense, it is necessary to determine the exploring range of each type of information in its corresponding search space. For the case of the topology, the range is set between [1, 2 MNN − 1] due to the integer number of this part being codified into a binary vector composed of MNN elements that indicates if there is a connection between neuron and neuron .
The synaptic weights and bias have a range between [−4, 4] and [−2, 2] and for the transfer functions the range is [1, nF], where nF is the total number of transfer functions.

Architecture and Synaptic Weights.
Once the individuals or possibles solutions are obtained, it is necessary to decode the matrix information W into an ANN for its evaluation. The first element to decode is the topology in terms of the synaptic weights and transfer functions that are stored in the matrix.
This research is limited to a kind of feed-forward ANN, for this reason some rules were proposed to guarantee that no recurrent connections will appear in the ANN (the unique restriction for the ANN). In future works, we will include recurrent connections and study the behavior of this type of ANNs.
The architectures generated by the proposed methodology will be composed of only three layers: input, hidden, and output. To generate valid architectures the following three rules must satisfied.
Let ILN be the set of neurons composing the input layer, HLN the set of neurons composing the hidden layer, and OLN the set of neurons composing the output layer. To decode the architecture taking into account these rules, the information in W with = 1, . . . , MNN and = 1 (which is in decimal base) is codified based on the binary square matrix Z. This matrix will represent a graph where each component indicates the links between neuron and neuron when = 1. For example, suppose that W has an integer number "57." It is necessary to transform it into a binary code "0111001." The binary code is interpreted as the connections of a th neuron to seven neurons (number of bits). In this case, only neurons two, three, four, and seven (from left to right) links to neuron are observed.
Then, the architecture is now evaluated with the corresponding synaptic weights of the component W with = 1, . . . , MNN and = 2, . . . , MNN + 1. Finally, each neuron computes its output with its corresponding transfer function shown in the same array. In the case of bias, it is encoded in the component W with = 1, . . . , MNN and = MNN + 2.

ANN Output.
Once decoded the information from the individual is necessary to know its efficiency to be evaluated with any of the fitness functions. To do this, it is necessary to calculate the output of the ANN designed during the training stage and generalization stage. This output is calculated using Algorithm 2, where is the output of the neuron , is the input pattern that feeds the ANN, is the dimensionality of the input pattern, is the dimensionality of the desired pattern, and is the output of the ANN.

Proposed Fitness Functions
Each individual must be selected based on their fitness, and the best solution is taken depending on the evaluation (performance) of each individual. In this work, we propose eight different fitness functions to design an ANN. It is important to remark that fitness functions only are used during the training stage to evaluate each solution. After designing the ANN, we use a new metric that allows us to compare efficiently the results provided by the ANN generated with the proposed methodology.

Mean Square Error.
The mean square error (MSE) represents the error between the ANN output and the desire patterns. In this case, the best individual is the one which generates the minimum MSE (see the following equation): where is the output of the ANN.

Classification
Error. The classification error (CER) is calculated as follows: the output of the ANN is transformed into binary codification by means of the winner-take-all technique. The binary chain must have only a number 1 and the rest is composed of 0s. This indicates that the position with 1 is the class to which the input pattern belongs. This binary chain is compared against the desire pattern, if they are equal the classification was done correctly.
In this case, the best ANN is the one which generates the minimum wrong classified patterns. The CER is represented by where npbc represents the number of patterns well classified and tpc is the total of patterns to classify.

Validation Error.
When the ANN is trained during a long period, the ANN could get a maximum learning in which the ANN becomes adept (overfitting). However, this has a disadvantage because if the input data during the testing stage are contaminated with a negligible amount of noise, the ANN will not be able to recognize new patterns.
For that reason, we need to include a validation phase to prevent overfitting and thus guarantee an adequate generalization. Therefore, we designed a fitness function that integrates the assessment of both the training and validation stages.
Based on this idea, two fitness functions were generated: the first evaluates the mean square error (MSE) on the training set MSE and the MSE on the validation set MSE ; see (18). The second function takes into account both the classification error (CER) on the training set CER and the classification error on the validation set CER ; see (19): In order to evaluate the fitness of each solution using (18) and (19), it is necessary to first computed the MSE or CER using the training set; after that, the MSE or CER using the validation set is computed. It is important to notice that the error achieved with the validation set is more weighted than the error obtained with the training set.  In that sense, we proposed the following equation for computing the factor that allows us to measure the size of the ANN in terms of the number of connections: where NC represents the number of connections when the proposed methodology is applied and NMaxC represents the maximum number of connection that an ANN can generate which is computed as in where MNN is the maximum number of neurons. It is important to mention that not necessarily less or more connections generate a better performance; however, by using factor RA, it is possible to weight other metrics that can measure the performance of the ANN and find the ANN with less connections with an acceptable performance.
In that sense, we proposed two new fitness functions in terms of the MSE function equation (22) and in terms of the CER function equation (23). These fitness functions tend to the global minimum when the factor RA and the performance are small; however, when one of these terms tends to increase, the fitness function tends to move away from the global minimum:

Architecture Reduction and Validation Error with MSE and CER Errors.
At last, two fitness functions RA MSE and RA CER were generated: the first reduces simultaneously the architecture, the validation error, and the MSE; see (24).
The second function reduces the architecture, the validation error, and the CER equation (25):

Tuning the Parameters for PSO Algorithms
Ten classification problems of different complexity were selected to evaluate the accuracy of the methodology: Iris plant, wine, breast cancer, diabetes, and liver disorder datasets which were taken from the UCI machine learning benchmark repository [25]. The object recognition problem was taken from [26], and the spiral, synthetic 1, and synthetic 2 datasets were developed in our laboratory. The pattern dispersions of these datasets are shown in Figure 2. Table 1 shows the description for each classification problem.
Each dataset was randomly divided into three sets for training, testing, and validating the ANN as follows: 33% of the total patterns for the training stage, 33% for validation stage, and 34% for testing stage.
After that, the best parameter values for each algorithm were found to obtain the best performance for each classification problem. Then, the best configuration for each algorithm was used to validate statistically the accuracy of the ANN.
To determine which parameters generate the best ANN in terms of its accuracy, it is necessary to analyze training and testing performance. Although the accuracy of the ANN should be measured in terms of the testing performance, it is also important to consider the performance that achieves the ANN during the training stage, in order to find the parameters that provoke the best results during training and testing stages. Instead of analyzing the training and testing performances separately, we proposed a new metric that let us consider the accuracy of the ANN during training and testing stages. This metric allows us to weight the testing performance to validate the accuracy of the proposal and, at the same time, to have the confidence that training stage was Computational Intelligence and Neuroscience done with an acceptable accuracy. This metric computed a weighted recognition rate (wrr) and it is described in wrr = 0.4 × (Tr r r) + 0.6 × (Te r r) , where Tr r r represents the recognition rate obtained during the training stage and Te r r represents the recognition rate obtained during the testing stage. From (26), we could observe that testing and training stages were weighted by a factor of 0.6 and 0.4, respectively. Using these factors, we can avoid that high wrr value may be obtained by a higher training recognition rate and a lower testing recognition rate.
The analysis to select the best values of each algorithm was performed taking into account the ten classification problems described above. The different parameters for each algorithm were varied in different ranges to evaluate the performance of the algorithms over different pattern recognition problem. In order to find the best configuration for the parameters of each algorithm, several experiments were done assigning different values to each parameter in the three bioinspired algorithms (original PSO, SGPSO, and NMPSO).
The parameters were divided into two types: the parameters that are shared or common to all algorithms, such as the number of generations, the number of individuals, the range of variables, and the fitness function. The specific parameters are those that are unique or specific to each algorithm, for example, for the basic PSO algorithm, inertia and the two coefficients of acceleration 1 and 2 are the parameters that change. In the case of SGPSO algorithm takes two parameters, the coefficient of acceleration 3 and the geometric center . Finally, the NMPSO algorithm has the crossover operator , the mutation operator , and which determine when each neighborhood should be updated.
For each parameter configuration and each problem 5 experiments with 2000 generations were performed. Once the ANNs were designed with the proposed methodology, the average weighted recognition rate wrr was obtained.
Next is described which values were taken for each parameter to obtain the best configuration for each bioinspired algorithm.
The common parameters for the three algorithms are represented as follows: for the population size, in the variable V = {50, 100} the first element corresponds to 50 individuals and the second corresponds to 100 individuals. In the case of the search space size = {2, 4} the first element indicates that the range is set to [−2, 2] and the second item indicates that the range is between [−4, 4]. The type of fitness function used with the bioinspired algorithm is represented

Experimental Results
Once we determined the best configuration for each algorithm, we performed an exhaustive testing of 30 runs for each pattern classification problem. The accuracy of the ANN generated by the methodology was measured in terms of the weighted recognition rate (26). The following subsections describe the results obtained for each database and each bioinspired algorithm. These experiments show the evolution of the fitness function during 5000 generations, the weighted recognition rate, and some examples of the architectures generated with the methodology. Figure 3 are shown some of the ANNs generated using the PSO algorithm that provide the best results for the recognition problem. Figure 4(a) showed the evolution of the fitness function CER where we can appreciate the tendency for each classification problem. These results were obtained with the best configuration of basic PSO.

Results for Basic PSO Algorithm. In
The evolution of the fitness function represents the average of the 30 experiments for each problem. It is observed that the value of the fitness function for the glass, spiral, liver disorders, diabetes, and synthetic 2 problems slightly decreases despite the number of generations. Smaller values for the fitness function were achieved with the Iris plant, breast cancer, and synthetic 1 problems. With the object recognition and wine problems, the value of the fitness function decreased when approaching the limit of generations. The average weighted recognition rate for each problem is presented in  a l  3  3  6 6  3 2  9  0  2  Synthetic 1  1  10  51  29  18  2  3  Synthetic 1  2  12  57  29  9  7  4 I r i s p l a n t 6 2 7 5 1 6 3 2 5 1 5 Breast cancer  2  40  51  61  32  8  6  Diabetes  4  30  57  76  26  0  7  Liver disorders  3  18  62  65 Figure 4(b). It can be observed that, for the glass problem, the ANN achieved the smallest average weighted recognition rate (52.67%), followed by the spiral (53.39%), liver disorders (68.74%), diabetes (76.90%), object recognition (80.22%), synthetic 2 (82.96%), and wine (86.49%). The highest average weighted recognition rates were achieved for the synthetic 1 (95.03%), the Iris (96.35%), and the breast cancer (96.99%). Table 2 presents the frequency at which the six different transfer functions were selected for the ANN during the training stage. Applying the PSO algorithm, we see that there is a small range of selected functions. For example, the sinusoidal function was selected more often for the spiral, synthetic 1, and synthetic 2 problems. The Gaussian transfer function was selected more often for Iris plant, breast cancer, diabetes, liver disorders, object recognition, wine, and glass problems. Table 3 shows the maximum, minimum, standard deviation, and average number of connections used by the ANN. As you can see, in average, the number of connections is low for the problems of spiral, synthetic 1, and synthetic 2. For the glass and wine, in average, 97.43 and 91.1 connections were used, respectively. Table 4 shows the maximum, minimum, standard deviation, and average the number of neurons used in the ANN  generated with the proposed method. In this table, we can see that the number of neurons in the ANN for the ten classification problems was no more than 13. Figure 5 are shown some of the best ANNs generated with the SGPSO algorithm. You can also observe an example of an ANN with a input neuron without any connection; see Figure 5(c). The lack of connection in the ANN indicates that the input feature was not necessary to solve the problem. In other words, a dimensionality reduction of the input pattern was also done by the proposed methodology. Figure 6(a) shows the evolution of the fitness function CER where we can see the tendency of the fitness function for each classification problem. These results were obtained with the best parameter configuration for the SGPSO algorithm. In general, the problems whose values are near to the optimal solution are the breast cancer, Iris plant, and synthetic 1, being in last place with high errors the liver disorders, glass, and spiral problems.

Results for SGPSO Algorithm. In
The average weighted recognition rate for each problem is presented in Figure 6(b). It was observed that for the glass problem the proposed methodology achieved the smallest weighted recognition rate (54.31%), followed by the spiral (55.60%), liver disorders (69.19%), diabetes (76.09%), object recognition (80.45%), synthetic 2 (81.39%), wine (82.47%), and synthetic 1 (93.61%). The second highest weighted recognition rate was achieved for the Iris plant (96.45%). The highest weighted recognition rate was achieved for the breast cancer problem (97.03%). Table 5 presents the number of times that transfer functions were selected using the SGPSO algorithm. The sinusoid function was the most selected by 9 of the 10 classification problems: spiral, synthetic 1 and synthetic 2, Iris plant, diabetes, liver disorders, object recognition, wine, and glass problems. For the breast cancer problem, sinusoid function was selected almost at the same rate as the Gaussian function.
Furthermore, Table 6 shows the maximum, minimum, standard deviation, and average number of connections used by the ANN designed with the proposed methodology. In this case, SGPSO generates more connections between neurons of the ANN for the ten classification problems than those generated with the basic PSO algorithm. Table 7 shows the maximum, minimum, standard deviation, and average number of neurons required for the ANN using SGPSO algorithm. Figure 7 shows some of the best ANNs generated with the NMPSO algorithm. The fitness function used with the NMPSO algorithm was CER function.

Results for NMPSO Algorithm.
Computational Intelligence and Neuroscience The evolution of the fitness function for the 10 classification problems is shown in Figure 8(a) where it is observed that the minimum values are reached with the synthetic 1, breast cancer, and Iris plant problems. For the case of wine problem the value of the fitness function improves while the generation's number increased. The worst case was observed for the glass problem.
The weighted recognition rate for each problem is shown in Figure 8(b). From this graph, we observed that the average weighted recognition rate for the glass problem was 54.06%, for the spiral problem 62.97% and for liver disorders it achieved 70.01%, the diabetes problem 76.89%, the object recognition problem 85.73%, and synthetic problem 2 86.30%. The best recognition rate was achieved with the wine problem (88.62%), Iris plant (96.60%), breast cancer (97.11%), and synthetic 1 (97.42%).
The number of times that the transfer functions were selected using NMPSO algorithm is described in Table 8. Using the sinusoidal function, the ANNs provide better results for the spiral, synthetic problem 1, synthetic problem 2, and the object recognition problem. For the the Iris plant, breast cancer, diabetes, liver disorders, wine, and glass problems the Gaussian function was the most selected.
In general, the transfer function most often selected using NMPSO algorithm was the Gaussian, second sinusoidal function, then the hyperbolic tangent, next the linear function, and the last places the sigmoid and hard limit functions. Table 9 shows the maximum, minimum, standard deviation, and average connections number.      In Table 10 are shown the maximum, minimum, standard deviation, and the average number of neurons used by the ANN generated with the NMPSO algorithm.

General Discussion
In general, Table 11 shows a summary of results taking into account the average weighted recognition rate obtained with the three bioinspired algorithms.
For the cases of the spiral, synthetic 1, Iris plant, breast cancer, liver disorders, object recognition, and wine problems the algorithm providing better results was the NMPSO algorithm. For the glass problem the best accuracy was achieved with SGPSO algorithm and for the case of diabetes the best performance was achieved using the basic PSO algorithm.
From Table 11, it is possible to see that the best algorithm, in terms of the weighted recognition rate, was NMPSO (81.57%), the second best algorithm was basic PSO (78.97%), and the last was SGPSO algorithm (78.65%) for the ten classification problems.
Moreover, these results were compared with results obtained from classic algorithms such as the gradient descent and Levenberg-Marquardt. Due to the classic techniques needing a specific architecture, it was proposed to design manually two kinds of ANN. The first consists of one hidden layer and the second consists of two hidden layers.
To determine the maximum number of neurons MNN used to generate the ANN we follow the same rule proposed in the methodology. For the ANN with two hidden layers, there was a pyramidal distribution using where the first hidden layer has the 60% of the total hidden layers and the second hidden layer has the 40% of the total hidden layers. Two stop criteria for the gradient descent and Levenberg-Marquardt algorithms were established: until the algorithm reach 5000 epochs or until reach an error of 0.000001. The classification problems were divided into three subsets: 40% of the overall patterns were used for training, 50% for generalization, and 10% for validation. The learning rate was set to 0.1.
In Table 12 is shown the average weighted recognition rate using the classic training algorithms: one based on gradient descent (backpropagation algorithm) and the other based on the Levenberg-Marquardt algorithm. From this set of experiments, we observed that the best algorithm was Levenberg-Marquardt with a single layer. This algorithm solved eight of ten problems with the best performance (spiral, synthetic 1, synthetic 2, Iris plant, breast cancer, diabetes, liver disorders, and object recognition). For the case of the wine problem, the best algorithm was the gradient descent algorithm composed of one single layer. The glass problem was solved better using Levenberg-Marquardt with two hidden layers.
Considering Tables 12 and 11, the best techniques to design ANN were the NMPSO algorithm followed by the Levenberg-Marquardt with one hidden layer. On the other hand, the basic PSO and SGPSO algorithms as well as the gradient descend and Levenberg-Marquardt with two layers did not provide a good performance.
Besides that Levenberg-Marquardt obtained better results than PSO and SGPSO algorithms, there are some important points to consider: first, the ANN designed with the proposed methodology includes the selection of the architecture,     synaptic weights, bias, and transfer functions. For the case of classic techniques, the architectures must be carefully and manually designed by an expert in order to obtain the best results; this process can be a time-consuming task for the expert. On the opposite side, the proposed methodology automatically designs the ANN in terms of the input and desire patterns that codified the problem to be solved.

Conclusions
In this paper, we proposed three connection rules for generating feed-forward ANN and guiding the connections between neurons. These rules allow connections among neurons from the input layer to the output layer. These rules also allow to generate lateral connections among neurons from the same layer.
We also observed that some ANNs designed by the proposed methodology do not have any connection from the input neurons. It means that the feature associated to this neuron was not relevant to compute the output of ANN. This is known as dimensionality reduction of the input pattern.
Eight transfer functions, which involve the combination of the MSE, CER validation error, and architecture reduction (of connections and neurons), were implemented to evaluate each individual. From these experiments, we observed that the fitness functions that generated the ANN with the best weighted recognition rate were those that used the classification error CER. The three bioinspired algorithms based on PSO were compared in terms of the average weighted recognition rate.
On the other hand, the NMPSO algorithm achieved the best performance followed by the basic PSO and SGPSO algorithm.
To validate statistically the accuracy of the proposed methodology, first of all, the parameters for the three bioinspired algorithms were selected. For the case of basic PSO the best fitness function selected was CER with a variable range between [−2, 2]. After tuning the parameters of each algorithm and choosing the best configuration, we observe that the parameters were different from those proposed in the literature; these values for the parameters were set to = 0.3, 1 = 1.0, and 2 = 1.5. For the SGPSO algorithm, the best fitness function selected was CER with a variable range between [−2, 2]. The values for the parameters were set to 3 = 0.5 and the geometric centre = 100. For the NMPSO algorithm, the best fitness function was CER with a variable range between [−4, 4]. The parameters for the best configuration were set to = 200, crossover rate = 0.1, and mutation rate = 0.1.
After tuning the parameters of the three algorithms, 30 runs were performed for each of the ten classification problems. In general, whereas the problems that achieved a weighted recognition rate of 100% were the synthetic problem 1, Iris plant, and object recognition problems, a lower performance was obtained with the glass and spiral problems.
The transfer functions that more often were selected for each algorithm were: the Gaussian function for the basic PSO algorithm, the sinusoidal function for SGPSO algorithm and the Gaussian function for NMPSO algorithm.
In general, the ANNs designed with the proposed methodology were very promising. The proposed methodology automatically designs the ANN based on determining the set connections, the number of neurons in hidden layers, the adjustment of the synaptic weights, the selection of bias, and transfer function for each neuron.