A Novel Multimean Particle Swarm Optimization Algorithm for Nonlinear Continuous Optimization : Application to Feed-Forward Neural Network Training

Multilayer feed-forward artificial neural networks are one of the most frequently used data mining methods for classification, recognition, and prediction problems. The classification accuracy of a multilayer feed-forward artificial neural networks is proportional to training. A well-trained multilayer feed-forward artificial neural networks can predict the class value of an unseen sample correctly if provided with the optimum weights. Determining the optimumweights is a nonlinear continuous optimization problem that can be solvedwithmetaheuristic algorithms. In this paper, we propose a novelmultimean particle swarmoptimization algorithm for multilayer feed-forward artificial neural networks training. The proposed multimean particle swarm optimization algorithm searches the solution space more efficiently with multiple swarms and finds better solutions than particle swarm optimization. To evaluate the performance of the proposed multimean particle swarm optimization algorithm, experiments are conducted on ten benchmark datasets from the UCI repository and the obtained results are compared to the results of particle swarm optimization and other previous research in the literature. The analysis of the results demonstrated that the proposed multimean particle swarm optimization algorithm performed well and it can be adopted as a novel algorithm for multilayer feedforward artificial neural networks training.


Introduction
Artificial neural networks (ANNs) are a vital component of artificial intelligence.Machine learning and cognitive sciences depend on ANNs to solve various complex nonlinear mapping relationships [1,2].ANNs can provide solutions to problems involving classification, prediction, optimization, and identification in various disciplines [3].In general, ANNs training is conducted using the backpropagation (BP) algorithm.The BP algorithm determines the weights of ANNs by computing explicit gradients of error such as sum square error (SSE) [4].However, ANNs trained with the gradient descent-based learning algorithm generally converge slowly and fall into local minima [5].To get rid of this problem, metaheuristic algorithms can be used for ANNs training.Determining the optimum weights of ANNs is a nonlinear optimization problem and metaheuristic algorithms can solve this problem.For instance, Slowik used an adaptive differential algorithm with multiple trial vectors for ANNs training and increased the efficiency of the data classification when compared to evolutionary algorithms and BP [6].Mohaghegi et al. trained a radial basis function neural network with BP and particle swarm optimization (PSO) algorithms for identification of a power system.They analyzed the experimental results of these two methods based on convergence speed and robustness.They found that PSO has several advantages in terms of both robustness and finding optimal weights of ANNs [7].Montana and Davis utilized an improved genetic algorithm (GA) for training feed-forward ANNs.The experimental results showed that the improved GA enhanced the performance of feed-forward ANNs compared to the BP algorithm.In addition, the improved GA added more domain-specific knowledge into ANNs [8].Malinak and Jaska implemented evolutionary, gradient, and combined techniques for tuning the weights of ANNs.They partitioned the ANNs into two parts: the output layer versus the other layers and then they trained these two parts with different techniques.The combination of evolution strategies and least mean square algorithm showed promise according to their experimental results [9].Carvalho and Ldermir applied PSO algorithm for the optimization of ANNs architectures and weights, aiming at better generalization performances through the creation of a compromise between low architectural complexity and low training errors.They used medical benchmarks to evaluate the performance of the proposed method and they compared the experimental results with the results of evolutionary programming and GA.Their proposed method showed better classification error percentage performance than the other algorithms [10].Apart from PSO, researchers have also applied some swarm intelligence algorithms, such as an ant colony optimization (ACO) algorithm for ANNs training.Krzysztof et al. proposed an ACO variant for continuous optimization and chose the training of feedforward ANNs for pattern recognition as a test case for the proposed algorithm.In addition, they hybridized the ACO algorithm with some classical gradient techniques such as the BP algorithm.The proposed algorithms were evaluated by applying them to classification problems from medical fields and comparing to the basic GA.The proposed algorithm showed better performance on medical datasets [11].Liang Hu et al. proposed GA-optimized ANNs (GANNs) to solve a real-life problem multipath ultrasonic gas flowmeter.They decreased the error rate of the multipath ultrasonic flowmeter while detecting the flow rate of the complicated flow field with the GANNs.The GANNs demonstrated better outcomes than ANNs and made the implementation of ANNs faster and easier [12].Researchers also used bat algorithm (BA) for training ANNs used to solve different real-life problems.The BA depends on the optimal solution in the velocity adjustment [13].Ahmad et al. increased the training accuracy of a feed-forward multilayer perceptron network (MLP) with cuckoo search (CS).They tested the proposed algorithm on four benchmark classification problems.Furthermore, they compared the obtained results with well-known metaheuristics, such as PSO and guaranteed convergence particle swarm optimization (GCPSO).According to the experimental results CS provided better performance than PSO and GCPSO in all benchmark problems [14].Bolaji et al. used the fireworks optimization algorithm (FWA) for ANN training and performed the experimental tests with UCI datasets.The experimental results were compared to the results obtained from the krill herd algorithm (KHA) [15], harmony search algorithm (HSA), and GA [16].According to the experimental results, the FWA algorithm showed better classification performance [17].Kider et al. used speed-constrained multiobjective particle swarm optimization (SMPSO) and NSGA-II optimization algorithms to determine the optimal input values (head type, feed rate, rotation speed, and chip depth) of force and surface roughness to minimize the output parameters that are applied to ANNs.The experimental results showed that SMPSO algorithm obtained better value than the NSGA-II algorithm [18].In this study, we propose a multimean particle swarm optimization (MMPSO) algorithm that makes novel use of PSO for solving continuous optimization problems.To assess the performance of the proposed MMPSO algorithm, it is applied on multilayer feed-forward artificial neural networks (MLFNNs) training and compared with PSO and other algorithms used in previous research.Analysis of the experimental results shows that the proposed MMPSO algorithm improved the classification accuracy of MLFNNs and showed a better performance than other algorithms.
The paper is organized as follows: in Section 2, MLFNNs are explained in detail.The used metaheuristic approaches PSO and the proposed MMPSO are clarified in Section 3. The application of metaheuristic algorithms to MLFNNs training is given in Section 4. In Section 5, the computational results are given and the paper is finalized with conclusions and future work.

Multilayer Feed-Forward Neural Networks
MLFNNs can be defined as a system by modeling the human brain functions.MLFNNs consist of artificial neural cells linked to each other in various forms and are usually organized in layers.They can be implemented as hardware in electronic circuits or as software in computers.In accordance with the brain information processing method, MLFNN has the ability to store and generalize information after a learning process [19].Some successful networks can be created with a single layer, although most applications require networks that contain at least three layers: an input layer, hidden layer, and output layer.A network consisting of a single layer can only predict linear functions.MLFNNs remove the conflicting limits of single-layer systems with hidden layers located between the input and output layers [19].Basically, all MLFNNs have a similar structure to that shown in Figure 1.In this structure, neurons in the input layer are used to get inputs, while neurons in the output layer are used to carry outputs and all neurons in the hidden layers are used to aid system training [20].
When we look at Figure Update the velocity and the position of the particle according to equations ( 4) and ( 5)

End for Until (Stopping criteria met)
Algorithm 1: The pseudocode of the PSO algorithm.
the nodes in the output layer. 1,1 ,  1,2 ,  1,3 , . . .,  1, are defined as weights and show the effect of the information received by a node.It is necessary to first calculate the outputs of the nodes in the hidden layers to calculate the output of the MLFNNs by the addition function (NET) and activation NET function (FNET) which are shown in ( 1) and ( 2), respectively [21].
where m is the number of nodes connected to node j,   is the  ℎ node, and   is the weight of the  ℎ node and the  ℎ node.The output value of the activation function is the output value of the node.This value can be either given to the outside world as the output of the MLFNNs or used in a different network again [22,23].The weights of the MLFNNs are updated by calculating the errors of the output obtained from the training process of the MLFNNs and the target output.These errors are known as SSE and are calculated according to (3).At the same time, the SSE can be used as an activation function of the metaheuristic algorithm.
where n represents the number of samples in a dataset,   is the output generated from the  ℎ input, and   is the target output of the  ℎ input.

Metaheuristic Algorithms
Metaheuristic algorithms are generally used in many areas for solving different problems such as optimization, scheduling, training of ANNs, fuzzy logic systems, and image processing [24].In this study, we used two metaheuristic algorithms, the original PSO and MMPSO, which is a novel use of PSO for determining the optimum weights of MLFNNs.
where t is the number of iterations,  is the inertia weight, i is the index of a particle in a swarm, and d is the dimension of the problem.Each particle has a position and velocity for each dimension:    is the position of the ith particle in dimension d, V   is the previous velocity of the ith particle in dimension d,   is the best fitness value of the ith particle, best is the best fitness value of all particles,  1 and  2 are two positive constants that represent the acceleration factor, and r 1 and r 2 are two random functions in the range [0, 1].In (5),    is the old position of the ith particle and  +1  is the new position of the ith particle in the swarm.The pseudocode of the PSO algorithm is given in Algorithm 1 [28].

Multimean Particle Swarm Optimization Algorithm.
Multiswarm optimization (MSO) is a technique that is used to predict the optimal solution to nonlinear continuous optimization problems.The effectiveness and the productivity of many metaheuristic algorithms worsen as the dimensionality of the problem increases [29].To overcome this problem, we proposed the MMPSO algorithm for obtaining the optimal solutions in a short time.The MMPSO algorithm is a multipopulation-based metaheuristic optimization algorithm developed from the PSO algorithm.In PSO, the velocity of each particle is updated with the parameters pbest and best according to (4) [26,27].The proposed MMPSO algorithm has multiple swarms.The velocity of each particle in each swarm is updated according to (6).In this equation, For each particle in the swarm Update the velocity and the position of the particle according to equations ( 6) and ( 5)

End for End for Until (Stopping criteria met)
Algorithm 2: The pseudocode of the proposed MMPSO algorithm.
there are two parameters that are different from (4), which are the mean pbest of all particles of that swarm (mpbest) and the best solution of all swarms (sbest).This modification brings two advantages to PSO.Firstly, using the mpbest reduces the particles from going out of search space and reinforces the local search of each particle.Secondly, each particle takes into account not only the gbest of its own swarm but also the sbest of all swarms, so that MMPSO algorithm gets closer to the optimum solution faster.
where t is the number of iterations,  is inertia weight, i is the index of a particle in a swarm, d is the dimension of the problem, each particle has a position and velocity for each dimension,    is the position of the ith particle in dimension d, V   is the previous velocity of the ith particle in dimension d,   is the mean value of the particles in a swarm, best is the best fitness value of a swarm, sbest is the best fitness value of all swarms, c 1 and c 2 are two positive constants representing acceleration factor, and r 1 and r 2 are two random numbers in the range [0, 1].The pseudocode of the proposed MMPSO algorithm is given in Algorithm 2.

Application of Metaheuristic Algorithms to MLFNNs Training
Determining the optimal weights of MLFNNs is a nonlinear optimization problem so metaheuristic algorithms can be used for MLFNNs training.An application of metaheuristic algorithms to MLFNNs training is explained step by step in the following text.
Step 1 (preprocessing of the dataset).The normalization process, which is a data preprocessing technique, is applied to the dataset to be classified.Thus, the dataset becomes more regular and suitable for MLFNNs.This normalization process is done using a min-max normalization function, which is shown in [30] Step 2 (organization of the dataset for classification).In this study, the datasets are organized in two different ways for two different experiments.In the first experiment, 5-fold cross validation is used for comparing the proposed MMPSO algorithm to the PSO algorithm.In the second experiment, 80% training and 20% testing are used for comparing the proposed MMPSO algorithm to previous research in the literature.
Step 3 (modeling the structure of the MLFNNs).The numbers of inputs and the number of outputs are determined according to the characteristics of the dataset.The number of inputs is equal to the number of attributes of the dataset.
Similarly, the number of outputs is equal to the number of classes of the dataset.The number of hidden layers is set to one for all problems and the number of nodes in the hidden layer is determined with GA, which is reported in a previous study by the authors [31].
Step 4 (determining the optimum weights of the MLFNNs with metaheuristic algorithms).A well-trained MLFNNs should have optimum weights and determining the optimum weights is a nonlinear optimization problem.Metaheuristic algorithms can be used to solve this problem owing to the structure of the metaheuristic algorithm.Generally, metaheuristic algorithms initialize with a random population.The fitness of each individual in the population is calculated according to the SSE of MLFNNs.The goal of the metaheuristic algorithms is to minimize the SSE.Therefore, metaheuristic algorithms search the problem space locally and globally and update the global best solution.The metaheuristic algorithms run until the stopping criteria, such as the number of iterations or the error rate, are met.
Step 5 (testing the MLFNNs).In order to determine the performance of the MLFNNs training with metaheuristic algorithms, the classification accuracy is calculated according to (8) for each test dataset.

Experimental Results
The application of the proposed MMPSO algorithm and the PSO algorithm to the MLFNNs is implemented using C# Microsoft Visual Studio Ultimate 2013.All experiments are carried out using a computer with an Intel Core i7 3840QM@2.00GHz processor with 8 GB of memory with Microsoft Windows 8 operating system.Ten different benchmark datasets from the UCI repository [32] are used to evaluate the performance of three metaheuristics, and the characteristics of these datasets are shown in Table 1.
In general, the structure of MLFNNs is represented by I-H-O where I is the number of nodes in the input layer, H is the number of nodes in the hidden layer, and O is the number of nodes in the output layer.The number of weights of the MLFNNs with bias is calculated using (9), which represents the dimension size of the optimization problem at the same time [31].
Furthermore, to determine the optimum structure of the MLFNNs is an optimization problem and the classification accuracy of the MLFNNs is directly affected by it (Ibrahim, Jihad, and Kamal, 2017).The determined structures of the MLFNNs according to the ten benchmark datasets which are shown in Table 2.
For the proposed MMPSO algorithm and the PSO algorithm, the acceleration constants  1 and  2 are set to 1.49 [33], the random numbers  1 and  2 are generated in the range [0, 1], and the number of particles of a swarm is set to 20.For the proposed MMPSO algorithm, the number of a swarm is set to three.For the initial population of the proposed MMPSO algorithm and the PSO algorithm, the weights of the MLFNNs are generated at random numbers in the range [−10, 10].All these experimental parameters are determined empirically.The sigmoid activation function is used in the hidden layer and the output layer of the MLFNNs for training and testing.The maximum number of iterations is used as the stopping criterion.The classification process is applied with 5-fold cross validation, which estimates the mean of the SSEs obtained on five different testing subsets.The results of the 5-fold cross validation experiment of the proposed MMPSO algorithm and the PSO algorithm are shown in Table 3.
When the results of the 5-fold cross validation experiment in Table 3 are investigated, the MLFNNs trained with the proposed MMPSO algorithm obtained better SSE, training AC, and testing AC results than the MLFNNs trained with the PSO algorithm for all datasets.As a result, these experimental Additional advantages of the proposed MMPSO algorithm are that it searches the global space more efficiently and convergences the optimum results more rapidly.To provide these advantages, the proposed MMPSO algorithm must minimize the fitness function SSE more rapidly than the PSO algorithm in the training process.The minimization of SSEs according to the iteration number in the training process is given in Figure 2 for the lymphography, ionosphere, glass and diabetes datasets.For the lymphography, ionosphere, and glass datasets, the proposed MMPSO algorithm better minimized the SSE from the initial iteration to the end iteration.For the diabetes dataset, the proposed MMPSO algorithm searches the global space more efficiently after the 90th iteration.Furthermore, the proposed MMPSO algorithm provides better SSE for the initial iteration in all datasets.
Furthermore, for analyzing the computational complexities of the proposed MMPSO and the PSO algorithms, the CPU running times of each algorithm were measured by Microsoft Process Explorer utility in seconds and are given in Table 4.
As shown in Table 4, the running time of the proposed MMPSO algorithm is shorter than that for PSO for all datasets.In addition, the proposed MMPSO algorithm is suitable for parallel implementation and the runtime of the MMPSO algorithm can be reduced to a much shorter time with parallel programming.Finally, the performance of the proposed MMPSO algorithm is compared with the performance reported in the literature for the HSA [16], KHA, GA [15], and the fireworks algorithm (FWA) [17] which split the data into 80% training and 20% testing, for six datasets.In order to make this comparison under the same conditions, six datasets are split into 80% training and 20% testing for this experiment.The proposed MMPSO algorithm is executed ten times and the best results are selected.The proposed MMPSO algorithm was executed with the same parameters as described in the previous experiment.The results of the 80% training and 20% testing experiment of MMPSO algorithm and the literature reports for six datasets are shown in Table 5.When the results of the 80% training and 20% testing experiment in Table 5 are analyzed, it is clear that the proposed MMPSO algorithm yields better results than the other four metaheuristics according to SSE, training CA and testing CA values for the iris, diabetes, and thyroid datasets.Although the proposed MMPSO algorithm obtained better SSE and training CA results than other metaheuristic algorithms, it could not obtain the best testing CA result for the ionosphere and breast cancer datasets.For the glass dataset, the proposed MMPSO algorithm obtained the best result only for the training CA.In summary, when looking at the results of the comparison in Table 5, the proposed MMPSO algorithm performed better classification results than other algorithms in general.

Conclusion and Future Work
In this paper, a novel MMPSO algorithm is proposed for MLFNNs training.The proposed MMPSO algorithm based on MSO technique has two advantages according to the PSO algorithm.Firstly, the proposed MMPSO algorithm strengthens the particles to carry out a local search in the search space range.Secondly, the proposed MMPSO algorithm has multiple swarms and takes into account both the best solution of each swarm and the best solution of all swarms and thus it gets closer to the optimum solution.To evaluate the performance of the proposed MMPSO algorithm experiments were conducted on ten benchmark datasets from the UCI repository.According to the experimental results, the proposed MMPSO algorithm yielded better performance than PSO for all datasets.Furthermore, the obtained experimental results were compared with the previous researches in the literature for six datasets.According to this comparison, the proposed MMPSO algorithm showed a competitive advantage over the reported algorithms.In conclusion, the proposed MMPSO algorithm showed good performance and can be adopted as a novel algorithm for MLFNNs training.
For future work, the proposed MMPSO algorithm will be used by intelligent systems to solve complex real-life optimization problems in various fields such as: design, identification, operational development, planning, and schedul ing.

Figure 2 :
Figure 2: The minimization of the SSE according to the iteration in the training.
1 ,  2 ,  3 , . . .,   represent the nodes in the hidden layer, and  1 ,  2 ,  3 , . . .,   represent Initialize all particles of the swarm with randomly generated position and velocity Repeat For each particle in the swarm Calculate the fitness function Update the local best position of the particle Update the global best position of the swarm End for For each particle in the swarm 1,  1 ,  2 ,  3 , . . .,   represent the nodes in the input layer, Initialize all particles of all swarms with randomly generated position and velocity

Table 1 :
The characteristics of the ten datasets used.

Table 2 :
The structure of the MLFNNs.

Table 3 :
The results of 5-fold cross validation experiment of the proposed MMPSO and PSO.

Table 4 :
CPU running times of the PSO and MMPSO algorithms in seconds.

Table 5 :
The results of the 80% training and 20% testing experiment for six datasets.