An Analysis of the Operation Factors of Three PSO-GA-ED Meta-Heuristic Search Methods for Solving a Single-Objective Optimization Problem

In this study, we evaluate several nongradient (evolutionary) search strategies for minimizing mathematical function expressions. We developed and tested the genetic algorithms, particle swarm optimization, and differential evolution in order to assess their general efficacy in optimization of mathematical equations. A comparison is then made between the results and the efficiency, which is determined by the number of iterations, the observed accuracy, and the overall run time. Additionally, the optimization employs 12 functions from Easom, Holder table, Michalewicz, Ackley, Rastrigin, Rosen, Rosen Brock, Shubert, Sphere, Schaffer, Himmelblau's, and Spring Force Vanderplaats. Furthermore, the crossover rate, mutation rate, and scaling factor are evaluated to determine the effectiveness of the following algorithms. According to the results of the comparison of optimization algorithms, the DE algorithm has the lowest time complexity of the others. Furthermore, GA demonstrated the greatest degree of temporal complexity. As a result, using the PSO method produces different results when repeating the same algorithm with low reliability in terms of locating the optimal location.


Introduction
A nongradient optimization method is a stochastic method, which means that, unlike gradient optimization, the results are heavily randomized. A scenario similar to Darwinian evolution is simulated in which the closest point to a maximum or a minimum value is selected as the optimal point in a function [1][2][3][4]. Unlike gradient methods, evolutionary optimization does not heavily rely on mathematics, and the initial starting point does not have nearly as much impact. Because of the random nature of evolutionary optimization, it is mostly less efficient than gradient-based optimization since it does not even guarantee an optimal solution [5,6]. However, the method is more aggressive and considers more solutions than gradient methods do, allowing it to find multiple local minima points, which give it some advantages. e way evolutionary optimization works is that first, one must generate a mathematical function to create a scenario with specific conditions and then various points will be randomly plotted throughout the function in ideal locations [7][8][9][10]. e results will be compared and then used to converge throughout the function. ese results then adapt and converge toward the optimized points chaotically through trial and error. e step size for updating unknowns is generally required when applying gradient-based optimization algorithms [11,12]. To achieve better generalization and convergence, learning rate scheduling schemes have been used in addition to the fixed learning rate. Staircases [13] and exponential decay [40] are simple, but popular schemes for reducing stochastic noises. AdaGrad [14], AdaDelta [15,16], RMSprop [17], and Adam [18] have also been developed for parameterwise adaptive learning rate scheduling. So while finding the optimal point is not guaranteed, it is at least possible to find these points' potential locations.
Since evolutionary optimization has a variety of starting points, it is not subject to the same weakness as gradient optimization. Gradient optimization accurately converges on the local minima. e function, however, does not know whether it has reached the global minima. As a result, less-optimal solutions are often reached than what is possible [18]. With evolutionary optimization, starting points are all across the function, which raises the probability of one starting near the global minima. ey all converge toward their local minima, and the results are then compared. Based on these, we can more easily approximate the global minima within the bounds of our function.
e best results, in general, can come from combining gradient and nongradient-based optimization to converge on the best solution, for this one would start with the broad function and implement evolutionary optimization [19,20].Despite the fact that it is not very analytical, it would often instinctively converge near the global minima, providing an indication of the general location. Afterward, a gradient-based algorithm may be used with the determined area as a starting point. Using a mathematical function, it will converge toward the global minima and provide an accurate result. It is possible to find the global minima for any function by combining the two algorithm types accurately (see Table 1).

Genetic Algorithm.
e genetic algorithm is a learning program that mimics natural evolution concepts such as reproduction crossover and mutation to produce what the program considers optimal offspring. It is the most general type of evolutionary optimization. It takes the general ideas behind it and puts them into action. It starts with various points spread randomly throughout the function, taking into account the various possible solutions within the problem's parameters. It allows the program to consider various possible solutions and focus on each of them to determine the best one. Once the algorithm has its values, it calculates each solution's fitness generated in the function. en a pair of solutions can be selected so long as they increase the chances of generating offspring; each parent can be used more than once per iteration to generate offspring. Once the points are selected, cross over them to create two new potential solutions. Otherwise, plot the new points over the parent points. Finally, you mutate the new points and generate the resulting points. e way that selection occurs is by comparing potential parents with potential partners in its local area. e values with a higher fitness value are more likely to produce offspring than those with lower fitness to better simulate evolution. Selection is often made by random chance, with the high fitness results being more likely to be picked. e probability of selection (pi) is represented by equation (1), with f i being the fitness value of individual i and N being the local population relative to a parent: e algorithm uses a crossover process to generate two new values to plot into the next iteration when selection is complete.
ese new values perturb old solutions as they try to steer away from the flaws. e general equation for the crossover stage is shown below for y k and x k , respectively, where α is the crossover blending factor and r k is the uniformly distributed random number in the interval [0, 1]. However, some highly successful members of the next iteration are allowed to remain the same as they were beforehand: (2,t) k , k � 1, . . . , n var. (2) To prevent the new iterations from becoming the same and promote more out-of-the-box solutions, a mutation factor is used to diversify the solutions and prevent the population from becoming stagnant. A mutation is a deviation from the crossover logic, which randomizes the solutions generated to hurl them closer or further from the end goal or toward another goal. e equations used to determine the mutation effect is shown below, with r being a 2 Computational Intelligence and Neuroscience uniformly distributed number in the interval [0, 1], x l k and x u k being the upper and lower bounds of x kT is the number of generations, T is the maximum number of generations, b is the strength of the mutation operator, and the function for y is given by Δ(t, y):

Particle Swarm Optimization.
In 1995, electrical engineer Russel Eberhart and social psychologist James Kennedy developed this alternative to the genetic algorithm. is nongradient algorithm considers the individuality and sociability of the population members. Specifically, the idea came from watching birds look for a nesting place. Not enough individuality led to too many birds trying to nest in the same place. However, not enough sociability led to many birds unable to find suitable nesting places. In general, the program uses social rules and individual deviations to find the ideal locations. It is calculated by accounting for the velocity vector of each particle as they travel. e vector considers the pack movement and individual instinct that goes into its movement and adds it to the initial inertia of the iteration. e basic equation for particle swarm vector optimization is shown below, with α being the inertia factor, β 1 being the individuality factor, β 2 being the sociability factor, r (i) 1 and r (i) 2 being uniformly distributed numbers in the interval [0, 1], X (i,t) being the individual's vector, P (i) being the best individual value and P (i) being the best value in the population. Within the vector equation, αv (i,t) represents the inertia, β 1 r (i) 1 (P (i) − X (i,t) ) represents the individuality, and β 2 r (i) 2 (P (g) − X (i,t) ) represents sociability: Other than this, it functions like the genetic algorithm; it begins with many solutions on the field. Each solution is evaluated for fitness. e result is compared to their previous swarm fitness, and the previous individual fitness and its position are updated accordingly. Its best individual fitness and position are then used to calculate the next iteration. All in all, particle swarm optimization edges out over the genetic algorithm, namely, because it does not need to sort fitness as the genetic algorithm does. It means that swarm optimization requires less-computational power. It tends to be cheaper to use than the genetic algorithm, especially with many values. To develop an algorithm based on horses' herding behaviors for highdimensional optimization e proposed algorithm proved to be highly efficient for solving serious dimensional global optimization problems, outperforming the standard algorithms used today in terms of accuracy and efficiency Meraihi et al. [27] 2021 Genetic algorithm optimization To develop an algorithm based on the foraging and swarming behaviors of grasshoppers e GOA algorithm gives superior results for most applications, having a high exploitation ability and convergence and excelling at preventing local minima stagnation.

Differential Evolution.
Differential evolution was developed around 1955 and was made to try simulating Darwinian evolution. It combines the parents' features to form a child. However, unlike previous methods, the new value may inherit features from multiple parents. It is the closest to gradient optimization that evolution optimization can get in this assignment. It is used for multidimensional real-valued functions without needing it to be differentiable, making it a robust algorithm.
Using two different parent equation values (P1 and P2), the method produces a series of children (C1,. . .,C n ). In these equations, α, β, and c are random parent features, m is the mutation factor between 0.5 and 1, and δ 1 and δ 2 are binomial crosses over coefficients. CR is the crossover, while R represents a random number with distribution [0, 1]: It is an algorithm that only acts when the product of the two-parent points produces a child with better fitness. When weighing its options on its results, it always selects the offspring with the excellent fitness. It abandons the rest, increasing the efficiency of the evolution. Furthermore, any improvements found by the function will be immediately included. As a result, the general solution is often more accurate than in either the genetic algorithm or particle swarm optimization.

Results and Discussion
In this report, we used three meta-heuristic algorithms of genetic algorithm, particle swarm optimization, and differential evaluation as two nongradient-based methods for optimization of some mathematical surfaces. In this report,    Table 2): In this report, we used GA to optimize Easom, Holder table, Michalewicz, Ackley functions shown in Figures 1-12. Moreover, Figures 5,7,9,and 11 illustrate the objective function values in each generation of genetic algorithm and plot of populations accumulation to find the optimum value.
f(x, y) � 1/2 + sin 2 (x 2 + y 2 ) + 0.5/(1 + 0.001(x 2 + y 2 )) 2 Himmelblau f(x, y) � f � (x 2 + y − 11) 2 + (x + y 2 − 7) 2 Spring force Vanderplaats f(x, y) � 4(      Based on the analysis results, the best value of crossover rate for optimization of Ackley function is 0.4-0.5, and mutation rate is 0.6-0.7 (Table 3). Moreover, based on Figure 6(c), it can be estimated that with the increase of the population to 10,000, there is no significant increase/decrease in the number of generations in    erefore, we repeat the optimization 1000 times with a specific swarm size. It can be seen that 1% of evaluations cannot find the optimum value of Rastrigin function (seen Figure 18(d)).
However, for the Rosen function, 100% of runs are accurate. One of the complicated formulas in optimization is the Rosenbrock function, based on the results, many runs are not accurate results regarding Figure 19(d). Moreover, there is no relationship between swarm size and optimization accuracy, because sometimes PSO cannot find the optimum value. ese results are also repeated in the Shubert function in Figure 20 based on the results, PSO does not have higher robustness for finding the optimum value of these function types because it can no longer be reliable results at least these equations.
For analysis of DE algorithms, four Sphere, Schaffer, Himmelblau's, and Spring Force Vanderplaats are used. Figures 21-24 depict the 3D surface of the following equations, and Figures 25-28 illustrate the DE evaluation results. We tested the crossover rate and scaling factor in the accuracy of the DE method. Based on the results for optimization Sphere, the best scaling factor is 0.3. ere is no relationship between error and crossover rate for crossover rate. Overly, one of the properties of DE is using a lower number of initial populations with lower time complexity to find the optimum value of the functions. However, it is sensitive in choosing the crossover rate. Based on   Figure 27, the optimum crossover value is 0.3, and the scaling factor is 0.45. Moreover, in DE, there is no relationship between the crossover and scaling factor rate on error for the spring force Vanderplaats function (see Figure 29).
Based on the comparison results between the optimization methods, the DE algorithm has the lowest time complexity among other methods. Moreover, GA illustrated the highest time complexity. However, the PSO algorithm has lower reliability to find the optimum point.

Conclusion and Future Works
e objective of this report is to evaluate nongradient-based methods for optimizing some mathematical surfaces by applying three meta-heuristic algorithms, including genetic algorithms, particle swarm algorithms, and differential evaluation algorithms. In this report, 12 functions of Easom, Holder table, Michalewicz, Ackley, Rastrigin, Rosen, Rosen Brock, Shubert, Sphere, Schaffer, Himmelblau's, and Spring Force Vanderplaats are used for optimization. We utilized GA to optimize Easom, Holder tables, Michalewicz, and Ackley functions in this report. e number of generations versus the population, error value as the population increases. According to the results of the analysis, the best crossover rate for optimization of the Ackley function is 0.4-0.5, and the best mutation rate is 0.6-0.7. For GA, it is estimated that with the increase in population to 10,000, there is no significant increase or decrease in the number of generations in Ackley function. Consequently, GA is able to find the optimal value with a minimum population value. Using the GA method, the global minimum of the Ackley function can be determined with a low degree of complexity. Additionally, the minimum population value for the best degree of complexity is 1000. ere are no significant effects of changing crossover, mutation rate, and error value for Easom and Holder table functions. Michaelewicz function shows that generation decreases with an increase in the number of populations. Due to the simplicity of GA in optimizing these functions, there is no optimal crossover mutation rate for this function.
In order to test the PSO, the effects of swarm size are compared for Rastrigin, Rosen, Rosenbrock, and Shubert functions. e optimization is repeated 1000 times with the same swarm sizes. It can be seen that 1% of evaluations are not able to determine the optimum value for the Rastrigin function. In contrast, 100% of evaluations are able to determine the Rosenbrock function. e Rosenbrock function is one of the most complex formulas in optimization. According to the results, there is no relationship between swarm size and optimization accuracy. ese results indicate that PSO does not have higher robustness for finding optimum values of these function types since it is no longer able to produce reliable results, at least for these equations. An analysis of DE algorithms uses four Spheres, Schaffers, Himmelblaus, and Spring Force Vanderplaats. To test the accuracy of the DE method, we tested the crossover rate and scaling factor. According to the results for optimization Sphere, the best scaling factor is 0.30. In terms of the crossover rate, there is no relationship between error and crossover rate. In general, one of the characteristics of DE is that it uses fewer initial populations with a shorter time complexity to find the optimal values. It is sensitive to the crossover rate, however. Furthermore, there is no relationship between the crossover and the scaling factor rate on error for the spring force Vanderplaats function in DE. Comparing the results of the optimization methods, it appears that the DE algorithm has the lowest time complexity. e GA algorithm has the highest time complexity. In contrast, the PSO algorithm is less reliable for finding the optimum point. e use of meta-heuristics has enabled engineers to solve several engineering problems that could not be solved with standard optimization approaches. Examples include the simplicity with which they can be combined in finite element software in any domain, where the combination/permutation of solutions available to each method enables the discovery of optimum projects without the need of explicit functions. Literature contains numerous examples of this phenomenon. Developing a meta-heuristic that can accomplish this with fewer populations and iterations (lower processing costs) and more accuracy is the point of contention in the literature between new algorithms attempting this goal. If the algorithm is evolutionary in nature, swarms, behaviors, and physical occurrences are all features that contribute to the primary purpose outlined above. I believe that the universal law of time will reveal those algorithms that are truly superior and distinguishable from the others. Additionally, as a reviewer, you may request tests such as Wilcoxon to determine whether the way each meta-heuristic operates has changed. Nomenclature p i : Probability of selection f: Objective function N: Number of populations α: Crossover blending factor r k : Random number t: Generation β 1 : Individuality factor β 2 : Sociability factor α: PSO inertia factor δ1 and δ2: Random parent features.

Data Availability
Data are available and can be provided over the e-mails querying directly to the author at the corresponding author (amin.valizadeh@mail.um.ac.ir).