Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO), inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature.This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and datamining technique. Hence, the PGWOcould be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

In this paper, we concentrate on GWO, developed by Mirjalili et al. [7] in 2014 based on simulating hunting behavior and social leadership of grey wolves in nature.Numerical comparisons showed that the superior performance of GWO is competitive to that of other population-based algorithms.
Because it is simple and easy to implement and has fewer control parameters, GWO has caused much attention and has been used to solve a number of practical optimization problems [16][17][18].
However, like other stochastic optimization algorithms, such as PSO and GA, as the growth of the search space dimension, GWO algorithm provides a poor convergence behavior at exploitation [19,20].Therefore, it is necessary to emphasize that our work falls in increasing the local search ability of GWO algorithm.According to [21], we can know that many direct search methods are fairly fast optimizers and have a strong ability of local search.Powell's method [22] is one of the direct search methods.In order to use good convergence property of Powell's method, we propose a hybrid metaheuristic that is grey wolf optimizer based on Powell local optimization method (PGWO).Compared to other state-of-the-art evolutionary algorithms, PGWO performs significantly better.
Cluster analysis or clustering is the task of grouping similar objects or multidimensional data vectors into a number of clusters or groups.Clustering analysis is an unsupervised learning process, without a priori knowledge of clustering, and the clustering algorithm is automatic classification based on the distance or similarity between samples.As a result of this characteristic, clustering techniques have been applied to a wide range of problems, such as data mining [23], data analysis, pattern recognition [24], and image segmentation [25].The traditional clustering methods can be simply classified as partitioning methods, hierarchical methods, density-based methods, and grid-based methods [26].In this paper, we concentrate on partitioning methods.Partitioning clustering method divides data vectors into a predefined number of clusters through optimizing some certain criterion.-means is the most popular partitioning clustering method [27,28].However, -means highly depends on initial states and always falls into local optimum.In order to overcome this problem, many methods have been proposed.For example, -harmonic mean algorithm has been proposed for clustering instead of -means in [29].A simulated annealing-(SA-) based approach has been developed in [30].A tabu search-(TS-) based method was introduced in [31,32].Genetic algorithm-(GA-) based methods were presented in [33][34][35][36].Fathian et al. [37] have developed a clustering algorithm based on honey-bee mating optimization (HBMO).The particle swarm optimization (PSO) is applied for clustering in [38].Hatamlou et al. employed a big bang-big crunch algorithm for data clustering in [39].Karaboga and Ozturk presented a novel clustering approach based on artificial bee colony (ABC) algorithm in [40].Data clustering based on gravitational search algorithm was presented in [41,42].In 1991, Colorni et al. have presented ant colony optimization (ACO) algorithm based on the behavior of ants seeking a path between their colony and a source of food.Then, Shelokar and Kao solved the clustering problem using the ACO algorithm [43,44].Kao et al. have presented a hybrid approach according to combination of the -means algorithm, Nelder-Mead simplex search, and PSO for clustering analysis [45].Niknam et al. have presented a hybrid evolutionary algorithm based on PSO and SA to solve the clustering problem [46].But every algorithm has some drawbacks; for example, means algorithm sucks in local optima, and convergence is highly dependent on initial positions in case of genetic algorithm; in ACO, the solution vector has been affected as the number of iterations increased; Kao et al. [45] and Niknam et al. [46] stated that PSO gives better clustering results, when it is applied in one-dimensional data set and for small data set, but when it is applied to large data set, it does not give the good results, and so forth.In this paper, a PGWO algorithm is used to solve the cluster problem, tested on nine data sets.This algorithm takes the advantage of GWO and Powell method.Initial process is started by GWO, which allows searching all the space for a global solution.When global solution is found, the clustering is switched to Powell for faster convergence to finish the clustering process.As can be seen from the simulation results, this proposed algorithm not only has higher convergence speed but also can find out the optimal solution compared to the other algorithms across the majority of data sets whether in small data set or in large data set.
The rest of the paper is organized as follows: Section 2 presents a brief introduction to GWO.Section 3 discusses the basic principles of GWO based on Powell local optimization method.The experimental results of test functions and data clustering are shown in Sections 4 and 5, respectively.Finally, Section 6 concludes the work and suggests some directions for future studies.

Grey Wolf Optimizer (GWO)
The GWO algorithm, proposed by Mirjalili et al. (2014) [7], is inspired by the hunting behavior and social leadership of grey wolves in nature.It is similar to other metaheuristics, and in GWO algorithm the search begins by a population of randomly generated wolves (candidate solutions).In order to formulate the social hierarchy of wolves when designing GWO, in this algorithm the population is split into four groups: alpha (), beta (), delta (), and omega ().Over the course of iterations, the first three best solutions are called , , and , respectively.The rest of the candidate solutions are named as .In this algorithm, the hunting (optimization) is guided by , , and .The  wolves are required to encircle , , and  so as to find better solutions.The encircle process could be molded as follows [7]: where is the position vector of the prey, ⃗  indicates the position vector of a grey wolf, ⃗  is gradually decreased from 2 to 0, and  1 and  2 are random numbers over range [0, 1].
In order to mathematically simulate the hunting behavior of grey wolves, in the GWO algorithm we always assume that , , and  have better knowledge about the position of the prey (optimum).Therefore, the positions of the first three best solutions (, , ) obtained so far are saved and other wolves () are obliged to reposition with respect to , , and .The mathematical model of readjusting the positions of  wolves is presented as follows [7]: where ⃗   is the position of the alpha, ⃗   is the position of the In these formulas, it may also be observed that there are two vectors ⃗  and ⃗  obliging the GWO algorithm to explore and exploit the search space.With decreasing , half of the iterations are devoted to exploration (| ≥ 1|) and the other half are dedicated to exploitation (|| < 1).The range of  is 2 ≤  ≤ 0, and the vector  also improves exploration when  > 1 and the exploitation is emphasized when  < 1.Note here that  is decreased linearly over the course of the iterations.In contrast,  is generated randomly whose aim is to emphasize exploration/exploitation at any stage avoiding local optima.The main steps of grey wolf optimizer are given in Algorithm 1.

Grey Wolf Optimizer Based on Powell Local
Optimization Method (PGWO)  [22] for finding a local minimum of a function.The function need not be differentiable, and no derivatives are taken.The method successively utilized a bidirectional search approach along each search vector to pursue the minimum of a function.The new position is represented as a linear combination of the search vectors.The new displacement vector as a new search vector is added to the search vector list.Correspondingly, the most successful vector which contributed most to the new direction is deleted from the search vector list.The algorithm is iterated by some run times until no significant improvement is made.The detailed steps of Powell's method procedure are presented in Algorithm 2 [47].

The Proposed Algorithm.
Over the last years, many research results have been published in the field of evolutionary algorithms [7].And the results show that the hybridizations metaheuristics and local search techniques are exceptionally successful.Successful hybridizations have been proposed, especially in combinatorial and discrete solution spaces [48,49].Interestingly, for real-valued solution spaces, few results have been introduced yet-an astonishing fact as many direct search methods are fairly fast optimizers [21].Therefore, in order to overcome the shortcomings of GWO algorithm including slow convergence speed, easily falling into local optimum value, low computation accuracy, and low success rate of convergence [19,20] (  , 10, 100, 4) of Powell's method and is set to 0.5.We also have tried to vary the number of grey wolves (population size ) and the performing probability of Powell's method.From our simulations, we found that  = 30 and  = 0.5 are sufficient for most optimization problems.So we will use fixed  = 30 and  = 0.5 in the rest of the simulations.
In the following section, various benchmark functions are employed to investigate the efficiency of PGWO algorithm (see Algorithm 3).benchmark functions.There are some parameters that should be initialized before running.Table 2 is the initial value of these algorithms.

Experimental Results and Discussion
The experimental results are presented in Table 3.The results are averaged over 20 independent runs, and bold results mean that PGWO is better, while underlined results mean that the other algorithm is better.The Best, Worst, Mean, and Std.represent the optimal fitness value, worst fitness value, mean fitness value, and standard deviation, respectively.Note that the Matlab code of the GGSA algorithm is given in http://www.alimirjalili.com/Projects.html.
To improve the performance evaluation of evolutionary algorithms, statistical tests should be conducted [52].In order to determine whether the results of PGWO differ from the best results of the other algorithms in a statistical method, a nonparametric test which is known as Wilcoxon's rank-sum test [53,54] is performed at 5% significance level.The  values calculated in Wilcoxon's rank-sum test comparing PGWO and the other algorithms over all the benchmark functions are given in Table 4.According to [52],  values < 0.05 can be considered as sufficient evidence against the null hypothesis.3, PGWO has the best result for  1 .For  2 , PGWO provided better results than the other algorithms across Best, Worst, and Mean, while the Std. of GWO is better than PGWO.The unimodal benchmark functions have only one global solution without any local optima so they are very suitable to examine exploitation.Hence, these results indicate that the proposed method provides greatly improved exploitation compared to the other algorithms.

Comparison of Experiment Results. As shown in Table
However, it should be noticed that multimodal benchmark functions have many local minima.The final results are more important because these functions can reflect the ability of the algorithm escaping from poor local optima and obtaining the global optimum.We have tested the experiments on  3 - 5 .Seen from Table 3, the PGWO provides the best results.
For  6 - 7 , these are fixed-dimension multimodal benchmark functions with only a few local minima; the dimensions of the functions are also small.In this case, it is hard to judge the property of the algorithms.The major difference compared with functions  3 - 5 is that functions  6 - 7 appear to be simpler than  3 - 5 due to their low dimension and a smaller number of local minima.For  6 , CS provided better results than the other algorithms across Mean, Std., and Worst.For  7 , the majority of the algorithms can find the optimal solution, but the PSO is more stable than the other algorithms in terms of Std.
The  values of Wilcoxon's rank-sum in Table 4 show that the results of PGWO in  5 are not significantly better than PSO.However, the PGWO achieves significant improvement in all remaining benchmark functions compared to the other algorithms.Therefore, this is evidence that the proposed algorithm has high performance in dealing with unimodal, multimodal, and fixed-dimension multimodal benchmark functions.
The convergence curves of six algorithms are illustrated in Figure 1.As can be seen in these figures, PGWO has a much better convergence rate than the other algorithms on all benchmark functions.
According to this comprehensive comparative study and discussions, these results show that the proposed algorithm is able to significantly improve the performance of GWO and overcome its major shortcomings.For that reason, in the  next section, we apply PGWO algorithm to solve clustering problem.

The Clustering Criterion.
Clustering is the process of grouping a set of data objects into a number of clusters or groups.The aim of clustering is to make the data within a cluster have a high similarity while being very dissimilar to objects in other clusters.Dissimilarities and similarities are evaluated based on the attribute of data sets, containing distance metric.The most popular distance metric is Euclidean distance [55].Let  = ( 1 ,  2 , . . .,   ) and  = ( 1 ,  2 , . . .,   ) be two objects described by  numeric attributes; the Euclidean distance between object  and  is presented as follows: For given  objects, the clustering problem is to minimize the sum squared Euclidean distance between objects and allocate each object to one of  cluster centers [55].The clustering aims at finding clusters center through minimizing the objective function.The objective function is defined as follows [26]: where  indicates the number of clusters,   indicates the th cluster,   indicates the kth center of the clustering, and (  ,   ) indicates the distance of the sample to the corresponding center of clustering; namely, (  ,   ) = ‖  −   ‖.

PGWO Algorithm on Clustering.
In clustering analysis, each element in the data set is a  dimensional vector.Moreover, the actual position of a grey wolf represents the  cluster centers, so each grey wolf indicates a  *  dimensional vector.For each grey wolf , its position is denoted as a vector   = ( ,1 ,  ,2 , . . .,  , *  ).In the initialization phase, we utilize maximum and minimum values of each component of the data set (which is to be grouped) as PGWO algorithm the initialization search scope of the grey wolves, and the initialization solution is randomly generated in this range.We use (6) to calculate the fitness function of grey wolves' individuals and the main steps of the fitness function are shown in Algorithm 4.

Data Clustering Experimental Results and Discussion.
In order to verify performance of the proposed PGWO approach for clustering, we compare the results of the means, GGSA, CS, ABC, PSO, GWO, and PGWO clustering algorithms using nine different data sets that are selected from the UCI machine learning repository [56].Artificial data set ( = 600,  = 2, and  = 4) is a two-featured problem with four unique classes.A total of 600 patterns were drawn from four independent bivariate normal distributions, where classes were distributed according to where  = 1, 2, 3, 4  1 = −3,  2 = 0,  3 = 3, and  = 6. and Σ are mean vector and covariance matrix, respectively [45,57].Iris data ( = 150,  = 4, and  = 3) is a data set with 150 random samples of flowers from the Iris species setosa, versicolor, and virginica collected by Anderson [58].From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in cm.This data set was used by Fisher [59] in his initiation of the lineardiscriminant-function technique [28,56,57].
Contraceptive method choice ( = 1473,  = 10,  = 3), CMC for short, is a data set that is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey.The samples are married women who either were not pregnant or did not know if they were at the time of interview.The problem is to predict the current contraceptive method choice (no use, long-term methods, or short-term methods) of a woman based on her demographic and socioeconomic characteristics [28,56,57].
Seeds data ( = 210,  = 7, and  = 3) is a data set that consists of 210 patterns belonging to three different varieties of wheat: Kama, Rosa, and Canadian.From each species, there are 70 observations for area , perimeter , compactness  ( = 4 *  * / ∧ 2), length of kernel, width of kernel, asymmetry coefficient, and length of kernel groove [56].
Statlog (Heart) data ( = 270,  = 13,  = 2) is a data set that is a heart disease database similar to a database already present in the repository (heart disease databases) but in a slightly different form [56].
Wine data ( = 178,  = 13,  = 3) is also taken from MCI laboratory.These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.The analysis determined the quantities of 13 constituents found in each of the three types of wines.There are 178 instances with 13 numeric attributes in wine data set.All attributes are continuous.There is no missing attribute value [28,56,57].
Balance scale data ( = 625,  = 4, and  = 3) is a data set that was generated to model psychological experimental results.Each example is classified as having the balance scale tip to the right, or left being balanced.The attributes are the left weight, the left distance, the right weight, and the right distance.The correct way to find the class is the greater of (left-distance * left-weight) and (right-distance * rightweight).If they are equal, it is balanced [56].
Haberman's Survival ( = 306,  = 3, and  = 2) is a data set that contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.It records two survival status patients with the age of patient at time of operation, patient's year of operation, and number of positive axillary nodes detected [56].
The results of comparison of intracluster distances for the seven clustering algorithms over 20 independent runs are shown in Table 5.Table 6 reports the  values produced by Wilcoxon's rank-sum test comparing PGWO and the other algorithms over all the data sets.
For Art data set, the optimum value, the worst value, and the average value of PGWO and GGSA are all 5.1390 + 02, while the standard deviation of GGSA is better than PGWO.For PSO, it only gets the optimum solution 5.1390 + 02.
For Iris data set, Cancer data set, and Seeds data set, PGWO, PSO, and GGSA provide the optimum value in comparison to those obtained by other methods.However, the worst value, the average value, and the standard deviation value of PGWO are superior to those of the other methods.
For Heart data set, PGWO, PSO, and CS all find the optimum solution 1.0623 + 04.That means they all can find the global solution.However, PGWO is slightly better for the worst value, the average value, and the standard deviation.
For CMC data set and Wine data set, Table 5 shows that the average, best, worst, and standard deviation values of the fitness function for PGWO algorithm are much smaller than those of the other six methods.The PGWO clustering algorithm is capable of providing the same partition of the data points in all runs.
For balance scale data set, the optimum values of the fitness function for PGWO, PSO, and GGSA are 1.4238+03.That means they all can find the global solution.And the optimum values of the fitness function for GWO and means are 1.4239 + 03; the result is close to PGWO, PSO, and GGSA.However, the standard deviation values of GWO, -means, PGWO, PSO, GGSA, and CS are 6.8756 − 01, 3.6565 + 00, 8.6619 − 01, 9.7396 − 01, 1.6015 + 00, and 7.3239 − 01, respectively.From the standard deviation, we can see that the GWO is more stable than the other methods.
For Haberman's Survival data set, the optimum value, the worst value, and the average value of the fitness function for CS and PSO are almost the same.The results of CS and PSO algorithms are better than those of the other methods.
The  values in Table 6 show that the results of PGWO are significantly better in Art data set, Cancer data set,   6, we conclude that for Iris data set PGWO is performing superior to the other algorithms except for GGSA.Since the  value for balance scale data set of PGWO versus GWO is more than 0.05, there is no statistical difference between them both.For Haberman Survival data set, while comparing PGWO and the other algorithms, we can conclude that PGWO is significantly performing better in three out of six groups compared to algorithms.So, it can be claimed that PGWO provides better results than the other algorithms across the majority of the data sets.
Figure 2 shows convergence curves of clustering on data sets over 20 independent runs.As can be seen from the figure, the convergence speed of PGWO is the fastest.
Figure 3 indicates ANOVA tests of clustering on data sets over 20 independent runs.Seen from Figure 3, PGWO is very stable for the majority of the data sets.For Art, Seeds, and CMC data sets, PGWO, PSO, and GGSA can obtain the relatively stable optimal values.For Heart and Wine data sets, the stability of PGWO, PSO, and CS is outstanding.For Cancer data set, most of the algorithms can obtain the stable optimal value except for ABC and PSO algorithms.For Iris data set, we can clearly see that PGWO is better in terms of the stability.For balance scale data set, GWO obtains the relatively stable optimal value.For Haberman's Survival data set, the stability of CS and PSO is the best, but PGWO and GGSA follow them closely.
Clustering results of Art, Iris, and Survival data sets by PGWO algorithm are presented in Figure 4 which can make it visualized clearly.It can be seen from Figure 4 that the PGWO algorithm possesses superior effect on Art, Iris, and Survival data sets.
In summary, the results show that the proposed method successfully outperforms the other algorithms across the majority of benchmark functions.Furthermore, the test results of clustering problems show that PGWO is able to provide very competitive results, including the property of this algorithm.Therefore, it appears from this comparative study that the proposed method has merit in the field of evolutionary algorithm and optimization.

Conclusion and Future Works
In order to apply the grey wolf optimizer to solve complex optimization problems efficiently, this paper proposed a novel grey wolf optimizer based on Powell local optimization method, namely, PGWO.In PGWO, at first, original GWO algorithm is applied to shrink the search region to a more promising area.Thereafter, Powell's method is implemented as a critical complement to perform the local search to exploit the limited area intensively to get better solutions.The PGWO makes an attempt at taking merits of the GWO and Powell's method in order to avoid all grey wolves getting trapped in inferior local optimal regions.The PGWO enables the grey wolves to have more diverse exemplars to learn from as the grey wolves are updated each generation and also form new grey wolves to search in a larger search space.With both techniques combined, PGWO can balance exploration and exploitation and effectively solve complex problems.The experimental results show the effectiveness of Powell's method in terms of solution quality and convergence speed.The proposed algorithm is benchmarked on seven well-known test functions, and the results are comparative study with GGSA, CS, ABC, PSO, and GWO.The results show that the PGWO algorithm is capable of providing very competitive results compared to these famous metaheuristics.Because of the superior performance of the PGWO algorithm, we use it to solve clustering problems.
The algorithm has been tested on an artificial data set and eight real data sets.To justify the performance of the PGWO algorithm on clustering problems, we compare it with the original GWO, GGSA, CS, ABC, PSO, and -means.The results prove that the PGWO algorithm is able to significantly outperform others on the majority of the data sets in terms of average value and standard deviations of fitness function.Moreover, the experimental results demonstrate that the proposed PGWO algorithm can be considered as a feasible and efficient method to solve optimization problems.
Our future work will focus on the two issues.On one hand, we would apply our proposed PGWO algorithm to test higher dimensional problems and large number of patterns.On the other hand, the PGWO clustering algorithm will also be extended to dynamically determine the optimal number of clusters.

Figure 1 :
Figure 1: The convergence curves of the average fitness value over 20 independent runs.

( 1 )
For data vector   (2) Calculate the Euclidean distance by (5) (3) Assign   to the closest cluster center (4) Calculate the measure function by (6) (5) End For (6) Return value of the fitness function Algorithm 4: Main steps of the fitness function.

Table 1 :
Initialize the grey wolf population   = ( = 1, 2, . . ., ) and parameters Calculate the fitness of population Find the first three agents   ,   ,   While ( < Max number of iterations) Update the position of the current search agent by (4) If rand > , choose current best solution   as a starting point and generate a new solution    by Powell's method as illustrated Algorithm 2. If    <   , replace   with    , otherwise, go to Step 6 Calculate the fitness of population Update   ,   ,    =  + 1 Benchmark functions.
, GWO algorithm based on Powell local optimization method is proposed.It adopts the powerful local optimization ability of Powell's method and embeds it into GWO as a local search operator.In this case, the proposed method has potential to provide superior results compared to other state-of-the-art evolutionary algorithms.The general steps of PGWO are presented as follows, among which  indicates the performing probability

Table 2 :
Initial parameters of algorithms.
indicates the maximum number of interactions;  is the current iteration.

Table 3 :
Results comparison of different optimal algorithms for 20 independent runs.

Table 5 :
Comparison of intracluster distances for the seven clustering algorithms over 20 independent runs.