A Feature Weighted Fuzzy Clustering Algorithm Based on Multistrategy Grey Wolf Optimization

. Traditional fuzzy clustering is sensitive to initialization and ignores the importance diﬀerence between features, so the performance is not satisfactory. In order to improve clustering robustness and accuracy, in this paper, a feature-weighted fuzzy clustering algorithm based on multistrategy grey wolf optimization is proposed. This algorithm cannot only improve clustering accuracy by considering the diﬀerent importance of features and assigning each feature diﬀerent weight but also can easily obtain the global optimal solution and avoid the impact of the initialization process by implementing multistrategy grey wolf optimization. This multistrategy optimization includes three components, a population diversity initialization strategy, a nonlinear adjustment strategy of the convergence factor, and a generalized opposition-based learning strategy. They can enhance the population diversity, better balance exploration and exploitation, and further enhance the global search capability, respectively. In order to evaluate the clustering performance of our clustering algorithm, UCI datasets are selected for experiments. Experimental results show that this algorithm can achieve higher accuracy and stronger robustness.


Introduction
Clustering technology is widely used in data mining, pattern recognition, machine learning, and image processing [1][2][3][4]. Existing algorithms can be divided into hard clustering and fuzzy clustering. Fuzzy C-Means (FCM) [5] is the representative algorithm of fuzzy clustering.
is algorithm constructs an objective function based on the intraclass distance according to the principle of intraclass compactness and obtains the final clustering result by optimizing the objective function. e FCM algorithm is simple and efficient, but its disadvantages are obvious. e FCM algorithm uses the Lagrange multiplier method to solve the iterative formula of membership degree and centroid. is method is not only sensitive to initialization but also cannot guarantee that the clustering result is globally optimal. Accordingly, Zhang et al. [6] used the particle swarm optimization (PSO) algorithm to find the global best quality center and obtained the clustering results. Jiang et al. [7] proposed a multiview clustering algorithm using PSO with double weights where PSO was used to find the centroid in each iteration. e grey wolf optimizer algorithm has the following advantages: (1) strong global search ability; (2) faster convergence speed and better local search ability; (3) simple principle, few parameters, and easy to operate and realize. Gupta and Deep [8] proposed an improved algorithm RW-GWO based on random walk to improve grey wolf's search ability, and the algorithm is employed to find the optimal setting for the directional overcurrent relay problem which is a highly complex problem. In order to alleviate the problem of premature convergence due to the stagnation at suboptimal solutions in classical GWO, an improved leadership-based GWO called GLF-GWO [9] is proposed. e GLF-GWO algorithm enhances the search efficiency of leading hunters in GWO and provides better guidance to accelerate the search process of GWO. Yichu [10] proposed a Fuzzy C-Means clustering algorithm based on grey wolf optimization (GWOFCM), which has better clustering stability and robustness. However, the GWOFCM algorithm, just like the traditional FCM algorithm, ignores the importance difference between features. Keller et al. [11] proposed a feature weighted fuzzy clustering algorithm, which gives different weights to each feature and achieves better clustering effect. Similarly, Zhou et al. [12] calculated the weight of each feature by using the maximum information entropy principle, proposed the EWFCM algorithm, and improved the accuracy of clustering.
Based on the above analysis, this paper proposes a feature weighted Fuzzy C-Means Clustering algorithm based on multistrategy grey wolf optimization (MSGWO-WFCM). is algorithm not only gives different weights to each feature but also uses GWO to find the global optimal centroid, which not only achieves the global best but also solves the sensitive problem of the initialization process. e rest of this paper is organized as follows. Section 2 introduces some algorithms related to our work. In Section 3, we improve GWO and propose the multistrategy GWO. In Section 4, we detail the feature weighted clustering algorithm based on MSGWO. Section 5 demonstrates the effectiveness of our algorithm by carrying out experiments. Finally, the conclusions are given in Section 6.

e WFCM Algorithm.
Given the dataset X with N samples, expressed as X � {x 1 ,. . .,x N }, where each sample has D-dimensional features, and v 1 ,. . .,v k are K centroids. Let U be a K × N matrix, whose element u ci is the membership of the ith object to the c cluster. e WFCM algorithm introduces weight matrix W to distinguish the importance of different features. e objective function and constraints of this algorithm are shown as following equations: where w cd is the weight of the dth feature in the cth category and m and t are two parameters to control the distribution of membership degree and feature weight, respectively. e Lagrange multiplier method is used to solve membership degree and centroid, and the Lagrange function is constructed as follows: By calculating the partial derivative of equation (4), the updating formulas of membership degree, centroid, and feature weight can be obtained, as shown in following equations: e WFCM algorithm completes the clustering process by iterating the updating formula of membership, centroid, and feature weight alternately.

Original Grey Wolf Optimization.
e GWO algorithm is inspired by the social hierarchy and hunting behaviour of grey wolf population [13]. Grey wolf is a predator at the top of the food chain. Most of them like to live in groups. e average size of each population is 5-12. Grey wolf population has a very strict social ruling class, which is divided into four levels from top to bottom, α wolf, β wolf, δ wolf, and ω wolves. e α wolf is mainly responsible for hunting, choosing habitat, and other decisions; the β wolf is the second leader, who is mainly responsible for assisting the leader to manage the group; the δ wolf is mainly responsible for reconnaissance, taking care of the old and the weak, hunting, etc. And, the ω wolves are at the bottom of the population. Although the ω wolves are subordinate to other grey wolves, they are indispensable to balance the internal relationship of the population. e hunting process of wolves includes the following stages:

Encircling Prey.
In this stage, grey wolves surround their prey during hunting. e mathematical formula is as follows: where t represents the current iteration, A ⇀ and C → are two coefficient vectors, X → p is the vector position of the target, and X → represents the position vector of a grey wolf. A ⇀ and C → are obtained from the following formula: In the iterative process, the value of a ⇀ decreases linearly from 2 to 0, and r ⇀ 1 and r ⇀ 2 are two random vectors in the interval (0, 1).

Hunting.
Hunting is usually guided by the α wolf, and the β and δ wolves may also go hunting. In order to simulate the hunting behaviour of the grey wolf, we assume that α (the best candidate solution), β, and δ wolves better understand the potential location of prey. erefore, the first three optimal solutions obtained so far are preserved, and other grey wolves (including the ω wolves) update its own position according to these optimal solutions. e specific mathematical model is as follows:

Attacking Prey (Exploitation).
When the prey stops moving, the wolf begins to attack. In order to approach the prey, with the decrease of the a value, the fluctuation range of A ⇀ also decreases. In other words, in the iterative process, when the value of a decreases from 2 to 0, A ⇀ takes a random value within [-a, a]. When the value of A ⇀ is within the range [-1, 1], the next position of the wolf can be any position between the current position and the prey position. When | A|<1, the wolves attack prey. (Exploration). In this stage, the grey wolves search for prey based on the locations of α, β, and δ. At first, they separate from each other, and then, they will gather to attack prey. In order to simulate divergence mathematically, use random values greater than 1 or less than −1 to force the grey wolf to deviate from its prey, which can improve the exploration ability of this algorithm and make it realize global search.

Some Improved Versions of GWO.
In recent years, in order to further improve the optimization accuracy and search efficiency, many researchers have tried to improve GWO. Saremi et al. [14] improved GWO by introducing the dynamic evolutionary population operator, which enhanced the local search ability and accelerated the convergence speed. Jayabarath et al. [15] embedded crossover and mutation operators to improve GWO and helped solve economic scheduling problems. In order to improve the population diversity of GWO, a levy-embedded GWO (LGWO) [16], by combining Levy flight and greedy selection strategy with the improved hunting stage, was proposed. A hybrid algorithm of the biogeography optimization algorithm and GWO is proposed [17] to help GWO jump out of the local optimum. In order to overcome the premature convergence problem of GWO, Wang et al. [18] combined the basic GWO with the Gaussian GED algorithm and proposed the GEDGWO algorithm. Besides, a memorybased grey wolf optimizer (mGWO) [19] is also proposed to make the balance between exploitation and exploration more stable.

Multistrategy Grey Wolf Optimization (MSGWO)
is paper improves GWO from three aspects: (1) in the aspect of population initialization, the strategy of population diversity initialization is proposed to improve the population diversity; (2) the nonlinear adjustment strategy of the convergence factor is proposed to balance exploration and exploitation better; (3) a general reverse learning strategy is proposed to further enhance the global search ability.

Population Diversity Initialization Strategy.
When solving the function optimization problem, GWO generates initialization population randomly. e stochastic method cannot guarantee that the initial population covers the decision space of the problem well, and it is easy to lose the diversity of the population. To solve this problem, we propose a semiuniform and semirandomized initialization method. In this method, half of the population still uses the random initialization method, and the other half is generated by the method of global homogenization and then local random. In this initialization method, the search space X i (i � 1,2, . . ., K) is evenly divided into several equal length subspaces whose size equals to one half of the population, and the initial individual value needs to be randomly generated in the randomly selected subspace. Each subspace has and only has one chance to generate individuals, which makes the GWO algorithm retain the randomness of the population on the basis of overall uniformity and avoids the problem of overconcentration of initialization positions caused by random population initialization. erefore, the population can be relatively evenly distributed in the whole search space, which ensures the randomness and diversity of the population [20]. Algorithm 1 gives the initialization flow of semihomogenization and semirandomization.

Nonlinear Adjustment Strategy of Convergence Factor.
It can be seen from equation (12) that parameter A plays a crucial role in coordinating the global and local exploitation capabilities of the GWO algorithm. When |A|>1, the group will expand the search scope, and GWO has strong exploration ability. And, when |A|<1, the group will narrow the search scope and search in local areas, and therefore, GWO has strong exploitation ability. Equation (9) shows that A changes with the convergence factor a, whose value decreases linearly from 2 to 0 with the increase of iteration times. As we all know, the optimization process of GWO is a complex nonlinear change process, and the linear change of convergence factor a obviously cannot reflect the actual optimization search process. Wei [21] proposed that the convergence factor a changes nonlinearly with the number of iterations in the grey wolf optimization algorithm with nonlinear adjustment strategy of the control parameter value.
e standard function test results show that this nonlinear strategy has better optimization performance than the linear strategy. Inspired by the inertia weight update in the improved PSO algorithm [22], we propose a nonlinear change convergence factor update method: where a initial and a final are the initial and terminal values of convergence factor a, respectively, t is the current number of iterations, t max is the maximum number of iterations, and k (k > 0) is the nonlinear adjustment coefficient. In equation (14), the convergence factor a changes nonlinearly with the increase of the number of iterations, which guarantees the effective balance of the global search and local search ability of our algorithm.

Generalized Opposition-Based Learning.
Opposition based learning (OBL) [23] can improve the search performance. Based on OBL, Wang et al. [23] proposed the concept of generalized opposition-based learning (GOBL), and the experiment shows that the strategy has more advantages.
In order to coordinate the exploration and exploitation ability, this paper implements a GOBL strategy for all individuals in the current population. By combining the opposite population with the current population, the excellent individuals are selected into the next generation population to enhance the diversity of the population, which can effectively reduce the probability of the algorithm falling into the local optimum.

MSGWO.
e steps of the MSGWO algorithm mainly include the following: Step 1: set the parameters, including the size of the population N, the dimension d, the maximum number of iterations t max , the initial value of the convergence factor a initial , the terminal value of the convergence factor a final , and the adjustment coefficient k, and initialize a, A, and C Step 2: generate N individuals as the initial population in the search space, and let t � 1 Step 3: calculate the fitness value of each individual grey wolf, and select the best three grey wolf X α , X β , and X δ Step 4: if t < t max , update the position of each individual in the group; otherwise, the algorithm ends.
Step 5: implement the GOBL strategy for all individuals in the current population, update the position of each individual, and select the positions of the top three fitness values as the positions of X α , X β , and X δ , respectively Step 6: calculate the value of convergence factor a according to equation (14), and then, calculate the values of A and C according to equations (9) and (10).
Step 7: if the convergence condition is met, this algorithm will end; otherwise, let t � t+1 and return to Step 4. e pseudocode of the MSGWO algorithm is listed as Algorithm 2.

MSGWO-WFCM
is paper proposes a feature weighted fuzzy clustering algorithm based on multistrategy grey wolf optimization (MSGWO-WFCM). In the process of clustering, the MSGWO algorithm is used instead of the Lagrange multiplier method to find the optimal centroids, which can ensure that our algorithm is not only easy to find the global optimal solution but also insensitive to the initialization process.

Fitness Value Function.
Fitness value function is the benchmark for evaluating the quality of individuals. e larger the function value is, the better the individual is, and vice versa. In GWO, this function is used to judge the grey wolf level. Specifically, the α, β, and δ wolves with highest fitness remain, and guide ω wolves to search for prey. In this paper, the fitness function is set as

Algorithm
Steps. e implementation steps of the MSGWO-WFCM algorithm are as follows: Step 1: set the parameters, including population size N, dimension d, maximum number of iterations t max and adjustment coefficient k, and initialize a, A, and C Step 2: generate N individuals as the initial population in the search space, and let t � 1 Step 3: if t < t max , calculate the fitness value f of each individual according to equation (15), and select the three wolves X α , X β , and X δ with the least fitness; otherwise, this algorithm will end Step 4: update the values of parameters a, A, and C Step 5: update the position of each individual according to equations (11)- (13) Step 6: implement the GOBL strategy for all individuals in the current population, and update the location of each individual Step 7: recalculate the fitness value f of each individual according to equation (15) Step 8: if the convergence condition is satisfied, the algorithm will end; otherwise, let t � t+1 and return to Step 3 e pseudocode of the MSGWO-WFCM algorithm is given as Algorithm 3.

Benchmark Functions.
In order to evaluate the effectiveness of the MSGWO algorithm, eight benchmark test functions (as Table 1) are selected for experiments. In Table 1, f 1 -f 5 are unimodal test functions and f 6 -f 8 are multimodal test functions. Each algorithm runs 30 times independently, and its average value is used to reflect the convergence accuracy, and the standard deviation is used to reflect the stability.

Experimental Results and Analysis.
e MSGWO algorithm is used to carry out numerical experiments on above eight standard test functions. e results are compared with those of PSO [24], GWO, HGSO [25], AO [26], AOA [27], and MRFO [28] algorithms in terms of the average value and standard deviation. To achieve a fair comparison, the iteration number and population size of all optimizers are set to 500 and 30, respectively. e values used for the main controlling parameters of the comparative algorithms can be seen in Table 2. e analysis has been performed on MATLAB 2018a platform on a computer with a Windows 10 64 bit professional and 16 GB RAM. Table 3 shows the experimental results on 8 standard test functions. It can be seen from Table 3 that, for functions f 1 , f 3 , f 6 , and f 7 , MSGWO converges to the theoretical optimal value 0. For functions f 4 and f 5 , the average value of the MSGWO algorithm in 30 experiments is very close to the global optimal solution. In addition, compared with other algorithms, the standard deviations of MSGWO on eight standard functions are the minimum, and seven of them are all 0, indicating that the stability of the MSGWO algorithm is better. In the case of the same population size and iteration times, compared with PSO, GWO, HGSO, AO, AOA, and MRFO algorithms, the MSGWO algorithm has better average and standard deviation and, therefore, has advantages in stability and optimization performance. e diversity analysis in MSGWO can be done by comparing the diversity curves of classical GWO and MSGWO.
(1) set the parameters (2) generate N individuals as initial population in search space according to algorithm 1 (3) calculate the fitness value of each wolf (4) determine the values of X α , X β , and X δ and let t � 0 (5) while t < tmax (6) for i � 1 to N (7) update the position of the ith grey wolf according to equations (11)-(13) (8) end for (9) implement the GOBL strategy for all individuals in the current population to update the position of each individual (10) calculate the fitness value of each wolf (11) update and save X α , X β , and X δ (12) calculate the value of convergence factor a according to equation (14), and then, calculate the values of A and C according to equations (9) and (10)  Journal of Electrical and Computer Engineering between two solutions X � (x 1 ,x 2 , . . . ,x d ) and Y � (x 1 , x 2 , . . . ,x d ) is calculated as follows: From the diversity curves drawn in Figure 1, it can be analyzed that the average distance between the search agents in MSGWO is less than classical GWO, which shows the better balance between the exploration and exploitation and a better convergence rate of MSGWO. It can also be observed that the leading hunters are improved through nonlinear convergence factor strategy as compared to classical GWO because MSGWO provides better solution to these functions and the average distance between the search agents is less than classical GWO.
Due to space limitation, Figure 2 only shows the convergence curves of MSGWO and comparison algorithms for six functions with fixed number of iterations. It can be clearly seen that, compared with PSO, GWO, HGSO, AO, AOA, and MRFO, our MSGWO algorithm has faster convergence speed and higher convergence accuracy.

Influence of Parameters.
In GWO, the nonlinear adjustment parameter k controls the change of the convergence factor and, therefore, has a great influence on the performance. In this section, five different values (k � 1/3, k � 1/2, k � 1, k � 2, and k � 3) are selected to analyse the influence on the performance of MSGWO through numerical experiments. e experimental results of MSGWO with different adjustment coefficient k are given in Table 4.
It can be seen from Table 3 that when k � 1/3, the optimization performance of MSGWO is the best. For functions f 6 , f 7 , and f 8 , the value of k has little effect on MSGWO. For functions f 1 , f 3 , and k � 1/3, 1/2, and 1, the MSGWO algorithm can converge to the theoretical optimal value 0, which is better than the results of the other two groups. For functions f 2 , f 4 , and k � 1/3 and 1/2, the MSGWO algorithm can converge to the theoretical optimal value 0, which is better than the results of the other three groups; for the other three groups, the smaller the value of k, the closer the MSGWO algorithm can converge to the theoretical optimal value. For function f 5 , the smaller the value of k, the closer the MSGWO algorithm can converge to the theoretical optimal value. e experimental results show that the parameter k has a certain influence on the results of MSGWO.

Experimental Preparation.
In order to evaluate the clustering effect of our MSGWO-WFCM algorithm, FCM, GWOFCM, HPSOFCM [29], WFCM, EWFCM, and Input: dataset X � {x1,. . .,xN}, number of clusters K, parameters m and t, number of population n, initial value ainitial and terminal value afinal of the convergence factor, threshold parameter ξ, maximum iterations t max , and fitness value ft(Vt) of t generation population Output: clustering result vector q (1) Use the strategy of population diversity to initialize n populations. e centroid matrix of the ith population is V i (i � 1,. . .,n); (2) Initialize the weight matrix Wi corresponding to the i-th particle; (3) Let t � 1; (4) for i � 1 to n (5) Update uci with equation (5); (6) Update wcd with equation (7); (7) Update the fitness value f1(Vi) of the ith population with equation (15); (8) end for (9) while t < t max (10) Select the best three wolves x α , x β , and x δ according to the fitness value; (11) Calculate the value of convergence factor a according to equation (14), and calculate the values of A and C according to equations (9) and (10); (12) Update the position of each individual according to equations (11)-(13); (13) Implement the GOBL strategy for all individuals in current population, and update the position of each individual; (14) t � t+1; (15) for i � 1 to n (16) Update uci with equation (5); (17) Update wcd with equation (7); (18) Update the fitness value ft(Vi) of the ith population with equation (15); (19) end for (20) if (min ft(Vi) -min ft-1(Vi))<ξ (21) break;  Journal of Electrical and Computer Engineering SFWFCM [30] are selected for comparative experiments. In our experiment, the values used for the main parameters of the clustering algorithms can be seen in Table 5. Six standard datasets are selected from UCI database, and the information of each dataset is shown in Table 6. e accuracy is selected as the evaluation index for the experimental results.
In the six datasets, Iris is a classic dataset in machine learning, and it contains three categories, each of which contains 50 samples. e Hab dataset comes from a study on the survival rate of patients with breast cancer surgery. e Ion dataset is a radar dataset collected by the system in Goose Bay, Labrador. e TSE dataset was provided by students from Gazi University in Turkey. e ecoli dataset  22 30 Convergence parameter (a) Nonlinear reduction from 2 to 0 k 1/3

Experimental Results and Analysis.
When initializing the centroids, C samples are randomly selected as the centroid matrix. Each algorithm is run for 10 times, and the average results are taken as the final experimental results, which are shown in Table 7.
It can be seen from Table 7 that the accuracy of the MSGWO-WFCM algorithm is the highest on the Iris dataset, 6.8% higher than FCM, 9% higher than WFCM, 4.6% higher than EWFCM, and 9.67% higher than the SFWFCM algorithm. On the Hab dataset, the accuracy of the MSGWO-WFCM algorithm is only 1.51% higher than that of the WFCM algorithm and 1.77% higher than that of the EWFCM algorithm, but 24.97% higher than FCM and 24.22% higher than GWOFCM. On the Ion dataset, the accuracy of the MSGWO-WFCM algorithm is 1.2% higher than that of EWFCM, 4.42% higher than that of HPSOFCM, 5.1% higher than that of SFWFCM, 6.64% higher than that of GWOFCM, and 11.11% higher than that of WFCM. On the TSE dataset, the accuracy of the MSGWO-WFCM algorithm is equal to that of WFCM, 0.8% higher than EWFCM, 3.51% higher than FCM, 10.2% higher than HPSOFCM, and 14.79% higher than the GWOFCM algorithm. On the ecoli dataset, the accuracy of MSGWO-WFCM is 22.6% higher than WFCM, 2.26% higher than FCM, 0.68% higher than HPSOFCM, and 6.81% higher than the GWOFCM algorithm. On the pendigits dataset, the accuracy of MSGWO-WFCM is 2.57% higher than WFCM, 3.93% higher than GWOFCM, 3.76% higher than EWFCM, 6.6% higher than FCM, 11.06% higher than SFWFCM, and 16.79% higher than the HPSOFCM algorithm. It is obvious that the accuracy of the MSGWO-WFCM algorithm is better than that of the contrast algorithms.
In order to evaluate the ability of the MSGWO-WFCM algorithm to find the optimal solution, experimental results of seven algorithms running for 10 times are shown as Figure 3. Figure 3(a) shows the results on Iris dataset. From Figure 3(a), we can see that the MSGWO-WFCM algorithm is more stable than WFCM, EWFCM, and SFWFCM and more accurate than the FCM, GWOFCM, and HPSOFCM algorithm. From Figure 3(b), it can be seen that the accuracy of the WFCM algorithm and MSGWO-WFCM algorithm is much higher than the FCM and GWOFCM algorithm on the Hab dataset, and the MSGWO-WFCM algorithm is more stable than HPSOFCM, EWFCM, and SFWFCM. e results on the Ion dataset are illustrated as Figure 3(c). is group of results shows that the MSGWO-WFCM algorithm is superior to the contrast algorithms in terms of accuracy and stability. From Figure 3(d), the results on the TSE dataset, it can be seen that the accuracy of the MSGWO-WFCM algorithm is higher than that of the FCM, GWOFCM, HPSOFCM, and SFWFCM algorithm and equivalent to the WFCM and EWFCM algorithm. From Figure 3(e), although the MSGWO-WFCM algorithm has some fluctuation on the ecoli dataset, its accuracy is still higher than that of the other six algorithms. In Figure 3(f ) of the pendigits dataset, the MSGWO-WFCM algorithm also has some fluctuations, but it has less fluctuation than WFCM, and its accuracy is higher than that of other five algorithms.      Journal of Electrical and Computer Engineering 6. Conclusions is paper proposes a multistrategy grey wolf optimization algorithm (MSGWO). First, the population diversity initialization strategy is introduced to enhance the population diversity; second, the convergence factor nonlinear adjustment strategy can be introduced to better balance exploration and exploitation; finally, the reverse learning strategy further enhances the global search capability. And, the results show that it has better convergence speed and convergence accuracy.
In order to overcome the shortcomings of the traditional fuzzy clustering, on the one hand, the differences between different features are considered, and different weights are assigned; on the other hand, MSGWO is used to update the centroid to ensure the global optimality of the clustering results and effectively alleviate the impact of the initialization process. Experimental results show that the performance of MSGWO-WFCM in terms of the accuracy and robustness is better than the comparison algorithms.
In future, we will explore the practical application of the proposed methods in different fields, such as image segmentation, text mining, and medical problems. Furthermore, we will introduce other search strategies and/or splitting operator for GWO to enhance the guiding search ability of GWO.