A New Chaotic Starling Particle Swarm Optimization Algorithm for Clustering Problems

A new method using collective responses of starling birds is developed to enhance the global search performance of standard particle swarm optimization (PSO). The method is named chaotic starling particle swarm optimization (CSPSO). In CSPSO, the inertia weight is adjusted using a nonlinear decreasing approach and the acceleration coefficients are adjusted using a chaotic logistic mapping strategy to avoid prematurity of the search process. A dynamic disturbance term (DDT) is used in velocity updating to enhance convergence of the algorithm. A local search method inspired by the behavior of starling birds utilizing the information of the nearest neighbors is used to determine a new collective position and a new collective velocity for selected particles. Two particle selection methods, Euclidean distance and fitness function, are adopted to ensure the overall convergence of the search process. Experimental results on benchmark function optimization and classic clustering problems verified the effectiveness of this proposed CSPSO algorithm.


Introduction
The particle swarm optimization (PSO) algorithm is a global optimization method based on intelligent search strategy in population inspired by the behavior of birds flocking.As a swarm intelligence algorithm, each particle flies in the search space, called the solution space, with a certain velocity and updates its velocity and position through a linear combination of individual and global best positions in history.Compared with other evolutionary algorithms, PSO uses individual and global experiences of the particles and has well-balanced mechanism between its exploitation and exploration abilities [1].Therefore, it has been successfully applied to many difficult optimization problems [2].
PSO provides good exploitation and exploration performance for solving optimization problems.The global experience guides the direction of the particle population, and individual experience gives a more precise direction in the search space.In this way, the particle population will move gradually closer to the global optimum.Hence, it has short computing time and is easy to implement.However, like most swarm intelligence algorithms, this algorithm can be easily trapped into local optima in later generations or iterations, and the search process may premature.Many works have been done to improve the standard or traditional PSO algorithm [3][4][5][6].
Data clustering has been an important technology in data analysis.The purpose of data clustering is to discover valuable informative patterns and implicit information [7].As an unsupervised classification technique, data clustering classifies similar data into the same groups or clusters using the characteristics of the data without any prior knowledge about the groups or clusters.Therefore, data clustering can extract potential patterns or knowledge from the dataset and can help in obtaining and understanding the valuable information in the data [8].Because data clustering problems are NP-hard, traditional methods are sometimes ineffective in solving these problems [9].Therefore, a lot of work has been done to solve this problem by adopting PSO, and some recent researches and works are summarized below.
Netjinda et al. [10] presented a new PSO procedure, called starling PSO, inspired by the collective responses of starlings.Dor et al. [11] proposed a dynamic topology DCluster algorithm based on two topologies using a fourcluster approach and a fitness function to solve the underlying problem.Ali [12] presented a new variant of the position updating rule of PSO, in which each particle is treated as a bidimensional vector.Armano et al. [13] proposed a multiobjective clustering PSO algorithm with two objective functions defined to integrate data connectivity and cohesion.Niu et al. [14] proposed a population-based clustering algorithm which combines different PSO and k-means algorithms to help particles escape from local optima.
Exploring the topological neighborhood of the k-nearest neighbors and employing pattern search are considered to be useful tools to improve the performance of PSO [15].Cormark [16] proposed a PSO procedure using Renyi entropy clustering, which contains two steps, initialization and particle removal.Bharti and Singh [17] proposed a binary PSO procedure with an opposition-based learning mechanism using chaotic mapping, dynamic inertia weight, and a mutation operator.Song et al. [18] added an environment factor to the velocity adjustment in PSO to enhance the robust behavior of the particles.Liu et al. [19] proposed a modified coevolutionary multiswarm PSO procedure based on new velocity updating and similarity detection to solve multiobjective clustering problems.
Although PSO has been widely used in many fields and has shown a great potential in solving optimization problems, it still has some limitations.For example, it is easily trapped into local optima and has a low convergence speed.So far, no effective methods have been developed to balance local and global searching abilities [20].Therefore, more works are needed to enhance the performance of PSO [21].On the other hand, data clustering, as one of the most popular data mining techniques in discovering potential information and knowledge from data, needs effective methods to obtain better clustering results.In this study, a chaotic starling particle swarm optimization (CSPSO) algorithm is proposed to obtain better clustering results by improving the PSO performance.
CSPSO has three major parts.A chaotic mapping, rather than a random generation, method is introduced to generate the acceleration parameters.A dynamic disturbance term (DDT) is added in velocity updating to avoid trapping into local optima.In order to improve the search ability, a local search strategy based on the behavior of starling birds is used, and the information of neighbors is collected to guide the direction of the particle.
The rest of this paper is organized as follows.The traditional PSO and the clustering problem are described in Section 2. The CSPSO is developed and described in detail in Section 3. In Section 4, simulation experiments are conducted and comparisons with existing methods are performed to analyze the effectiveness of CSPSO.Section 5 gives conclusions and future research directions.

Preliminary
In this section, the basic concepts of PSO and data clustering problems, related to the development of the proposed algorithm, are described in some detail.

The Standard PSO.
PSO is one of the swarm intelligence algorithms inspired by social behavior of bird flocking [26].In PSO, the population size of the particles is denoted by , and the dimension of the search space is denoted by .Each particle  has a position vector   = { 1 ,  2 , ⋅ ⋅ ⋅   }, a velocity vector V  = {V 1 , V 2 , ⋅ ⋅ ⋅ V  }, and an individual best position in history   = { 1 ,  2 , ⋅ ⋅ ⋅   }.The best position found by all particles in the swarm, called the global best, is represented by   = { 1 ,  2 , ⋅ ⋅ ⋅   }.At iteration , the new updated position and velocity of particle  are determined by ( 1) and (2), respectively, in the following: where  is the inertia weight,  1 and  2 are the acceleration coefficients controlling the step size, and  1 and  2 are two independently generated random numbers uniformly distributed between 0 and 1.Two termination criteria are used.One is when a preset maximal number of iterations is reached and the other is when a tolerance level has been achieved.The standard PSO algorithm is described in Algorithm 1.

Chaotic Mapping.
Chaotic maps are mapping methods used to generate random numbers with features like ergodicity and nonlinear and random similarity [27].There are many kinds of chaotic maps, such as tent map, Tchebychev map, and logistic map.Specifically, the logistic map developed by May [28] can explore in the vicinity of a solution by oscillating in the region.One chaotic variant based on logistic mapping is given by (3) in the following [29]: ℎ ( + 1) =  * ℎ () * (1 − ℎ ()) , ℎ () ∈ (0, 1) ,  ∈ (0, +∞) , where ℎ() represents the value of the chaotic number at time ,  is the control parameter, and ℎ(0) ∉ {0, 0.25, 0.5, 0.75, 1.0}.In (3), the parameter  controls the behavior of the chaotic variant ℎ.The values of ℎ() for varying values of  are shown in Figure 1.[30].A partition must satisfy the following conditions:

Data Clustering
Usually, the Euclidean distance is used to measure the difference or similarity between two data points   1 and   2 .Let  = {z  }, for  = 1, 2, . . ., , represent the centers of the  cluster   .The intracluster distance of cluster   is the sum of the distances of all data points in the cluster to   given by (4) in the following: The quality of the clustering results for the dataset can be measured by the sum of the intracluster distances over all clusters given by (5) in the following [31]: ( defined in ( 6) is used as the fitness function in clustering in the following.

The Chaotic Starling Particle Swarm Optimization Algorithm
Some efforts have been made in order to enhance the search performance and convergence speed of PSO.In this section, a chaotic mapping method and a DDT are introduced into the PSO algorithm to improve the global search ability, and a local search technique based on starling birds is added to improve the convergence speed.This improved PSO method is the CSPSO algorithm.The major components and the details of the CSPSO algorithm are discussed in this section.
3.1.Dynamic Update of CSPSO Parameters.Three components, velocity of the previous iteration, individual factor, and social factor, are used to update the velocity of a particle.Three parameters, ,  1 , and  2 , control the relative contributions of the three components.The inertia weight  controls the influence of the velocity of the previous iteration.The cognitive coefficients  1 and  2 balance the contributions of the individual and social factors.Furthermore, the acceleration coefficients  1 and  2 control the proportion retained in the individual and social factors.Each parameter plays an important role in the velocity update and also consequently affects the particle position update.Instead of using fixed values, logistic mapping is used to generate the acceleration coefficients  1 and  2 in each generation.In addition, the inertia weight is adjusted using an exponential function.The modifications of these parameters are shown in ( 6) and (7): where  min and  max are the minimum and maximum of the inertia weight and  max is the maximum number of iterations.

The Dynamic Disturbance Term.
As an optimization algorithm based on swarm intelligence, PSO uses the trajectory of particles in the population by comparing the fitness function values to select the local and global best positions.Because of local optimal solutions, the velocity of a particle is gradually decreasing during the later iterations.Therefore, the majority of the particles fall into, and cannot easily jump out of, local optimal points, called premature convergence and evolutionary stagnation.To overcome these problems, a method based on DDT [32] is used to change the velocity update of each particle.DDT is embedded into velocity updating when the velocity of a particle becomes too low or when its position stays unchanged in the middle and final generations.The modified velocity is updated using (8) and the DDT, represented by , is updated using (9) in the following: where  is an accommodation coefficient with a value between 0 and 1.This improvement helps in preventing a particle from flying out of the boundary of the search space when its velocity is too high in early generations, and in escaping from a local optimal point by increasing its velocity so as to improve the overall searching capacity.Besides,  increases linearly to avoid oscillation and to keep a relatively stable search direction in the optimization process.

Local Search Based on Starling Birds.
In nature, a starling bird spreads some kinds of information to its nearby neighbors in order to defend its position, and the information coming from one starling bird can spread to everywhere in the swarm.Inspired by this phenomenon, a local search method based on starling bird behavior is used in the developed PSO procedure.Each particle seeks a new position and velocity in the neighborhood by a weighted method that can be seen as a kind of information communication between an individual and its nearby neighbors.This mechanism tries to seek a better solution in the local exploration around the particle and reduces the time needed in local search [25].
Dynamic Neighborhood Selection.For each particle , the neighborhood  is a subset of the particle population.The particles in the population are listed in decreasing order of the fitness-Euclidean distance ratio (FER) from particle  [23], and the  particles ranked on the top are taken as the ones in the neighborhood of the particle.FER is determined by (10) in the following: where ED(  ,   1 ) is the Euclidean distance between   and   1 ,  = ‖‖/(F(  ) − F(  )),   and   are the best and worst positions of the particles in the current population,  is the size of search space defined as  = √∑   (   −    ) 2 , and    and    are the maximum and minimum values of the th dimension of the data points in the dataset.The number of neighbors  is the size of the neighborhood.In order to explore different search scope, a dynamic strategy is used to adjust the size of the neighborhood specified in (11) in the following: where  max and  min are the maximum and minimum sizes of the neighborhood and INT() is the integral function taking only the integer part of .
Position Update.The position of particle  is adjusted using the information of the particles in the neighborhood.Using the weighted positions of the neighbors, the new position of particle  is determined by (12): where   is the current position of particle , x is the new position after the adjustment, and  3 is a randomly generated real number uniformly distributed between −1 and 1.
Velocity Update.Similar to the update of the position, the velocity of particle  is updated using the velocity information of the particles in the neighborhood.The new velocity of particle  is determined by (13) in the following: where V  and V are the current and updated velocities of particle  and  4 is a randomly generated real number uniformly distributed between 0 and 1.The collective responses of starling birds are described by the pseudocode shown in Algorithm 2.

The CSPSO for Clustering Problems
4.1.Particle Selection.In the previous section, a local search method based on starling bird behavior is used in the neighborhood of a particle to search for better solutions.Time needed by this method will increase.In order to accelerate search speed and reduce time needed, the local search method is not applied to all particles.Two methods are adopted to select particles from the population.The local search method is then applied to all the particles selected with these two methods.Euclidean distances to the global best.The particle ranked on the top of each group is selected.As a result, a total of   particles are selected.The particles selected in this method diversify the search and enhance exploration of the CSPSO procedure.
A Fitness Function Method.In each generation, all the particles are sorted in the ascending order of their fitness function values.A total of   particles ranked on the top are selected.The particles selected in this method focus on the promising regions in the search space so as to enhance exploitation of the CSPSO procedure.The two particle selection methods described above are used in CSPSO to improve its convergence and increase diversity.As a result, a total of  =   +  particles are selected.In the implementation,   =   is used.Particles selected through the Euclidean distance method are diversified in the search space to effectively avoid premature convergence.The particles selected through the fitness function method will lead the search direction.Figure 2 gives more details about the process of particle selection.

Population Initialization.
In the implementation of CSPSO, each cluster center is the position of a particle, each cluster center represents a cluster, and different cluster centers represent different clusters.The set of cluster centers  = {  }, for  = 1, 2, . . ., , represents a partitioning result and also represents the position of a particle   = { 1 ,  2 , . . .,   }.In the initialization, the initial position of each particle is generated randomly in the entire search space.Dimension  of the position of particle  represented by   is generated where   is dimension  of data point   .In particular, the dimension  of a particle is D =  * .

Boundary Handling.
If a particle flies outside or across the boundary of the search space, this particle no longer represents a solution.The position and velocity of such a particle are adjusted.If the position of a particle in dimension  is below the lower (above the upper) limit of that dimension, it is set to the lower (upper) limit of that dimension.Accordingly, the velocity of this particle along dimension  is reversed.These adjustments can limit the scope of the positions and change the velocities, as well as changing the directions of the flying paths, of the particles.The position and velocity of such a particle are adjusted as follows: where x is the new value of dimension  of the position and V is the new value of dimension  of the velocity of particle  after the adjustments.

The Pseudocode of the Proposed CSPSO.
The pseudocode of the proposed CSPSO is presented in Algorithm 3.

Simulation Experiments
Three experiments are conducted to evaluate the performance of CSPSO.The first experiment validates the optimal parameter values in CSPSO using the Sphere and Rastrigin [33] benchmark functions.The second experiment validates the performance of CSPSO using four classical numerical benchmark functions [33].These functions, all to be minimized, and their domains are presented in Table 1.The third experiment checks the effectiveness of CSPSO in data clustering using some datasets from the Artificial Datasets [34] and the UCI Machine Learning Repository [35].CSPSO is implemented in MATLAB and all the experiments are conducted on a LENOVO desktop computer with an Intel 3.20 GHz i5-4460/P4 processor and 8GB of RAM in a Windows 8 environment.

Experiment on Parameter Settings.
As described above, the values of the parameters have important influences on the performance of CSPSO.This section focuses on checking the influences of the three critical parameters in CSPSO, the inertia weight , the size of the neighborhood , and the number of selected particles , so as to find appropriate values for them.The Sphere and Rastrigin functions are used to study the influences of different values of the parameters in CSPSO.The shape and ranges of the Sphere and Rastrigin functions for  = 2 are depicted in Figure 3.The dimension of benchmark functions is  = 10 in the experiment.CSPSO ran 30 times for each of the two benchmark functions.
For the validity of the experiments and fair comparisons, parameters not to be tested in the experiments are kept at the same values.These parameters are the population size  = 50, the max number of iterations  max = 100, and the cognitive coefficients The influences of the inertia weight  is tested first.Its value is adjusted according to (8).The values of  min and  max directly determine the decline speed of  and consequently affect the convergence speed of CSPSO.Their values are varied in the experiments.When the value of  varies, the minimum and maximum sizes of the neighborhood are kept at  min = 2 and  max = 5, and the number of selected particles is kept at  = 10.The best and mean values of the Sphere and Rastrigin functions are reported in Table 2.
The differences in these results are obvious.The optimal values of the Sphere and Rastrigin functions were obtained at different values of .However, the mean values are different for the Rastrigin function because it is multimodal with many local optima.Generally, it is difficult to find global optimal solutions for multimodal functions with traditional optimization algorithms.Based on these results, the optimal The influences of the size of the neighborhood  is examined next.In CSPSO, the size of the neighborhood  increases gradually to enhance the local search ability.Large values will lead to slow convergence and small values will restrain the local search ability.An appropriate value balances exploration and exploitation.The lower and upper limits on the size of the neighborhood are set to different values and the best and mean values of the Sphere and Rastrigin functions are reported in Table 3.
Table 3 shows that the best results are obtained when the size of the neighborhood is between the lower and upper limits  min = 2 and  max = 7.The influences of the number of selected particles  are then tested.It has influences on both convergence and search ability.Different values for   and   are used and the results are reported in Table 4.
The results in Table 4 show that the best number of selected particles is 10 when the mean values of the test functions and computation time are both low.It is easy to see that more particles do not always lead to better results.Because of randomness built into the algorithm, sometimes most of the particles may just gather around local optima.
Based on these results, the lower and upper limits on the inertia weight  min and  max are set to 0.4 and 1.2, respectively, the lower and upper limits on the size of the neighborhood  min and  max are set to 2 and 7, respectively, and the number of selected particles  is set to 10.These parameter values in CSPSO are kept the same and are used in the following experiments.

Benchmark Functions.
In this section, the 4 classical numerical benchmark functions presented in Table 1 are used to validate the effectiveness of CSPSO.The dimension of the benchmark functions, which determines the dimension of the search space, is set to  = 10, 20, and 30.The performance of CSPSO is compared with those of PSO, fitness-Euclidean distance ratio particle swarm optimization (FER-PSO) [23], and starling particle swarm optimization (SPSO) [25].PSO is the basic swarm intelligent algorithm that was developed by Kennedy and Eberhart [26].FER-PSO utilizes the individual best of a neighbor selected based on the Euclidean distance and fitness function value to replace the individual best of its own.The SPSO uses the collective responses of the particles to adjust the positions and velocities of the particles when the algorithm stagnates.
The values of the adjustable parameters in these competitive algorithms are the best ones reported in the respective references as listed in Table 5.These competitive algorithms were also run for 50 independent times each so as to get some meaningful results.The best fitness function values for these functions obtained by these algorithms are reported in Table 6.
Figure 4 shows the convergence of some functions of the three algorithms.Compared with PSO and FER-PSO in Figures 4(b) and 4(c), the curves of the function values obtained by CSPSO decline slowly at the beginning of the evolutionary process because of the time taken by local search.Hence, it has a slow convergence speed.Relying on DDT and chaotic mapping, CSPSO can find lower fitness values than other algorithms after about half of evolutionary process rather than falling into local optima.The local search strategy in the neighborhood helps in finding better solutions in the nearby neighborhood of the current position of a particle.In particular, the curve of the Sphere function obtained by CSPSO disappeared after the best value 0 has been found because this value cannot be marked on the chart.As shown in Figure 4(d) for the Rosenbrock function, a multimodal function, and traditional algorithms like PSO,  As the results of the 4 classic functions show in Table 6, CSPSO and SPSO perform better than PSO and FER-PSO on the Rastrigin and Griewank functions, and CSPSO performs better than the other three algorithms on the Sphere and Rosenbrock functions.In general, the results show that CSPSO has more powerful global search ability than traditional algorithms like PSO, FER-PSO, and SPSO both in single modal and multimodal functions.Furthermore, as indicated by the results in Table 6, the better performance of CSPSO than others provides evidence that the improvement in CSPSO is effective in enhancing the performance of PSO and CSPSO can find the global optimum more often than other algorithms in the experiments.

Clustering Problems.
In this section, experiments on clustering problems with different datasets are performed and results of different algorithms, including CSPSO, are reported and discussed.Eight benchmark datasets are used.They are Data 3 2, Data 5 2, Data 10 2, and Data 4 3 from the work reported in [34] and Iris, Glass, Contraceptive Method Choice (CMC), and Wine from the UCI Machine Learning Repository [35].More details about these datasets are given in Table 7. Data 5 2 and Data 4 3 are also plotted in Figure 5.
The optimized values of the parameters in the other algorithms shown in Table 5 are adopted in the experiments.In order to perform fair comparisons, all the algorithms run on all the clustering datasets for 50 times to eliminate the effects random factors.As simple statistics, best values (Best), worst values (Worst), mean values (Mean), and standard deviations (S.D.) are used as the evaluation criteria to measure the effectiveness of these clustering algorithms.The environment of experiment is the same for all clustering algorithms.
Figure 6 shows the convergence of these algorithms on four datasets for typical runs of the algorithms.The fitness functions of CSPSO decline slowly at the beginning of the evolutionary process and continue to decline, rather than dropping into local optimum, at the middle of the evolutionary process, possibly because DDT gives the particles high velocities.Local search based on Euclidean and fitness neighborhood structure also helps the particles find better positions.This phenomenon is more evident on highdimensional datasets such as the Glass dataset.
To be more clear, Figure 7 gives more details about the convergence of these algorithms near the end of the search process on three out of the four same datasets.CSPSO has the best performance among all these clustering algorithms.ABC is the first converging to a local optimal point possibly because of the influence of randomly selected neighbors.GABC has a poor performance for large data volumes, such as the Data 5 2 and Glass datasets, possibly due to the local search behavior of onlooker bees.FER-PSO has apparently better convergence performance than other algorithms possibly due to the strategy that only one of the neighbors is selected to guide the search of a particle, but it is easily trapped in local optimal points in the later generations  of the search process.SPSO shows a better performance in searching for a global optimum, but its local search strategy is not working on high-dimensional datasets.The above results and discussions reveal the different performances among PSO, ABC, FER-PSO, GABC, SPSO, and CSPSO and verify the good performance of CSPSO as a clustering algorithm.
It can be seen from Table 8 that CSPSO has a better performance on these classical clustering problems.For the Data 3 2, Data 5 2, and Data 10 2 datasets, CSPSO obtained lower mean fitness values than others, but other algorithms perform better than CSPSO on the best fitness values.This is possibly due to the randomness of the swarm intelligence algorithms.For the Data 4 3 datasets, because of their large numbers of data points, the ability of local search in CSPSO is affected by the selected neighbors.ABC has better stability on this dataset, but has a poor ability in searching for a global optimum than the others due to the random strategy.CSPSO outperforms the other algorithms and obtains lower mean fitness values than others on the Iris, Glass, CMC, and Wine datasets.Therefore, it has better performance on highdimensional data.This proposed algorithm with embedded DDT and chaotic mapping has better global search ability and is more stable than traditional PSO through the use of a local search method based on starling birds.Furthermore, as the sizes of the datasets increase in both the dimension and number of data points, CSPSO has better performance than PSO, ABC, GABC, FER-PSO, and SPSO.It is more capable of dealing with big data in the current information age.

Conclusions
Because the performance of standard PSO depends on its parameter values, parameter adjustment is one of the most useful ways for PSO to obtain good exploration and exploitation abilities.CSPSO proposed in this study enhances the overall convergence of the searching process.Nonlinear adjustment of the inertia weight and the chaotic search method based on logistic mapping enhance the global search performance of the traditional PSO and help particles escape from local optima.DDT added to velocity update increases the velocities of the particles in later iterations and also helps particles escape from local optima.The local search strategy based on behavior of starling birds utilizes the information of the nearest neighbors in the neighborhood and determines a new collective position and a new collective velocity.The two particle selection methods, Euclidean distance and fitness function value, maintain population diversity of the particles.These improvements help particles avoid stagnation and find better solutions in the search space.
The results of simulation experiments in both benchmark function optimization and classical clustering problems show the effectiveness of CSPSO.Compared with the traditional PSO and other evolutionary algorithms, CSPSO has better convergence properties and stronger global search ability.However, as for other swarm intelligence algorithms, CSPSO may be trapped into, and may stagnate around, local optimal points and, therefore, may be unstable when applied to problems with multiple local optima.In future works, CSPSO will be applied to other combinatorial optimization problems, such as scheduling problems.Other metaheuristic methods and optimization techniques may also be used to improve the performance of PSO.

Figure 1 :
Figure 1: Bifurcation diagram for the logistic map.

Figure 2 :
Figure 2: A local search method in the CSPSO algorithm.

Figure 4 :
Figure 4: Convergence of the algorithms on some functions (D = 10).

Figure 5 :
Figure 5: Distributions of data points in some datasets.

Figure 6 :Figure 7 :
Figure 6: Convergence of the algorithms on some datasets.

Table 2 :
Results when varying the inertia weight .

Table 3 :
Results when varying the size of the neighbourhood .

Table 4 :
Results when varying the number of selected particles .
values of the lower and upper limits of the inertia weight are set to  min = 0.4 and  max = 1.2 where the minimum mean values are achieved.

Table 5 :
Parameter settings in the experiment.

Table 6 :
Best fitness function values obtained by different algorithms for the test functions.

Table 7 :
Description of the datasets used in the experiments.

Table 8 :
Comparison of performance of CSPSO with other clustering algorithms.