A New Binary Adaptive Elitist Differential Evolution Based Automatic k-Medoids Clustering for Probability Density Functions

This paper proposes an evolutionary computing based automatic partitioned clustering of probability density function, the so-called binary adaptive elitist differential evolution for clustering of probability density functions (baeDE-CDFs). Herein, the k-medoids based representative probability density functions (PDFs) are preferred to the k-means one for their capability of avoiding outlier effectively. Moreover, addressing clustering problem in favor of an evolutionary optimization one permits determining number of clusters “on the run”. Notably, the application of adaptive elitist differential evolution (aeDE) algorithm with binary chromosome representation not only decreases the computational burden remarkably, but also increases the quality of solution significantly. Multiple numerical examples are designed and examined to verify the proposed algorithm’s performance, and the numerical results are evaluated using numerous criteria to give a comprehensive conclusion. After some comparisons with other algorithms in the literature, it is worth noticing that the proposed algorithm reveals an outstanding performance in both quality of solution and computational time in a statistically significant way.


Introduction
Clustering aims to divide input data into multiple groups such that elements in each group are similar and different from elements in other groups as much as possible. There are two primary objects of clustering, discrete element and probability density function (PDF). Over the past few years, discrete element is preferred in clustering with a lot of works such as [1,2]. However, with the explosion of digital era, a massive amount of data is created each day [3,4], and how to present such data well is a challenge task for discrete elements. The main reason for this is that the discrete element just presents whole data by a representative point so that it is unlikely to fully demonstrate characteristic of whole data, especially fluctuation data [5]. Meanwhile, the remaining object-PDF shows its advantage in such digital era like capturing the distribution of whole data, giving a visible look of characteristic of the estimated objects [5]. However, regardless of the advantages of PDF, works related to clustering for this interesting object are still very limited. Therefore, this paper aims to contribute new approach for clustering of probability density functions (CDFs).
Concerning CDFs, two main approaches can be found, nonhierarchical and hierarchical [6]. With respect to nonhierarchical approach, k-means and k-medoids based algorithms can be seen as the typical ones [7]. However, k-means based approach shows its disadvantage when addressing outliers or noises [7]. In contrast, k-medoids based approach can handle outliers effectively as shown in the work of [8] since it directly employs objects in input data as the centers. The k-medoids is then applied to various fields. For example, in [9], Zhang B et al. presented an improved ranked k-medoids 2 Mathematical Problems in Engineering algorithm by a specific cell-like P system and extended the application to membrane computing. In addition, in [10], Liu F et al. proposed an k-medoids clustering algorithm for testing software. In [11], Cahaya L et al. introduced an unsupervised learning algorithm, the so-called intelligent k-medoids, to predict the length of a study time of the students in universities. Moreover, in [12], Prihandoko et al. applied k-means and k-medoids algorithms to analyze the natural disaster data in Indonesia. By that way, they can reveal the hidden information so that the government could have appropriate solutions in advance. Nevertheless, these applications are still widely applied on discrete elements, not the PDFs. To the best of our knowledge, there are no works using k-medoids in clustering for probability density functions so far. As a result, it is necessary to develop a nonhierarchical crisp clustering algorithm for probability density functions (PDFs) such that the representative object is generated based on k-medoids scheme.
Over the past few years, tremendous research effort has been made to evolve the partitioned clustering algorithms in complex data sets through evolutionary computing techniques. However, not many studies concern how to determine the number of clusters at the same time, especially in the field of clustering for PDFs. For instance, just two recent evolutionary-based algorithms are proposed by author group of Vo-Van et al., but both accept a k input as the number of clusters rather than determining it "on the run" [13,14]. Moreover, these works usually employ very classical and simple evolution methods to solve the problem, genetic algorithm (GA). Although GA is an effective method in solving heuristic optimized problems, there are still large amounts of modern and efficient evolution methods that need to be mentioned. Therefore, in this paper, one of state-of-the-art evolutionary algorithms will be employed and modified to address clustering of probability density functions automatically.
Regarding evolutionary-based algorithms, the adaptive elitist differential evolution (aeDE), firstly developed by Ho-Huu et al. [15], has proved to be one of the most effective and robust algorithms. Firstly, aeDE is a global searching algorithm inherited from its original version, Differential Evolution (DE) [15], which is an advantage compared with that of Genetic algorithm (GA), easily trapping into local solution [16]. Secondly, with two modifications suggested by Ho-Huu et al., the quality of solution as well as the computational cost has been enhanced significantly. On the one hand, the adaptive mutation scheme assists the algorithms searching more efficiently and decreasing computational burden; on the other hand, the elitist selection scheme supports algorithm in recording the best individuals for next generation. For these reasons, aeDE is being increasingly applied in numerous fields. For example, in [17], Ho-Huu V et al. presented an improved DE to solve the shape and size optimization problem of truss structure with frequency constraints. Also, in [18], Ho-Huu V et al. applied aeDE to optimize the truss structures with frequency constraints based on reliability under uncertainties of loadings and material properties. In [19], Vo-Duy T et al. suggested a global numerical approach for lightweight design optimization of laminated composite plates subjected to frequency constraints by applying aeDE. Moreover, in [20], this author also proposed an aeDE based-effective approach to maximize the fundamental frequency of laminated functionally graded carbon nanotube-reinforced composite quadrilateral plates. Besides, in [21], Demertzis K and liadis L utilized aeDE to optimize extreme learning machine model, and then it is applied to create an advanced computer vision system for the phenotype based automatic recognition of invasive or other unknown species. Nevertheless, one noticeable point is that the application of such a good and robust algorithm as aeDE is being restricted in engineering problems as mentioned above, as none of the works related to aeDE have been found in clustering problem. Therefore, it would be emphasized that the aeDE is a fully expected candidate to be employed to address clustering problem regarding the evolutionary approach. However, to adapt this algorithm to the clustering problem, several works are required. For example, the variables in clustering problems are represented as discrete variables in the binary form. Therefore, we need to use a concept of binary chromosome inspired from the idea of [22] to deal with this problem. Moreover, employing binary presentation of chromosome in this paper provides advantages of coding and reducing complexity of the algorithm. Therefore, it would be more roust when dealing with automatic clustering problem. As a result, in this paper, we employ the evolutionary algorithm with binary chromosome representation to tackle the automatic partitioned clustering problem, called binary adaptive elitist differential evolution for clustering of probability density functions (baeDE-CDFs).
From what has been drawn, to fill the research gap and contribute more flexible algorithm for clustering of PDFs, this paper proposes a novel algorithm called baeDE for automatic k-medoids partitioned clustering of PDFs (baeDE-CDFs). More specifically, the clustering problem is restated and solved under the form of an optimal problem where the objective function is based on Silhouette index (SIindex), a modification from an internal validity measure index, Silhouette [23], and the optimal algorithm is baeDE derived from [15] with some modifications. Some advantages of the proposed method can be enumerated as follows. First, probability density function is considered as the main object in this paper instead of discrete element to overcome some shortages of discrete element as mentioned before. Second, by applying k-medoids scheme, the proposed method could face outlier or noise effectively. Third, the proposed algorithm can address the clustering problem without the number of clusters in advance, thus increasing adaptation in such fluctuation digital world. Finally, using new but successful population-based optimal algorithm, namely, aeDE, in clustering problem not only decreases computational burden but also enhances quality of solution. The superiority of the proposed algorithm has been shown through many numerical examples including simulated, Benchmark, and real data sets in a statistically significant manner.
Brief summarization of some discussed approaches for CDFs is illustrated in Table 1 along with the proposed method to clarify its contributions. The rest of the paper is divided into the following sections. Section 2.1 concerns distance measure  [24] Automatically defined Based on data-driven learning mechanism Not good if data is overlapping Good Low The method in [25] Given in advance k-means Not good Medium Low The proposed method Automatically defined Binary aeDE Good even for complex data Good Lower than [13,14] and states the clustering problem with respect to the evolutionary approach. Section 2.2 presents the Silhouette (SI) modified and the external validity measure index to evaluate the quality of produced partitioned clustering. Section 2.3 shows in detail the proposed algorithm. Section 2.4 then will prove the effectiveness of the proposed algorithm via multiple numerical examples in several aspects. Section 4 finally draws some conclusions of the whole work. Usually, clustering multiple objects into the separate groups depends heavily on the type of measures or distances to assess similarity level between objects [6]. Overview of clustering of PDFs and advantages of L1-distance in computing similarity between PDFs have been shown in [26]. Moreover, recent works regularly employ L1-distance in clustering problem such as [5,24,27]. Therefore, in this paper, we also use L1-distance as a criterion to evaluate whether two or more objects are similar or dissimilar.
Equation (2) implies that Apparently, the L1-distance also satisfies normal criteria of a distance measure. First, it is always greater than zeros and it should be symmetric. Moreover, the minimum value should be obtained in case of identical PDFs.

Clustering Problem as an Optimization Problem.
Since the clustering problem can be stated in form of an optimization problem, the evolutionary approach can be applied to a clustering problem. The idea is originally derived from Darwin's evolutionary theory. The main point is to use the evolutionary operator and population of clustering structure to converge to a global solution. For more details, the reader can refer to [28]. This kind of clustering problem can be briefly presented as follows.
where n is number of PDFs and k is the number of clusters. Moreover, according to [29], the clustering problem is NPhard when the number of clusters exceeds 3.

Internal and External Validity Measure Indexes.
As discussed in the previous subsection, evaluating the quality of established partitions should be handled by validity measure indexes containing internal and external measures. The two following sub-subsections describe the measures that we use in this paper as fitness function and test criterion for produced partition, respectively.

Internal Measure: Modified Silhouette (SI).
As mentioned in the above subsection, most internal validity measures that applied clustering for PDFs concentrate on building internal measures for the type of k-means clustering. Nevertheless, for kind of k-medoids clustering of PDFs, no internal validity measure is taken into consideration. Therefore, this study proposes a modified Silhouette measure to tackle the problem of k-medoids clustering. Its formulation is shown below.
. . , C }, clustering data set into k disjoint sets in terms of , such that ⋃ =1 = .
These following terms need to be computed: (i) ( , ) is distance from PDF i to the medoid of the cluster it belongs to.
(ii) ( , ) is the distance from PDF i to medoids of other clusters.
All of them are presented as follows: The local SI value of a single PDF is defined by The global SI value of a partition can be calculated as All distances here are computed by L1-distance. Further, instead of computing difference from one PDF to other PDFs in another cluster pair by pair, we calculate the distance from that PDF to the medoid of mentioned PDFs. Similarly, when assessing the distance from one PDF to another PDF in one cluster, we also calculate distance from this PDF to its medoid. This manner improves the computation cost significantly compared with the original formula. In case all PDFs in a cluster converge into the medoid of that cluster, it means that the values ( ) of all these are equal to 1, which reflects the compactness of the cluster. In contrast, if the distance from to the medoid of cluster it belongs to is high, but the distance from to the medoid of other clusters is tiny, this leads to ( ) receiving the negative value and, in the worst case, SI should equal -1. This case shows that the separation of the cluster is not good. Therefore, the value of SI is in range [−1, 1] in general and the better value of SI indicates the better result. Nevertheless, in this paper, the algorithm is built to find the minimum solution; a minus sign should be put ahead of SI to fit with the designed algorithm.

External Measure: Adjusted Rand Index (ARI).
Regardless of advantages of the internal validity measure index, one should have another criterion to validate the produced partition and the goodness of the internal measure. The external measure could be a candidate to solve this problem. The usual strategy is making the comparison between the produced partition and the "solution partition" where "ground-truth" labeling is known. The higher the value of the external measure, the better the produced partition. Around a lot of external measures known to the users, the adjusted Rand index, a modified version of Rand index, is considered to be one of the most popular; more details can be found at [30]. If the P and Q are two given partitions of data set, the formulation of ARI is defined as follows: where is the count of pairs of elements in the same cluster in P and Q; is the count of pairs of elements in the same cluster in P, but in different clusters in Q; is the count of pairs of elements in a different cluster in P, but in the same cluster in Q; is the count of pairs of elements in a different cluster in both P and Q; P and Q are two partitions of set F: one is the produced partition and the other is solution partition, respectively.

Chromosome Presentation.
As presented in the previous subsection, clustering problem can apply the evolutionary algorithm to be solved under population structure. Therefore, the chromosome must be presented to build such a population. Because the proposed algorithm is automatic k-medoids clustering, the chromosome is encoded under a binary form. More specifically, at the position where label 1 appears, it means the PDF at that position is considered as a medoid. In addition, the minimal number of clusters must be at least two since it is meaningless to group all objects into one group. Furthermore, the maximum number of clusters is still a question; in this work, the square root of number of PDFs is considered to be the maximum number of clusters. Hence, depending on random value created initially and considered as the number of medoids in each chromosome ( random ), the number of medoids may vary leading to the different number of clusters. The following example gives a visible look for encoding the chromosome. Suppose having n = 9 PDFs, = { 1 ( ), 2 ( ), . . . , 9 ( )}. Then, some chromosomes can be encoded as follows: From the two above chromosomes, we see that for chromosome 1: the initial value is created randomly random = 3, having 3 PDFs encoded as medoids. The positions 1, 3, and 4 are places having label 1, so the corresponding PDFs 1 , 3 , 4 are considered as the medoids for partitions. Based on this chromosome, the fitness function can be evaluated. The similar explanation is used for chromosome 2.

The Proposed Algorithm.
The aeDE which is one of the state-of-the-art population-based optimization algorithms is firstly introduced by [15]. However, this standard aeDE only tackles the real code chromosomes. According to the best of our knowledge, there is no version of binary aeDE used for bit string chromosomes. In addition, since our problem faces automatic k-medoids clustering, a binary chromosome should be more convenient for computation than the real code one. That is the reason why the standard aeDE should be modified in binary form to match our scope. The detailed process is introduced as follows.
Initialization. The algorithm will commence with a randomly created population, including NP individuals. Each individual is a vector containing n design variables representing n input PDFs px = [ 1 2 . . . ], = {0, 1}. The scheme to create initial individual is as below: (1) px = (1, ); (2) = ( , ([ min , max ])); where (i) n is the total number of PDFs; (ii) min is the minimum number of clusters, usually 2; (iii) max is the maximum number of clusters, usually square root of n; (iv) NP is the population size.
First, a zero vector px is created. Next, the operator randi selects a random integer according to the chosen range. Subsequently, the operator randperm will randomly select = ([ min , max ]) unique integers from 1 to n inclusive. Then, the considered vector x at position random will be replaced by bit 1. After running to NP loops, a population including NP binary individuals will be available for next phases.
Mutation Phase. By suggestion of [15], the mutation process of aeDE algorithm is modified by two adaptive phases. In the first phase, the algorithm hopes to maintain the diversity of the population and tries to prevent the population from jumping into the optimal local solution. Therefore, the operator mutation "rand/1" is employed since it is good at global search [15]. Nevertheless, this operator has its main drawbacks such as slow convergence to optimal global solution and being bad at searching the local optimal solution [31]. That is the reason why the second adaptive phase appears. This phase intends to accelerate convergence speed of the population toward the best individual after the global solution region already located. Thus, the mutation operators, which prefer local solution searching ability, are employed in this phase. They would be "current-to-best/1" or "best/1" due to their effective searching capacity for local solution.
After some experiments, we see that the operator "best/1" may be more suitable for the type of our problem. Hence, this mutation operator is employed for the second phase in the mutation scheme of our algorithm. However, there must be a value called threshold to know when the first phase terminates or the second phase starts. When the absolute value of the deviation of the objective function between the best individual and the entire population in the previous generation (denoted delta) is smaller than the threshold, the transfer will be taken. By this effective modification, the ability to search the global and the local optimal solutions as well as convergence rate is enhanced significantly in comparison with the original DE [15].
Here are some mutation operators already discussed above.
(ii) current-to-best/1: k Probability Estimation Operator. However, it seems like when employing the modification scheme from aeDE, the new mutant individual is still the real-coded one. Therefore, inspired by the idea of [22], in this paper, we employ the scheme of the probability estimation operator to generate the binary coded mutant individual. Here, in mutation process, the author creates multiple probability models at each iteration from information gained from parent individuals. For more details, the probability estimation operator is defined as follows: where (i) F is the scale factor; (ii) 1 , 2 , 3 are the jth-bits of three randomly selected individuals; (iii) b is denoted as bandwidth factor and receives a positive real constant.
By this scheme, three parent individuals will be taken into consideration to capture differential information aiming to establish the probability distribution model. The bit of the mutant individual will then be decided, based on this model, whether "1" or "0". The appearance of parameter b plays a role in tuning the range and shape of the probability distribution model. One suitable value of b will efficiently improve searching capability as well as retaining the diversity of population simultaneously [32].
Once probability estimation vector is determined, the corresponding mutant individual pk is deduced by the following equation: where rand is a uniform distributed random number in range Crossover Phase. Subsequently, the crossover phase aims to produce trial individual pu by employing the binomial crossover operation. This operation is performed by exchanging some elements between the target individual and its mutant individual. It can be illustrated by the equation below: where = 1, 2, . . . , , = 1, 2, . . . , ; rand is an integer number selected from 1 to n; and CR is the crossover rate ranging from [0.7, 1].
Selection Phase. In traditional DE, each trial individual pu generated after crossover phase will be compared with the target vector px to select a better individual for the next generation. However, by this way, some good information of ignored individuals can be forgotten. Somehow, this ignored individual is still better than other ones in the entire population although it is worse than its target individual in comparison with pair-by-pair one. Therefore, the authors Ho-Huu et al. [15] have proposed the elitist selection scheme original from [33] to maintain the good individual in the current population. This scheme will firstly combine both trial individuals in children population C and target individuals in parent population P into one population, called Q. Then, NP best individuals will be selected from this combined population Q to establish population for the next generation. By this way, all the best individuals are always stored and maintained for the next generation. As a result, the convergence rate of the algorithm will be improved, too.
Stopping Condition. As discussed in [15], the baeDE-CDFs will be stopped either when value of delta is greater than that of tolerance or when the maximum number of iteration (Maxiter) is reached. Here, the values of tolerance and Maxiter are, respectively, set to 1e-6 and 2000 for all examples.
By the improvement in aeDE algorithm combined with the probability density estimation, the baeDE-CDFs produces binary chromosome which increases the flexibility of such fast and robust optimization algorithm as aeDE in various circumstances. The flowchart in Figure 1 shows general process of the proposed algorithm baeDE-CDFs.

Simulation Strategy.
In this subsection, the proposed baeDE-CDFs in Section 2.3 is employed to solve numerous clustering problems. The outline of simulation strategy is briefly presented as follows. Firstly, the details of each data set will be presented. Secondly, the influence of the mutant factor F, crossover control parameter CR, population size NP, threshold thres, and bandwidth factor b of baeDE-CDFs on the optimal solution is examined for all examples. Nevertheless, one of these investigations is demonstrated only to avoid prolixity. Subsequently, based on the obtained results of investigations, some adequate parameters are recommended later. Thirdly, a comparison between the k-means based algorithms and k-medoids based baeDE-CDFs in terms of ARI (adjusted Rand index) is designed to measure the effectiveness of baeDE-CDFs. It would be noticed that all algorithms are performed over 50 independent loops to guarantee the stability of algorithm and the result is taken as average value plus the standard deviation. Further, statistics tests are also performed to validate whether the difference is significant or not. Next, another comparison between the evolutionary algorithms in terms of both accuracy and computational time is performed to evaluate the impact of applying the aeDE technique to clustering problem. The statistical tests are also employed there with the same purpose. Finally, some primary conclusions are drawn based on the deduced results. Besides, it is worth noticing that all the problems are implemented on 2015-version Matlab software on an Intel5 Core6 i5-2400 CPU @ 3.10 GHz with 4GB main memory in Windows Server 2010 environment.  some properties as well as effectiveness of the proposed algorithm. Then, it is extended to real application with more complex problem. It is also noticed that the size and complexity of data sets of examples gradually increase from Benchmark examples to real application. Herein, a summary introduction of each data set is presented, but more characteristics of each data set such as total of objects, number of clusters, nominal partition, or complexity level can be found in Table 2. Moreover, the data set has the explicit distribution; the PDF of that date set is estimated based on the explicit formula (example 1 for instance). Nevertheless, if the data set has not known the distribution yet, such as data sets in examples 2, 3, 4, and 5, the kernel method is employed to estimate the PDFs as follows. Let 1 , 2 , . . . , be N discrete elements including n dimensions which are employed to estimate the PDFs. The kernel formula is shown as follows:
In the above formula, there are two important parameters that need to be estimated, the bandwidth h and the kernel function K. Recently, there are various ways to choose these parameters. In this paper, we take the estimation of h according to suggestion of Scott [34] and the kernel function is the Gaussian one.
(i) Example 1. Seven univariate normal probability density functions (PDFs) are considered. This data firstly proposed by [25] is one of the simulated databases with "well-separated" class structure. The details of estimated parameters can be found in [25]. Seven PDFs estimated from these parameters can be seen in Figure 2.
(ii) Example 2. Some samples of Brodatz data set are taken into account [35], which are D4, D10, D12, and D34, respectively, as shown in Figure 3. Each of the samples is cropped randomly into ten pieces with uniform size of 256×256 pixels so that there are forty images in total. The PDFs estimated from these images are demonstrated in Figure 4.
(iii) Example 3. Thirty texture images from UIUC data set are involved in this case, separated into three categories: Floor, Wall, and Upholstery as illustrated in Figure 5 [36]. PDFs estimated from all images are demonstrated in Figure 6. It is obvious that the complexity of this data set is higher than the two previous ones because of more remarkable overlapping classes.   (iv) Example 4. Thirty images derived from the Columbia-Utrecht Reflectance and Texture Database (CUReT) are considered in the example, clustered into three groups: Human Skin, Ribbed Paper, and Insulation as shown in Figure 7. The details of CUReT data set can be found according to the following link http://www1.cs.columbia.edu/CAVE// software/curet/. Estimation of PDFs is depicted in Figure 8. In this case, it can be seen that the complexity is a little more remarkable than the previous cases due to significant overlapping area.
(v) Example 5. In this example, a real application is extended to deal with the traffic problem. Commonly, traffic congestion is always a headache problem in Asian countries, and so is Vietnam. Specially, in rush hour, this problem is likely more complicated. Concerning this problem, this paper contributes one solution by applying the proposed algorithm to determine which photos present traffic jam and which present non-traffic jam. A total of 100 digital images extracted from the short video in front of Ton Duc Thang University are considered. These are divided into two main groups: traffic jam and non-traffic jam, respectively. Some of such photos are presented in Figure 9 and distribution of PDFs estimated from the photos is shown in Figure 10.

Influence of the Parameters F, CR, NP, Thres, and b on the
Optimal Solution. One of the major problems of evolutionary problem is parameter setting. As a result, a survey of influence of parameter setting on algorithm performance in each example will be presented first. This aims to get satisfactory parameters of F, CR, NP, thres, and b of baeDE-CDFs for each problem. However, to avoid wordiness, only typical example is demonstrated in detail. Based on the obtained results, an adequate parameter set is recommended. In this paper, an investigation of how parameters in baeDE-CDFs influence the optimal solution in case of example 5 is presented carefully. In detail, the impact of the mutant factor is described in Table 3 and Figure 11, and the influence of the crossover control parameter is illustrated in Mathematical Problems in Engineering    of baeDE-CDFs would be better than those obtained by other values. Table 5 and Figure 13 depict the impact of population size NP on the optimal solution. It can be found that there is an upward trend in computational cost as the population size is increasing. However, no significant change is found in the value of fitness function, just a minor difference. Therefore, NP = 30 would be a proper choice that can satisfy both the quality solution and computational cost.
The influence of threshold thres on the optimal result is indicated in Table 6 and Figure 14. It can be realized that there is a significant increase in computational cost when the threshold is stricter, in contrast to the trend of fitness value. However, there is just a tiny fluctuation of fitness value when ℎ ∈ [1 3, 1 5]. Therefore, it can be seen that, with a larger value of thres, the algorithm converges rapidly with less iterations, but the result is unstable. In contrast, with a stricter value of thres, the algorithm converges more slowly, but the result is more stable. Hence, it would be appropriate to choose ℎ = 1 3 as the threshold for this example.
The impact of bandwidth on the solution quality is shown in Table 7 and Figure 15. It can be observed that both the computational cost and fitness value are higher than expected when value of bandwidth ranges from 0 to 10. From bandwidth of 20 or greater, the computational cost as well as the solution quality is more stable. However, to achieve a balance between computational cost and solution quality, a bandwidth of 50 is suggested in this case.

Evaluation of Scheme to Select the Representative PDF.
In this subsection, an evaluation regarding the way to choose the representative PDF is presented. In detail, there are two main schemes to select the representative PDF as mentioned before. The first is based on the k-means method, which selects the average of PDFs as the representative PDF; the second relies on the k-medoids method which picks up one of PDFs as the representative PDF. In this paper, the representative PDF is chosen with regard to second way. Therefore, we would like to measure whether the k-medoids based representative PDF is superior to the k-means based one or not. Herein, three k-means based clustering algorithms are considered to have a comparison with the proposed algorithm, baeDE-CDFs in terms of ARI. The first is an automatic clustering algorithm proposed by [24]; the second is a state-of-the-art crisp clustering algorithm suggested by Thao Trang et al., namely, GA-CDF [14]; the third is a nonhierarchical clustering algorithm derived from [25].
The evaluation strategy is outlined as follows. Firstly, the Shapiro-Wilk test is employed to test the normality of ARI derived from algorithms. The result confirms that all ARI of algorithms do not follow a normal distribution due to sig. < 0.05 as illustrated in Table 8 and Figure 16. As a result, some nonparametric tests are used in this case. Specifically, for comparison of mean difference between ARI of baeDE-CDFs and that of remaining algorithms, the Mann-Whitney test is employed; for rank of performance of algorithms, the Kruskal-Wallis test is used. The detailed result of all descriptive statistics, test, and ranking is demonstrated in Table 9. However, it should be noticed that in the case ranks of algorithms' performance are not statistically significantly different, the ranks are not presented in the table. Besides, a sign "+" indicates that there is a statistically significant difference between baeDE-CDFs and compared algorithm and vice versa for a sign "-".
From the table, one common point that can be drawn is that the proposed algorithm is always superior to the remaining algorithms in terms of accuracy. For example, for the first three examples, the baeDE-CDFs maintain the absolute value of ARI, along with the standard deviation being 0. In addition, it is also ranked the first by result of Kruskal-Wallis test and the difference between baeDE-CDFs is statistically significant. These findings confirm that baeDE-CDFs really works well in the first three examples as proved by its accuracy and stability.
In last two examples, there is a downward trend of ARI of most algorithms, including baeDE-CDFs. It is worth noticing that most k-means based algorithms do not perform well in this case. For instance, algorithm of Chen and Hung constantly produces ARI 0 in both last cases; GA-CFD is a bit better but still achieves a low ARI, just around 0.5. In contrast, baeDE-CDFs is likely to be more superior by its ARI   persistently greater than 0.9 in the last two cases. Further, the mean test by Mann-Whitney and ranking by Kruskal-Wallis authenticate the dominance of baeDE-CDFs over the remaining algorithms in last cases.
In general, from performance of algorithms in this part, it is obvious that the k-medoids based algorithm baeDE-CDFs has proved its superiority over the compared k-means algorithms in terms of ARI. In particular, in complex problem like examples 4 and 5, this superiority is more apparent. Therefore, it would be noticed that the way to choose a representative PDF in clustering really plays a pivotal role in algorithm's performance. In particular, through all examples investigated, selecting representative PDF based on kmedoids shows an outstanding performance compared with one based on k-means.

Evaluation within Evolutionary Algorithms
Only. In this subsection, one more survey is executed to measure the performance of evolutionary algorithms applied to clustering problem. To the best of our knowledge, there are two available algorithms using population-based techniques to perform the clustering problem: one is GA-CDF of [14] and the other is MI-LXPM-CDF of [13]. In this circumstance, these algorithms are considered to have a comparison with baeDE-CDFs in five examples. Moreover, all these algorithms will be performed with the same objective function: modified SI. In detail, performance of all algorithms would be measured on both aspects: computational time and accuracy. Herein, computational time is computed based on average time of 50 running loops and so does ARI. Besides, some similar tests as in previous part are also performed to confirm statistical significance of difference in ARI between algorithms.
The detailed result of all algorithms is depicted in Table 10 and Figures 17 and 18. For the first example, we witness a good performance of all algorithms by ARI values approximating 1. Therefore, a statistical nonsignificant mean difference of ARI values between baeDE-CDFs has been found in this case. Similarly, ranking by Kruskal-Wallis is also useless so that ranks are not presented here. However, regarding computational time, it is noticed that baeDE-CDFs consumes the least time and with MILXPM-CDF the opposite is true.
For image data sets, we record the worse performance of two algorithms, namely, GA-CDF and MILXPM-CDF, compared with baeDE-CDFs with regard to accuracy. Both GA-CDF and MILXPM-CDF achieve ARI value lower than 0.5, with even negative value for MILXPM-CDF in case of CUReT data set. Moreover, results from Mann-Whitney test have confirmed the superiority of baeDE-CDFs over the remaining genetic algorithms with statistically significant level of 0.05. Further, ranks derived from Kruskal-Wallis test also show that baeDE-CDFs is always ranked first in four later data sets.
In connection with speed of algorithms' performance in four later data sets, due to the image object presented by a lot of pixel points, the computational time also increases. More specifically, computation speed of GA-CDF and MILXPM-CDF is far more inferior to that of baeDE-CDFs. The baeDE-CDFs is persistently of the first rank, while second and third positions are occupied by GA-CDF and MILXPM-CDF. In particular, in real application, the computational time is Non-traffic jam Traffic jam Figure 9: Two pictures represent traffic state: non-traffic jam and traffic jam, respectively, in front of Ton Duc Thang University.    a burden for evolutionary algorithms in such a problem. Particularly for cases of GA-CDF and MILXPM-CDF, the computational time is overestimated. However, looking back into the proposed method baeDE-CDFs, although the computational time may be remarkable, it seems nothing to that of the two remaining algorithms. Therefore, one would state that aeDE, being modified in binary chromosome and combined with estimation of distribution algorithm and the fitness function Silhouette used for k-medoids, produces more valid results than other genetic algorithms.

Conclusions
In this work, two primary modifications of aeDE are proposed to give the so-called binary adaptive elitist differential evolution for clustering of probability density functions (baeDE-CDFs) for solving clustering problem. Firstly, a kmedoids based scheme is suggested instead of k-means based one to be a representative PDF. This replacement helps the algorithm address outliers well leading to expecting a more accurate result. Secondly, a binary form of aeDE is suggested for the first time, which is inspired by the concept of estimation of distribution algorithm. This helps original aeDE to be more convenient for computation problem as well as taking advantage of the probability distribution model. In addition, an internal validity measure, namely, SI, is modified to be appropriate for k-medoids based automatic clustering and  also to contribute a measure index in field of clustering for PDFs.
The superiority of baeDE-CDFs has been already proved in the numerical example subsection. In comparison with k-means based clustering algorithm, baeDE-CDFs defeats convincingly the other algorithms in the literature in a statistically significant way in terms of quality of solution. As a result, it is always of the first rank according to Kruskal-Wallis nonparametric test. In comparison with evolutionary algorithms only, baeDE-CDFs is still far superior to GA-CDF and MILXPM-CDF both in quality of solution and computational time. In particular, the accuracy and speed of baeDE-CDFs are still maintained in more complex problem as example 5; meanwhile a poor quality of solution and overestimated computational time are witnessed in the remaining algorithms.
These illustrate that the baeDE-CDFs can offer a robust, effective, and reliable optimization method for solving clustering of PDFs, even in complicated cases. In addition, baeDE-CDFs is just quite similar to original aeDE. For that reason, it is simple to understand and implement. Moreover, with fast convergence rate, useful global search ability, and automatic clustering, the baeDE-CDFs can be an attractive tool for dynamic clustering of entirely unknown data sets.   Therefore, from what has been concluded, one can extend the application of the proposed algorithm in numerous fields. For example, it is possible to reduce traffic congestion in Vietnam by applying the proposed algorithm. Further, attenuating such traffic congestion could bring benefit to the citizens because they will spend less time in transit and have more time for their business. In addition, applying the proposed algorithm in recognizing materials in factories is also achievable. Hence, these extensions would be considered in our upcoming research.

Data Availability
For examples 1, 2, 3 and 4, the data used to support the findings of this study are included within the article. For example 5, the traffic data used to support the findings of this study is available from the corresponding author upon request.