Decision-Making Support for the Evaluation of Clustering Algorithms Based on MCDM

,


Introduction
Clustering is widely applied in the initial stage of big data analysis to divide large data sets into smaller sections, so the data can be comprehended and mastered easily with successive analytic operations [1][2][3]. e processing of massive data relies on the selection of an appropriate clustering algorithm, and the issue of the evaluation of clustering algorithms remains an active and significant issue in many subjects, such as fuzzy set, genomics, data mining, computer science, machine learning, business intelligence, and financial analysis [1, [4][5][6]. Computer scientists, economists, political scientists, bioinformatics specialists, sociologists, and many other groups usually debate the potential costs and benefits by analyzing these data for supporting decisionmaking [7]. However, the decision-making process is extremely complex because of competing interests of multiple stakeholders and the intricacy of systems [8][9][10].
Clustering algorithms, which are unsupervised patternlearning algorithms without a priori information, partition the original data space into smaller sections with high intergroup dissimilarities and intragroup similarities. Clustering can be used to process various types of massive data to uncover unknown correlations, hidden patterns, and other potentially useful information. However, Naldi et al. [11] pointed out different clustering algorithms sometimes produce different data partitions. In some situations, different algorithms produce different or even conflicting results.
erefore, the evaluation of clustering algorithms remains a significant task and a challenging problem.
Several validity measures for assessing clustering algorithms are presented successively, such as the Xie-Beni (XB) index [12], the I-index [13], the CS index [14,15], Dunn's index [16,17], and the Davies-Bouldin (DB) index [18,19]. ese validity measures are often divided into the three categories of external, relative, and internal measures [20][21][22]. External measures compare the partitions produced by clustering algorithms with a given data partition [20,22]. Relative measures compare partitions produced by the same clustering algorithm with discrepant data subsets or diverse parameters [22]. Internal measures depend on computing the property of the resulting clusters [22]. Brrn et al. [20] stated that relative and internal measures fail in predicting and locating error produced by clustering algorithms, and external measures for evaluating clustering results perform more effectively. erefore, in our empirical research, we will select external measures to evaluate and measure the performance of clustering algorithms. e theorem of No Free Lunch (NFL) states that there exists no single model or algorithm which can get the best performance for a given domain problem [23][24][25]. It suggests that the evaluation of clustering algorithms is very complicated and challenging. Moreover, different clustering algorithms may produce different or conflicting partitions. e motivation of this paper fixates on the evaluation of clustering algorithms to reconcile different or even conflicting evaluation performance. Besides, the reconciliation of these differences or conflicts is an important problem which has not been fully investigated. In addition, the evaluation of clustering algorithms usually involves multiple criteria which are modeled as an MCDM problem. So, based on MCDM, this paper proposes a model, called decisionmaking support for evaluation of clustering algorithms (DMSECA), to evaluate and measure the performance of clustering algorithms and further to reconcile their differences or even conflicts among the evaluation performance of clustering algorithms during a complex decision-making process.
e proposed model consists of three steps. First, we apply the six most influential clustering algorithms to task modeling on 20 UCI data sets with a total of 18,310 instances and 313 attributes. Second, based on nine external measures, we employ four commonly used MCDM approaches to rank the performance of clustering algorithms over the 20 UCI data sets. ird, based on the eighty-twenty rule, we propose a decision-making support model to generate a list of algorithm priorities to identify the best clustering algorithm among 20 UCI data sets for secondary mining and knowledge discovery. Each MCDM method is randomly assigned to five UCI data sets. e contributions of this article are threefold: first, our proposed DMSECA model can identify the best clustering algorithms for the given data sets by a generated list of algorithm priorities during a complex decision-making process. Second, the proposed model can reconcile those differences or even conflicts to achieve agreement in terms of the clustering algorithm evaluation.
ird, based on the eighty-twenty rule, the expert wisdom is merged to propose a decision-making support model to carry out secondary knowledge discovery for information fusion in a complex decision-making environment. e rest of this article is organized as follows. Section 2 reviews the related work. Section 3 describes some preliminaries, such as clustering algorithms, MCDM methods, and external measures. Section 4 proposes our model by merging the expert wisdom to reconcile disagreements among the clustering algorithms. Section 5 presents the data sets, provides the experimental design, shows empirical results, and discusses the significance of this work. Section 6 summarizes this article.

Related Work
Cluster analysis aims to classify elements into categories on the basis of their similarity [26]. In recent years, many clustering algorithms have been proposed [26][27][28][29]. e density peak clustering has been published by Rodriguez and Laio in Science [26]. In view of the low objectivity and accuracy because of the man-made factor, a density fragment clustering without peaks is proposed based on density peak clustering [30]. Jiang et al. [28] developed GDPC algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. In order to overcome the defect of the original DPC in detecting anomalies and hub nodes, Jiang et al. [29] proposed an improved recognition method on the halo node for density peak clustering algorithm (halo DPC) [29]. e proposed halo DPC can improve the ability to deal with varying densities, irregular shapes, the number of clusters, outlier, and hub node detection [29].
Clustering ensemble has been increasingly popular in the recent years by consolidating several base clustering methods into a probably better and more robust one. Alizadeh et al. [31] presented a novel optimization-based method for the combination of cluster ensembles. Parvin and Minaei-Bidgoli [32] proposed a weighted locally adaptive clustering (WLAC) algorithm which is based on the LAC algorithm. Considering that some features have more information than the others in a dataset, Parvin and Minaei-Bidgoli [27] proposed a fuzzy weighted locally adaptive clustering (FWLAC) algorithm, which is capable of handling imbalanced clustering. Abbasi et al. [33] proposed a criterion to assess the association between a cluster and a partition which is called edited normalized mutual information, ENMI criterion. Mojarad et al. [34] presented a clustering ensemble method, named RCEIFBC, with a new aggregation function, which takes into account the two similarity criteria: (a) one of them is the cluster-cluster similarity and (b) the other one is the object-cluster similarity. Mojarad et al. [35] proposed an ensemble aggregator, or a consensus function, called as the robust clustering ensemble based on sampling and cluster clustering (RCESCC) algorithm in order to get better clustering results. Rashidi et al. [36] proposed a new clustering ensemble approach using a weighting strategy for performing consensus clustering by exploiting the cluster uncertainty concept. Bagherinia et al. [37] proposed a novel fuzzy clustering ensemble framework based on a new fuzzy diversity measure and a fuzzy quality measure to find the base clusterings with the best performance. In clustering ensemble, multiple clustering outputs can be combined to 2 Complexity produce better results in terms of consistency, robustness, and performance than the basic individual clustering methods. e evaluation of clustering algorithms is an active issue in the fields such as machine learning, data mining, artificial intelligence, databases, and pattern recognition [11]. In a typical clustering scenario, three fundamental questions must be addressed: (i) identifying an effective clustering algorithm suitable for a given data set; (ii) determining how many clusters are presented in the data; and (iii) evaluating the clustering [38]. is article focuses on the first problem.
Several validity measures have been proposed to evaluate clustering algorithms. Yeung et al. [39] pointed out that the figure of merit (FOM) is used on microarray data, and different biological groups represent the clusters. Halkidi et al. [40] presented the Rand statistic to measure the proportion of pairs of vectors. Roth et al. [41] presented a stability measure to evaluate the partitioning validity and to choose the number of clusters. Chou et al. [14] presented a CS cluster relative measure to assess clusters with different sizes and densities.Žalik [42] presented a CO cluster-validity measure based on compactness and overlapping measures to estimate the quality of partitions. Chou et al. [43] presented an area measure to evaluate the initial cluster number based on the information of cluster areas. Wani and Riyaz [44] presented a new compactness measure using a novel penalty function to describe the typical behavior of a cluster. Azhagiri and Rajesh [45] proposed a novel approach to measure the quality of the cluster and can find intrusions using intrusion unearthing and probability clomp algorithm. e validity measures are often divided into the types of internal, relative, and external measures [20,21,24,25,46]. Internal measures are based on computing properties of the resulting clusters, and these measures do not include additional information on the data [20,25,47]. Relative measures are based on the comparison of partitions produced by the same clustering algorithm with different data subsets or different parameters, and they do not demand additional information [20,25,39]. External measures compare the partitions produced by clustering algorithms with a given data partition [20,25,48]. ese correspond to a kind of error measurement, so they can be supposed to offer improved correlation to the true error [20]. e results of Burn et al. [20] indicate that external measures for evaluating clustering results are more accurate than internal or relative measures. us, external measures are selected to assess the performance of clustering algorithms.
In addition, the evaluation of clustering algorithms involves more than one criterion. us, it can be solved by MCDM methods. is differs from previous approaches. For example, Dudoit and Fridlyand [49] proposed a predictionbased resampling method to evaluate the number of clusters, and Sugar and James [50] chose the number of clusters by an information-theoretical approach. Peng et al. [51] developed an MCDM-based method to select the number of clusters. Peng et al. [52] also developed a framework to select the appropriate clustering algorithm and to further choose the number of clusters. Meyer and Olteanu [53] indicated clustering in the field of multicriteria decision aid (MCDA) has seen a few adaptations of methods from data analysis, most of them however using concepts native to that field, such as the notions of similarity and distance measures. Besides, Chen et al. [54] pointed out the clustering problem is one of the well-known MCDA problems, and the existing versions of the K-means clustering algorithm are only used for partitioning the data into several clusters which do not have priority relations; therefore, Chen et al. [54] proposed a complete ordered clustering algorithm called the ordered K-means clustering algorithm, which considers the preference degree between any two alternatives. Mahdiraji et al. [55] presented marketing strategies evaluation based on big data analysis by a clustering-MCDM approach. is paper takes a new perspective by proposing a DMSECA model based on the MCDM method, merging expert wisdom by using the eighty-twenty rule to select the best clustering algorithms for the given data sets during a complex decision process. Furthermore, our proposed DMSECA model can reconcile different or even conflicting evaluation performance to reach a group agreement for information fusion in a complex decision-making environment. e eighty-twenty rule is proposed by Pareto [56], who researches the wealth distribution in different countries. e eighty-twenty rule is based on the observation that, in most countries, about 80% of the wealth is controlled by about 20% of the people, which is called a "predictable imbalance" by Pareto [57]. e eighty-twenty rule has been expanded to many fields such as sociology and quality control [58]. In this work, the eighty-twenty rule is used to focus on the analysis on the most important positions of the rankings in relation to the number of observations for predictable imbalance. e truth is often in a few hands: the views of about 20% of the people represent more satisfactory rankings in the opinion of all participants. e decision-making process is extremely complex because of competing interests of multiple stakeholders and the intricacy of systems [8][9][10]. In this paper, the proposed DMSECA model, based on MCDM methods and the eightytwenty rule, presents a new perspective by merging expert wisdom to evaluate the most appropriate clustering algorithm for the given data sets and the proposed model can reconcile individual differences or conflicts to achieve group agreements among clustering algorithm evaluations in a complex decision-making environment.

Preliminaries
is section presents some elementary and preparatory knowledge. It first introduces several evaluation approaches in Section 3.1, and then, the classic MCDM methods are presented in Section 3.2; finally, the performance measures of clustering algorithms are described in Section 3.3.

Clustering Algorithms.
Clustering is a popular unsupervised learning technique. It aims to divide large data sets into smaller sections so that objects in the same cluster are lowly distinct, whereas objects in different clusters are lowly similar [21]. Clustering algorithms, based on similarity Complexity 3 criteria, can group patterns, where groups are sets of similar patterns [54,59,60]. Clustering algorithms are widely applied in many research fields, such as genomics, image segmentation, document retrieval, sociology, bioinformatics, psychology, business intelligence, and financial analysis [61][62][63][64].
Clustering algorithms are usually known as the four classes of partitioning methods, hierarchical methods, density-based methods, and model-based methods [65]. Several classic clustering algorithms are proposed and reported, such as the K-means algorithm [66], k-medoid algorithm [67], expectation maximization (EM) [68], and frequent pattern-based clustering [65]. In this paper, the six most influential clustering algorithms are selected for the empirical study. ese are the KM algorithm, EM algorithm, filtered clustering (FC), farthest-first (FF) algorithm, makedensity-based clustering (MD), and hierarchical clustering (HC). ese clustering algorithms can be implemented by WEKA [69].
e KM algorithm, a partitioning method, takes the input parameter k and partitions a set of n objects into k clusters so that the resulting intracluster similarity is high, and the intercluster similarity is low. And the cluster similarity can be measured by the mean value of the objects in a cluster, which can be viewed as the centroid or center of gravity of the cluster [65]. e EM algorithm, which is considered as an extension of the KM algorithm, is an iterative method to find the maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables [70]. e KM algorithm assigns each object to a cluster.
In the EM algorithm, each object is assigned to each cluster according to a weight representing its probability of membership. In other words, there are no strict boundaries between the clusters. us, new means can be computed based on the weighted measures [68]. e FC applied in this work can be implemented by WEKA [69]. Like the cluster, the structure of the filter is based exclusively on the training data, and test instances will be addressed by the filter without changing their structure. e FF algorithm is a fast, greedy, and simple approximation algorithm to the k-center problem [67], where the k points are first selected as a cluster center, and the second center is greedily selected as the point farthest from the first. Each remaining center is determined by greedily selecting the point farthest from the set of chosen centers, and the remaining points are added to the cluster whose center is the closest [66,71]. e MD algorithm is a density-based method. e general idea is to continue growing the given cluster as long as the density (the number of objects or data points) in the neighborhood exceeds some threshold. at is, for each data point within a given cluster, the neighborhood of a given radius must contain a minimum number of points [65]. e HC algorithm is a method of cluster analysis that seeks to build a hierarchy of clusters, which can create a hierarchical decomposition of the given data sets [66,72].

MCDM Methods.
e MCDM methods, which were developed in the 1970s, are a complete set of decision analysis technologies that have evolved as an important research field of operation research [73,74]. e International Society on MCDM defines MCDM as the research of methods and procedures concerning multiple conflicting criteria, which can be formally incorporated into the management planning process [73]. In an MCDM problem, the evaluation criteria are assumed to be independent [75,76]. MCDM methods aim to assist decision-makers (DMs) to identify an optimal solution from a number of alternatives by synthesizing objective measurements and value judgments [77,78]. In this section, four classic MCDM methods: the weighted sum method (WSM), grey relational analysis (GRA), TOPSIS, and PROMETHEE II are introduced as follows. [79] is a well-known MCDM method for evaluating finite alternatives in terms of finite decision criteria when all the data are expressed in the same unit [80,81].

WSM. WSM
e benefit-to-cost-ratio and benefit-minus-cost approaches [82] can be applied to the problem of involving both benefit and cost criteria. In this paper, the cost criteria are first transformed to benefit criteria. Besides, there is nominal-the-better (NB), when the value is closer to the objective value, the nominal-the-better (NB) is better. e calculation steps of WSM are as follows. First, assume n criteria, including benefit criteria and cost criteria, and m alternatives. e cost criteria are first converted to benefit criteria in the following standardization process.
(1) e larger-the-better (LB): a larger objective value is better, that is, the benefit criteria, and it can be standardized as (1) (2) e smaller-the-better (SB): the smaller objective value is better, that is, the cost criteria, and it can be standardized as (3) e nominal-the-better (NB): the closer to the objective value is better, and it can be standardized as Finally, the total benefit of all the alternatives can be calculated as e larger WSM value indicates the better alternative.

GRA.
GRA is a basic MCDM method of quantitative research and qualitative analysis for system analysis [83]. Based on the grey space, it can address inaccurate and incomplete information [84]. GRA has been widely applied in modeling, prediction, systems analysis, data processing, and decision-making [83,[85][86][87][88]. e principle is to analyze the similarity relationship between the reference series and alternative series [89]. e detailed steps are as follows.
Assume that the initial matrix is R: (1) Standardize the initial matrix: (2) Generate the reference sequence x 0 ′ : where x 0 ′ (j) is the largest and standardized value in the jth factor. (3) Calculate the differences Δ 0i (j) between the reference series and alternative series: (4) Calculate the grey coefficient r 0i (j): where δ is a distinguished coefficient. e value of δ is generally set to 0.5 to provide good stability. (5) Calculate the value of grey relational degree b i : (6) Finally, standardize the value of grey relational degree β i : [90], TOPSIS is one of the classic MCDM methods to rank alternatives over multicriteria. e principle is that the chosen alternative should have the shortest distance from the positive ideal solution (PIS) and the farthest distance from the negative ideal solution (NIS) [91]. TOPSIS can find the best alternative by minimizing the distance to the PIS and maximizing the distance to the NIS [92]. e alternatives can be ranked by their relative closeness to the ideal solution. e calculation steps are as follows [93]:

TOPSIS. Developed by Hwang and Yoon
(1) e decision matrix A is standardized: (2) e weighted standardized decision matrix is computed: where the V * are the criteria weights, and m i�1 w j � 1. (3) e PIS V * and the NIS V − are calculated: (4) e distances of each alternative from PIS and NIS are determined: (5) e relative closeness to the ideal solution is obtained: where when R is closer to 1, the alternative is closer to the ideal solution. (6) e preference order is ranked. e larger relative closeness indicates the better alternative.

PROMETHEE II. PROMETHEE II, proposed by
Brans in 1982, uses pairwise comparisons and "values outranking relations" to select the best alternative [94]. PROMETHEE II can support DMs to reach an agreement on feasible alternatives over multiple criteria from different perspectives [95,96]. In the PROMETHEE II method, a Complexity 5 positive outranking flow reveals that the chosen alternative outranks all alternatives, whereas a negative outranking flow reveals that the chosen alternative is outranked by all alternatives [51,97]. Based on the positive outranking flows and negative outranking flows, the final alternative can be selected and determined by the net outranking flow [98]. e steps are as follows: (1) Normalize the decision matrix R: (2) Define the aggregated preference indices. Let a, b ∈ A and where A is a finite set of alternatives a 1 , a 2 , . . . , a n , k is the number of criteria such that 1 ≤ k ≤ m, w j is the weight of criterion j, and k j�1 w j � 1(1 ≤ k ≤ m). π(a, b) represents how a is preferred to b over all criteria, and p j (a, b) represents how b is preferred to a over all criteria. p j (a, b) and p j (b, a) are the preference functions of the alternatives a and b.
In general, there are six types of preference function. DMs must select one type of preference function and the corresponding parameter value for each criterion [51,98]. (4) Determine the positive outranking flow and negative outranking flow. e positive outranking flow is determined by and the negative outranking flow is determined by (5) Calculate the net outranking flow: (6) Determine the ranking according to the net outranking flow.
Larger ϕ(a) is the more appropriate alternative.

Performance Measures.
Burn et al. [20] proposed that external measures for evaluating clustering results are more effective than internal and relative measures. Accordingly, in this study, nine clustering external measures are selected for evaluation. ese are entropy, purity, microaverage precision (MAP), Rand index (RI), adjusted Rand index (ARI), F-measure (FM), Fowlkes-Mallows index (FMI), Jaccard coefficient (JC), and Mirkin metric (MM). Among them, measures of entropy and purity are widely applied as external measures in the fields of data mining and machine learning [99,100]. e nine external measures are generated by a computer with an Intel core i5-3210M CPU @ 2.50 GHz with 8G memory. Before introducing external measures, the contingency table is described. Table. Given a data set D with n objects, suppose we have a partition P � P 1 , P 2 , . . . , P n by some clustering method, where ∪ k i�1 P i � D and P i ∩ P j � ϕ, for 1 ≤ i ≠ j ≤ k. According to the preassigned class labels, we can create another partition on

e Contingency
Let n ij denote the number of objects in cluster P i with the label of class C j . en, the data information between the two partitions can be displayed in the form of a contingency table, as shown in Table 1 [65]. e following paragraphs define the external measures. e measures of entropy and purity are widely applied in the field of data mining and machine learning [99,100].
(1) Entropy. e measure of entropy, which originated in the information-retrieval community, can measure the variance of a probability distribution. If all clusters consist of objects with only a single class label, the entropy is zero, and as the class labels of objects in a cluster become more varied, the entropy increases [101]. e measure of entropy is calculated as A lower entropy value usually indicates more effective clustering.
(2) Purity. e measure of purity pays close attention to the representative class (the class with majority objects within each cluster) [102]. Purity is similar to entropy. It is calculated as A higher purity value usually represents more effective clustering.
(3) F-Measure. e F-measure (FM) is a harmonic mean of precision and recall. It is commonly considered as clustering accuracy [103]. e calculation of FM is inspired by the information-retrieval metric as follows:

Complexity
A higher value of FM generally indicates more accurate clustering.
(4) Microaverage Precision. e MAP is usually applied in the information-retrieval community [104]. It can obtain a clustering result by assigning all data objects in a given cluster to the most dominant class label and then evaluating the following quantities for each class [60]: (1) α(C j ): the number of objects correctly assigned to class C j . (2) β(C j ): the number of objects incorrectly assigned to classC j . e MAP measure is computed as follows: A higher MAP value indicates more accurate clustering.

(5) Mirkin
Metric. e measure of Mirkin metric (MM) assumes the null value for identical clusters and a positive value, otherwise. It corresponds to the Hamming distance between the binary vector representations of each partition [105]. e measure of MM is computed as A lower value of MM implies more accurate clustering. In addition, given a data set, assume a partition C is a clustering structure of a data set and P is a partition by some clustering method. We refer to a pair of points from the data set as follows: (i) SS: if both points belong to the same cluster of the clustering structure C and to the same group of the partition P (ii) SD: if the points belong to the same clusters of C and to different groups of P (iii) DS: if the points belong to different clusters of C and to the same groups of P (iv) DD: if the points belong to different clusters of C and to different groups of P Assume that a, b, c, and d are the numbers of SS, SD, DS, and DD pairs, respectively, and that M � a + b + c + d, which is the maximum number of pairs in the data set. e following indicators for measuring the degree of similarity between C and P can be defined.
(6) Rand Index. e RI is a measure of the similarity between two data clusters in statistics and data clustering [106]. RI is computed as follows: A higher value of RI indicates a more accurate result of clustering.
(7) Jaccard Coefficient. e JC, also known as the Jaccard similarity coefficient (originally named the "coefficient de communauté" by Paul Jaccard), is a statistic applied to compare the similarity and diversity of sample sets [107]. JC is computed as follows: A higher value of JC indicates a more accurate result of clustering. [108] as an alternative for the RI. e measure of FMI is computed as follows:

(8) Fowlkes and Mallows Index. e Fowlkes and Mallows index (FMI) was proposed by Fowlkes and Mallows
A higher value of FMI indicates more accurate clustering.
(9) Adjusted Rand Index. e adjusted Rand index (ARI) is the corrected-for-chance version of the measure of RI [106]. It ranges from − 1 to 1 and expresses the level of concordance between two bipartitions [109]. A value of ARI closest to 1 indicates almost perfect concordance between the two compared bipartitions, whereas a value near − 1 indicates almost complete discordance [110]. e measure of ARI is computed as

Index Weights.
In this work, the index weights of the four MCDM methods can be calculated by AHP. e AHP method, proposed by Saaty [111], is a widely used tool for modeling unstructured problems by synthesizing subjective and objective information in many disciplines, such as politics, economics, biology, sociology, management science, and life sciences [112][113][114]. It can elicit a corresponding priority vector according to pair-by-pair comparison values [115] obtained from the scores of experts on an appropriate scale [116]. AHP has some problems, for example, the priority vector derived from the eigenvalue method can violate a condition of order preservation proposed by Costa and Vansnick [117]. However, AHP is still a classic and important approach, especially in the fields of operation research and management science [118]. AHP has the following steps: (1) Establish a hierarchical structure: a complex problem can be established in such a structure, including the goal level, criteria level, and alternative level [119,120].
(2) Determine the pairwise comparison matrix: once the hierarchy is structured, the prioritization procedure starts for determining the relative importance of the criteria (index weights) within each level [119,121,122]. e pairwise comparison values are obtained from the scores of experts on a 1-9 scale [116]. (3) Calculate index weights: the index weights are usually calculated by the eigenvector method [120] proposed by Saaty [111]. (4) Test consistency: the value of 0.1 is generally considered the acceptable upper limit of the consistency ratio (CR). If the CR exceeds this value, the procedure must be repeated to improve consistency [119,121].

The Proposed Model
Clustering results can vary according to the evaluation method. Rankings can conflict even when abundant data are processed, and a large knowledge gap can exist between the evaluation results [123] due to the anticipation, experience, and expertise of all individual participants. e decisionmaking process is extremely complex. is makes it difficult to make accurate and effective decisions [124]. As mentioned in Section 1, the proposed DMSECA model consists of three steps. ey are as follows. e first step usually involves modeling by clustering algorithms, which can be accomplished using one or more procedures selected from the categories of hierarchical, density-based, partitioning, and model-based methods [65]. In this section, we apply the six most influential clustering algorithms, including EM, the FF algorithm, FC, HC, MD, and KM, for task modeling by using WEKA 3.7 on 20 UCI data sets, including a total of 18,310 instances and 313 attributes. Each of these clustering algorithms belongs to one of the four categories of clustering algorithms mentioned previously. Hence, all categories are represented.
In the second step, four commonly used MCDM methods (TOPSIS, WSM, GRA, and PROMETHEE II) are applied to rank the performance of the clustering algorithms over 20 UCI data sets based on nine external measures as the input, computed in the first step. ese methods are highly suitable for the given data sets. Unsuitable methods were not selected. For example, we did not select VIKOR because its denominator would be zero for the given data sets. e index weights are determined by AHP based on the eigenvalue method. ree experts from the field of MCDM are selected and consulted as the DMs to derive the pairwise comparison values completed by the scores of experts. We randomly assign each MCDM method to five UCI data sets. We apply more than one MCDM method to analyze and evaluate the performance of clustering algorithms, which is essential.
Finally, in the third step, we propose a decision-making support model to reconcile the individual differences or even conflicts in the evaluation performance of the clustering algorithms among the 20 UCI data sets. e proposed model can generate a list of algorithm priorities to select the most appropriate clustering algorithm for secondary mining and knowledge discovery. e detailed steps of the decisionmaking support model, based on the 80-20 rule, are described as follows.
Step 1. Mark two sets of alternatives in a lower position and an upper position, respectively.
It is well known that the eighty-twenty rule reports that eighty percent of the results originate in twenty percent of the activity in most situations [58]. e rule can be credited to Vilfredo Pareto [56], who observes that eighty percent of the wealth is usually controlled by twenty percent of the people in most countries [57]. e implication is that it is better to be in the top of 20% than in the bottom of 80%. So, the eighty-twenty rule, introduced in Section 5, can be applied to focus on the analysis of the most important positions of the rankings in relation to the number of observations for predictable imbalance. e eighty-twenty rule indicates that the twenty percent of people, who are creating eighty percent of the results, which are highly leveraged. In this research, based on the expert wisdom originating from the twenty percent of people, the set of alternatives is classified into two categories, where the top of 1/5 of the alternatives is marked in an upper position, which represents more satisfactory rankings from the opinion of all individual participants involved in the algorithm evaluation process. e bottom of 1/5 is in a lower position, which represents more dissatisfactory rankings from the opinion of all individual participants. e element marked in the upper position is calculated as follows: where n is the number of alternatives. For instance, if n � 7, then x � 7 × 1/5 � 1.4 ≈ 2. Hence, the second position classifies the ranking, where the first and second positions are those alternatives in the upper position, which are considered as the collective group idea of the most appropriate and satisfactory alternatives. Similarly, the element marked in the lower position is calculated as where n is the number of alternatives. For instance, if n � 7, then 7 * 4/5 � 5.6 ≈ 6. us, the sixth position classifies the ranking, where the sixth and seventh positions in the lower positions are considered collectively as the worst and most dissatisfactory alternatives.
Step 2. Grade the sets of alternatives in the lower and upper positions, respectively. A score is assigned to each position of the set of alternatives in the lower position and upper position, respectively.
8 Complexity e score in the lower position can be calculated by assigning a value of 1 to the first position, 2 to the second position, . . ., and x to the last position. Finally, the score of each alternative in the lower position is totaled, marked d.
Similarly, the score in the upper position can be calculated by assigning a value of 1 to the last position, 2 to the penultimate position, . . ., and x to the first position. Finally, the score of each alternative in the upper position is totaled, marked b.
Step 3. Generate the priority of each alternative. e priority of each alternative f i , which represents the most satisfactory rankings from the opinions of all individual participants, can be determined as where a higher value of f i implies a higher priority.

Experimental Design and Results
We now present an experiment on 20 UCI data sets. is is designed to test and verify our proposed DMSECA model for performance evaluation of clustering algorithms in order to reconcile individual differences or even conflicts in the evaluation performance of clustering algorithms based on MCDM in a complex decision-making environment. e experimental data sets, experimental design, and experimental results are as follows.

Data Sets.
A total of 20 datasets are applied for performance evaluation of clustering algorithms in the experiment. ey are originated from the UCI repository (http:// archive.ics.uci.edu/ml/) [125]. ese 20 datasets, which include the structures and characteristics of data set characteristics, attribute characteristics, number of instances sizes range from 100 to 4,601, the number of attributes from 3 to 60, and the number of classes from 2 to 10.

Experimental Design.
In this section, the experimental design is described in detail to examine the feasibility and effectiveness of our proposed DMSECA model. e DMSECA model can be verified by applying the four MCDM methods introduced in Section 3.2 to estimate the performance of the clustering algorithms for the 20 selected public-domain UCI machine learning data sets. Each MCDM method is randomly assigned to five UCI data sets. e experimental design can be implemented as follows: Input: 20 UCI data sets. Output: rankings of evaluation performance of clustering algorithms to generate a list of algorithm priorities in order to select the best clustering algorithm and reconcile individual disagreements among their evaluations.
Step 1: prepare target data sets: data preprocessing to delete class labels of the original data sets.
Step 2: obtain clustering solutions: obtain clustering solutions of the six classic clustering algorithms introduced in Section 3.1 by WEKA based on the target data sets.
Step 3: calculate the values of nine external measures of each data set.
Step 4: obtain the weights of external measures. In this paper, the weights of external measures are obtained by AHP based on the eigenvalue method, which is scored by three invited and consulted experts.
Step 5: use WSM, TOPSIS, PROMETHEE II, and GRA to generate rankings of evaluation performance of clustering algorithms. Each MCDM method is randomly assigned to one of the five UCI data sets. e four MCDM methods are implemented using MAT-LAB 7.0, using the external measures as the input.
Step 6: achieve consensus. e consensus on different or even conflicting individual rankings of evaluation performance of clustering algorithms can be achieved by using the proposed decision-making support model in the third step, which merges expert wisdom.
Step 7: generate a list of algorithm priorities. e list can reconcile individual disagreements among the evaluation performance of clustering algorithms.

Experimental Results.
is section gives the obtained results by testing the proposed DMSECA model on the 20 UCI datasets including a total of 18,310 instances and 313 attributes to reconcile those individual differences or conflicts among evaluation performance of clustering algorithms. e six clustering algorithms, nine external measures, and four MCDM methods are applied to illustrate and explain our model. e experimental results are as follows.

Complexity 9
First, the values of the nine external measures of the 20 data sets can be obtained using the selected six clustering algorithms. e process is implemented according to Steps 1-3 in Section 5.2. To facilitate understanding, we have selected the Ionosphere data set as an example to explain the computational process. e initial values of the nine external measures, which are provided in Table 3, are standardized by equations (1)-(3) to transform cost criteria to benefit criteria. e standardized data are presented in Table 4. We highlight the optimal result of each external measure in boldface. It is clear that no clustering algorithm obtains the optimal results for all external measures. is supports the NFL theorem.
Second, the rankings of the clustering algorithms on the 20 data sets computed by SWM, TOPSIS, GRA, and PROMETHEE II are presented in Tables 5-8, respectively. e four MCDM methods are implemented using MATLAB 7.0 using the external measures, such as Purity, En, FM, and Rand as the input based on Tables 3 and 4. Each group of five UCI data sets can be processed by one of the four MCDM methods, which are randomly assigned. e measure weights of each expert applied in WSM, TOPSIS, GRA, and PROMETHEE II are obtained by AHP based on the eigenvalue method. e final index weights of three experts can be obtained by the weighted arithmetic mean to aggregation, which has been a widely used aggregation algorithm in the decision problems. e final index weights for the nine external measures, in the order given in Tables 4 and 5 e results in Tables 5-8 do not enable us to identify and determine a regular pattern of evaluation performance of the clustering algorithms.
e results indicate that various MCDM methods generate conflicting rankings. On the basis of these observed results, secondary mining and knowledge discovery are proposed to reconcile these disagreements.
Finally, a decision-making support model based on the eighty-twenty rule for secondary mining and knowledge discovery is applied to reconcile individual disagreements.
is model includes three steps as follows. In Step 1, mark two sets of alternatives in a lower position and an upper position, respectively. According to equations. (31) and (32), in the upper position, we know that n � 6, and then x � 6 × 1/5 � 1.  Table 9, based on Tables 5-8. In Step 2, grade the sets of alternatives in the lower and upper positions, respectively, according to Step 2 in Section 4. e scores of alternatives in the upper position d i can be totaled. Similarly, the scores of alternatives in the lower position d i can be totaled. en, the results are presented in Table 10 for the 20 UCI data sets.
In Step 3, the priority of each alternative is computed by equation (33), and the calculation results are reported in Table 10.

Discussion and Analysis.
e results in Tables 5-8 indicate that different MCDM methods produce different or even conflicting individual rankings. us, it is difficult for DMs to identify the best clustering algorithms for the given data sets. Table 10 reports a list of algorithm priorities. e       Table 11.
In Table 11, the number of each position ranking can be determined according to Tables 5-8. For example, for ranking 1 of the upper position, the numbers of clustering algorithms are 1, 3,9,8,3, and 12, respectively, and the rankings of the clustering algorithms are 6, 4/5, 2, 3, 4/5, and 1 corresponding to EM, FF, FC, HC, MD, and KM, respectively. However, the rankings of the lower positions are     1  3  4  3  2  7  FF  3  2  1  5  6  3  FC  9  3  3  0  4  1  HC  8  1  1  1  3  6  MD  3  4  3  8  2  0  KM  12  1  3  1  2  1 12 Complexity ignored. When making decisions, the overall situation affected by the decision-making process should be considered to the maximum extent. In this work, we establish two sets of alternatives in the lower and upper positions. After the rankings of the lower position are fully considered, the rankings of the clustering algorithms are 6, 4, 2, 5, 3, and 1, respectively. ese results are basically the same, which shows that our proposed model is feasible and effective. erefore, in this paper, from an empirical perspective, the effectiveness of our proposed model is examined and verified using six clustering algorithms, nine external measures, and four MCDM methods on 20 UCI data sets, including a total of 18,310 instances and 313 attributes. Moreover, our proposed model merges expert wisdom using the eighty-twenty rule, which reports that eighty percent of the results originate from twenty percent of the activity [58] and indicates that the twenty percent of people who are creating eighty percent of the results are highly leveraged.
us, based on the expert wisdom originating from the twenty percent of the people, the set of alternatives is classified into two categories, where the top of 1/5 of the alternatives is marked in an upper position, and the bottom of 1/5 is marked in a lower position. e empirical results also verify our proposed model and confirm its ability to reduce and reconcile individual differences among the performance of clustering algorithms by employing a list of algorithm priorities in a complex decision environment.

Conclusions
Data clustering is often widely applied in the initial stage of big data analysis. Clustering analysis can be used to examine massive data sets of a variety of types to uncover unknown correlations, hidden patterns, and other potentially useful information. However, Naldi et al. [11] pointed out that different clustering algorithms may produce different data partitions. Furthermore, the NFL theorem states that there exists no single algorithm or model that can achieve the best performance for a given domain problem [23][24][25]. erefore, the focused question becomes how to select the best clustering algorithms for the given data sets. e decision-making process is extremely complex because of competing interests of multiple stakeholders and the intricacy of systems [8][9][10].
is paper proposes a DMSECA model to estimate the performance of clustering algorithms in selecting the most satisfactory clustering algorithm according to the decision preferences of all individual participants during a complex decision-making process. e proposed model has been designed to reconcile individual disagreements in the evaluation performance of clustering algorithms.
e studies have shown that the DMSECA model, which is based on the eighty-twenty rule, can generate a list of algorithm priorities and an optimal ranking scheme that is the most satisfactory according to the decision preferences of all individual participants involved in a complex decision-making problem. An experimental study involves the use of 20 UCI data sets, including a total of 18,310 instances and 313 attributes, six clustering algorithms, nine external measures, and four MCDM methods in order to test and examine our proposed model. e feasibility and effectiveness of the proposed model are illustrated and verified by carrying out a statistical analysis of rankings for a total of 20 UCI data sets to allow for a comparison of the results with those generated by our proposed model. e results are basically the same as the rankings of the clustering algorithms produced by our proposed DMSECA model. e empirical results show that our proposed model cannot only identify the best clustering algorithms for the given data sets but also can reconcile individual differences or even conflicts to achieve group agreement among the evaluation performance of clustering algorithms in a complex decision-making environment. Finally, a decision-making support model is proposed by merging expert wisdom for secondary knowledge discovery, based on the 80-20 rule, in order to focus the analysis on the most important positions of the rankings in relation to the number of observations for predictable imbalance.
In future work, a decision support system including data space, method space, model space, and knowledge space will be further developed, which can deal with much more methods/models/algorithms, such as general clustering, theory of subspace clustering, fuzzy clustering, and density peak clustering, in order to form a robust and effective algorithm selection and evaluation framework for improving the universality of the application.

Data Availability
e data used to support the findings of this study are included within the article, and a total of 20 datasets are originated from the UCI repository (http://archive.ics.uci.edu/ml/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.