Community Detection Based on Density Peak Clustering Model and Multiple Attribute Decision-Making Strategy TOPSIS

. Community detection is one of the key research directions in complex network studies. We propose a community detection algorithm based on a density peak clustering model and multiple attribute decision-making strategy, TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution). First, the two-dimensional dataset, which is transformed from the network by taking the density and distance as the attributes of nodes, is clustered by using the DBSCAN algorithm, and outliers are determined and taken as the key nodes. Then, the initial community frameworks are formed and expanded by adding the most similar node of the community as its new member. In this process, we use TOPSIS to cohesively integrate four kinds of similarities to calculate an index, and use it as a criterion to select the most similar node. Then, we allocate the nonkey nodes that are not covered in the expanded communities. Finally, some communities are merged to obtain a stable partition in two ways. This paper designs some experiments for the algorithm on some real networks and some synthetic networks, and the proposed method is compared with some popular algorithms. The experimental results testify for the eﬀectiveness and show the accuracy of our algorithm.


Introduction
Research on complex networks [1] has been an important aspect of data mining.Complex networks are often abstracted from actual systems and are composed of nodes representing entities and edges representing connections between them.Due to the complexity of the mechanism of the systems, the macroscopic behavior of the networks does not conform to the single statistical randomness or the complete regularity, and the networks present a kind of complexity between the two properties.In different applications, the networks present a complex topology structure due to the diversity of the nodes and edges.Researches show that the networks abstracted from the real systems often have such characteristics as small-world [2], scale-free [3], and community structure [4,5].e smallworld characteristic shows that the nodes in the network are connected by a short path; and the scale-free feature means that the degree of the nodes follows a power-law distribution.
e nodes in the network can be divided into several groups, wherein the nodes within each group have more dense connections, and the connections between the groups sparse, with each group constituting a so-called "community."e community structure contains the organizational information in each part of the network and the interaction information between these parts, which can be of great help to the research on the underlying structure and potential functions of the actual systems.And, community detection can promote other complex network studies, like influence maximization [6][7][8], network embedding [9][10][11], feature-level fusion [12], and vulnerability assessment [13].
erefore, finding the community structure has become an important research direction of complex networks.
Community detection can be regarded as a clustering problem in the network.e density peak clustering algorithm [14] argues that the cluster centers have larger density because they are usually surrounded by lots of data points.At the same time, they are separated by the data points, resulting in larger distance between them.at makes them significantly larger in density and distance than the noncenter data points.Density peak clustering uses this feature to mine the cluster centers in the dataset and identify clusters of any shape.
e density peak clustering model is highly compatible with the problem of community detection.Due to their scale-free feature, the key nodes of the communities are surrounded by many low-degree nodes, so their densities are relatively higher, and there is larger distance between them because of the sparse connections between the communities.In recent years, there have been many methods for successfully applying the density peak clustering model to address the problem of community detection [15][16][17][18][19], which will be introduced in Section 2.
We find that some traditional methods applying density peak clustering model in community detection often use simple mathematical methods to distinguish the key nodes from the nonkey nodes, which include the product or simple linear combination of nodes' density and distance as the standard of discrimination.ese methods often encounter difficulties when distinguishing key nodes whose attributes are not particularly obvious from the node set.e reason is that some nonkey nodes which only have a larger density in the dense community might interfere with the key nodes of the sparser communities due to the intricate structure of the network.
In this paper, we think about this problem from another aspect, and we propose a new DPC-based method of community detection.Because the key nodes have larger density and distance, they become outliers facilitating the use of these properties as the nodes' attributes.erefore, we can turn the problem into outlier detection in the two-dimensional space about the nodes' properties instead of using traditional simple mathematical methods.
e key nodes and other nodes can be distinguished more accurately from this aspect.And, in our method, the DBSCAN algorithm [20] is used for outlier detection.
After obtaining the key nodes, we generate the frameworks of communities by taking them as the seeds, and expand each framework by gradually absorbing the most similar node iteratively.In the process of expansion, we use the T-similarity as a criterion to select the most similar node, and this index is computed by using the multiple attribute decision-making strategy, TOSIS (Technique for Order Preference by Similarity to an Ideal Solution) [21] to combine multiple similarities, and avoids the insufficient adaptation of a single similarity.e expansion phase stops when the benefit of external edges is greater than that of internal edges for the community.en, we also use the T-similarity to allot the nonkey nodes, which are not covered by the expanded communities, and obtain the initial community structure.At the end of this step, there are some small communities in our partition.To get the final result, we use two strategies to merge some of them, which include an approach by optimizing modularity and a strategy about community metrics.
Optimizing modularity [22] is a fast and quality-assured way of community merge.erefore, we make use of this strategy to merge some initial communities.Sometimes, this way might merge communities excessively due to the pursuit of modularity, which affects the resolution of communities.Community metric [23] is an indicator to evaluate the significance of a community.So, we also provide a way to solve the resolution problem by controlling the minimum community metric allowed in the network.e contributions of our work are summarized as follows: (i) In this paper, we mine the key nodes (cluster centers) in the DPC model from a new perspective that the DBSCAN algorithm is applied to the transformed two-dimensional dataset of network, which provides a new idea for distinguishing the key nodes from the nonkey nodes in the DPC model accurately.(ii) TOPSIS, a multi-attribute decision-making algorithm, is used to synthesize the four similarities in our method.On the one hand, the communities' expansion and merging operations are more stable and reasonable, and on the other hand, the adaptability of the proposed algorithm to different networks is improved.(iii) We propose two approaches for merging some of small communities; the difference between them is that they merge the unstable communities in the detection process from different directions.One is inclined to be higher modularity, and the other pays more attention to solve the resolution limit problem.Under the guarantee of the previous steps, either approach will result in a higher quality of partition.
e organization of this paper is as follows.Section 2 investigates the problem of community detection, Section 3 clarifies the specific steps of the proposed method, Section 4 testifies the effect of the proposed method through experiments, and Section 5 concludes the entire paper.

Related Work
e research on community detection in complex networks has been ongoing for decades, and a large number of methods have emerged.Here, we introduce some of them that have inspired the ideas of the proposed algorithm.

2.1.
Traditional Community Detection Methods.Optimizing modularity is a classical and efficacious way in community detection.And, the modularity is first proposed in the literature [22] along with the GN algorithm, which is used to evaluate the quality of the community structurelarger modularity often means higher quality of the structure.Fast Q [24] is a classic modularity optimization algorithm proposed by Newman.It initially takes each node as a community, and the hierarchical community structure is obtained by continuously merging the two communities with the largest modularity increment.Louvain algorithm [25] obtains the community structure with larger modularity by moving each node to its neighbor's community with the largest modularity increment, then compressing the community as a supernode, and repeatedly performing the above operations until the modularity is the largest.ECG [26] obtains k groups of communities by using one-level Louvain 2 Complexity k times, and weights the edges according to the probability that the two nodes are in the same community, then uses Louvain on the weighted network to get the community partition.Leiden [27] is an improvement of Louvain.e authors argue that weakly connected or even disconnected communities may appear during the operation of Louvain.erefore, they pay more attention to the connectivity of the communities when moving the nodes.e shortcomings of Louvain are overcome, and the algorithm is also optimized on the time complexity through the fast local node-moving method.In addition, there are many evolutionary algorithms which detect communities aiming at optimizing the modularity [28,29].
Optimizing modularity is not the only direction for community detection.Based on a variety of information on the network, there have been a lot of methods to partition networks from different directions.Attractor [30] is a method based on network dynamics wherein the interaction between nodes and the distance between nodes affect each other.Under their interplays, there are obvious differences between inter-community edges and intra-community edges on distance.After deleting the intercommunity edges with a large distance, the community structure can be obtained.LPA [31] is a community detection algorithm based on the mechanism of information propagation and has high efficiency.Every node is initially assigned with a unique label, and iteratively updates its own label to be the one that occurs in its neighborhood most frequently.
e label update procedure is repeated until every node's label is the most frequent one among its neighbors.At that time, the network is divided into communities according to the same labels held by the nodes.Infomap [32] links the problem of community detection with information coding through coding the communities and nodes.When the code length is shorter, the corresponding community structure is clearer.In the process of random walk, Infomap puts the node into its neighbor's community, and takes the community with the largest reduction of code length as the node's affiliation, until the code length reaches the minimum.Walktrap [33] defines the distance between nodes through random walks, and continuously merges the two communities with the shortest distance to obtain multilevel community structure.
e structures with the largest modularity are selected as the final partition.PPC [34] is a top-down divisive method, in which an ordered sequence of nodes is generated through random walk repeatedly first, and then PPC uses modularity to determine the cut point to separate subgraphs, and the network is continuously partitioned until there is no positive modularity gain.
e literature [35] proposes a community detection method based on game theory.is method defines the leadship of nodes by sequential-move game, and optimizes the attribution of nonleader nodes in dynamic systems.

Community Detection Methods Based on Density Peak
Clustering.Since the DPC (Density Peak Clustering) algorithm [14] was proposed in 2014, there have been many methods of applying DPC to detect communities.Isofdp [15] uses IsoMap to map nodes to a d-dimensional space, which presents more diversity while retaining the characteristics of the original network, then calculates the density and distance corresponding to the nodes in the low-dimensional space, and selects k community centers.By repeating the above process in the scopes of d and k, the optimum d and k and the corresponding community structures are obtained.IDPM [16] defines nodes' density and distance using Jaccard similarity and the shortest path length of other nodes, respectively, and mines the key nodes whose density is larger than a calculated threshold; then, the nonkey nodes are assigned to the key nodes correspondingly.en, IDPM iteratively merges the community and modifies the community affiliations of the unstable nodes in the community boundaries.Finally, the community structure of the network is obtained.IDPCNMF [17] uses the improved page rank score as the density, takes the shortest path length between the node and another one with the larger density as the distance.en, it calculates the product of the density and the distance, and selects the nodes whose value of product is larger than the mean plus the twice of the standard deviation as the center nodes.Taking the number of center nodes obtained as the parameter of NMF, the adjacency matrix is factorized into vectors containing community structure information; then, IDPCNMF analyzes the communities from the result.EADP [18] weights the link strength through common neighbors, no matter whether there are direct connections between nodes or not.
e authors use the reciprocal of the link strength as the distance, and map it to the density using the Gaussian kernel function.is algorithm takes the product of density and distance, calculates the product difference for all adjacent pairs of nodes, and then compares the difference with the value predicted by using a linear regression on other smaller differences.e community centers are obtained automatically.In addition, the expansion phase of EADP has been extended to the scope of the overlapping community detection.CDEP [19] compresses the nodes with degrees of 1 or 2 in the network to higher degree neighbors as super nodes.e degree of each super node is taken as the density, and the number of nodes contained in each super node is taken as the quality.CDEP uses the second-order difference method to mine some super nodes as the seeds, and uses common neighbor weighted similarity as a criterion to expand the seeds to generate communities.

Method
is paper proposes a community detection algorithm DPCT, which is based on the density peak clustering model [19] and the multiple attribute decision-making algorithm, TOPSIS [21].Transforming the network topology into a two-dimensional space related to the density and the distance, the clustering algorithm DBSCAN [20] is used to find the key nodes of the communities.We mine them in a more Complexity precise scope than traditional DPC methods that only use the simple mathematical combination of the density and the distance, so the key nodes are exact and reasonable.Because the networks are different in their topological structure, and single similarity is difficult to reflect the accurate similar situation in different scenarios.erefore, DPCT uses the multiple attribute decision-making algorithm TOPSIS to calculate a new similarity T-similarity, which can combine the advantages of multiple similarities.
e steps are shown in Figure 1, and can be summarized as follows: (1) Mine the key nodes using DBSCAN on a twodimensional dataset, which is transformed from the network.(2) Generate the community frameworks and expand them according to the calculated T-similarity.(3) Allocate the remaining nonkey nodes to the existing communities, or generate new small communities on the basis of T-similarity.(4) Merge some of the small communities.e pseudo code of the proposed method is shown in Algorithm 1.

Key Nodes Mining.
e density and distance of the nodes need to be calculated, which makes the network's topological information to be transformed into a two-dimensional space.First of all, the density contribution of a node with degree k to its neighbors is 1/k, so we use equation (1) to define the density of node v.
where N(v) represents the set of neighbors of the node v, and k(u) represents the degree of node u. at is, the density of a node is the sum of all the neighbors' contributions to it.e distance of node v is defined as the minimum length of the shortest path between it and the other nodes with higher density, and the distance of the node with the largest density is the maximum value to the other nodes' distance: where dis(u, v) refers to the length of the shortest path between the nodes u and v.In order to facilitate the subsequent operations, we normalize the two attributes of each node according to In this way, we transform the nodes into a two-dimensional space.Figure 2, for instance, shows the visualization of the dolphin social network [44] and the transformed data points.e key nodes in the four communities are "tigger," "jet," "grin," and "sn96," and they are obvious outliers in the two-dimensional space shown in Figure 2(b).In addition, the node "sn96" is a conspicuous key node in Figure 2(b), but this node does not have a particularly high density due to the sparse connections within its community.Finding the boundary between some key nodes like "sn96" and larger density nonkey nodes is always a problem worth exploring in the application of the DPC model.
In some of them, this boundary is often distinguished by simple mathematical methods.However, we found that some key nodes with lower density value, like the node "sn96," only holds a value of 0.2, and is interfered by nonkey nodes with higher density.Simple mathematical calculation is difficult to prevent such interference, and it may lose some key nodes.
We think about this problem from a more reasonable and accurate aspect, and turn it into an outlier detection problem.According to the scale-free characteristic of the network, the key nodes are outliers in the two-dimensional space, whether their density values are prominent or not.erefore, we use the DBSCAN to cluster the transformed dataset, and the nodes that cannot be clustered are taken as the key nodes.
DBSCAN is a classic clustering algorithm that clusters the dataset by using two parameters, the radius Eps and the minimal number of data points within the radius Minpts, and the data points that cannot be clustered are outliers.We use this algorithm to find the key nodes from the two-dimensional space like the one presented in Figure 2(b).
We intend to take the outliers which cannot be absorbed in any clusters as the key nodes for community detection.However, some key nodes may be recognized as small clusters on large-scaled networks.In this case, if we only select outliers, some potential key nodes will be left out.If multiple clusters are recognized, there must be some clusters formed by the key nodes.erefore, we check the attributes of the nodes in each cluster to determine the target clusters, and extract the key nodes from them.It is clearly shown that the nodes in each cluster have similar characteristics.If any member is not suitable for a key node, all the nodes in the cluster cannot be selected.Here, we calculate a density threshold θ as the criterion for examining clusters.e condition for a cluster CL to be excluded is In this paper, the value of θ is set to d/mρ ′ , d is the average degree of the network, and mρ

􏼈
is the maximal normalized density.After excluding all the clusters containing the nonkey nodes, the outliers and nodes in the remaining clusters together are selected as the key nodes.
In the above discussion, the largest cluster tends to be formed by nonkey nodes because of the power-law distribution characteristic [3].
us, it can be excluded.e pseudo code for the procedure of mining the key nodes is shown in Algorithm 2.
e steps in the procedure are clear.DBSCAN is used for clustering the dataset, which is transformed from the network.After the clusters and the outliers are determined, we examine all the clusters except for the biggest 4 Complexity e network.
Transform the network into a two-dimensional data set by calculating density and distance of each node, and use DBSCAN to identify outliers and small clusters' members as the key nodes.
Generate the frameworks of communities and expand them according to T-similarity calculated by TOPSIS.
Allocate the remaining nodes to the existed or new communities.
Merge some of the small communities.
e partition of network.Complexity one, and take the members in the suitable clusters and the outliers as the key nodes.

Community Expansion.
After processing using Algorithm 2, we have discovered the key nodes.In this subsection, we generate community frameworks based on them and attach most of the nonkey nodes.We intend to take each key node as a community initially, but we find that some key nodes hold the close connection.erefore, they should be contained in the same community.We use the condition presented in equation (5) to determine whether two key nodes v and u are close or not, at is to say, if the common neighbors between a pair of key nodes are more than half of the smaller degree node's neighbors, they are considered to be closely connected and put together; otherwise, each key node is regarded as an initial community framework separately.
We find that the processing order of these key nodes will affect the expansion result, and the experimental results show that firstly expanding the community framework formed by nodes with larger density and distance has the best effect.erefore, we integrate the density and the distance of every node v ∈ V as in the literature [14] to be the product c(v) of them: en, we arrange the community frameworks in the descending order of the largest product of the nodes in each of them.
In the expansion process, there are mainly two problems; one is how to choose the suitable node to add to the community, the other one is the termination condition of the procedure.For the first problem, we attempt to find the most similar node to the community as its new member continuously.We use equation (7) to calculate the similarity between the community and other nodes, where s(v, u) is the similarity between nodes u and v.
In the experiments, we sometimes found that a single similarity may not adapt to diverse networks.We ponder whether multiple similarities can be combined to integrate the advantages and determine the similar conditions between nodes accurately.So, we propose the T-similarity by introducing the multiple attribute decision-making algorithm TOPSIS to combine multiple similarities that include Jaccard similarity [45], Salton's cosine similarity [46], HPI similarity [47], and HDI similarity [47].e four similarities between nodes are defined as It can be seen from the above formulae that the most similar node will only appear in the first-or second-order neighbors of the community.erefore, we only calculate the similarities between the community and the nodes in this area, so that unnecessary calculations can be avoided.
For the issue of the termination condition, since the community is a node group with tight internal connections and sparse external connections, if the number's gain of the Input: G(V, E), the network; Eps, the radius of the DBSCAN algorithm; Minpts, the minimal number of data points contained within the radius.(9) if ∃ v ∈ cluster and ρ(v) < θ then (10) continue; ( 11) else (12) keynodes.add(cluster); (13) end ( 14) end (15) return keynodes ALGORITHM 2: Keynodes_mine (G(V, E), Eps, Minpts).6 Complexity external edges is greater than that for the integral edges after a node is added to the community, this node is not suitable to enter the community.erefore, if the community expands to a node that satisfies equation ( 9), the community expansion should be stopped before the node enters.
where e c in and e c out refer to the numbers of internal and external edges after a node is added to the community c, respectively, olde c in and olde c out refer to the counterparts before the node enters the community.e addition of 1 to the numerator and denominator ensures that the fraction is significant.
e specific steps for community expansion are summarized as the pseudo code in Algorithm 3. We first generate the community frameworks using the single key node or the close pairs of them, and then arrange the frameworks in the descending order of the key node's product of density and distance.en, T-similarities are calculated, and the most similar node is added to the frameworks continuously until the termination condition presented in equation ( 9) is met.

NonKey Nodes Allocation.
When the community expansion ends, there are still some nonkey nodes that have not been classified into any community.In order to obtain the community partition of the entire network, these nonkey nodes should be allotted.
To this end, we also use the T-similarity as a criterion to select the most similar node of each unclassified nonkey node.However, some nodes have no connection with their most similar nodes, and putting them into the same community without special consideration may create weakly connected or even disconnected communities.So after finding the most similar node, we classify the nonkey nodes according to the actual situation.For better explanation, we use v to represent the nonkey node to be allocated, and u, C u to represent v's most similar node and the community to which node u belongs.
e different situations and processing strategies are shown in Table 1.
It is clearly shown that the processing order of nonkey nodes also affects the community partition result, and the descending order of the c(v) of each nonkey node v according to equation ( 6) is optimal distinctly.e specific steps of this process are as follows.e value of c(v) for each unclassified nonkey node v is calculated firstly, and these nodes are arranged in the descending order of the values.
en, we calculate the T-similarity between each nonkey node and its first-and second-order neighbors; the most similar node is selected, and the nonkey node is allotted according to the specific connection situation of the two nodes.
is process is repeated until all nonkey nodes are allotted.e pseudo code of this process is shown in Algorithm 4.

Community Merging.
We have obtained the preliminary partition of the entire network by running Algorithms 2-4, sequentially.However, some preliminary communities each contain only one or two node(s), which are too small so that the intra-community edges are less than the inter-community ones.Here, we merge them using the modularity or community metric as the criteria separately, and the corresponding approaches are named as DPCT-Q and DPCT-M in this paper, respectively.

Community Merging Based on Modularity.
Modularity [22] is an important standard to measure the quality of the community partitions, reflecting the connection relationship among the nodes within the communities: where k represents the number of communities, e ij is the fraction of the number of edges between the communities C i and C j to the total number of edges, therefore  k i�1 e ii represents the proportion of the communities' internal edges, a i is the sum of e ij , so  k i�1 a 2 i represents the expectation of  k i�1 e ii .We have mentioned in Section 2 that Fast Q [24] is a classic algorithm targeting at optimizing the modularity.Here, we use the preliminary partition result to replace the initial partition of the original Fast Q algorithm with each single node being a community, and use the modularity increment as the basis for merging the preliminary communities.
e modularity increment led by joining a pair of communities C i and C j is calculated as In the merging process, the two communities with the largest modularity increment are joined together each time, and the corresponding modularity increments are updated, until there are no pair of communities that can lead to the positive modularity increment.In the whole process, whether communities are merged or not are determined by the modularity increment, which can merge the small communities in a direction that improves the overall modularity.

Community Merging Based on the Community Metric.
Although DPCT-Q has the advantages of parameter-free and large modularity, it might result in the merging of communities excessively and loss of part of the community's information.
e community metric [23] measures the significance of a community, and determines which one most needs to be merged.e community metric is used along with the minimum threshold δ; the communities whose community metric is less than δ are merged with the most similar community.
According to the characteristics of community, if a community is small in size and has many connections to the outside, it needs to be merged.erefore, the community metric for the community C i is defined as the product of the sparseness α i and the size fraction β i of C i :

Complexity
Input: G(V, E), the network; keynodes, the keynodes of communities.Output: partition, the partition of network.
Generate a new community v { } Input: G(V, E), the network; partition, the community partition of network; nonkeynodes, the nodes not covered in the expanded communities.

Output: partition, the partition of network. (1) arrange nonkeynodes by c(v) of each v;
(2) for each v in nonkeynodes do (3) candidate ⟵ v's first-and second-order neighbors; (4) simmat � calculate the four similarities between u and candidate; (5) T-similarity � TOPSIS (simmat); (6) u ⟵ node with max T-similarity; (7) if (v, u) ∈ E then (8) if u in partition then (9) add v to the community in which u belongs; (10) else (11) partition.add( v, u { }); (12) end (13) else (14) partition.add(v); ( 15) end ( 16) end (17) return partition ALGORITHM 4: Nonkeynodes_alloction (G(V, E), partition, nonkeynode).8 Complexity where α i and β i are calculated as equations ( 13) and ( 14), respectively, where E in i and E out i represent the internal and external edge sets of community C i , respectively; V i represents the set of nodes in community C i , and V represents that in the entire network.at is, α i is defined as the ratio of the number of internal edges to the number of external connections for C i , and β i is defined as the size of the community relative to the entire network.at means the smaller size of a community and the weaker the internal connections, the more it needs to be merged.erefore, we choose the community with the smallest community metric denoted as C i , then find the community C j that is most similar to C i and merge them.e similarity between the two communities is defined as equation ( 15), Same as the previous discussion, we use the T-similarity for s(u, v) in equation (15).
In this process, we merge the community with the smallest community metric into its most similar community, and update the community metric of the relevant community in turn, until all the communities whose community metric is less than δ are merged.
Comparing the two strategies, it can be seen that DPCT-Q merges communities automatically and tends to acquire a larger modularity.DPCT-M is more inclined to solve the resolution limit problem by adjusting the given threshold.

Time Complexity.
e time complexity of our algorithm is analyzed in this subsection.DPCT can be divided into four phases from the above discussion, and we analyze these steps.
Firstly, we transform the network into a two-dimensional dataset.For each node, the density is calculated in O(d), where d is the mean degree.And, the calculation of distance can be accomplished by a breadth first search process of finding a node with larger density value.According to the small-world law, this node can be found in O(n).erefore, the time complexity of the transforming step is O(n 2 ).And, the process of DBSCAN can be accomplished in O(n log n) [20].
If we obtain k community frameworks, each community finds its most similar node in O(logj), and the expansion phase is terminated in O(ilogj), where i is the average number of nodes absorbed, j is the mean number of communities' neighbors.At the end of the community expansion process, there are u nodes not covered; finding each node's most similar node needs O(logw), where w is the mean number of nodes' neighbors, and this process can be carried out within O(u logw).
In the community merging process, if there are c communities that need to be merged, the modularity optimization way needs O(c 2 ) [22] and the way of community metric control can be implemented in O(c log c) [23].In conclusion, since k, i, j, u, w, c are all much smaller than n, DPCT-Q and DPCT-M partition the network in

Experiments
In order to verify the performance of the proposed method, we design these experiments.First, we explore the influence of the DBSCAN's two parameters Eps and Minpts on the quality of the detected community structure, and use the results to set the parameters for the subsequent experiments.After that, we use some mature community detection algorithms to compare the performance with the proposed method on some real and synthetic networks.Finally, we present the comparison between the results of the proposed method with a single similarity and the results of combining four similarities using TOPSIS to testify the advantages of T-similarity.
In our experiments, we use twelve real-world networks and four groups of synthetic networks, which will be introduced in section 4.2.e indexes used are modularity [22] and NMI [48].
e larger the modularity, the better the community structure.e larger the NMI, the closer the detected community structure is to the real structure, and its maximum value is 1.

Experiments on the Settings of Parameters. Because we use the clustering results of DBSCAN, its parameters Eps
and Minpts affect the number of the key nodes, so as to affect the final community partition.Before performing the other experiments, we first explore the influence of these two parameters.Here, we conduct experiments on as many values as possible for them on three smaller networks, namely, the karate club network [49], the Riskmap network [50], and the dolphin social network [44], to observe their impact on the quality of the resultant communities.DPCT-M's third parameter δ needs to be adjusted according to the rule explored in the literature [23] after the other two parameters are determined.erefore, we take the results of DPCT-Q and plot them in the heat maps, as shown in Figures 3-5: From the figures, we can see that the detected number of key nodes decreases with the decrease of Eps and the increase of Minpts.
ese findings are logical.First, Eps  Because we choose the outliers and nodes in the smaller clusters, the number of key nodes will increase as Eps decreased or Minpts increased.However, if these two parameters are modified unfounded to increase the number of key nodes, some inappropriate nodes may be selected, which may lead to a decrease in the performance of DPCT.Although the DBSCAN algorithm is highly sensitive to parameters, the proposed method is robust due to the large difference between the attributes of the key nodes and the nonkey nodes.From Figures 3-5, we can also find that the best community structures are obtained from these three networks when Eps is 0.1 and Minpts is 3. And, because we normalized the density and distance of nodes, the parameter settings can still refer to this situation on other networks of different scales.e larger the network's scale, the smaller Eps and the larger Minpts need to be tuned.

Comparative Experiments.
In this section, we present the performance of DPCT on 12 real networks and 4 groups of synthetic networks with the comparison algorithms.We use NMI and modularity as the indexes for evaluating the algorithm's performance on the networks with known ground-truth community structures, and use the modularity as the evaluation index for networks with the ground-truth being unknown.
In these experiments, we determine the best Eps and Minpts through DPCT-Q firstly, then apply them to DPCT-M and adjust the parameter δ.For algorithms with parameters such as the Attractor, we adjust the parameters to the best on each network.For algorithms with nondeterministic results, such as Louvain and Leiden, we run each of them 50 times on each network and take the largest value.

Real-World Networks.
e information of the realworld networks used is listed in Table 2.For the five networks with real community structures in the table, we visualize the ground-truth structure and the results obtained by DPCT in Figures 6-10.
As shown in Figure 6, both DPCT-Q and DPCT-M split the karate network's two parts into four communities.e difference is that DPCT-M mistakenly assigns the node "10" into the community of node "1." is is because the most similar node for "10" is "29," but there is no edge between the nodes "10" and "29," and the node "10" does not have enough neighbors in the community to which node "29" belongs.erefore, node "10" is regarded as an isolated community.In the merging process, joining node "10" into the community of node "34" yields larger modularity gain in DPCT-Q.While in DPCT-M,node "10" is more similar with the community of node "1."Compared with the ground truth, both of the two kinds of partitions have larger modularity values.
In Figure 7, DPCT-Q and DPCT-M obtain the same partition from the Riskmap network, and both of them split the community of node "18" in the upper right corner of the ground-truth structure into two smaller communities, because the connections between the two small communities are not close enough.
In the dolphin network, we can see from Figures 8(b) and 8(c) that because we accurately detect the key node "sn96" of the community in the upper right corner, its community has been detected successfully.
e nodes "kringer" and "thumper" are mistakenly classified into this community because of the larger modularity or T-similarity.Compared with the ground truth, nodes "sn89," "sn100," "zap," "ccl," and "double" are a tighter group in the community of the node "grin," and they are regarded as a new community by DPCT-Q, and DPCT-M merges this group into the community of the node "trigger".
Figure 9 shows the partition detected on the Santa Fe network.DPCT-Q splits the community of node "7" and the community of node "42" into new communities due to the pursuit of the large modularity.In the partitions of the two methods, node "83" is not correctly classified. is is because it is more similar to the community of node "102".
e football network has more connections than the other four networks.Figures 10(b) and 10(c) show the partitions of the two methods, respectively.Although, both DPCT-Q and DPCT-M merge some communities excessively, which is hard to avoid for some communities with relatively denser inter-community connections, both the partitions have considerable modularity.
After analyzing the results on these networks, we find that the proposed algorithm can acquire high-quality partitions.In order to better verify the performance of the proposed algorithm, we apply the comparison algorithms on the same networks.Here, we compare the modularity and the NMI of the algorithms' results and present them in the bar charts, as shown in Figure 11.From these figures, we can see that both DPCT-Q and DPCT-M achieve the largest or the second largest modularity.e scenario of NMIs is almost the same as that of modularity, but they are more or less affected by the misclassified nodes.However, the results still have large values on the Riskmap network and the Santa Fe network.
In addition, we test the effect of the comparison algorithms and the proposed algorithm on the other seven real networks, and the results are presented in Figure 12.Due to the large scale of the networks, some comparison algorithms cannot detect the community structure.For example, Attractor, Isofdp, and Walktrap cannot obtain the effective results from the Cond-mat networks.We do not plot the corresponding bars on the bar chart for the algorithms that cannot detect the corresponding results.
In these networks, the community partitions of the proposed algorithm generally have higher quality, which is reflected by the modularity of the results.It can be seen that the proposed algorithm has obtained the largest modularity on the Polbooks network, the e-mail network, the Power network, the PGP network, and the Cond-mat network, and the second largest values on other networks.

Synthetic Networks.
In the experiments on the real networks, the proposed algorithm exhibits excellent performance.Experiments on the artificially synthesized networks also can testify for the proposed algorithm's accuracy.
e LFR benchmark networks [59] are a kind of artificially synthesized networks, the characteristics of which are tuned by some parameters.We generate networks with different numbers of nodes, and different community sizes, to meet the needs for testing the algorithm's performance.e main parameters for generating the LFR network are as follows: n represents the number of nodes; k and maxk represent the average degree and maximum degree in the network; minC and maxC represent the minimum and maximum size of the community in the network; and exp1 and exp2 represent the power-law distribution exponents of the nodes' degree and the size of the communities.In addition, there is the most important parameter μ, which represents the proportion of the edges associated with each node but connecting outside of the node's community.at is, the larger the value of μ, the more difficult it is to detect the community structure.
We generate four groups of networks for experiments, including two scales of networks with 1000 nodes and 5000 nodes; for both scales, we have generated network groups holding small and large communities, which are denoted by     12 Complexity 1000s, 1000b, 5000s, and 5000b, respectively.Each group of networks are generated by varying the values of μ from 0.1 to 0.8 with increasing 0.1 each time, and ten networks are generated in the same parameter setting.For each comparison algorithm, we take the average of the results of the ten networks as its result to minimize the error caused by the occasionality.e specific parameters of the LFR network are shown in Table 3.
In the experiments on the "1000" series of networks, Eps is set to 0.05 and Minpts is set to 3. And, on the "5000" series of networks, Eps is set to 0.04 or 0.03, Minpts is set to 3 as well.
For the convenience of comparison, we draw the results of the algorithms in the line chart, as shown in Figure 13.It can be seen when the value of μ is low, both DPCT-Q and DPCT-M can get considerable NMI.As the value of μ gradually increases, the network's ground-truth structure gradually becomes indistinct.e performance of DPCT-Q which uses modularity increment as the standard is decreasing, and the NMI obtained is relatively low. is phenomenon is also reflected in all the modularity-optimization-based algorithms, and DPCT-Q performs particularly better among them.In contrast, the advantages of DPCT-M gradually appear with the increase of μ.In the four groups of networks, DPCT-Mcan obtain the best NMI values even when μ is 0.8.
ese results show that DPCT-M can accurately detect communities even when the community structure is ambiguous.

Complexity
rough comparative experiments, we assess the performance of the proposed algorithm.Irrespective on real networks or on synthetic networks, both DPCT-Q and DPCT-M can obtain higher quality partitions.e accuracy of DPCT-Q decreases in some extreme situations.At the same time, DPCT-M also has a strong ability to adapt to the extreme networks.

Similarity Analysis Experiments.
In our method, we use multiple attributes decision-making algorithm, TOPSIS, to calculate the T-similarity.In this group of experiments, we compare our method with T-similarity and with four different single similarities, respectively, on the five real-world networks with the ground-truth structure.For DPCT-M, we use Jaccard similarity, Salton's cosine similarity, HPI similarity, and HDI similarity in community expansion and nonkey nodes allocation process separately, and the corresponding methods are named as DPCT-Q j , DPCT-Q s , DPCT-Q p , and DPCT-Q d .For DPCT-M, we also use the four different single similarities to replace T-similarity in the processes; the replaced methods are denoted as From Figure 14(a), we can see that under the DPCT-Q framework, the modularity obtained on the karate club network by HDI is relatively low, but in contrast it reaches the largest value on the dolphin network.e scenarios for the other three similarities are similar, which testify that the single similarity does not have a good adaptiveness, and the T-similarity can integrate the advantages of four similarities in different networks.
ere is a similar rule in Figure 14(c).And we can observe that DPCT-Q and DPCT-M with T-similarity can obtain the largest modularity on each network.
Table 3: e parameters of the LFR benchmark networks, including the number of nodes, the average degree and max degree of nodes, the power-law distribution exponents, the minimum and maximal size of the community, and the mixing parameter μ with the increment of μ, dμ.Complexity ese comparison results demonstrate the rationality of the multiple attribute decision-making method.Of course, more similarities can be used as attributes to improve the accuracy.In this paper, four node similarities are selected to improve the quality of the detected communities while taking into account the calculation efficiency simultaneously.

Conclusion
is paper presents a community detection algorithm, DPCT, based on the density peak clustering model and multiple attribute decision-making strategy, TOPSIS.Using DBSCAN, we mine the key nodes to avoid interference with the nodes that have larger density, and use TOPSIS to calculate a new well-adapted T-similarity to replace the traditional similarities.
e proposed method generates communities' frameworks based on the key nodes and expands them, and allots the remaining nonkey nodes using T-similarity; finally, we use two strategies to merge some of the obtained communities.In the experiments, the influence of the parameters of DBSCAN on the community detection is first explored.After determining the best values of the parameters, the accuracy of DPCT is verified in comparison with other algorithms.Finally, we testify the adaptation of T-similarity.
Two strategies, namely DPCT-Q and DPCT-M, are proposed in this paper, and both of them have their own advantages; the former tends to obtain high-modularity partition and get results more handily, and the latter can partition the networks as accurately as possible even when the community structure is not clear.erefore, we suggest that DPCT-Q be used to partition the network first in practical applications.On the one hand, the community partition result can be obtained handily, and on the other hand, the suitable parameters Eps and Minpts of DBSCAN can be found.If the required community information is lacking in the result of DPCT-Q, DPCT-M can be used.
Frankly speaking, the time complexity of the proposed method is relatively high, which is mainly due to the fact that the shortest path length between nodes is used as the distance attribute of nodes, which takes more time to acquire the results when it is applied to large-scale networks.erefore, we will try to use other ways of distance calculation to improve the adaptability of the proposed method in the future work.

Data Availability
e networks used in our experiments include some realworld networks and some artificial datasets.e real-world networks that have been cited in Table 2 were taken from previously reported studies.Most of them can also be downloaded from http://www-personal.umich.edu/∼mejn/netdata/ and https://snap.stanford.edu/data/index.html.We construct the Riskmap network manually according to the literature [50].e artificial networks are synthesized using LFR benchmark network generator, which are freely available at https://sites.google.com/site/santofortunato/,and the parameters used are listed in Table 3.

Figure 3 :Figure 4 :
Figure 3: e influence of different parameters on the results in the karate club network.(a) e influence on the number of key nodes.(b) e influence on the modularity.(c) e influence on the NMI.In the figure, the closer the color of blocks to yellow, the greater the value of the block.is illustration style applies to the following figures as well.

Figure 5 :
Figure 5: e influence of different parameters on the results in the dolphin network.(a) e influence on the number of key nodes.(b) e influence on the modularity.(c) e influence on the NMI.

Figure 6 :
Figure 6: e karate club network.(a) e ground-truth community structure.(b) e result detected by DPCT-Q.(c) e result detected by DPCT-M.

Figure 8 :
Figure 8: e dolphin network.(a) e ground-truth community structure.(b) e result detected by DPCT-Q.(c) e result detected by DPCT-M.

Figure 9 :
Figure 9: e Santa Fe network.(a) e ground-truth community structure.(b) e result detected by DPCT-Q.(c) e result detected by DPCT-M.

Figure 10 :
Figure 10: e football network.(a) e ground-truth community structure.(b) e result detected by DPCT-Q.(c) e result identified by DPCT-M.

Figure 11 :Figure 12 :
Figure 11: Performance comparison between different community detection algorithms in the networks with ground truth: (a) comparison in modularity; (b) comparison in NMI.

Figure 13 :
Figure 13: Performance comparison between different community detection algorithms in the LFR networks.(a) e results detected in the 1000s networks.(b) e results obtained from the 1000b networks.(c) e results identified from 5000s networks.(d) e results acquired from 5000b networks.

Figure 14 :
Figure 14: Comparison of the effect of TOPSIS and single similarity.(a) e modularities detected by the DPCT-Q framework with T-Similarity and different single similarities.(b) e NMIs detected by DPCT-Q framework with T-Similarity and different single similarities.(c) e modularities detected by DPCT-M framework with T-Similarity and different single similarities.(d) e NMIs detected by DPCT-M framework with T-Similarity and different single similarities.
candidate ⟵ c 's first-and second-order neighbors; (12) simmat � calculate the four similarities between c and canidiate; (13) while True do

Table 2 :
e information of real-world networks.( e columns "|V|" and "|E|" represent the numbers of nodes and edges, respectively.e column "|C|" represents the number of communities in the ground-truth structure of the network, a symbol "-" means that the groundtruth community structure is unknown, and Eps and Minpts are the parameters.).