Detecting Communities in Complex Networks using an Adaptive Genetic Algorithm and node similarity-based encoding

Detecting communities in complex networks can shed light on the essential characteristics and functions of the modeled phenomena. This topic has attracted researchers of various fields from both academia and industry. Among the different methods implemented for community detection, Genetic Algorithms (GA) have become popular recently. Considering the drawbacks of the currently used locus-based and solution-vector-based encodings to represent the individuals, in this paper, we propose (1) a new node similarity-based encoding method to represent a network partition as an individual named MST-based. Then, we propose (2) a new Adaptive Genetic Algorithm for Community Detection, along with (3) a new initial population generation function, and (4) a new adaptive mutation function called sine-based mutation function. Using the proposed method, we combine similarity-based and modularity-optimization-based approaches to find the communities of complex networks in an evolutionary framework. Besides the fact that the proposed representation scheme can avoid meaningless mutations or disconnected communities, we show that the new initial population generation function, and the new adaptive mutation function, can improve the convergence time of the algorithm. Experiments and statistical tests verify the effectiveness of the proposed method compared with several classic and state-of-the-art algorithms.


Introduction
Ever since the introduction of graphs by Leonhard Euler in the 18-th century, they became a revolutionary tool to model and analyze real-world phenomena.Many of the complex systems have interactional features and graphs laid the necessary ground to model them.The graphs acquired from real-world phenomena, typically have characteristics that differentiate them from random graphs.One of these characteristics is the diversity of the edge density in the different parts of the network.These densely inter-connected components could be interpreted differently based on the context of the modeled phenomena and provide valuable information about its essence.While these sub-graphs can represent individuals with common interests in social networks, they might indicate the proteins involved in a specific function on a protein interaction network.Therefore finding these groups can provide meaningful information for the experts or recommender systems to make sound decisions Gasparetti et al. (2020); EL-MOUSSAOUI et al. (2019).The goal in community detection is to find a partition for the network that separates these densely connected parts from each other Gupta and Singh (2020).It must be noted that community detection differs from graph partitioning problem based on the predefined number of communities and their nodes.While this information is provided in the graph partitioning problems in advance, they form the major dilemmas of a community detection algorithm.Therefore, a community detection algorithm should be able to find the communities of a network in the absence of a predefined quantity of communities and their nodes Newman and Girvan (2004).
In recent years the popularity of community detection has increased among researchers from various fields.The need for a quality measure to evaluate the results of different methods led to a new measure called modularity.Modularity is a method that evaluates the given partition for a network considering the non-randomness of intra-community edges.Despite modularity's popularity, there are still methods that define the quality of a given partition based on other approaches such as nodes similarity because of modularity's scalability problem Coscia et al. (2012); Lancichinetti and Fortunato (2011).On the other hand, there is another category of algorithms that attempts to combine different approaches.
Researchers have implemented different artificial intelligence and evolutionary algorithms (EA) for the purpose of maximization of modularity and other measures.The GA's ability to solve various problems has brought considerable popularity for them in solving optimization problems.These methods start from random individuals, and through keeping and combining the fittest and eliminating the weak solutions, narrow the search space to desired solutions Cai et al. (2016).This seemingly simple logic has shown to be able to find remarkable results for complicated problems.In recent years, several EA-based methods proposed to solve community detection problems.Nearly all of the EA-based community detection algorithms can be classified on the modularity-optimization category (In section 2, we discuss different methods in a detailed literature review).However, despite the extensive use of modularity, it suffers from scalability problems, which indicates that modularity can't detect communities smaller than a specific scale Lancichinetti and Fortunato (2011).Therefore efforts to propose another measure continue.Node similarity-based algorithms try to solve this problem by offering alternative measuresCheng and Zhang (2016); Hesamipour and Balafar (2019).On the other hand, most of the GA-based community detection methods use locus-based or solution-vectorbased representations to encode each solution, while each of them has deficiencies (we will discuss the details of the deficiencies of these methods in section 4.1).
In this paper, we introduce a new individual encoding scheme in an attempt to both overcome the deficiencies of the existing representation schemes and make the benefit of different approaches.In the new encoding scheme, called MST-based representation, instead of using nodes and their neighbors, first, we create a weighted copy of the network using node similarity measures and form a binary chromosome by encoding its spanning tree's edges.Zero and one values of the elements of this string indicate the occurrence of a connection or detachment at the corresponding edge of the spanning tree.Then, we implement this representation to solve the community detection problem by a new adaptive genetic algorithm.
Compared with the other representations, MST-based representation can reduce the search space by eliminating some rare possibilities and therefore directing the procedure towards much-appealing solutions.Also, considering the time limit as one of the major dilemmas of the EAs, this representation can perform faster, compared to the other methods, regarding the fact that using this representation reduces the possible values of each gene from the number of its corresponding node's neighbors to two (connected or detached edge).We will show that amount of information loss caused by this representation is negligible considering the improvements in other parameters.To further improve the convergence time of our method, we propose a new initial population generation method based on the proposed representation.The novel initial population generation function separates the network into some initial communities using a simple but effective threshold.Results confirm that this strategy can yield better initial populations and thus can reduce the convergence time of the algorithm.Finally, we introduce a new adaptive mutation function called sine-based mutation function.The sine-based mutation function creates the mutation probability distribution based on the distance of the edges from the borders of the community and a self-adaptive control parameter.The control parameter causes the distribution to change smoothly based on the improvement of the best individual of the population pool.Experiments approve the effectiveness of the proposed mutation function.We can summarize the contributions of this paper as follows: • A new community detection method based on GA is proposed (Section 4); • We propose a new method to represent the individuals for community detection problems in GAs called MST-based representation (Section 4.1); • We propose a new method to generate the initial population and enhance the convergence time (Section 4.2); • A new adaptive mutation function called sine-based mutation function is introduced, which adjusts probabilities based on a self-adaptive adjustment parameter (Section 4.5); • Several experiments conducted to show the effectiveness of the the proposed method, and comparisons have been made with other methods (Section 5).
The rest of the paper is organized as follows: in section 2 we review the existing scientific literature of community detection subject, in section 3 we present some preliminaries and definitions, section 4 focuses on the details of the the proposed method, section 5 provides the results of our method and compares its results with other well-known algorithms.Finally, we conclude the paper in section 6.

Related works
Community detection methods can be categorized based on either their methodologies or their definitions from the communities Coscia et al. (2012); Gupta and Singh (2020).2021).Yet, it is possible to break each one of these categories into more specific subcategories.From the methodological point of view, community detection algorithms are mostly separated to Agglomerative and Divisive classes.While Agglomerative methods start from the local structures and try to expand them to form the communities, Divisive methods start from the full graph level and try to detect the communities by dividing the network into communities.On the other hand, because of the extreme richness of the different definitions of "community" in the literature, the definition-based approach divides the algorithms into sub-categories such as density-based, vertex similarity-based, action-based, and influence propagation-based methods.
Newman and Girvan defined "community" based on the non-randomness of the edges among the nodes and proposed modularity to measure this feature in different partitions using a null random model Newman and Girvan (2004).While some methods try to solve the community detection problem via optimization of one of the definitions, others choose to solve the problem considering another one, hence resulting in different sub-categories.Furthermore, the recently trending Data structure-based methods are based on the idea of transforming a network to another data structure such as trees Souravlas et al. (2021).These methods use the new data structure either to reveal some of the hidden characteristics of the network or to reduce the complexity of the problem.
Newman and Girvan introduced modularity along with a divisive modularity-optimization-based community detection algorithm Newman and Girvan (2004).In their method Newman and Girvan (2004), they've repeatedly discarded the edges with high betweenness value from the graph and constructed a hierarchical tree, as the communities started to split.This iterative edge removal process goes on until all the edges are removed and a dendrogram formed from top to the bottom.Finally, they cut the dendrogram at a level that gives the maximum modularity.Compared to the other measures, its lower computational complexity and higher accuracy draw the attention of the scientific society toward modularity, which resulted in numerous algorithms based on it.Another modularity-optimization method proposed to form dendrogram from bottom to top (agglomerative) in Clauset et al. (2004).This method is well-known as Fastgreedy.It starts from the singleton communities and merges them following a greedy strategy until no other aggregation improves the modularity further.Blondel et al. proposed another modularity-based agglomerative method known as Louvain in Blondel et al. (2008).Their proposed method operates at two stages; first, it starts from the single node communities and merges them subjected to the rise in modularity.Secondly, it merges these groups of nodes at the supernodes.Then it repeats the process from the beginning over the newly formed graph until no other aggregation could increase the modularity.(2019).They showed that the Louvain method can lead to disconnected communities and added a refinement stage to prevent it.
On the reverse side of the modularity-optimization methods are the algorithms that use local structural information of the network to detect communities that do not focus on the optimization of a specific global measure.One of the widely used methods is called the Label Propagation algorithm (LPA) proposed in Raghavan et al. (2007).LPA is one of the classic algorithms of community detection.LPA starts with assigning a label to each node of the network at the initial step, then updates their label to the most frequent label among their neighbors in a random sequence.In this algorithm, the label of each node corresponds to the community that the node belongs.This process quickly ends up in the uniquely labeled sub-graphs.A mutually beneficial domain with the community detection is link prediction.Hence, numerous similarity-based methods use link prediction indexes to measure the similarities among the edges Daud et al. (2020).Cheng et al. show that link prediction can be used to improve the accuracy of community detection methods Cheng and Zhang (2016).Narang et al. (2013) describes some of these similarity measures in the network flow framework and shows that some of these similarities are related to different kinds of network flows.As a similarity-based agglomerative method, Castrillo et al. defined a new similarity index based on the well-known cosine similarity index Castrillo et al. (2017).Then, starting from the singleton communities, they merge them till no more weak communities remain.Hesamipour et al. use one of these link prediction measures named as Adamic/Adar (AA) to determine the central nodes of the network and expand the communities around them Hesamipour and Balafar (2019).They find each central node based on its higher similarity score and its higher distance from other previously selected central nodes.After locating these nodes, they expand the communities around them by a game-theoretic agglomerative approach.Liu and Ma (2019) also proposes a community detection method using the AA similarity measure.They use the AA to measure the similarity of nodes and form initial communities based on this similarity.Later, they merge these initial communities based on a specific attractiveness index.Some of the other methods of this category interpret the local-information in a probabilistic framework.In Hajiabadi et al. (2017), they use the proportion of neighbor nodes belonging to each community to estimate the dependency of each node to its own or the neighbor community and propose an overlapping community detection method.To find the communities, Infomap Rosvall and Bergstrom (2008) operates a random walk process and obtains the probabilities for a random walker to pass from each edge.Then it turns the community detection problem into a minimum length encoding problem and detects the communities using Huffman encoding.In Nikolaev et al. (2015), a method is proposed that removes the edges of the graph iteratively to find out their overall impact at the entropy of each node.They define the entropy of a specific node as the uncertainty of the destination node for a random walker if it starts from that node.
By the growing acceptance of modularity and the methods based on its maximization, researchers turned their attention toward EAs, which were known to be proper algorithms to answer maximization problems.To the best of our knowledge, Bingol et al. made the first attempt to use EAs to detect the communities of a network Tasgin et al. (2007); Tasgin and Bingol (2006).They've implemented the solution-vector representation method to represent each individual at Tasgin and Bingol (2006).A solution vector is a n-dimentional vector (n is the number of nodes) that stores the community identifier of each node in its corresponding element.First, they form the initial population by setting some nodes and their neighbors in the same communities.Later, they use modularity as a fitness function and sort individuals based on their fitness values and select some individuals to merge them using the crossover function.Their one-way crossover function chooses a random community from one of the parents (source parent) and transfers it to the destination parent.Finally, they apply a uniform mutation function to change the value of a random gene.Later, the same authors proposed a measure named community variance and employed it in a clean-up stage to promote the convergence time of their former method Tasgin et al. (2007).As we have mentioned in Section 1, the goal of community detection is to find sub-graphs with dense internal connections and sparse outer links.Modularity reflects both of these conditions inside of itself.A multi-objective EA is proposed based on these terms in Shi et al. (2012).Multi-objective methods attempt to optimize more than one measure simultaneously.They've decomposed modularity to two different objectives to address both terms.Finally, they return the individual with the maximum modularity as the solution.Ying et al. used a similar multi-objective conical area EA to detect the communities too Ying et al. (2019).The similar multi-objective approach was also implemented in Gong et al. (2012) and Li et al. (2020).Despite the fact that most of the GA-based community detection methods are categorized as modularity-optimization methods, there are several other methods that attemp to optimize other measures.GA-net is one of the famous community detection algorithms based on evolutionary methods, that doesn't use modularity as its fitness function Pizzuti (2008).This method proposes a new measure, called community score, to quantify the quality of detected communities and implements it as a fitness function Pizzuti (2008).Yet, the proposed measure on GA-net highly depends on its internal control parameter, and its results change dramatically depending on it.In Samie and Hamzeh (2017), Samie et al. propose another GA that detects communities using community score measure (the measure proposed by Pizzuti in GA-net Pizzuti (2008)).Their method implements an additional clean-up step to absorb some weak communities or single nodes into the others.Zarei et al. propose a heuristic method for initial population generation, an object migration automata-based method, and a hybrid algorithm based on object migration automata and GA Zarei and Meybodi (2020).Their purpose to couple those methods is to evade from getting stuck in local optima.
On the other hand, some other methods try to reach a balance between modularity-optimization-based and similaritybased methods.These methods usually run in several stages and take the benefit of one of the strategies at each step.Saoud et al. Saoud and Moussaoui (2018) propose a method that creates initial communities using local information-based similarity indexes to minimize the uncertainty in the first stage, then in the second stage, they utilize a modularity-optimization-based method to detect the final communities Saoud and Moussaoui (2018).In Li et al. (2019) researchers propose a method called EdMot.Using triangle connections, EdMot forms a new graph.Then, it finds the densely connected components of this newly formed graph and adds extra edges to them.The resulted network summarizes both lower-level and higher-level connections, and it tends to highlight the communities.Another method called WATSET Ustalov et al. (2018) proposes a meta-heuristic way that uses a fuzzy method to create a sense graph in the first stage.In the next stage, it detects the crisp communities.CCGA Said et al. (2018) is a GA that uses clustering coefficient to create initial population, then it uses modularity as the fitness function and attempts to detect communities.Since some of these methods transfer the given network into a new data type using one of the measures, they can be categorized as data-structure-based methods.In Saoud and Moussaoui (2018), first, researchers propose to assign a value to each edge based on the similarity of its end nodes.After forming the weighted graph, they remove the edges with a lower weight than a specific threshold value.As a result of this process, they obtain some disconnected groups of nodes.Then in a clean-up procedure, they join smaller components to the larger ones.Next, they start to merge these larger groups based on the number of intermediate edges between them until modularity keeps increasing.In a similar approach in Saoud and Moussaoui (2016), again, they assign weights to the edges of the graph using similarity measures.Then they find the Maximum Spanning Tree (MST) of the weighted network.They exclude half of the lower weighted edges of this tree and obtain (n−1) 2 divided components.From this step on, they employ a similar approach to the previous work Saoud and Moussaoui (2018) to combine these components until modularity improves.The idea of using MST was previously applied by Wu et al. Wu et al. (2013).In their work Wu et al. (2013), they estimate the distance of two nodes based on a heuristic approach and allocate a value to each edge of the graph.Then, they get the minimum spanning tree (mST) of the network and try to cut the tree in a point that maximizes the distance between two components.
MSTs have been implemented in a variety of other applications and specifically in the EAs Garza-Fabre et al. (2017).Extraction of an underlying structure of the given network is a usual practice in community detection algorithms.This underlying structure is usually called skeleton-network, and it represents a summary of the main network.Here, we treat the mST of the network as a skeleton network.Our method uses similarity indexes to assign weights on the edges of the network.Then it uses MSTs as the basis to introduce a new representation for the individuals at the community detection problems.Finally, using a new initial population function and a novel mutation function, it tries to detect the communities using a modularity-optimization-based GA.Therefore, our method stands between the borderlines of similarity-based, modularity-based, and data-structure-based methods.As far as we know, MSTs have not been implemented to solve community detection problems in this way before.Even though our method is designed for undirected networks but in the case of the existence of an skeleton network (it can be any tree (cycle-less) that preserves the local characteristics of the network), it can be implemented on different networks such as temporal networks, directed networks, multi layer networks, etc Lee et al. (2014); Lu et al. (2014); Long et al. (2020).In the following section, we provide some insights about preliminaries and definitions that are necessary to continue the discussion.

Definitions
Graph: We use G to represent a graph.A graph is nothing but a tuple of (V, E), in which V refers to the set of nodes, and E is the set of edges that connect two nodes E = (v, u)|(v, u) ∈ V 2 .In this paper, we focus on the unweighted and undirected graphs.In these graphs, in the case of presence of an edge such as e z ∈ E between two nodes v, u ∈ V , we would have e z = (u, v) = (v, u).For each node v, we call the nodes that have a direct edge with v, set of its neighbors and show it by Γ(v).The degree of a node is equal to the number of nodes contained in the neighborhood of that node (d v = |Γ(v)|).Also, we use |V | = n and |E| = m to represent the number of nodes and edges of a graph.The same definition differs slightly in the weighted networks.A weighted network is a triplet such as G w = (V, E, w) consisting of a set of nodes (V ), a set of edges (E), and a mapping function (w) that maps each edge to a real number (w : E → R).
Community: Although there are different definitions for the notion of the community Hesamipour and Balafar (2019);Fortunato (2010), yet there seems to be a consensus on the definition of a community as the nodes that are densely interconnected and have fewer outer links.Therefore, assuming C i as the i-th community of a network, the following should apply to it: In the cases that a node could become a member of more than one community, we would call them overlapping communities.In this paper, we concentrate on non-overlapping communities (C i ∩ C j = ∅).Additionally, having that a node should be a member of at least one community, we have k i=1 C i = V .The set of non-overlapping communities that divide a network to subgraphs is called a partition P = {C 1 , C 2 , . . ., C k }; i = j for i, j ∈ {1, . . ., k}.
Minimum/Maximum Spanning Trees: A minimum/maximum spanning tree is a tree derived from a weighted graph that involves all of the graph's nodes and a subset of its edges that the sum of their weights is the minimum/maximum possible value.A spanning tree always has |E| = n − 1 edges where cutting each of them divides the tree.In the proposed method, we take the benefit of this feature, and we use the terms of Broken edge or Border Edge to denote the edges that were cut to split the spanning tree.In this paper, we denote a Maximum Spanning Tree with MST and a Minimum Spanning Tree with mST.Naturally, one can construct both of them with the same method by simply inverting the weights of edges.Here, we use the well-known Prim algorithm to find the corresponding MST of a graph Prim (1957).

Proposed Algorithm
In this section, we illustrate the details of the proposed method.First, we start by explaining the precomputation and encoding procedure in section 4.1.Next, we describe the initial population generation method in 4.2, then we define the details of selection, crossover, and mutation functions in section 4.3 to 4.5, respectively.

Representation
Most of the GA-based methods in community detection, use locus-based representation to encode the individuals of the population Gong et al. (2012).In this representation scheme, each individual consists of n genes, each one referring to one of the nodes of the network.In the locus-based method, the value of each gene (g v = u) refers to the identifier of one of the nodes and specifies that these two nodes belong to the same community (u, v ∈ C i ).In the practical implementations, often they change the domain of each gene to the neighborhood of the corresponding node (g v ∈ Γ v ).Although using a locus-based representation may be necessary for other applications, but even this domain can have redundancies for a community detection problem.For Example, in Figure-1.a,selecting different neighbors for node #2 would not affect the outcome, because all of its neighbors belong to the same community.Yet, if a mutation happens on this node, the fitness value of the individual should be recomputed.Besides, if we use the locus-based representation, we should make an additional random/educated choice from the scope of the neighbors of each node (gene).Another method that is used far less than locus-based representation is solution-vector representation.This method, again, holds a gene for each node in every individual.But, in this scheme holding that the graph has k communities, one has to choose a value between 1 and k for each gene, indicating the identifier of the community that the node belongs to Zarei and Meybodi (2020); Shi et al. (2012).Although this method eliminates the need to label the members of a community in an additional pass, it might result in disconnected communities and violate the definition of community (3). Figure -1.b shows such a scenario.
To avoid such errors, we propose a new method to represent individuals.In this method, which we call it MST-Based representation, with the cost of sacrificing some possible combinations of the nodes in a community, some problems such as effectless mutations or disconnected communities get eliminated.Furthermore, by reducing the domain of individual representation to binary, our method brings community detection problems closer to the classic GA representations Sampson (1976).To create such a representation first, we assign a value for each edge of the graph using one of the node similarity/link prediction measures.Table-1 lists some of the well-known and recent measures.In the resulted weighted graph, the weight of each edge denotes the similarity of two end nodes of that edge.Each of these similarity measures considers a specific feature, and therefore using a different measure might affect the results because each measure focuses on the different characteristics.After assigning weights to the edges of the graph, we choose a corresponding MST of its (practically most of the networks satisfy the unique MST criteria).As we have described in section 3, each MST has n − 1 edges, where discarding each of them splits the tree into two parts.Instead of encoding individuals based on the nodes, we encode them based on the edges.Hence in MST-based representation, each individual will have n − 1 genes with a binary domain.The value of each gene expresses the state of the corresponding edge on the MST tree.We use 1 to indicate a broken edge and 0 to refer to its connectedness.The MST-based representation of a given partition for the example graph is shown in the Figure -1-c.As described in the figure, using MST-based representation causes a significant change by each mutation, where even a mutation over an internal edge such as (13, 14), results in a brand new partition.It is obvious that using such a representation makes some combinations impossible.For example, because of the absence of an edge between node #5 and #9 in the resulted MST, reaching a community consisting of just these two nodes would be impossible.But in practice, considering the mutually beneficial relationship between the community detection and the node similarity/link prediction, having communities with less This method brings the similarity score of two nodes between 0 and 1 by dividing the number of common neighbors among them to the number of nodes in the union of their neighborhoods Jaccard (1901).
Cosine similarity This measure computes the cosine of the degree between two columns of the adjacency matrix Salton and McGill (1983).
Hub Promoted Index (HPI) This measure is designed to maximize the impact of higher degree nodes Ravasz et al. (2002).Adamic/Adar (AA) This measure reduces the impact of very high degree nodes by summing the inverse logarithm of the degrees of nodes Adamic and Adar (2003).Resource Allocation Index (RA) This measure calculates the possibility that the intermediate nodes pass the resource they have taken from u to v Zhou et al. (2009).

Common Neighbors Degree Penalization (CNDP)
This method computes the similarity between two nodes using the common neighbors of k which consist of the common neighbors of u and v (|C k |), number of neighbors of k (|Γ k |) with the power of average clustering coefficient (C) times a constant(β, we defined the β = 1.76)Rafiee et al. (2020).

Similarity based on Random Walk (SRW)
Here, the random walk transition matrix of the graph (π)is used to compute the similarity between two nodes.We have set T = 5 in our experimentsLiu and Lü (2010).

Hybrid Influence of Neighbors (HIN)
T l=2 This measure is a generalization of the SRW measure which uses both the average degree and the average hindexs of the neighbors.We have set T = 5 in our experimentsGao and Zhu (2020).similar nodes is unlikely to be desirable for us Daud et al. (2020); Cheng and Zhang (2016).Therefore, considering the fact that the weighting process has already highlighted the edges with more similar neighbors, we can be assured that we are only neglecting the communities that might have sparse intra-community connections.Consequently, MST-based representation can reduce the search scope of the problem.Thus the amount of information loss of using this representation is acceptable regarding its various advantages compared to the other representation schemes.On the other hand, this representation can be applied to different types of networks (such as temporal networks, directed networks, etc.) using an application-specific skeleton networks instead of mST.

Initial population generation
Although GAs usually perform the search process for finding an optimal (or at least a near-optimal) solution of the problem in a manner of eliminating the weakest and survival of the fittest individual of a random pool, if we could start the process from a more rational initial population, we could improve the convergence time Tasgin et al. (2007); Tasgin and Bingol (2006).Even though the number of nodes of each community depends on the edges among them, observations suggest that the size of communities could increase depending on the size of the network.Considering the cluster/community size equal to √ n (n is the number of nodes) is a common hypothesis in clustering and community detection algorithms Bezdek and Pal (1998); Bilal and Abdelouahab (2017).We take this assumption to form initial individuals.First, we define a visited flag for each node.Then we start by selecting a random edge.If the end nodes of the selected edge were not visited before, then we explore the other edges connected to its end-nodes, while, keeping track of the number of nodes we have discovered so far.We keep exploring the edges in a Breadth-First Search (BFS) order until either a broken edge reached (an edge which its corresponding gene has a value of 1) or the number of the nodes of the community exceeds the threshold of √ n.After exploring each edge, we set its end nodes visited flag as B ← add a random edge to the queue 5: while B isn't empty do 6: edge ← B.pop() add the new broken edges (b) to B 12: add the new broken edges (b) to B Algorithm 2 The pseudo-code of the modified version of the BFS algorithm to produce initial population generation explained in 4.2 Input: sN ode : the direction that we will start to expand the edges; nN ode: the neighbor node; − → V : the visited vector of the nodes; − → g : the genes vector.
Output: returns the new broken edges(b), updated visited vector( − → V ) and the updated genes vector( − → g ) − → S ← sN ode checked and consider both of them as members of the same community.After reaching the determined threshold, we set the value of corresponding genes of all the remaining edges in the queue equal to 1, meaning that these are broken Figure 2: The initial population generation procedure is explained from left to right; a)The edge (4, 2) gets chosen, and its end-nodes get assigned to the corresponding community.The edges of node 2 get passed, and, because 4 nodes are assigned to the community, procedure reaches the predefined threshold ( √ 14 = 4), and chooses another edge; b) The edge (5, 8) gets chosen, and its end-nodes get assigned to the community.The broken edge from 5 to 4 gets ignored, but node 6 gets added to the community.The (8,9) edge gets processed, and by adding the node 9, community members reach to the limit and the rest of the edges in the queue get broken; c) (10,11) edge gets chosen, and the community expands around the node 11; d) The edge between 7 and 8 gets chosen, and because it doesn't reach the threshold limit, the broken edge gets connected again; e) (13,14) gets chosen, and because the community members fail to reach the threshold value, the broken edge gets reconnected.
edges.In the case that the process reaches to the broken edges before the number nodes reach √ n, and it can't find a non-broken edge to expand the community, then we change the gene value of one of the broken edges to 0 (meaning that we connect it again).We repeat this procedure until all of the MST edges are processed.Figure-2 shows the process of forming an initial individual for the example network of Figure-1.The random selection order of edges to create such an individual is [(4,2), (5,8), (10,11), (9,8), (3,2), (7,8), (11,13), (12,11), (1,2), (6,10), (14,13), (5,4), (5,6)].The pseudocode of creating an individual for the initial population is given in Algorithm-1 and Algorithm-2 .The BFS-mod function in Algorithm-2 is a slightly modified version of the BFS algorithm that stops the procedure and returns the broken edges, the updated visited vector, and the updated chromosome after reaching the termination conditions.Experimental results prove that individuals created using this method have better fitness values, and therefore, can accelerate the convergence (section 5.3).

Selection
The goal of the selection phase is to keep the appropriate individuals and discard the weak ones.A fitness function is necessary to evaluate the quality of individuals.Modularity is one of the most used measures to determine the quality of a partition on a graph.Most of the GAs use this measure as their fitness function for community detection problems.The details about the modularity are described in section 5.1.One of the important characteristics of modularity is its additiveness Fortunato (2010).An additive function could be written as the sum of another elementary function.In other words, for a partition such as P , an additive measure (Q) has an elementary function like q(.) such that: (2) The q (C) function for modularity is like the following: where C denotes a community, l C denotes the number of the edges connected to the nodes of the community C and d C represents the sum of the degrees of all of the nodes of C. Increase of the value of this function indicates a strong community structure for the nodes of C. Now, we can simply replace the q (C) in (2) to re-write the (14) as follows: Using this characteristic of modularity, we reduce some of the unnecessary computations by keeping a complementary list for each individual.This list would have a length equal to the number of communities and will contain the modularity of each community inside of its elements.Therefore in the case of occurrence of a mutation or crossover, we would be able to calculate the modularity of the modified communities only.
After obtaining the modularity value for each individual, we use roulette wheel selection method.Using this method, the possibility of selecting an individual is equal to the ratio of its fitness value to the sum of the fitness values of all individuals.This method increases the chance of selecting better individuals based on their fitness scores.Assuming p x as the probability of selecting individual x, we can calculate it by: where Ω indicates the population space.After calculating the probability of each individual, the cumulative probability is computed as follows: Now, if we have a random value between 0 and 1 such as r, we would select the kth individual if and only if pk−1 ≤ r ≤ pk .In our method, we keep the population space size constant through generations.In each generation, we select |Ω| 2 couple of individuals using a roulette wheel selection method.Then, we implement a crossover function (section 4.4) to produce new individuals.After implementing mutation (section 4.5) function over a specific ratio of these individuals, we add them to the population space.Finally, we sort the resulted |Ω| + |Ω| 2 individual based on their fitness values and select the |Ω| fittest to transfer to the next generation.

Crossover
The crossover function is carried out to merge qualified individuals and create new ones.In Umbarkar and Sheth (2015), some of the common crossover functions have been reviewed.As we have described in section 4.3, we have defined a complementary list to keep the modularity value of each community in hand.Here we use a similar method to Žalik and Žalik (2018) as our crossover function.In this method, first, we sort the communities of both parents based on their modularity score.Then, we move the corresponding genes of each of these communities to the child individual, respectively.If for a specific community, part of its genes have already been assigned to the child's genes by another community, then it gets fragmented.This method iterates through all of the parents' communities until all of the genes of child individual gets determined.
Figure-3 describes the crossover function step by step.It should be noted that by transmitting each community to the child individual, its modularity value also gets transmitted unless the community is fragmented.Thus we won't need to re-compute the fitness for each community.Instead, once the child individual was formed, we compute modularity only for the fragmented communities by formula (4) and sum the values of the rest of the communities.

Mutation
A mutation function is used in GAs to add to the diversity of population space.Using this function, one or more genes of each individual gets modified.Here we describe three different mutation functions, two constant and one new adaptive functions.
One of the famous mutation functions is the uniform mutation function.This function assigns an equal chance for each gene to get elected for mutation.This probability would be equal to 1 n−1 for genes of MST-based representation.To select the gene, we create a random number between 1 and n − 1.As we have mentioned previously, one of the advantages of the binary representations is the reduction of the value domain for each gene.While in the locus-based and solution-vector based methods, we have to make another random or educated selection to choose the value of mutated gene (selecting another neighbor in locus-based and selecting another community identifier in solution-vector-based method), for a binary representation it's enough to perform a negation.Also, we have examined a non-uniform mutation function in our experiments.As its name suggests, a non-uniform function does not assign equal probabilities for each gene.We've formed this distribution based on the weights of the corresponding edges of network's MST.To generate the probabilities, we have divided the weight of each edge to the sum of the weights of all edges of the MST.Again we have negated the value of selected genes on mutation.Yet, we also introduce a new adaptive mutation function based on the depths of the corresponding edges of each gene.Naturally, each gene in the MST-based representation refers to an edge in the graph's MST.In some cases, it might be meaningful to keep the internal edges untouched while directing most of the mutations to the border edges or vice versa.Our aim at this function is to examine the effectiveness of this theory.In this method, we assign the probability of each gene concerning its corresponding distance from the broken edges of each community.Figure-4 can assist us to describe our function.For example, Figure -4.ashows a distribution that puts more weight on the border edges (inter-community edges on MST) of the communities.In this figure, those points that have a value of 1 indicate the border edges, and as we move to the depth of each community value gets decreased, then it starts to rise as we move closer to the next border edge.Using a distribution similar to Figure -4.acauses to reduce the chance of a mutation on the internal edges.The reverse of that occurs when using a distribution similar to Figure-4.cthat internal edges are subjected to the mutations.Figure -4.b displays a situation that each gene has the same chance to get mutated.To form such a distribution, first, we create a vector such as − → w with the length of n − 1 and assign the value of 1 for the corresponding elements of broken edges.Then, for each broken edge, we start to navigate the edges connecting to its end nodes in a BFS order until we reach another broken edge.For each navigated edge, we change its corresponding − → w value as follows : Where the index of the gene is denoted with i, d indicates its distance from the border edge, and α is a control parameter.Assigning 1 to α will produce a distribution like Figure-4.c,while α = 0 will result in a distribution like Figure-4.a.Whereas assigning 0.5 to the α would result in a uniform distribution.In our method, α is a self-adaptive parameter.It gets modified based on the modularity of the best individual of each generation between 0 and 1.If the fitness value of the best individual improves in the last generation, then α remains unchanged, while it changes as follows in the opposite condition: where δ is a user-defined constant to determine the length of each step, and q is the generation index.After performing the described process using all of the broken edges of the individuals, we transform the values of the − → w vector between 0 and 1 and use it as the probability distribution function of mutation function.We hypothesize that by assigning the mutation probability of each gene based on its corresponding edge's distance with the broken edges of the community, this method can result in mutations that can separate or join the communities in a better way.Furthermore, by adjusting the slope of the mutation probability function based on the improvement of the quality of the best individual of the pool, it can increase the chance of better mutations.Figure-5 describes the effect of δ on determining α.

Results and discussion
Here we present the results of the proposed method on standard datasets measured based on NMI and modularity, then we compare the results with the results of some other algorithms and discuss the advantages of the proposed method.
Both of these measures are widely used measures in community detection literature.Also, we use both real-world and synthetic networks to compare the performance of the proposed method.Finally, we present the results of statistical tests on the results to give a better insight of the algorithms' outcomes.

Measures
NMI and modularity are commonly used measures to qualify the results of community detection algorithms.These measures are defined as follow: Normalized Mutual Information (NMI): For the datasets that have ground-truth information, it is expected from a community detection algorithm to produce similar outputs to the real-world observations.NMI measures how close the ground-truth and the results are.Taking Ĉ = { Ĉ1 , Ĉ2 , ..., Ĉk } as the ground-truth data, and C = {C 1 , C 2 , ..., C q } as the detected communities, NMI is calculated as follows: where I(C, Ĉ) and H(C) are the mutual information of C and Ĉ, and the entropy of C, respectively, and could be calculated as follows: Modularity: This measure first proposed by Newman and Girvan in Newman and Girvan ( 2004) is one of the most popular quality functions applied to measure the quality of a partition.It is defined as follows: where A is the adjacency matrix, m is the number of the edges of the graph, P is a null model matrix that its elements are computed as 2m ; e(i, j) ∈ E , and δ(C i , C j ) is a function that returns 1 whenever C i and C j are the same and 0 otherwise.

Datasets
Nine real-world and seven synthetic datasets are used to evaluate the performance of the proposed algorithm.Table-2 summarizes the characteristics of the real-world datasets, and Table-3 provides the parameters used to generate the synthetic networks.The details of these datasets are described in the following.

Zachary Karate Club
This dataset models the interactions between karate club members in an American university.Each node in this network denotes one of the members of the club, and each edge represents the existence of a relationship between two members outside of the club.This network involves 34 nodes and 78 edges.The members of the club get divided after a discrepancy between club manager and coach Zachary (1977).

Dolphins interaction network
Bottlenose dolphins were studied in Doubtful Sounds of New Zealand for several years by Lusseau Lusseau et al. (2003).62 nodes of this network represent the dolphins, and an edge between them indicates that they have been seen together more than expected time.

American college Football network
Mostly known as football network, this dataset is the result of modeling the matches that took place among the teams of 12 different conferences in a season of American college football games.On average, each team is more likely to have a match with the teams of its conference (7 intra-conference and 4 inter-conference matches).The ground truth state of this datasets divides the 115 nodes of the network to 12 communities in which each community represents a conference Girvan and Newman (2002).

Political Books
Amazon.com recommends similar products for its customers while purchasing products.This dataset represents the recommendation network of political books on this website.Books are categorized into three groups based on their content: liberal, neutral, and conservative Krebs (2004).

Political Blogs
Adamic and Glance conducted a study on the internet blogs a few months preceding the 2004 presidential elections of the United States of America and created this dataset.In this graph, each node represents a blog, and an edge takes place between two nodes if they have a hyperlink to each other.Here the aim is to divide the 1490 node into two communities denoting the political inclination of each blog Adamic and Glance (2005).

Power Network
This network shows the power network of western states of the United States of America.In this graph, each node represents a generator, a transformator, or a sub-station, and each edge represents a power supply line between two points.This dataset doesn't have ground-truth information Watts and Strogatz (1998).

Jazz musicians network
This dataset represents jazz musicians based on their collaboration.Each of the 2742 edges of this graph represents the collaboration of jazz musicians in the same band or having a common musician in the band GLEISER and DANON (2003).

Pretty-Good Privacy (PGP)
This dataset is the network of the users of the Pretty-Good Privacy program, a program that is used for the transmission of encrypted e-mails and files.While each of the 10,680 nodes depicts a user, these nodes are connected by 24,316 edges, each of which depicts a bidirectional signature between two users Boguñá et al. (2004).

Collaboration network of Arxiv High Energy Physics Theory (Ca-HepTh)
This dataset models the scientific collaboration among the scientists.It covers the papers submitted to the High Energy Physics -Theory category.Each node depicts a scientist, and each edge represents the existence of a co-authored paper.This dataset consists of 9,877 nodes connected by 51,971 edges Leskovec et al. (2007).

Synthetic LFR networks
Lancichinetti, Fortunato, and Radicchi proposed a method to generate the synthetic networks for evaluating the performance of community detection algorithms, which is abbreviated as LFR.In their proposed method, node degree  and community size distribution follow a power-law rule and therefore creates networks with similar characteristics of the real networks.Degree and network size exponents are depicted, with γ and β, respectively.Also, they've defined a mixing parameter (µ) that controls the ratio of the edges between communities.We have generated 7 synthetic LFR networks with different sizes for our experiments.The details of the generated networks are given in Table-3 Lancichinetti et al. (2008).Figure-6 shows two of the synthetic networks generated by the LFR benchmark method for our comparisons.

Comparisons
In this section, we evaluate the effect of each of the parameters of the proposed algorithm on the performance of different phases of the proposed algorithm and discuss the advantages and disadvantages of the proposed method compared with other methods.We have conducted extensive performance evaluation experiments and compared the results of the proposed method with some of the recent and classic community detection algorithms.We compared our method with both state-of-the-art and classic algorithms.Also, we picked the algorithms from different categories including, modularity-optimization-based, similarity-optimization-based, and GA-based methods, to give a better sense of the performances of different algorithms.The compared methods include Louvain Blondel et al. (2008), Leiden Traag et al. (2019), Fastgreedy Clauset et al. (2004), Infomap Rosvall and Bergstrom (2008), LP Raghavan et al. (2007), LocalGame Hesamipour and Balafar (2019) We have used NMI and modularity measures in our comparisons and conducted the experiments on nine real-world and seven synthetic networks.First, we start by explaining the effects of different internal approaches and parameters.Then we direct the discussion towards the advantages and disadvantages of our method compared to the other algorithms.
As we have described in section 4.1, to construct the MST trees needed for the MST-based representations, first, we use node similarity measures to assign weight on each edge of the graph.Naturally, based on the applied similarity measure, the resulted MSTs will be different because each measure highlights some characteristics, and therefore some edges might get eliminated when using a specific similarity measure.To determine the best similarity measure, we've applied the measures presented in Table -1  Specifically considering the edges of node #5, and comparing them with the edges of the same node in different trees, one can understand the impact of each similarity measure.In all of the proceeding experiments, we use Jaccard similarity to generate MST trees both because of its simplicity and its higher accuracy.
As we explained in Section-4.2we adopt a a very simple initial population generation method by splitting the MST tree into commuities of √ n node.Yet this method can result in very effective initial populations which involve near-optimum individuals.To evaluate the abilities of the initial population generation function, we have compared the quality of the populations of a completely randomly generated population genration function with the proposed method.Figure-9 compares the modularity of the best individual and the average modularity of the initial population for each approach.It could be seen that the proposed method performs considerably better than the random method.On the other hand in Figure-11 generation 0 depicts the quality of the best individuals of four different methods; 1) the proposed algorithm, 2) CCGA, 3) CACD, and 4) GA-Net.While GA-net's initial population is a random initial population, CACD and CCGA use their own initial population generation function.As it can be seen in Figure-11 our initial population generation function produces near-optimum individuals.The method of splliting graph or data to the clusters of √ n nodes/points had been implemented already in various clustering and community detection algorithms but as far as we know no other community detection method had implemented this method to create initial population for GA.
The next influencing factor is the mutation function.To examine the effectiveness of the proposed mutation function, we've conducted several experiments on the jazz dataset and compared the results based on the convergence time and the modularity score.The obtained results are shown in Figure -10.a.To generate the outcomes, we ran 100 experiments, while setting the stopping criteria as excessing the modularity of 0.435 or observing no improvement in the modularity of the best individual for 50 continuous generations.At first glance, we can see that uncertainty of the number of generations to reach a solution is the highest at the weight-based mutation function.This function has a higher mean and longer whiskers than both of the other functions.Also, this method has more upper outliers.Hence we can conclude that this method performs weaker than the others.Considering its higher mean value when compared with the other methods, the main reason for its poor performance comes from the fact that because of its non-uniform probability distribution, it has more tendency to get stuck in some local optima, which might result in higher convergence time.On the other hand, the mean values (110.5 and 111) and the upper outliers of both sine-based and uniform-based functions are similar.But the box of the uniform-based function is more compact than the sine-based function's box, and the sine-based function's box is slightly inclined lower.Therefore, we can say that the deviations in uniform function are less than the sine-based function, and its results are closer to the mean.Yet, it can be observed that the upper whisker of the sine-based method is shorter, and its lower whisker is longer than the uniform-based method's whisker, which shows that the sine-based function can converge faster than the uniform-based method.Therefore, we conclude from the Figure-10 that the sine-based method performs almost 25% faster than the uniform-based method.When comparing the variations of the modularity scores of mutation functions, we observe that all of them have a similar mean, but again the weight-based method has higher variations compared to the others.Again, this might have resulted out of the non-uniform probability distribution which results in getting stuck in local optima.Therefore, this method performs poorly over both convergence time and the modularity score.Yet, comparing the sine-based and uniform-based functions shows that both of them have similar upper whiskers, while the lower whisker of the sine-based distribution is considerably shorter than the uniform distribution.This means that the likelihood of observing lower modularity is almost 25% lower when we use a sine-based method instead of the uniform-based.This proves that our hypothesis in Section-4.5 was right, and the sine-based mutation function is able to reduce the convergence time by changing the mutation probability distribution of the genes with a smoothly changing adaptive function.Its higher outcomes come from the fact that sine-based method directs the mutation probabilities to the border and the central edges of each community, therefore resulting in joining the smaller communities and breaking the larger ones.Box plots of the modularity variations is shown in the Figure-10.b.
Consequently, we can conclude that the sine-based method can produce better and faster outcomes.It should be considered that this advantage does not come without a cost.For the sine-based method, we have to perform a set of calculations, such as assigning a depth value for each edge.Yet, considering that the number of edges in an MST tree is at the order of n, then it is possible to compute these values in linear time.Therefore using sine-based function can be a cost-effective choice for smaller datasets.
Table-4 and -5 provide the results of comparisons of the proposed method with some of the other methods, based on NMI and modularity measures.We have used the Jaccard similarity to assign weights to the edges, and the sine-based mutation function to perform mutations in our experiments.For running all of the GA-based methods, we set the population size to 100 for small networks and 300 for larger ones.We ran all of the GA-based algorithms to at most 300 generations several times and recorded the best outcome.In our method, we set δ = 0.1 for our experiments.It can be seen that our method can outperform all of the other GA-based methods in almost all of the experiments.The proposed method's merits get explicit as the network size increases.This is especially obvious on the LFR5 and LFR6 datasets, which have a very high number of inter-community edges.The reason for the other methods' poor performance in these datasets is that the locus-based methods keep testing different neighbors, and therefore they need both larger populations and more generations to reach better results in such datasets.individuals with at most 0.43 modularity value.On the other hand, while our method converges at 20-th generation with 0.5234, CCAD can only beat our method on 148-th generation, reaching convergence at 189-th generation on the modularity of 0.5266, the next two algorithms can never find better partitions even on 300-th generation.Therefore, comparisons prove that the proposed algorithm can produce better results compared with other GAs.It worth mentioning that the GA-net method's high fluctuations are the result of its fitness function.As we explained in Section-2, this method doesn't use modularity as its fitness function (CACD and CCGA are both modularity-optimization-based methods).
Comparing our method with other state-of-the-art methods shows that the proposed method can produce competitive results on both NMI and modularity measures.On the real-world datasets, our method results in an average of 0.607 modularity, while, the best modularity is produced by the Louvain algorithm, which reaches 0.617.But, while our method's average NMI value is 0.634, Louvain results in 0.618.The best NMI value belongs to CACD with 0.826 NMI and 0.56 modularity.But, on the synthetic datasets, our method can reach the highest modularity of 0.62 while lagging behind the best NMI value of 0.97 with 0.92.Therefore our method can be considered as a successful GA-based community detection algorithm.
In synthetic networks, µ defines the ratio of the inter-community edges.In Figure-10, we compare the effect of the µ parameter on the quality of the outcomes of our method.As it can be seen from the figure, an increase in the µ decreases modularity on almost all of the methods.This is due to the fact that more inter-community edges are the exact opposite of the main definition of the community.LocalGame, a similarity-based method because of its high dependency on the distance between the communities, falls immediately to 0 on both measures as µ goes beyond 0.5.CCGA and CACD behave almost identically to the µ parameter on modularity, but CACD results in better NMI values.Yet, even though CACD's NMI value stays the highest at µ = 0.7, but its NMI and modularity values start to decrease so quickly on the µ = 0.1.The proposed method's NMI and modularity remain the highest until µ is below 0.5, and after passing 0.5 it starts to dwindle, yet staying as the second-best method for almost all of the other µ values.Comparing the results shows that the proposed algorithm can result in acceptable outcomes on both measures.
Extensive comparisons on both real-world and synthetic datasets show that the proposed method can produce competitive results with state-of-the-art algorithms.In the next section, we perform some statistical tests on these results.

Statistical tests
After presenting the results of the comparisons in 5.3, here, we are going to show the results of some statistical tests on the results.We performed multiple comparisons on the results of the algorithms.Multiple comparisons can be performed on different methods Calvo and Santafé (2016).Here, we have used the Friedman test, along with the post-hoc Nemenyi test.We conducted the tests separately on NMI and Modularity results for both real-world and synthetic datasets.
The Friedman test is a non-parametric statistical test that has been introduced to conclude whether any significant difference among different treatments (algorithms in our case) exists or not (at least two of the treatments are significantly different or not).We have used the Friedman test with Iman & Davenport extension, which is known to be an omnibus test.Here, the Nemenyi test hasn't detected any algorithm to outperform the others.The highest difference is between EdMot and GA-Net with 8.083 while our method's distance from GA-Net is 7.83.
Analyzing the results and the statistical tests show that our method is comparable with the state-of-the-art methods, and it is capable of showing significantly better outcomes (considering the average NMI and Modularity values in Table -4 and Table-5) compared with some of the other methods, especially some genetic-based ones.Some of the best-known methods, which have been designed to maximize the Modularity, such as Louvain, Leiden, and EdMot, haven't been able to show significantly different outcomes compared with the presented methods.In the next section, we will conclude the paper.

Conclusions
In this paper, we proposed a new method to detect communities in complex networks using an adaptive evolutionary approach.Here, we have introduced a new encoding scheme, using node similarity measures and MSTs, which does not have deficiencies such as resulting in the separate communities and meaningless mutations that were likely to happen in locus-based and solution-vector-based representations.Also, we have introduced a new method to generate the initial population.This method can enhance the convergence time and quality of the initial population drastically.Furthermore, an adaptive sine-based mutation function was introduced, which changes the mutation probability of each gene based on the depth of its corresponding edge in MST.The new mutation function can help to reach better and faster results.Several experiments were conducted on the real-world and synthetic networks, and results were compared

Figure 1 :
Figure 1: Comparison of different representation schemes and their corresponding graph; a) An individual created by the locus-based method: As it could be seen using this method might cause meaningless mutations, b) An individual created by the solution-vector method: Using this method might result into separated communities, c) MST-based representation: This method reduces the domain of each element to binary and avoids meaningless mutations and separated communities.

Algorithm 1
The pseudo-code of initial population generation explained in 4.2 Input: M ST of the network Output: returns a vector representing an individual of initial population ( − → g ) 1: procedure INITGEN(M ST ) zero vector with the length of n indicating if a node has been visited or not 3: − → g ← a vector that represents genes of an individual n − 1 genes for n − 1 edges in the MST 4:

Figure 3 :
Figure 3: The crossover procedure to create a new individual; a) The graph, individual and the community vector of the first parent; b) The graph, individual and the community vector of the second parent; c) New individual: to create the new individual first, we sort the communities of both parents in descending order of their modularities, then respectively, we select the communities from best to worst.The C 2,5 , C 2,1 , and C 1,3 communities get added first.Then C 1,4 gets selected, but because #10, #11, and #12 nodes are already assigned to other communities, a new singleton community gets formed for the node #6.

Figure 4 :
Figure 4: The effect of α on the shape of probability distribution: a) α parameter has been set equal to 0 and consequently mutation probability is directed toward border edges; b) setting α = 0.5, results on uniform probability distribution; c) shows the probability distribution of the edges when setting α = 1, mutation probability of inner edges have been increased.

Figure 6 :
Figure 6: Representations of the ground truth partitions of the LFR-5 and LFR-6 networks.
on the karate dataset.Again, we have used different measures from both recent and classic literature and various categories.The resulted MST trees, and the final partitions, are shown in Figure-7.Bolded edges are representing the MST edges.Figure-8 compares the results based on the NMI and the modularity.As can be seen from Figure.7, cosine, and Jaccard similarities both perform likewise and better than other similarity measures.It can be noticed in Figure-7 that the generated results are directly related to the corresponding MST trees.

Figure 7 :
Figure 7: This figure illustrates the impact of using different similarity measures on the resulted MST tree and the final partition of the algorithm over the Zachary karate club dataset.

Figure- 11
shows another significant analysis of the different GA-based community detection algorithms.This figure shows the modularity of the best individual of each GA-based method on the Polbooks dataset over several generations.For this dataset, we set the generation size to 100 and let the methods run for 300 generations.Generation number 0 depicts the performance of initial population generation functions.While the best individual of our initial population

Figure 9 :Figure 11 :
Figure 9: Comparison of the performance of the proposed initial population generation function (P) with random initial population generation function (Rand).
Figure 12: Comparison of the results of the proposed method with some of the recent community detection algorithms based on their response to different µ values of LFR networks.
Figure-13 to Figure-16 show the critical distance diagram of the tests.The critical distance value for the Nemenyi test on NMI values of the real-world datasets is 9.273, and the only two algorithms with a significant difference are CACD and WMW with a distance of 9.8 (Figure-13).The outcome was predictable considering that CACD had an average NMI value of 0.826 while WMW had achieved an average of 0.526.On the other hand, Figure-14 shows the critical distance diagram of the algorithms based on their Modularity results.Here, we can see that some algorithms have outperformed the others significantly.Our method has shown to have a significant difference with GA-net and WMW resulting in a difference of 7.11 and 7.44 while the critical distance value is 6.76.A significant difference has also been detected between EdMot and GA-Net, Leiden and GA-Net, Louvain and GA-Net, Leiden and WATSET, EdMot and WMW, Leiden and WMW, Louvain and WMW, and Leiden and LocalGame.The critical difference diagram of the algorithms on their NMI values on synthetic datasets is shown in Here, the only pair of algorithms that has shown a significant distance from each other is the pair of LocalGame and GA-net that has reached a distance of 8.4167 while the critical distance is 8.3912.Figure-16 shows the critical distance diagram of the algorithms on the Modularities of the synthetic datasets.

Figure 13 :Figure 14 :
Figure 13: Critical distance diagram for the Nemneyi test on NMI outcomes of the algorithms on real-world datasets.The CD value is 9.273

Figure 15 :Figure 16 :
Figure 15: Critical distance diagram for the Nemneyi test on NMI outcomes of the algorithms on synthetic datasets.The CD value is 8.39 Later Traag et al. proposedan improved version of the Louvain called LeidenTraag et al.

Table 1 :
Some of the widely used node similarity measures.

Table 2 :
The summarized information of the real-world datasets used to compare the performance of the proposed method; N indicated the number of nodes of the graph, m denotes the number of edges of the graph, and µ is the average degree of each node.

Table 3 :
Parameters for generating benchmark LFR networks; N indicates the network size, k n is the average node degree, the maximum node degree is represented with k max , µ shows the value of mixing parameter, the minimum community size and the maximum community size are denoted with C min and C max , respectively.

Table 4 :
Comparison of the results of real-world datasets of the proposed method with other methods based on NMI and modularity (Q) measures.
I Comparison of the results of the proposed method for different similarity measures in karate network.
This figure shows the modularity of the best individual of each GA-based method on the Polbooks dataset over several generations.For this dataset, we set the generation size to 100 and let the methods run for 300 generations.Generation number 0 depicts the performance of initial population generation functions.While the best individual of our initial population

Table 6 :
The results of the Friedman test on the NMI and Modularity outcomes of the algorithms.All of the tests show the existence of a significant difference between at least two algorithms.Test real-world (NMI) real-world (Q) synthetic (NMI) synthetic (Q) shows the results of the Friedman test on the data presented in 5.3 with α = 0.05.Since all of the p-values are smaller than 0.05, we can say that the Friedman test has confirmed a significant difference between at least two algorithms in all cases.Despite being a powerful statistical test, the Friedman test can not exactly show which treatments differ from each other.For this reason, we have conducted two post-hoc tests.One of the most famous post-hoc tests is the Nemenyi test.The Nemenyi computes a distance value for each pair of treatments and a critical distance (CD) value.Treatments with a higher distance than the critical distance value are considered significantly different from each other.We have performed the test on NMI and Modularity outcomes of the algorithms on all datasets.