An Extreme Learning Machine-Based Community Detection Algorithm in Complex Networks

Community structure, one of the most popular properties in complex networks, has long been a cornerstone in the advance of various scientific branches. Over the past few years, a number of tools have been used in the development of community detection algorithms. In this paper, by means of fusing unsupervised extreme learning machines and the k-means clustering techniques, we propose a novel community detection method that surpasses traditional k-means approaches in terms of precision and stability while adding very few extra computational costs. Furthermore, results of extensive experiments undertaken on computer-generated networks and real-world datasets illustrate acceptable performances of the introduced algorithm in comparison with other typical community detection algorithms.


Introduction
As one of the most popular research fields over the past decades, complex networks have stimulated scientific advances in various fields such as biology [1], social networks [2], epidemiology [3], computer science [4], and transportations [5].Numerous articles have explored different types of properties in complex networks.Among these articles, the study on community structure, which means vertices in a given network are inherently segregated into groups inside which the connections are relatively denser than those outside, has been one of the most popular [6].Finding out the divisions of nodes in networks, which is called community detection or network clustering, is a hot spot for investigators because it is a good means to uncover the underlying semantic structure, mechanisms, and dynamics of certain networks [7].Using such extracted information, internet service providers (ISPs) could set up a dedicated mirror server for intense web visits from the same geographic region to improve their customers' internet surfing experiences [8], and/or online retailers could provide more efficient recommendations to customers in favor of creating a more friendly purchase environment [9].
To address the community detection problem, researchers have developed numerous algorithms.Social network scientists used to solve this problem by traditional methods such as graph partitioning, hierarchical clustering, partitional clustering, and spectral clustering [7,10].Girvan and Newman proposed the first divisive algorithm named after them, which is a milestone historically because it introduced more physicists and computer scientists to this field [11].Divisive algorithms use the concept of betweenness as criteria to judge how often an edge participates in a graph process and break up connections one by one to determine the most significant community structure [10].A byproduct of Girvan and Newman's algorithm, called modularity Q, a quality function originally proposed as a criterion to decide when to stop the calculation, is another landmark that supports clustering methods focusing on the modularity optimization problem [12].Although it has been proven impossible to list all the feasible divisions to determine the best strategy in deterministic polynomial time (the problem is np-hard) [13], many alternative approximate optimization techniques, including greedy algorithms [14], random walks [15], fast unfolding algorithm [16], information-theoretic framework [17], belief propagation [18], extremal optimization [19], simulated annealing [20], and genetic algorithms [21], have been deployed to solve the problem.Along with the optimization tools of interest, many instruments have been involved in this field.For example, spectral algorithms explore eigenvalues of Laplacian matrices of graphs using traditional clustering techniques [22].Similar to spectral clustering, similarity matrix factorization and blocking can also be applied [23].Label propagation, that is, attaching labels to each node based on the neighbor information, is known as a fast and effective clustering method [24].
k-means clustering has long been one of the best offthe-shelf tools that exhibit relatively high precision and low computational complexity [25].However, the performance of k-means relies too heavily on the selection of the initial centroids; hence many updates have been proposed to overcome this drawback.k-means++ chooses distinct initial seeds far from each other in a probabilistic manner, which leads to more stable clustering results but involves increased complexity [26].Through ranking nodes in the same manner as Google's cofounders did [27] and picking the center nodes from the highest ranking ones, k-rank achieves small fluctuation in the community detection output although it requires additional running time [28].Another defect of k-means, explained by Ng et al. [29], is that it is only capable of finding clusters corresponding to convex regions.To address this problem, one could map the original data into a more suitable feature space.For example, Li et al. made use of principal component analysis (PCA) to implement k-means in a lower-dimensional space for community detection tasks [30].
The prevalence of extreme learning machines (ELM), originally proposed by Huang et al., should be largely credited to the simplicity of its implementation [31].It has been demonstrated that given random input weights and biases of the hidden layer, a single-layer feedforward network (SLFN) could approximate any continuous functions simply by tuning the output weights [32].As a result, the abstracted task in ELM is equivalent to a regularized least squares problem which can be solved in closed form using the Moore-Penrose generalized inverse [33].Recently, semisupervised and unsupervised ELMs have been exploited based on the manifold regularization framework [34].Regarding the clustering task, the unsupervised ELM can be interpreted as an embedding process that maps the input data into low-dimensional space.
In this paper, we propose an extreme learning machine community detection (ELM-CD) algorithm based on the combination of k-means and unsupervised ELM to fulfill the community detection task.Unsupervised ELM, inheriting the efficiency of ELM, has been utilized as a mapping mechanism that transforms the adjacency matrix to lowdimensional space, where k-means can be employed to label the groups.In consideration of no additional computational load, we prefer the original lite weighted k-means to other reinforced editions.Extensive comparison trials on both artificial networks and realistic networks indicate that ELM-CD outperforms traditional k-means in light of different precision criteria.Meanwhile, the introduced algorithm has remarkably low complexity, approaching that of k-means, and outperforms all other competitors evaluated.
The remainder of this article is organized as follows.Section 2 provides details of our algorithm.In Section 3, evaluations and comparisons are made in artificial and real-world networks.Finally, we conclude our work in Section 4.

Model and Algorithm
2.1.Preliminaries.We focus on an undirected network G V, E , where V represents the set of vertices numbering in total N and E represents the set of edges with total number M. Neglecting self-loops, which mean an edge starts and ends at the same point, connections among nodes are expressed as a symmetric adjacency matrix A ∈ ℝ N×N .
According to A, the Laplacian matrix L ∈ ℝ N×N is defined as A ij is the degree of vertex i, i = 1, … , N, and j = 1, … , N.
A community detection algorithm segregates V into mutually exclusive districts by means of attaching labels on each node to indicate which group it belongs to.Because A is symmetric, each row or column of A can be considered as an input sample denoted by a i ∈ ℝ N , i = 1, … , N. Instead of directly assigning each node with a community label, ELM-CD first embeds A into a smaller matrix E ∈ ℝ N×n in the n dimensional feature space, where k-means clustering proceeds to output a vector t ∈ ℕ N + whose items are the community labels of each node.

Embedding Process.
Following the universal semisupervised and unsupervised ELM framework [34], given N h neurons in the hidden layer, we define E i ∈ ℝ n as the output vector from ELM with respect to input vector a i .
in which w j ∈ ℝ N and b j ∈ ℝ represent input weights and bias of the jth neuron in the hidden layer, respectively; β j ∈ ℝ n is the output weight from the jth neuron to n dimensional output elements; and g • is the sigmoid function 2 Complexity Assume that the input dataset A can be classified as unlabeled set a i , i = 1, … , u, and that u represents the number of unlabeled nodes and labeled set a i , y i , i = 1, … , l, where y i ∈ ℝ n is the corresponding community label and l is the number of labeled nodes so that u + l = N.The calculation of ELM would fall into two parts.In the first part, we randomly decide the input weights and biases according to a uniform distribution.In the second, let (3) be expressed as the inner product where We define each line of H as h i ∈ ℝ 1×N h , indicating the output with respect to a i .The target of the second stage could be interpreted as a manifold-regularized optimization problem: in which ∥•∥ is the Euclidean distance, Tr • calculates the trace of a matrix, and λ is a trade-off parameter.The first term of ( 8) is the loss function taking into account all labeled nodes with diverse penalty coefficients C i .The second term refers to the classical antioverfitting regularization item that constrains the output weight to be as small as possible.The third term is the manifold regularization where the unlabeled data comes into play.Concretely speaking, the manifold regularization framework believes in the assumption that if two points on the manifold sphere are close to each other, then they would also result in similar predicted outcomes [35].Consider the adjacency matrix as a measure of distance that two nodes are close to (connected to) each other if A ij = 1, otherwise they will be far away (disconnected) from each other.The approximation of the manifold regularization is provided as This regularization penalizes the large difference in prediction of two connected nodes.
When there exist no labeled nodes in the input dataset and substituting ( 5) into (8), the optimization formulae could be reformulated as The additional constraint abides by the suggestion of Belkin and Niyogi in the case of the degeneration of the solution [36], and I n is an n dimensional identity matrix.
Let γ and v denote the eigenvalue and the corresponding eigenvector, respectively.When N ≥ N h and H have full column rank, it has been proven in [34] that solving (10) is equivalent to selecting n generalized eigenvectors corresponding to the n smallest eigenvalues of the problem.
To organize the matrix β, we abandon the first eigenvector because it corresponds to eigenvalue 0 and contributes little in the embedding process.Thus, given the first n + 1 smallest eigenvalues γ 1 , γ 2 , … , γ n+1 sorted by ascending order and their corresponding eigenvectors v 1 , v 2 , … , v n+1 , the output weights β are If N < N h , meaning that the dimension of A is smaller than the number of neurons in the hidden layer, the proven formulation of ( 11) is underdetermined and has the following alternative formulation: In turn, the solution is where the normalized eigenvectors are given by Finally, we substitute β into (5) to obtain the embedding matrix E that would be fed into a k-means clustering algorithm to determine the community labels t.

Clustering Process.
In this article, an implementation of the original k-means clustering [25], owing to its low computational complexity, has been integrated into ELM-CD.First, ELM-CD randomly selects k rows in E as the initial centroids of k clusters.Second, taking the Euclidean distance as the standard, each row in E, which is represented bye i , i = 1, … , N, is assigned to a cluster whose centroid is closest to a certain row.Then, for each cluster, we calculate the mean value of all its members and designate this mean vector as the new centroid.We then iterate the cluster assignment and the centroid update processes until no row in E changes its community label.
We show the entire procedure of the ELM-CD algorithm in Pseudocode 1.

Experiments and Results
In this section, we deploy our proposed algorithm ELM-CD on computer-generated networks as well as four wellknown real-world networks to compare its performance with that of three competitors, including two divisive algorithms: GN [11] and NF [37], and conventional k-means [30].All experiments took place using MATLAB R2016a running on an AMD Athlon(tm) X4 740 3.2 GHz desktop with 4 GB RAM.The constructed ELM network has N h = 2000 neurons and randomly initialized input weights between −1 to 1 with all biases being ignored (set to 0).Following the empirical configuration in [34], we set the embedding feature space dimension n = 3 and the trade-off parameter λ = 0 1.

Modularity and NMI.
Our first criterion, the modularity Q proposed by Newman and Girvan [38], is defined as the margin between the density of intracommunity edges on the estimated network and the density on a randomly reorganized network with the same number of nodes and edges.
where d i = ∑ N j=1 A ij still represents the degree of vertex i as before and G i indicates the group label of node i. δ ij is the Kronecker delta, leading to 1, if vertex i and j belong to the same group, 0, otherwise

16
A larger Q corresponds to a more apparent community structure in the underlying network and vice versa.
The normalized mutual information (NMI), another de facto benchmark for community detection, has also been deployed to evaluate the accuracy of algorithms [39].Based on information theory, the NMI of two division methods A and B to partition N nodes is defined as where C A and C B are the numbers of communities in A and B, respectively, and N ij denotes the number of nodes that both appear in group i of A and in group j of B, where indicates the more similar A is to B. Given the exact community distribution result B and the candidate algorithm A, I A, B = 1 means that the algorithm has found all the communities identical to the real structure while I A, B = 0 indicates a complete failure of the algorithm.

3.2.
Computer-Generated Networks.We fabricate networks with community structures per Girvan and Newman's method [11].In this case, 128 vertices are separated into four communities uniformly, meaning 32 nodes in each group, and edges are randomly added between each pair of nodes with probability P in when the two nodes come from the same community and P out for vertices belonging to different groups.In restraint of P in > P out and the average degree of a vertex z = 16, through changing P out or the average number of intercommunity edges per vertex z out (the two are ELM-CD (A) 1 initialize constant parameters, including λ, N h , n 2 generate L from A 3 normalize A 4 randomly generate input weights and biases of neurons in hidden layer 5 calculate H according to Eq. ( 6) select the first n + 1 smallest eigenvalues of Eq. ( 11) and assemble β using the corresponding eigenvectors as in Eq. ( 12) 8 else 9 select the first n + 1 smallest eigenvalues of Eq. ( 13) and assemble β using the corresponding eigenvectors as in Eq. ( 14 update c i to be the mean value of all members in cluster i 20 return t Pseudocode 1: (pseudocode of the ELM-CD).4 Complexity equivalent), we can achieve different networks with obvious (small z out ) or ambiguous (large z out ) community characteristics.These computer-generated networks with decided graph knowledge are then fed into our algorithm and its competitors to calculate their NMI, and the results are presented in Figure 1.Each point in Figure 1 represents the average NMI of a certain algorithm on 50 randomly generated scenarios with predefined z out varying from 0 to 8. The error bars accompanying those points mark the normalized deviation of NMI in 1000 experiments.Lines connecting points are included solely as a guide.The community detection resolution of all methods drops with increasing z out ; however, our algorithm outperforms all competitors, especially when z out is approaching 8, which means the community structure is becoming extremely ambiguous, Even when z out = 8, ELM-CD could still correctly classify half the number of nodes.Another outstanding feature of our algorithm is its comparative robustness.Different from what happens to ELM-CD, both GN and k-means run competitively in terms of accuracy until z out = 6, but performance noticeably deteriorates after that.

Real-World Networks.
In this part, four real-world networks with decided numbers of communities have been selected as test beds for measurement of our algorithm's accuracy as well as its running times.Table 1 shows the four datasets in detail.For each scenario, we took 1000 trials for each of the three probabilistic algorithms NF, ELM-CD, and k-means.For the decided algorithm GN, only one trial is sufficient.
First, we checked each algorithm's understanding of community structures based on the definition of modularity.For GN, ELM-CD, and k-means, we need to predefine the number of communities as one of the input parameters.Practically, we varied this parameter from 1 to 10 for the first three networks (Karate, Dolphins, and Polbooks) and from 1 to 20 for the last (football).Among all partition results, the clustering scheme with the highest Q was chosen along with its corresponding NMI scores.Table 2 lists these experimental results and shows that ELM-CD exhibits obvious but not dominant superiority compared with the other algorithms.Specifically, compared with k-means, ELM-CD demonstrates a stronger capability to determine more modular structures, simultaneously with higher NMI in most instances.We cannot ignore the higher modularity scores of GN and NF obtained in the dolphin network, but we should also bear in mind that in this test, GN and NF determined five and four clusters, respectively, compared to ELM-CD's three clusters.
Second, we fixed the predefined number of communities as the inherent number for each network and calculated the NMI of each algorithm to evaluate how close the difference between the discovered community structure and the given underlying structure could be.Meanwhile, the consumed time was recorded to compare the complexity of the algorithms.We display the maximum NMI accompanied by the corresponding modularity and the calculation time in Table 3. From this table, in spite of the minor improvement in precision, GN is much slower in comparison with the other three algorithms.Regarding NF, although it runs fast enough, the low modularity and NMI results represent a shortfall in the overall performance.ELM-CD no doubt consumes more time than k-means, whose computational complexity is O N • k where k is the predefined number of  On the other hand, owing to the application of ELM, the stability of ELM-CD is much better than that of k-means.We demonstrate this by listing the average modularity and NMI results of the three fast algorithms in Table 4.
In Table 4, k-means illustrates very poor stability (even worse than that of NF) in light of the large differences between the average results and the maximum results shown in Table 3.In contrast, ELM-CD overcomes this inherited shortcoming due to the inclusion of the manifold regularization term in the equivalent optimization process as depicted in (8).Another trend exposed is that the robustness of k-means and ELM-CD is stronger for large networks with higher numbers of communities.The graphical clustering results of ELM-CD in Table 2 and Table 3 have been illustrated in the left and right parts of Figures 2-5, respectively.

Zachary's Karate Club Network.
From 1970 to 1972, Zachary observed a karate club as an example of a social network, which captures 34 members in the club and documents 78 pairwise links between members who interacted outside the club [40].The decided community structure formed after a series of factional confrontations between the administrator "John" and instructor "Mr.Hi" (pseudonyms) led to the split of the club into two.Half of the members formed a new club around Mr. Hi, and members from the other part found a new instructor or gave up karate.
Figure 2 shows the partition results of our algorithm corresponding to the scenarios in Table 2 and Table 3, respectively.We use triangles to mark the new group belonging to the administrator, and circles represent the instructor's.From Figure 2(a), ELM-CD converges to four groups to reach the largest modularity, although it redundantly separates each new club into two smaller ones.When the number of communities is decided, ELM-CD was able to detect the difference after the confrontation occurred in the club, which is demonstrated in Figure 2(b).

Dolphin Social Network.
The second network we make use of here is an undirected social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand [41].These dolphins, naturally separated by their sex into two categories, were studied by Lusseau et al. for several years.
In Figure 3, we present snapshots of ELM-CD clustering results for the largest Q and the maximum NMI, respectively.Again, triangle and circle markers have been used to represent the two identical genders known ahead of time.For the best effort segregation in light of the modularity criteria, ELM-CD splits the female community into two subgroups, as shown in Figure 3(a).Node 40 appears to be the key factor that has the greatest effect on NMI results.In both images of Figure 3, node 40 is the only one that has been incorrectly classified.As a matter of fact, in Figure 3(b), the maximum NMI would be 1 rather than 0.8888 if node 40 could be assigned to the correct group.

Political Book Network.
The third benchmark is a network of books on US politics published around the time of the 2004 presidential election and sold by https:// www.amazon.com/[42].Edges between books represent frequent copurchasing of books by the same buyers.In total, 105 books, denoted as nodes, have been classified into three categories as liberal, neutral, and conservative according to their political orientations.
In Figure 4, these political tendencies have been identified by triangle (conservative), circle (liberal), and pentagon (neutral) markers, respectively.The identifications made by our algorithm, on the other hand, are shown in three different colors.Red and blue colors represent the conservative and liberal groups, respectively, while green indicates the neutral part, which occupies a small number of nodes.It is clear from Figure 4 that people tend to have relatively stable opinions on politics as indicated by the obvious segregation of these books.However, it is not very uncommon that some neutral books appear more liberal or conservative oriented in the eyes of some critical readers, while the political orientations of some liberal or conservative books are obscure and may be considered as neutral.This explains why our algorithm can make mistakes on the boundaries between different classes.Moreover, from these two images, some books could have been classified more reasonably.For example, node 78 has four connections to conservative books       [11].On average, there are seven intraconference games and four interconference games, meaning that each team has a greater chance of confronting a competitor in the same conference.This feature gives us a good opportunity to deploy our algorithm to evaluate its accuracy.

Complexity
In Figure 5, each conference has been located in a cluster relatively far from the other conferences and the community detection results have been recolonized using different colors.Aiming at the highest score in modularity, ELM separates these teams into eight classes.From Figure 5(a), except for the Southeastern and the Atlantic Coast conferences, all others have been misunderstood to some extent.Some of them, say the Pacific Ten and the Mountain West in dark blue, have been merged into a larger group.Some have been disassembled and displaced into other groups, such as the Independents and the Sun Belt.Such categorization could be rational because conferences in a larger group do compete with each other more frequently than smaller conferences.In contrast, teams in the Independent conference, except for nodes 81 and 83, share no game with each other; hence, it is no surprise to see them in other categories.When the number of communities is restrained to 12, as shown in Figure 5(b), ELM-CD performs well, with NMI = 0 8944 according to Table 3.This time, most conferences have retained the majority of nodes to organize independent communities, except for the Sun Belt, which has been fragmented, and the Mountain West, which contains two subclasses.However, we also have to note some unusual clustering behavior in this procedure, such as node 37 being placed in the same group with nodes 81 and 83, having nothing to do with 37 at all.Both of the snapshots in Figure 5 on some level expose the irrationality of the original conference definitions.For instance, node 111 belongs to Conference USA, but this team has always played its games with opponents from other conferences, which is contrary to the definition of a community.

Conclusions
In this article, we have proposed a community detection algorithm by combining the extreme learning machine and k-means techniques.Compared with the traditional k-means clustering method, the most evident advantages of the introduced algorithm, named ELM-CD, are the increase in precision and stability while adding very little calculation.ELM-CD has also been tested on Girvan and Newman's artificial networks and on four real-world networks and has been compared to GN, NF, and conventional k-means based on modularity and NMI benchmarks.From the implementation of ELM as an embedding procedure ahead of k-means clustering, ELM-CD demonstrates competitive performances both in accuracy and complexity.
) 10 archive the embedding matrix E = Hβ 11 randomly select k rows from E as centroids, say c 1 , c 2 , … , c k 12 while t changed 13 for i = 1 to N 14 for j = 1 to k 15 dist j = e i − c j 16 find the smallest element in dist and its corresponding index s 17 t i = s 18 for i = 1 to k 19

Figure 1 :
Figure 1: NMI of different algorithms on computer-generated networks.

Figure 2 :
Figure 2: Communities extracted from Zachary's karate club network using ELM-CD.Different shapes indicate the inherent classification, while different colors show the detection results of ELM-CD.(a) Clustering result with the largest Q.(b) Given k = 2 the scenario with the best NMI (=1).

Figure 3 :
Figure 3: Communities extracted from the dolphin social network using ELM-CD.Different shapes indicate the inherent classification, while different colors show the detection results of ELM-CD.(a) Clustering result with the largest Q.(b) Given k = 2 the scenario with the maximum NMI.

Figure 4 :
Figure 4: Communities extracted from the political book network using ELM-CD.Different shapes indicate the inherent classification, while different colors show the detection results of ELM-CD.(a) Clustering result with the largest Q.(b) Given k = 3 the scenario with the maximum NMI.

Figure 5 :
Figure 5: Communities extracted from the US college football network using ELM-CD.Twelve conferences have been placed in groups with conference names as labels.The detection results of ELM-CD are highlighted in different colors, each of which indicates one detected community.(a) Clustering result with the largest Q.(b) Given k = 12 the scenario with the maximum NMI.

Table 1 :
Information of datasets.

Table 2 :
Maximum Q values and corresponding NMI of different algorithms.CD requires the ELM calculation expenditure, which costs O N h time.Theoretically, once k is determined and when N ≫ N h , the complexity of ELM-CD is asymptotically O N , of the same as k-means.

Table 3 :
Compared results for maximum NMI, corresponding Q values, and time elapsed (s) of different algorithms.

Table 4 :
Average NMI and Q values of the three fastest algorithms.

7
Complexitywhile only sharing two connections with liberal ones; nevertheless, it is defined as a liberal book.Another example is the neutral node 77 that has no link to any other neutral books.Both the best efforts in Figures4(a) and 4(b) have successfully corrected these two unreasonable classifications.3.3.4.US College Football Network.The last example is the US college football network.Girvan and Newman introduced this network of American football games between Division I colleges during the 2000 regular season, where 115 teams (nodes) from 12 conferences played 613 games (edges) in total