CDCN: A New NMF-Based Community Detection Method with Community Structures and Node Attributes

,


Introduction
The science of networks is a modern discipline spanning the natural, social, and computer sciences, as well as engineering.
There are different kinds of networks in the real world, such as citation networks, social networks, and collaboration networks [1]. Community detection algorithms are important methods for analyzing the networks' structure and understanding the node semantics, which play important roles in the era of network intelligence [2][3][4][5]. First, analyzing the community structure of the network is helpful for people to study the composition and evolution of the whole network, and it can better explain the intrinsic characteristics and causes of the network. Furthermore, community detection algorithms are the key to understanding complex network systems and have important applications in different networks in various fields. For example, they are very useful for social networks to recommend friends and groups to users by analyzing the inherent structure and characteristics of their social network and clustering the user nodes. Community detection is very useful in the application of disease spread analysis, e.g., it can be used in reality epidemic spreading [6].
Most existing community detection algorithms analyze the network by using the original network topology information [7][8][9][10][11][12][13][14][15]. Girvan and Newman [7] analyzed the community structures in social and biological networks. Newman and Leicht [8] analyzed the mixture models of networks. Rosvall and Bergstrom [9] revealed the community structure of complex networks utilizing the maps of random walks. Xie et al. [1] proposed a method for uncovering overlapping communities in social networks named SLPA via a speaker-listener interaction dynamic process. Coscia et al. [12] proposed a local first discovery method for overlapping communities. He [13] utilized the Markov random field approach for community detection in a specific network. Clauset et al. [14] and Li et al. [15] analyzed the community structure in large scale networks. Cui and Wang [16] used the key bicommunity and intimate degree uncover the overlapping community structures in bipartite networks.
However, for some networks, there are not only network topology information but also node attribute information that is a semantic interpretation of the community structure. For example, papers in citation networks contain titles, abstracts, and keywords that may be represented using binary-valued vectors. We binarize the categorical input so that they can be thought of as vectors in Euclidean space (we call this embedding the vector in Euclidean space).
Such networks with node attributes are named attributed graphs [17,18]. It is a great challenge to discover the community structure with node attributes in an effective way. To characterize a community, the existing community detection methods mainly rely on the original network topology. The missing and meaningless information in the network topology often leads to poor results. The node attributes of a network may carry essential community information that is complementary to the network topology information. Therefore, even though two nodes are not directly connected, they may belong to the same community according to the node attributes. Several algorithms that consider both structural and attribute information have been proposed in [17,[19][20][21][22]. Yang and Leskovec [17] used the nonnegative matrix factorization approach to find the overlapping communities in large scale networks. Atzmueller et al. [19] proposed an exhaustive subgroup discovery method for descriptionoriented community detection. Wang et al. [20] proposed a semantic community identification method to find the community structures in large attribute networks. Huang et al. [21] analyzed the attributes of community networks. Yang et al. [22] proposed a discriminative approach that combined links and content for community detection. However, all of those methods assign each edge of the attribute graphs the same value. This will lose information about the network. For example, edges that form densely connected subgraphs are much more likely to be in the same community than edges that connect separate subgraphs. Thus, utilizing the original network topology directly causes indiscriminate penalizing of node pairs, whether they are in densely connected structures or not. It means that we should assign the different characteristic nodes various values.
Nonnegative matrix factorization (NMF) is an effective method in community detection. Some scholars have studied it. Luo et al. [23] proposed a symmetric NMF method via pointwise mutual information-incorporated that has highly accurate. Lu et al. [24] used the NMF method to improve density peak clustering in community detection. Zhang and Zhou [25] studied the structure of deep NMF in community detection. Wang et al. [26] used the constraint NMF to detect the community in dynamic networks. These studies laid the foundation for our experiments.
The state-of-the-art methods (e.g., CDE [27]) use both a community structure matrix and a node attribute matrix in the NMF framework, and CDE also considers the densely connected subgraphs. However, CDE only considers the relationship between two nodes directly connected while they analyze the community structure matrix. This behavior will lose information about the community structure.
Scholars have proposed some important methods for large-scale community detection such as neighborhood, maximal subgraph, intimate degree, and core-vertices [28]; these studies provide important ideas for our paper. More importantly, the main contributions of this paper are as follows.
(1) We propose a novel method that generates the community structure matrix, which retains the relationship between two nodes that are directly connected or share the same neighbors (2) We combine node attribute information and community structure information in an effective way. Then, we propose our method, named Community Detection with Community Structure and Node Attributes (CDCN), to identify the network communities with semantic annotation and community structures using nonnegative matrix factorization framework [29,30] (3) Extensive experiments were conducted on public datasets to demonstrate the effectiveness of CDCN, and its accuracy and performance were better than those of the state-of-the-art methods The remainder of the paper is organized as follows. Section 2 briefly summarizes the three different types of community detection models. Section 3 describes the community detection model, deduces its theory and formula, and describes the solution algorithm for our model. Section 4 conducts extensive comparative experiments to evaluate the effectiveness of our proposed CDCN model on real graph datasets with the ground-truth communities delineated. Section 5 presents the conclusions of this paper and discusses future research directions.

Related Works
This section will briefly summarize the three different types of community detection models that use different information to determine the network information. We briefly summarize the three types of community detection models in Table 1.
The first type of community detection method focuses on the original network topology. GN [7] was built around the idea of using centrality indices to find community boundaries. NMM [8] was used to find the mechanism of 2 Wireless Communications and Mobile Computing probabilistic mixture models and the expectation maximization algorithm to understand the structure of networks. CPM [10] is a clique percolation method, and it consists of two steps. The first step is to construct the vertices of the k-clique graph, and the second is to find the connected components and set-union vertices within each connected component to get a new community. SLPA [11] was presented as general framework for detecting and analyzing both individual overlapping nodes and entire communities. In it, the nodes exchange labels according to dynamic interaction rules. The stochastic block model (SBM [32]) is the simplest nodebased community detection model. The nodes of the network randomly fall into K communities, which are denoted as z i ∈ f1, 2, ⋯, kg, and the edges are independently generated at a probability w z i z j . McDaid et al. [37] improved Bayesian inference for the stochastic block model for large networks. However, the above community detection methods directly utilized the original network topology and ignore the inherent community structures (e.g., node attributes). The missing and meaningless information in the network topology often leads to poor results. Therefore, the second type of method focuses on node attributes, and it includes some classical or state-of-the-art clustering methods. Strictly speaking, those methods are not community methods, but they could use node attributes information to discover communities. Thus, in this paper, we also regard them as related work. CAN [34] was proposed as a clustering model to learn the data similarity matrix by assigning the adaptive and optimal neighbors for each data point based on the local distances. SMR [35] uses for the kernelized random walks on the global KNN graph and the Smooth Representation Clustering to improve the clustering result. NC [5,36] uses the eigenvectors of the matrix representations of the network to solve the community detection problem.
The third type of community detection method considers both the original network topology and node attribute information. Several algorithms that consider both structural and attribute information have been proposed in [17,[19][20][21][22]. However, all of those methods assign each edge of the attributed graph the same value. This will lose information about the network. The state-of-the-art method SCI [20] uses both the community structure matrix and the node attribute matrix in NMF framework, and CDE [27] encodes the inherent community structures for community detection via the underlying community memberships. However, they only considered the relationship between two directly connected nodes when they analyze the community structure matrix.
Nonnegative matrix factorization (NMF) [38] is an effective means of data dimensionality reduction. It can discover the hidden information and the relationship between multi-dimensional data and lay the foundation for data mining and knowledge discovery. At present, NMF has been widely used in data mining [39], image retrieval [40], community discovery [41], hotspot prediction [42], social network privacy protection [43], signal processing [44], and other fields. In terms of community discovery [45,46], NMF can find the associations between networks based on network node attributes, which is an important community detection method [47]. Many scholars have studied the application of NMF in community detection and provided some ideas for the research of our paper.
Different from all those methods, we combine node attribute information and community structure information by generating the community structure matrix, which retains the relationship between two directly connected nodes or nodes that share the same neighbors. This method can more accurately find the relationships between networks. The topology diagram of the three methods can be seen in Figure 1. Figure 1 shows an unweighted graph with two communities, where the different shapes stand for the nodes of different communities. Figure 1(a) is the result of using the adjacency matrix directly, and Figure 1(b) is the community structure embedding matrix of the CDE model. They both focus on the relationship between two directly connected nodes. Figure 1(c) shows the community structure matrix of our models, where the dotted lines are the correlations of the nodes that share the same neighbors but are not directly connected. In other words, compared with existing models, we can not only express the node relationship using a continuous numerical value, but we can also describe the relationship between nonadjacent nodes. Otherwise, the isolated nodes (cold nodes) have no neighbor nodes. It increases the scale of the system and can be ignored when constructing the adjacent matrix. In this paper, the influence of the isolated node on the system is not considered.

CDCN: The Community Detection Model
In this section, we propose a novel algorithm for community detection that combines the community structure and node attribute information. We will introduce the community structure, node attributes, the overall model, and the algorithm in detail.
3.1. Community Structure Part. The community structure part models the network structure. Given an undirected network G = ðP, EÞ with n nodes P and e edges E, we could get a binary-valued adjacency matrix A from G. If node i and node Table 1: A brief summarization of community detection.
As one of the state-of-the-art methods, SCI directly regards the binary-valued adjacency matrix A as community structure matrix. It will degrade the effectiveness of the community detection model without embedding the adjacency matrix due to the sparsity of A. The CDE model is proposed as a novel community structure embedding method to quantify the structural closeness of nodes to offer a good depiction of the inherent community structures in graphs. Though CDE solved the problem of the sparsity of the adjacency matrix, it still has limitations in that CDE only considers the relationship between two directly connected nodes when it analyze the network structure. Both SCI and CDE will lose the relationship information between nodes that are not directly connected.
We start with a concise and reasonable observation regarding whether two nodes belong to the same community. They may be surrounded by the similar environment that means the two nodes may share the similar neighbor nodes. Therefore, we measure the similarity in a community memberships as follows: where Pði, jÞ can be expressed by Jaccardði, jÞ/D, and Jaccardði, jÞ is the Jaccard index value if node i and node j.
The Jaccard index value can be expressed as follows: where i = 1, 2, ⋯, n; j = 1, 2, ⋯, n; and NðiÞ is the neighboring nodes of node i. According to the similarity of the node community member ships, we could get a community structure matrix S ∈ R n×n , where S i,j = Similarityði, jÞ.
In our daily lives, if two people like the same movie, can we think that they share similar hobbies? That may be right but insufficient if the movie is popular and everyone would like it. Thus, if the movie is unfashionable, we could be sure that the two people share similar hobbies. Obviously, this is also suitable for community detection. If two nodes have a same cold neighboring node (few nodes are connected to it), it will make a great contribution to the similarity between the two nodes. In other words, we should add a penalty to the hot neighboring nodes (many nodes are connected to it). By replacing Jaccardði, jÞ with Lði, jÞ, Pði, jÞ can be expressed as Lði, jÞ/D, Lði, jÞ = ð∑ P p=1 1/ð1 + Nðt p ÞÞÞ/|NðiÞ | ∪|NðjÞ|, where P is the count of |NðiÞ | ∩ |NðjÞ|, t p ∈ |NðiÞ | ∩ |NðjÞ|, and p = ð1, 2, ⋯, PÞ; Nðt p Þ is the count of t p . If Nðt p Þ = 0, Jaccard ði, jÞ and Lði, jÞ will be the same. Figure 1(a) illustrates an unweighted graph with two communities. It uses the SCI method. The edge value is 1 if two nodes are directly connected, and it cannot describe which edges are more important when detecting two communities' structure. Figure 1(b) is the result of the embedding method of CDE. It can assign more weights to the edges that form densely connected subgraphs while assigning less weight to the connection between two communities. Figure 1(c) is the display of our proposed community structure matrix. We show more information about the network structure by considering the relationships between nodes that are directly connected or share similar neighboring nodes.
We define U ∈ R N×K as the probability distribution matrix between nodes and communities. U ij stands for the propensity of node i belonging to community j, where i = ð 1, 2, ⋯, NÞ and j = ð1, 2, ⋯, KÞ. The community structure   Wireless Communications and Mobile Computing matrix S ∈ R n × n can be approximately decomposed into the multiplication between the probability distribution matrix U and its transpose. For a formal model, the following holds: where The process implies that if node i and node j have similar community memberships, they have a high similarity.
3.2. Community Node Structure Part. We define T ∈ R N×F as the node attribute matrix, where N is the count of the nodes in network and F is the feature dimension of a node. The attributes of a node are in the form of an N -dimensional binaryvalued vector, and T i * represents the vector of node i, where i = ð1, 2, ⋯, NÞ. The node attribute function is as follows: where M ∈ R K×F andT i,j = ∑ K k=1 U i,k M k,j . The node attribute matrix T is decomposed into two matrixes, U and M. As mentioned above, U is the probability distribution matrix between nodes and communities, U i,k stands for the propensity of node i belonging to community k, M is the probability distribution matrix between nodes feature and communities, and M k,j is the weight of the j-th node attribute feature for community k.
In this way, we can use the node attribute information to divide the communities.
3.3. The CDCN Model. In this subsection, we will elaborate the overall model of our CDCN method. There are two parts of our method, which include the community structure part and the node attribute part.
We combine the community structure part in equation (3) and the node attribute part in equation (4) together. Therefore, our proposed model is written as follows: where α is a positive parameter that adjusts the weight for the two items. If α > 1, it means that we are more inclined to the node attribute information; in contrast, we are more inclined to the network topology information. To make the model more generalized, we do not strictly restrict the symmetric decomposition of S. In regard to the original model in equation (3), it can be equivalently converted into the following: where V is a surrogate variable for U. For further relaxation, equation (5) can be changed to the following: where β is a positive parameter to adjust the closeness between U and V. The higher it is, the closer the two variables are.
In practice, it is common to set β to a moderate value for real applications. In this way, we can enhance the generalization ability of the model. This means that we do not need to limit the community structure matrix S to be decomposed into the same matrix. In other words, we relax the matrix decomposition condition.
In summary, we use the same variable U to combine the two parts, the community structure matrix and the node attribute matrix, and to make the model more generalized, we do not strictly restricted the symmetric decomposition of S. In this way, we get the optimization function of the model.
As for the detection of overlapping communities, we identify that node v i belongs to the j-th community when U ij is higher than a predefined threshold ε. Following CDE, we set ε = 0:1 in our paper.
3.4. The Algorithm for CDCN. In this subsection, we will share the solution algorithm for our proposed model. The learning process algorithm for CDCN can be seen in Algorithm 1.
The matrix U mk ,V mk , and M kf in Algorithm 1 can be derived from the following derivation. According to equation (7), take the derivatives of LðU, V, MÞ with respect to U, V, and M, we can get the formulas (8), (9), and (10), respectively: Based on this, the updating rules for the variables are given as follows: where ρ mk , θ mk , and ϕ kf denote the step sizes for the ðmkÞth element of matrix U, the ðmkÞth element of matrix V and the 5 Wireless Communications and Mobile Computing ðkf Þth element of matrix M, respectively, in the gradient descent methods. If we set ρ mk = U mk /2ðUV T V + αU T MM T + βUÞ, θ mk = V mk /2ðVU T U + βVÞ, and ϕ kf = M kf /2αU T U M, then the following holds: The algorithm for the optimization (7) is summarized in Algorithm 1. Note that the updating with variable step sizes will naturally maintain the nonnegative constraints. Because the number of nodes N is far greater than the number of node features F, the time complexity is OðKN 2 Þ.

Experiments and Analysis
In this section, we have conducted extensive comparative experiments to evaluate the effectiveness of our proposed CDCN model on real graph datasets with ground-truth communities.
4.1. Datasets. We consider 7 widely accepted network ground-truth community datasets, i.e., Karate, Polbooks, Football (http://www-personal.umich.edu/ mejn/netdata/), Citeseer, WebKB (http://linqs.cs.umd.edu/projects/projects/ lbc/), Facebook (http://snap.stanford.edu/data/ego-Facebook.html), and HEP-TH (http://snap.stanford.edu/ data/cit-HepTh.html). The network statistics are reported in Table 2. The Citeseer network consists of 3312 scientific publications with 4732 edges. The number of node attribute features in Citeseer is 3703. The WebKB network includes 4 subnetworks (i.e., Cornell, Texas, Washington, and Wisconsin), and each subnetwork consists of 5 communities. There are 877 web pages with 1608 edges, and each webpage is annotated by 1703-dimensional binary-valued word attributes. Karate, Polbooks, and Football are nonoverlapping communities without node attributes. The HEP-TH (high energy physics theory) citation graph is from the ePrint arXiv and covers all the citations within a dataset of 27770 papers with 352807 edges, and we believe that the papers published in the same journal belong to the same community. There are some communities that contain very few nodes; therefore, we exclude these communities (less than 10 nodes) and then get the dataset of 20048 papers with 236230 edges.
The node attributes of Cornell, Texas, Washington, Wisconsin, Citeseer, and Facebook are binary vectors where the elements are either 0 or 1. The node attributes of HEP-TH are dense vector with a dimension of 300. We extract the paper titles and abstracts and then the train word vector model [41] to get the vector.
We compare different methods in these networks to prove the effectiveness of our community structure matrix. The detailed information of the datasets can be seen in Table 2.

Evaluation
Methods. The compared methods may include nonoverlapping and overlapping communities, and so we choose different evaluation metrics.
(i) For nonoverlapping communities: In terms of the measures to evaluate the quality of nonoverlapping communities, we use two evaluation metrics. We adopt the same evaluation procedure used in [17] that every detected community is matched with its most similar ground-truth community.
The first metric is the accuracy (AC [48]). Given a network containing |V | nodes, for each node, predict i is the community label we obtain by applying different algorithms, and real i is the ground-truth label provided by the datasets. The accuracy is defined as follows: Input:network graph G, node attributes matrix T, hyper-parameters α and β, number of communities K and maximum number of iterations maxIter. Output:the probability distribution matrix U. Begin According to network graph G and Eq.(1), generate the community structure matrix and randomly initialize the probability distribution matrix Algorithm 1: The learning process: CDCN. 6 Wireless Communications and Mobile Computing where δðx, yÞ is the function δ that equals 1 if x = y and it equals 0 otherwise, and mapðpredict i Þ is the mapping function that maps each community label real i to the equivalent label from the datasets. The best mapping can be found by using the Kuhn-Munkres algorithm [49]. The second metric is the normalized mutual information (NMI [48]). In clustering applications, mutual information is used to measure the similarity of two sets of clusters. Given the discovered communities C of the results of community detection methods and a set of ground-truth communities C * , their mutual information metric NMIðC, C * Þ is defined as follows: where pðc i Þ and pðc * j Þ denote the probabilities that a node arbitrarily selected from the network belongs to the community c i and c * j , respectively, and pðc i , c * j Þ denotes the joint probability that this arbitrarily selected node belongs to clusters c i and c * j at the same time.
(ii) For overlapping communities: We compare a set of detected communities M with the ground-truth communities M * as in [15], and the evaluation function is as follows: where δðc * i , c j Þ is the similarity measure between communities c * i and c j . We consider a standard metric δð:Þ to quantify the similarity between a pair of communities, and the similarity score will be between 0 and 1.1 indicates the perfect recovery of the ground-truth communities.

Parameter Sensitivity Analysis.
In this section, we perform the parameter sensitivity analysis of CDCN on the Wisconsin and Washington dataset. The number of nodes in these two is appropriate, which makes it easier to see the effect. Our algorithm has two hyperparameters: α is a nonnegative constant that controls the balance between the original network topology information and node attribute information, and β is a nonnegative constant for the relaxation of matrix U. For each hyperparameter value, we repeated the experiments ten times and took the average of the ten results. The results of other datasets are similar. These two parameters are also applicable to other datasets because they have similar topological structures and network characteristics. The results of the two parameters can be seen in following figures. Figures 2 and 3 illustrate that CDCN achieves better performances in the range of α = 0:15 through α = 0:3, and specifically, CDCN achieves the highest AC and NMI scores when α = 0:2. This indicates the different importance of the original network topology information and the node attribute information. In terms of the parameter β, we set α = 0:2 and vary β from 0 to 0.05. Figures 4 and 5 illustrate that CDCN achieves the highest AC and NMI scores when β = 0:02. These results indicate that if we use both node attribute information and community structure information, we will get better results for real networks.

Experimental Setups.
We compared our algorithm against six topology based methods, i.e., SNMF, SLPA, DEMON, CPM, Louvain, and InfoMap; three node attributes based methods, i.e., CAN, SMR, and NC; and three methods that consider both network topologies and node attributes, i.e., PCL-DC, SCI, and CDE.
As with the experiments in [27], it is hard to compare the quality of community results when the numbers of communities are different for baseline methods. Therefore, we set the number of detected communities K as the number of ground-truth communities. We applied our proposed method and other baseline methods on the public datasets, repeat the tests ten times, and take the average of the ten results.
For all baseline methods, we set their parameters by default to achieve the best results for those methods. For example, for CDE, we set α = 1, β = 2, and κ = 5, and for SCI, we set α = 50 and β = 1. For more information, please refer to their papers. Regarding the parameters of our CDCN approach, maxIter is set to100 to achieve convergence, and the hyperparameters α and β are set as 0.2 and 0.02, respectively. Our algorithms are implemented in python, and all experiments are performed on a PC with Windows 7, Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz and 24 GB of main memory.

Evaluation on Nonoverlapping
Communities. In this subsection, we evaluate the results on nonoverlapping communities. We report the ACs and NMIs of all methods in Table 3. The results indicate that CDCN outperforms all The baseline comparison methods include InfoMap, CPM, SLPA, Louvain, DEMON, SNMF, CAN, SMR, NC, PCL-DC, SCI, and CDE. The real datasets include Cornell, Texas, Washington, Wisconsin, and Citeseer. All the datasets are independent of each other, and there is no connection between them; therefore, they are nonoverlapping communities. We apply our method to the above datasets by using the different baseline methods.
Compared with the algorithms that focus on the original network topology or node attributes, the results show that combining both the original network topology and the inherent community structure information together will result in making great improvements. For example, among the algorithms that focus on the original network topology or node attributes, the highest ACs on Cornell, Texas, Washington, Wisconsin, and Citeseer are, respectively, 0.446, 0.545, 0.508, 0.471, 0.314, and 0.168, and ACs on our methods are 0.569, 0.641, 0.695, 0.694, 0.544, and 0.215, respectively. The values increased by 0.123, 0.096, 0.187, 0.223, 0.230, and 0.047, respectively. The same as the AC, our method also greatly improved the NMI.
The experiments results can be seen in Table 3. From Table 3, we can see that compared with the PCL-DC, SCI, and CDE models that combine both original network topology information and node attribute information, obviously, PCL-DC, SCI, and CDE will lose the information on the relationships between nodes that are not directly connected, and our model considers the relationships between nodes that share the same neighbors. The results in Table 3 prove this. In Table 3 4.6. Evaluation on Overlapping Communities. Some scholars have proposed effective methods in the detection of overlapping communities, such as subspace decomposition, maximal cliques, maximal subgraph, and the clustering coefficient. Li et al. [50] proposed a method to measure the performance of the overlapping communities. They are the state-of-the-art methods in recent years. With reference to For the overlapping communities, we use the F1-Score and Jaccard-Similarity to evaluate the partitioned results of all the methods, except the clustering methods that could not discover the overlapping communities. The tested network is the complete Facebook data, and it contains 10 different ego-networks with manually identified circles. We select 4 representative ego-networks from them. The experiment results can be seen in Table 4.
The baseline comparison methods include InfoMap, CPM, SLPA, Louvain, DEMON, SNMF, PCL-DC, SCI, and CDE. The real datasets include FaceBook Ego-network 107, FaceBook Ego-network 698, FaceBook Ego-network 1912, FaceBook Ego-network 3908, and FaceBook. There are some intersections between datasets, and some overlapping areas appear; therefore, there are overlapping communities. We apply our method to the above datasets by using the different baseline methods.
Ego-network 107 has the most nodes, Ego-network 698 has the fewest nodes, Ego-network 1912 has the highest intensive degree, and Ego-network 3908 has lowest intensive degree.
As shown in Table 4, CDCN achieves the best performances on all the tested networks. In addition, it also shows that our model greatly improves community detection by combining both the original network topology and the inherent community structure information together. For instance, the highest F1-Score among the methods that focus on network topology information is 0.517, and the highest F1-Score among the methods that combine both the original network topology and the inherent community structures information together is 0.474 on FaceBook ego-network 107. Compared with those methods, our model achieves an F1-Score of 0.539. The other ego-network results are similar with those of FaceBook ego-network 107. On the complete Facebook data, our model gets an F1-Score of 0.372, which is greater than the best results of the other methods. For the Jaccard-Similarity, the results in Table 4 also indicate that CDCN outperforms all comparison algorithms.

Evaluation on Nonnode Attributes Communities.
There are some communities without node attributes, and it is hard to divide these communities using the majority methods that use node attributes. However, it is easy to deal with the problem using CDCN since we could use only the community structure part of our method. Therefore, our method will be simplified as follows: The update formulas are changed into the following: To prove the usefulness of our community structure matrix, we add two more baselines, which are called Adj-Mat and Emb-Mat. Adj-Mat just replaces the community structure matrix S in equation (20) with the adjacency matrix of network. Similarly, Emb-Mat just replaces the community structure matrix in equation (20) with the embedding matrix of CDE. Then, the optimization algorithm is changed, as shown in Algorithm 2.
The baseline comparison methods include InfoMap, CPM, SLPA, Louvain, DEMON, SNMF, Adj-Mat, and Emb-Mat. We assessed the NMI and AC values on the karate, football, and polbooks datasets.
In addition, in this part, we compare our method with the methods that focus on the original network topology and that do not node attribute information on four datasets. The four datasets are nonoverlapping communities, and so, we use AC and NMI to evaluate the partitioning result of all the methods. The results can be seen in Table 5.
As shown in Table 5, the results indicate that CDCN outperforms all comparison algorithms for the nonoverlapping community detection task. The NMI reached 1.0, 0.916, and 0.570 on the karate, football, and polbooks datasets, respectively, which are better than the above methods. The AC value reached 1.0, 0.909, and 0.838 on karate, football, and polbooks datasets, respectively, which are also better than the above methods. Furthermore, the results compared with the Adj-Mat and Emb-Mat also show the great usefulness of our community structure matrix.

Conclusions
Community detection has been widely used in recommendation systems, social networks, and network security. Efficient and fast community detection algorithms contribute to the development of intelligent networks. Based on the analysis of the network characteristics, in this paper, in order to solve the problem of community detection in attributed graphs, we propose a novel method to generate the community structure matrix, which retains the relationship between two directly nodes connected or nodes that share the same neighbors, and named it CDCN. We combine node attribute information and community structure information in an effective way in the nonnegative matrix factorization framework. We used two indicators named AC and NMI on nonoverlapping communities and two indicators named F1-Score and Jaccard-Similarity on overlapping communities to evaluate our method. On nonoverlapping communities, the AC and NMI values of CDCN are better than those of other methods. On overlapping communities, the F1-score and Jaccard-Similarity value of CDCN are better than those of other methods. The extensive experimental results demonstrated

Data Availability
The original dataset used in this work is available from the corresponding author on request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.