The semantic social network is a kind of network that contains enormous nodes and complex semantic information, and the traditional community detection algorithms could not give the ideal cogent communities instead. To solve the issue of detecting semantic social network, we present a clustering community detection algorithm based on the PSO-LDA model. As the semantic model is LDA model, we use the Gibbs sampling method that can make quantitative parameters map from semantic information to semantic space. Then, we present a PSO strategy with the semantic relation to solve the overlapping community detection. Finally, we establish semantic modularity (SimQ) for evaluating the detected semantic communities. The validity and feasibility of the PSO-LDA model and the semantic modularity are verified by experimental analysis.
With the development of society and the improvement of science and technology, semantic social networks are rapidly developed and many semantic networks, like Twitter and Weibo, have made an insignificant impact in our life so far. In these networks, different individuals have different small social “worlds” which are called communities [
The topological community detection represents the pioneer work, the goal of which is studying the topological constructions and dividing the social networks into several separate networks. The representative algorithms contain Modular Optimization [
In the last few years, the analysis in semantic social network has become popular. Most of these algorithms utilize LDA model as the basic model. The SVM-DTW method proposed by Solera, Calderara, and Cucchiara [
In this paper, we propose a novel community detection algorithm for the objective of dividing nodes into clusters. The main characteristic of communities detected by this algorithm is that members of the same community have common or similar interests. We take into account the topic and keywords information in text from individuals’ words through LDA model, then quantize semantic nodes, and map them into semantic space. Then, we get ideal virtual social communities after using Particle Swarm Optimization algorithm. Last but not least, we build a novel modular model and use the new function
Compared with other models in semantic social network, such as lovain method model [
The rest of the paper is organized as follows: Section
The problem of community detection belongs to NP-hard areas [
First, we select topics and words from individuals’ semantic information through LDA model. Then, we map semantic nodes into semantic space via Gibbs sampling method [
Every individual says different words as each node has its own information contents in semantic social network [
LDA probability model.
In this section, we research LDA model on information contents. The relevant mathematical symbols for illustrating the LDA model are given in Table
The symbol description.
SYMBOL | DESCRIPTION |
---|---|
| Number of keywords in semantic social network |
| Set of keywords in semantic social network, |
| Node set corresponding to keywords set |
| Topic set corresponding to keywords set |
| Topic distribution probability vector |
| Keyword distribution probability vector of topic |
| A priori parameter over topic distribution probability specific to each node |
| A priori parameter over keyword distribution probability specific to a special topic |
The process of forming LDA model is shown in Algorithm
( ( ( ( ( ( ( ( (
Gibbs sampling [
We make
The distribution probability
Particle Swarm Optimization (PSO) is an intelligent optimization algorithm. It was first proposed by J.Kennedy and R.C.Eberhart [
Compared with other optimization algorithms, such as Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Simulate Anneal (SA), PSO algorithm has two attractive features: firstly, PSO optimizes the solution from the local optimum first and runs fast, which makes the algorithm more adaptable to the evolution of networks; secondly, particles in PSO can be mapped to nodes in semantic network; the process of finding the optimal solution in PSO is consistent with the birth process of the semantic community.
PSO puts a set of random solutions at system startup time and uses iterative search to find out optimal solutions [
In PSO-LDA, some LDA semantic feature is put into PSO. We use nodes in semantic social network mapping to “particle” in PSO and utilize semantic information vector of each node mapping to velocity of each particle in PSO. As for fitness value, we use information similar function instead. In PSO, we normalize that the nodes in semantic social network simulate the behavior of a “bird flock”, where social sharing of information takes place, individuals’ gains from the discoveries and previous experience of all other nodes during the search for food [
First, we assume the search place is
In the search place, once velocity
Generally speaking, the performance measure of semantic social network is mostly based on the topological construction. And the
In this part, we would present and discuss the experiments with topics number analysis, evaluation criterion, real datasets, and different community detection algorithms, based on three datasets (the American College Football network dataset, the Krebs polbooks network dataset, and the dolphins network dataset).
The number of topics
The graph of football network.
The graph of polbooks network.
The dolphins network.
In this section, we use the topic number to experimentalize on three datasets (football, polbooks, and dolphins). Figure
The performance of detected communities with
For the sake of getting communities more intuitive, Figure
The communities for
In this section, we do the comparison on different optimization algorithms with three network datasets above (dolphins, polbooks, and football). We compare the number of communities, the size of communities, runtime, and semantic concentration with PSO algorithm, Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Simulate Anneal (SA). The result is shown in Figure
The performance of different optimization algorithms.
The comparison on different optimization algorithms on dolphins (the black nodes are overlapping nodes).
PSO
GA
ACO
SA
In this section, we compare
The digrams of comparison on the constriction factor with
Considering the bias in the semantic community detection, we utilize classical nonsemantic algorithms to illuminate the issue with the football dataset, for example.
We choose GN, FN, LFM, COPRA as nonsemantic classical algorithms, where LFM and COPRA are the overlapping community detection algorithms. The
The classical nonsemantic algorithms on
Algorithms | | | |
---|---|---|---|
GN | 0.4615 | 0.3573 | 0.3873 |
FN | 0.4061 | 0.3174 | 0.4012 |
LFM | 0.3255 | 0.2331 | 0.3625 |
COPRA | 0.5407 | 0.4115 | 0.3902 |
PSO-LDA | 0.5132 | 0.4258 | 0.4842 |
The detected communities with nonclassical algorithms on football.
GN
FN
LFM
COPRA
From the result in Table
In this section, we compare real different datasets, including Quantifying Link Semantics-Publication (QLSP) dataset (805 nodes), Academic Social Network (ASN) dataset (extract 2500 nodes) (
The results of classical nonsemantic algorithms under various datasets.
Algorithms | | QLSP | ASN | DBLP(A) | DBLP(B) | Enron |
---|---|---|---|---|---|---|
GN | | 0.3107 | 0.2103 | 0.2822 | 0.3193 | 0.3256 |
| 0.2309 | 0.2054 | 0.2137 | 0.2863 | 0.2874 | |
| 10 | 35 | 17 | 16 | 27 | |
FN | | 0.4215 | 0.2234 | 0.3191 | 0.2618 | 0.3475 |
| 0.3134 | 0.1711 | 0.2912 | 0.2561 | 0.2994 | |
| 10 | 33 | 19 | 16 | 26 | |
LFM | | 0.3668 | 0.2403 | 0.4052 | 0.3613 | 0.4153 |
| 0.3167 | 0.2172 | 0.3317 | 0.3121 | 0.3572 | |
| 12 | 29 | 21 | 12 | 30 | |
COPRA | | 0.4196 | 0.1213 | 0.383 | 0.4112 | 0.4559 |
| 0.2891 | 0.1124 | 0.2971 | 0.3217 | 0.4007 | |
| 13 | 31 | 21 | 13 | 26 | |
PSO-LDA | | 0.3248 | 0.2112 | 0.3537 | 0.2998 | 0.3401 |
| 0.3412 | 0.2734 | 0.3641 | 0.3569 | 0.3989 | |
| 14 | 30 | 23 | 15 | 27 |
The histogram of
The histogram of
In this paper, we presented a novel community detection algorithm PSO-LDA that combines the topological construction with semantic information. It can avoid the number and the size of communities. For the Gibbs sampling solving the hidden parameter in the proposed model, the sampling result approaches to the realistic state. The main contribution of this research focuses on how to use different similarity measure to measure similarity between nodes based on topological construction and their semantic information. As for future work, we would apply the model in some fields such as privacy protection and worm containment in semantic social network.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
This work is sponsored by National Natural Science Foundation of China (61402126), Nature Science Foundation of Heilongjiang province of China (F2016024), Heilongjiang Postdoctoral Science Foundation (LBH-Z15095), University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province (UNPYSCT-2017094), Heilongjiang Province Foundation for Returned Scholars (LC2018030), and National Training Programs of Innovation and Entrepreneurship for Undergraduates (201810214020). The paper is also supported by China Natural Science Fund.