A Naming Game-Based Method for the Location of Information Source in Social Networks

We study herein the problem of the location of the information propagation source in social networks based on the network topology and a set of observations. We propose a concise and novel method to accurately locate the source of information using naming game theory. This study introduces the design of a dynamic deployment method that reduces considerably the number of observations and the time needed to locate the source. Moreover, it calculates the probability of each node that acts as a source based on the information provided by observations. This method can be potentially applied to various information propagation models. The simulation results reveal that the method is able to estimate the information source within a small number of hops from the true source.


Introduction
e popularity of social software increases the user access to information and improves the speed of information propagation. Such social software not only facilitates our daily lives but also makes us vulnerable to risks. For example, misleading information would prompt people to make incorrect judgments and even jeopardize social security and stability [1]. To control the emergence and propagation of misleading information and to reduce the damages caused by them, it is important to locate the information source accurately.
Many prior research studies on information propagation had concentrated on the localization of the information sources. Most of them were based on the limited knowledge of network structures and the states of some of the nodes. Early methods were designed to work on treelike networks. ey relied on the identification of spanning trees in networks and regarded the roots of the spanning trees as the propagation source [2][3][4]. Dariusz et al. [5] analyzed the rumor source detection on three Twitter networks. e study described an empirical investigation for finding the position of the rumor teller and calculating the length of the propagation path. In addition, some methods detected rumor sources based on the measurements of node centralities based on the network topology. For instance, Shah and Zaman [6] proposed the rumor centrality to identify the rumor source and claimed that the node with the maximum closeness centrality was the rumor source. Zhu and Ying [7] proved that the rumor source was the Jordan infection center with the minimum infection eccentricity. Subsequently, Chen et al. [8] extended the Jordan center technique to detect multiple rumor sources. Wang [9] utilized a novel reverse dissemination method that narrowed down the scale of suspicious sources and introduced an analytical model for rumor spreading in time-varying social networks. Shelke and Attar [10] calculated Spearman centrality based on the positive correlation between informing time of nodes and source-node geodesic distance. e algorithm possessed a high location accuracy with low time complexity. Jiang et al. [11] developed algorithms to estimate infection sources with a subset of timestamps where the number of sources was unknown. Furthermore, it was demonstrated that the algorithms used could estimate single and multiple sources.
To improve the localization accuracy, many of the previous methods obtained the information of the propagation process based on observations. e diverse range of observations led to the emergence of various methods in this field [12]. According to the literature, there are three main categories of observations [13]. (1) Complete observations: given a time t during the propagation, this type of observation presents the exact state for each node in the network [13]. (2) Snapshot observations: snapshot provides partial knowledge of network status at a given time t [14]. For instance, it recognizes all the infected nodes or only a set of nodes. (3) Sensor observations: the sensors record details of the propagation, including the observation states, state transition times, and propagation directions [15]. However, owing to the huge scale of the social networks, it is very difficult to apply the methods mentioned above to set observations in the process of real-world information transmission.
In this study, we propose a novel observation setting method which is inspired from the naming game theory [16,17]. Based on interactions between nodes, the naming game simulates the process of naming an object with a single word [18,19]. Specifically, a pair of neighboring nodes is chosen for conversation purposes, whereby one of these is the speaker, and the other is the listener. Both the speaker and the listener have memories. Initially, the speaker chooses a word from memory and transmits it to the listener as the name of the unknown object. If the transmitting word also exists in the listener's memory, then the two nodes reach consensus. erefore, only that word would be kept by both nodes. In contrast, if the listener did not have the transmitting word in memory, then the conversation will fail, and the listener will learn that word and store it to memory. is process continues until either (a) a final convergence is reached to a single word from the entire population or (b) until the process fails after a sufficiently long time of communications.
In a similar manner to the naming game, information propagation depends on the interactions between individuals. In social networks, individuals can remember whether each of their neighbors has spread the information, and if so, they know the spreading time.
us, in our observation setting method, if an individual is selected as an observation, it is defined as the speaker in the naming game. Accordingly, this individual reveals to the listener the neighbor who first spread the information. Based on the content provided by the observations, we designed a method that locates the source of information propagation based on multiple iterations. We updated the probability of each node that served as the source at each iteration. Our method incorporated the observations setting in the process of source location rather than at the beginning of the location. e advantage of this approach is attributed to the fact that we can use as few observations as possible to achieve a higher accuracy of source location. e purpose of this study is to accurately locate the propagation source with as few observations as possible.
erefore, we design a method to set up observations as needed. Instead of focusing on the specific state of each observation, the study focused on the state of its neighbors.
is method does not need to know the parameters of the propagation process and is not limited to the type of propagation.
is study provides a new method for the location of the propagation source and achieves good accuracy. e rest of this study is organized as follows. Section 2 introduces the model for the information source location in social networks with the naming game. Section 3 describes the simulation results with analyses and comparisons. Section 4 is the conclusion section.

Materials and Methods
In this section, we briefly introduce preliminary knowledge pertaining to the information source location, including the network structure, and the classical propagation models. Subsequently, we apply the naming game theory to locate the unique information source.

Network Structure and Classical Propagation Model.
In general, the network within which a process is propagating is modeled as an undirected and unweighted graph.
, v N is the node set and E represents the links between the nodes. Based on these contact networks, the classical propagation models, such as SI [20], SIS [21], and SIR [22], have been investigated. However, the real information propagation is much more complicated than the classical models, and it is difficult to know the propagation models beforehand when locating the information source. We assume that a propagating process on G follows any of the three models. We then select a node as the information propagation source (denoted by s). e information begins from s and quickly propagates in the network. Without knowing the propagation dynamics and any parameter, our method could estimate the source of the information propagation.

Locating the Information Source.
In this section, we propose a naming game-based approach that can naturally locate the source node on the underlying topology based on communication interactions. We assume that the individual or organization that detects the propagation source can communicate with all nodes in the network. It can be considered as an extra node which is connected with all the nodes in the network and is denoted by n * . Let s represent a random variable whose prior distribution is uniform over the node set v. Effectively, this means that any node in the network is equally likely to be the source. Our goal is to locate the unknown source node s at a certain time irrespective of which information propagation model is adopted. We denote P(t) � p i (t) N i�1 as the probability vector, whereby p i (t) represents the probability that node i is the propagation source at the t-th iteration. Initially, for ∀ i ∈{1, 2, . . ., N}, p i (0) equals 1/N.
At each time step t, we design a way to update the value of P(t). First, an observation is determined according to the naming game theory. Specifically, n * is always picked 2 Complexity as the listener, and then a speaker node ns t is randomly picked from a set of candidate nodes C � {c 1 , ..., c |C| } ⊂ V. Initially, C � V. e node ns t is removed from C after it has been selected as a speaker. At the same time, the speaker node ns t is also an observation. e observations help us obtain the propagation information effectively and locate the source node accurately. We define a local source as the node which first spreads information to the observations. In the scenario of information propagation, the individual knows which neighbor is its local source. us, the speaker ns t tells the listener n * its local sources. We assume that all the speakers are honest. is means that the information ns t transferred to n * is authentic. Compared with other types of setting observation methods, our method does not determine all the observations beforehand, but it confirms the existence of an observation at each iteration. en, we let F t � {f t1 , ..., f t|F t | } ⊂ V denote the set which contains the neighbors of speaker ns t and S t � {s t1 , ..., s t|S t | } ⊂ V denote the local source set of ns t . If S t � ϕ, none of the nodes in F t serves as the source node. Otherwise, if S t ≠ ϕ, it is also valid that none of the nodes in set F t -S t serves as the source node. us, we update the value of P(t) according to where n z represents the number of nodes which are not in Q and which have nonzero probabilities to be the sources at time t. In addition, Q is a set of nodes that has different contents in different situations, and it is expressed as According to local source set S t and the network topology, the propagation direction of some edges may be determined. Direction indicates that information may spread from the starting node to the end node. We let CF t � {cf t1 , ..., cf t|CF t | } ⊂ V denote the set which contains the common neighbors of the observation node ns t and its local sources. If CF t ≠ ϕ, the propagation direction must be from the nodes in S t to the common neighbors.
us, we transform the undirected to a directed graph by removing the impossible edges of the information propagation. Specifically, if S t � ϕ, the edges starting from the neighbor of ns t should be removed. Otherwise, if S t ≠ ϕ, the edges whose start node is in the F t -S t set and whose target node is in the S t set should be removed. Moreover, if |S t | > 1, the edges whose start and target nodes are all in S should be removed. A simple example illustrating the determining process of edge directions is shown in Figures 1(a)-1(d). e green node v 1 is a speaker which is regarded as an observation. e orange nodes v 3 and v 6 are the local sources of node v 1 . Nodes v 3 and v 6 propagate information at the same time, so the edges between nodes v 3 and v 6 can be removed. Node v 5 is the common neighbor of nodes v 1 and v 3 . Node v 5 must spread the information before node v 3 , so the direction of the edge between node v 5 and node v 3 must be from v 3 to v 5 . Similarly, the direction of the edge between nodes v 6 and v 4 must be from v 6 to v 4 .
We denote [u, v] as the shortest path between nodes u and v . Let d(u, v) be the length of the shortest path between u and v in G. It is obvious that each node s tr in S t receives the information earlier than ns t . When d(i, ns t ) � inf or d(i, s tr ) � inf, it means that node i cannot propagate the information to the observation n st . us, node i must not be the source of propagation, and we update the value of P(t) according to (1). When d(i, ns t ) ≠ inf and d(i, s tr ) ≠ inf, we denote d(i, S t ) as the distance from node i to set S t . It can be calculated by d(i, S t ) � max(d(i, s tr )) for all s tr ∈ S t . Furthermore, if the deduction between d(i, ns t ) and d(i, S t ) is larger than zero, the node in S t would receive the information prior to node ns t . us, a larger value of the deduction between d(i, ns t ) and d(i, S t ) provides a greater probability of node i being served as the propagation source. If S t ≠ ϕ, we update the probability where ε is the minimum difference between ns t and S t of each i whose p i (t) ≠ 0. Just as in Figure 1(e), it can be seen that nodes 4 and 5 cannot propagate the information to local sources 3 and 6. en, we calculate the distance difference between nodes 1, 2, 3, and 7 to the observation and to their local sources, respectively. Afterwards, we calculate the probability that they are the source according to equation (3). Finally, we conclude that node 7 is the source. Generally, we need multiple observations to accurately locate the information source. us, we define set M(t) to contain the indices of nodes with the highest probabilities of being the source nodes. Hence, we have

e aforementioned process is repeated until the probability vector P(t) satisfies
where A is a constant integer and larger than unity, and it is defined as the number of times the iteration result is unchanged. We summarize the proposed method for the localization of the information source in Algorithm 1.

Simulation Results
To evaluate the performance of the proposed algorithm, we considered the four real social networks listed in Table 1 direction of some edges is determined based on the information provided by the observations. e more the edges' directions are determined, the more accurate the result will be. us, it is good to select the nodes that can determine more edges' directions as observations, such as nodes with a large degree and nodes with a large clustering coefficient. Without loss of generality, we randomly chose a node as the source to propagate a message and localize the source in 500 runs. Similar to the work of Jiang et al. [13], the parameters were set to α � 0.75 and β � 0.5. Our proposed method does not focus on the state transition of the nodes. us, it can be applied to locate the source without considering the propagation models. To illustrate the capacity of the proposed method to localize the source more accurately, we chose the SIS propagation model from the above four social networks,

Accuracy of Information Source Location.
We use the error and average error distances between the real and the estimated sources to justify the accuracy of our proposed method in Section 2. We denote the error distance by the shortest distance of the hops between the real and estimated sources ϕ [24]. As the value of ϕ is reduced, the performance achieved by the method is enhanced. Specifically, if ϕ � 0, our method accurately identifies the real source node. Figure 2 shows the distribution of the error distance ϕ in four real social networks. When the parameter A � 1, our method achieves an accuracy of 50% in case of the Enron network, 78% in the case of the Simmons81 network, 87% in the case of the Hamilton46 network, and 84% in the case of the Wake73 network. As the value of A increases, the accuracy of the algorithm improves gradually. Figure 3 shows the average error distance versus the value of A. It can be seen from Figure 3 that the average error distance decreases with A.
us, increasing A conducts locating the source more accurately.

Efficiency of the Proposed Method.
In our method, the time complexity O(N·R) is required, where R is the number of observations. us, the number of observations should be 4 Complexity used to evaluate the effectiveness of our proposed information source localization method. is is attributed to the fact that, as the number of observations increases, an increased number of iterations is required, and more time is thus expended. We expect to locate the source node accurately by selecting a fewer number of observations. us, the smaller the number of observations is, the better the efficiency is. In correspondence to the accuracy plots in Figure 2, Figure 4 shows the average number of observations to the total number of nodes in four social networks (in percent). As A increases, we can see that the accuracy of the algorithm improves, and the average number of observations increases. Furthermore, when A � 6, the average number of observations we need is less than 0.12 in the case of the Enron network, 0.02 in the case of the Simmons81 network, 0.01 in the case of the Hamilton46 network, and 0.025 in the case of the Wake73 network. e percentage number of observations when A � 3 in the four social networks is shown in Figure 5, whereby B is the number of experiments. It is shown that different accuracies can be achieved even if we randomly select the same number of observations. Hence, different observations have a considerable influence on the accuracy of the results. Figures 5(a)-5(d) show that the fluctuation range of each point is reduced considerably, and the number of yellow points is significantly increased. is shows that the network structure has a considerable influence on the results. Complexity  show that the network with a larger average clustering coefficient, a larger average degree, and a smaller diameter achieves a higher accuracy and is associated with a smaller number of observations. is is because an observation can determine the propagation direction of an additional number of edges in the network with a higher average clustering coefficient and average degree. As the number of the determined directions of the edges is increased, the more accurate the results become. Moreover, the larger the diameter of the network is, the larger the number of directions is needed for the accurate determination of the location of the propagation source.

Comparison with Other Source Locating Methods.
e experimental results in Sections 3.1 and 3.2 show that our proposed method can achieve high-localization accuracies with a few observations. Specifically, by randomly selecting 12% of the nodes as observations, the average error distance in which our method achieved was less than 0.6 hops. However, the previous work in [15] theoretically proved that the average error distance approached 3.3 hops when the proportion of observations reached 20%. e work in [25] revealed that the average error distance was approximately 1.1 hops when the proportion of observation reached 20%. erefore, our proposed method outperformed the method proposed in 6 Complexity [15,25] in terms of the accuracy and the number of required observations.

Conclusions and Future Work
In this study, we investigated the problem of localization of the information source in social networks. We first introduced a novel deployment scheme that reduced considerably the number of required observations based on the setting of observations according to needs. We then proposed an information source localization method based on the naming game theory. In this method, the network was transformed from an undirected to a directed graph based on the determination of the propagation direction of some of the edges. Accordingly, the propagation source was estimated by calculating the distance difference between each node to the observations and their local sources. e experimental results showed that our method could achieve better localization accuracy with a small number of observations. Moreover, the proposed method constitutes a general source localization method for information propagation, which is not limited by the propagation model. erefore, this method has good practical application  prospects. However, some future work can be done to locate the information source in social networks. Firstly, to achieve better localization accuracy with fewer observations, better observation selection methods should be designed to replace random selection. Secondly, we only considered the source localization problem of single-source information propagation. In this sense, future work may investigate the identification of the sources of multiple-source information propagation.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.