Extracting Backbones from Weighted Complex Networks with Incomplete Information

The backbone is the natural abstraction of a complex network, which can help people understand a networked system in a more simplified form. Traditional backbone extraction methods tend to include many outliers into the backbone. What is more, they often suffer from the computational inefficiency—the exhaustive search of all nodes or edges is often prohibitively expensive. In this paper, we propose a backbone extraction heuristic with incomplete information (BEHwII) to find the backbone in a complex weighted network. First, a strict filtering rule is carefully designed to determine edges to be preserved or discarded. Second, we present a local search model to examine part of edges in an iterative way, which only relies on the local/incomplete knowledge rather than the global view of the network. Experimental results on four real-life networks demonstrate the advantage of BEHwII over the classic disparity filter method by either effectiveness or efficiency validity.


Introduction
Complex networks have become an important approach for understanding systems involving interacting objects [1].Thus, networked systems have permeated a wide spectrum of domains, ranging from the biology and the automatic control to the computer science [2,3].With networked systems being increasingly large, to understand and reveal the underlying phenomena taking place in such systems are facing considerable challenges.The presence of the backbone is a signature or an abstraction of the nature of complex systems and can provide huge help for understanding them in more simplified forms [4].For example, detecting the backbones in criminal networks can better target suspects [5].Also, urban planners attempt to examine the topologies of public transport systems by analyzing their backbones [6].
Recent years have witnessed an increasing interest in extracting backbones in large-scale weighted networks of various kinds [4,[7][8][9].As many networks are evolving into large scale and the weight distributions are spanning several orders of magnitude, extracting backbones from them has become a critical task for research and applications of various purposes.In general, the backbone should be thought of as a set of nodes and edges that interconnect various pieces of network, providing a path for the exchange of information between different subnetworks [10].Thus, a promising way for backbone extraction is to map the original network into a smaller network, in which the numbers of nodes and edges should be small enough to be amenable to analysis and visualization.
In the literature, the existing methods can be roughly divided into two categories, one based on the coarse graining and the other is filter-based.The methods based on the coarse graining [4,7,[11][12][13][14] clump nodes sharing common attributes together in the same group/community and then consider the whole group as one single unit in the new networks.However, there is often no clear statement on whether properties of the initial network should be preserved in the network of clusters [15].
The filter-based methods [8,9,[16][17][18] typically employ a bottom-up strategy to extract the backbone.They often start by defining a statistical property of a node or an edge, and then this property is used as a criterion to determine nodes/edges to be preserved or discarded.In this case, the observation scale is fixed and the representation that the network symbolizes is not changed.Instead, those elements, nodes, and edges, which carry relevant information about the network structure, are kept while the rest are discarded.However, the filter-based methods may include a multitude of outliers, which should not be included into the backbone naturally.What is more, they often suffer from the computational inefficiency: the exhaustive search of all nodes or edges is often prohibitively expensive.
In this work, we attempt to design a novel filter-based method for extracting backbones from large-scale weighted networks.Unlike the exhaustive search adopted by the existing methods, the proposed approach only needs incomplete information and then invokes the iteratively local search scheme for improving the efficiency.So, this novel method is called backbone extraction heuristic with incomplete information (BEHwII).In particular, although   proposed in [8] is employed as the filtering criterion, BEHwII imposes max instead of min to enhance the filtering rule, so that the case of extracting too many outliers into the backbone can be avoided.Our method is naturally a heuristic, since it does not examine all edges in the network.Alternatively, BEHwII greedily selects an optimal edge in one iteration and adds this edge into the backbone if the predefined max filtering rule is satisfied.Extensive experiments on various real-world networks demonstrate the superiority of BEHwII over the global filtering method in terms of effectiveness and efficiency.
The remainder of this paper is organized as follows.In Section 2, we introduce preliminaries and motivation of this work.In Section 3, we discuss the local search mechanism and then present the algorithmic details of BEHwII.Experimental results will be given in Section 4. We present the related work in Section 5 and finally conclude this paper in Section 6.

Preliminaries and Motivation
Since the proposed method for backbone extraction is a filter-based model in essence, we begin by providing the preliminary knowledge about the filter-based model.Thus, we analyze some drawbacks of existing filter-based methods, which leads to a better understanding of the motivation of this paper.
The filter-based models typically employ a bottom-up strategy to extract the backbone.They often start by defining a statistical property of a node or an edge, and then this property is used as a criterion to determine nodes/edges to be preserved or discarded.As a result, preserved nodes and their links, or preserved edges and their endpoints, composed the backbone of the network.Therefore, the key step in filterbased methods is how to define a reasonable filtering property for nodes/edges.For instance, -core is a well-known filtering property that is used to construct a hierarchical topological filter in [16].However, many simple filtering properties (e.g., -core) are not suitable for weighted networks.Meanwhile, the real-world weighted networks are usually with strong disorder heavy-tailed distributions of weights [19].That is, the probability distribution () that any given link carries a weight  is broadly distributed, spanning several orders of magnitude.This feature exerts nontrivial challenges to define the filtering property for weighted networks, due in large part to the lack of a characteristic scale.Serrano et al. [8] addressed this challenge by introducing the disparity filter based on the null hypothesis; that is, the normalized weights that correspond to the connections of a certain node of degree  are produced by a random assignment from a uniform distribution.Given a node  and its associated link with weight   , the normalized weight   is defined as Based on (2), given an edge, the probability   indicating its normalized weight   is compatible with the null model and can be defined as where   is the degree of node .Thus,   is adopted as the filtering criterion in [8] for weighted networks.Given a significance level , the edges that carry weights which can be considered not compatible with a random distribution can be filtered out with a certain statistical significance.That is, edges with   <  should be kept, since they reject the null hypothesis.
The criterion   gave birth to an effective filter-based method for backbone extraction [8].However, two drawbacks have attracted our attention.One of the biggest limitations is that it may include a multitude of outliers, which should not be included into the backbone naturally.In what follows, we try to explore its cause and give a modified scheme.
For node  with degree   , the level of local heterogeneity in the weights can be calculated as Thus, under perfect homogeneity, when all the links share the same amount of the strength of the node, (  ) equals 1 independently of   , while in the case of perfect heterogeneity, when just one of the links carries the whole strength of the node, (  ) is equal to   .With predefined null model, the join probability distribution for two intervals can be defined as where Θ(⋅) is the Heaviside step function, which can be used to calculate the statistics of  null (  ) for the null model.The average ( null (  )) and the standard deviation  2 ( null (  )) are estimated to be In real networks, the observed level of local heterogeneity, denoted by  ob (  ), can be compared against the null model expectations.Namely, the observed values are compatible with the null hypotheses when they lie between the perfect homogeneity and ( null (  )) +  ⋅ ( null (  )).And the local heterogeneity will be recognized only if  ob (  ) obeys The parameter  is a constant determining the confidence interval for the evaluation of the null hypothesis.The larger it is, the more restrictive the null model becomes and the more disordered weights should be for local heterogeneity to be detected.A typical value of  in analogy to Gaussian statistics could be set as 2. In Figure 1, we show two regions (local heterogeneity and local compatibility) associated with different   .Obviously, small nodes in terms of degree (e.g.,   < 5) are more likely to fall into the local compatible region, which implies that those nodes with small degree should not be preserved in the backbone.
In [8], the multiscale backbone is obtained by preserving all the links which beat the significant level  for at least one of the two nodes at the ends of the link while discounting the rest.Notice that   is not symmetrical; that is, In the case of a node  with degree   < 5 connected to a node  with degree   ≫ 5, we might have   <  <   .Then this link will be preserved as it holds min(  ,   ) < .However, as discussed above, node  is likely to fall into the local compatible region, which should be kept away from the backbone.Considering that an intermediate power law degree distribution is usually observed in real systems, the disparity filter in [8] may include a multitude of outliers.To avoid including many outliers into the backbone, one can impose max instead of min to enhance the filtering rule, so that a connection is preserved whenever its intensity is significant for both nodes involved.
Secondly, most of the existing filter-based methods [8,9,16,17] suffer from the computational inefficiency, the exhaustive search of all nodes or edges in a network.For example, the filtering method based on   is heavily dependent on the number of links.As many social networking sites are evolving into superlarge scales, for example, containing millions even billions of nodes and edges, the computation will be terrible!According to the above analysis, this paper proposes a local method for extracting backbones from weighted Local heterogeneity Be compatible with the null model The perfect heterogeneity The perfect homogeneity networks.In particular, we try to answer the following two questions: (i) Q1: how to carefully design a filtering criterion to avoid including many outliers into the backbone?(ii) Q2: how to reduce the computational complexity of the backbone extraction algorithm?

Backbone Extraction Heuristic with Incomplete Information (BEHwII)
Let G = (, , ) be a given weighted graph, where  is the set of nodes (|| = ),  is the set of edges (|| = ) that connect the nodes in , and  is the weight of every edge in .Backbone extraction is formulated as finding a subset of graph G  = (  ,   ), that is, the backbone, where |  | ≪ || and ∀  ∈   ,   < .This implies that the backbone should also significantly reduce the number of edges, while preserving most essential connections.
In this section, we propose a backbone extraction heuristic with incomplete information (BEHwII for short).First, we introduce the basic idea of BEHwII, covering the local search mechanism.Second, we present algorithmic details including the complexity analysis for BEHwII.

Local Search Model.
In this paper, we employ the filtering criterion   proposed in [8].However, one major drawback lies in that it is probable to include too many outliers into the backbone as stated in Section 2. To explore its cause, we argue that this drawback originates from the looseness of the filtering rule, that is, min(  ,   ) < .Therefore, BEHwII attempts to impose max instead of min to enhance the filtering rule, so that a connection is preserved whenever its intensity is significant for both nodes involved.In BEHwII, an edge   is preserved in the backbone, if where   is the probability derived by comparing the normalized weight   with the null model, as shown in (3).With the filtering rule, BEHwII aims to extract a certain percentage (denoted by %  ) of edges satisfying (8) as the backbone.
A straightforward way for backbone extraction is to apply the exhaustive search, that is, to examine all of the edges one by one, and add the edge to the backbone as (8) satisfied.Obviously, this exhaustive search suffers from the computational inefficiency, especially when the network becomes much larger.Here, we introduce a local search model to solve this problem.We divide the explored graph into three regions: the known local area C, the boundary area B, and a larger unknown area U, as illustrated in Figure 2. Initially, we randomly select a node V  as the start node and add V  to C.Then, all neighbors of nodes in C (e.g., V  ) are added to B. The local search model selects an optimal edge   with minimum  *  from C ∪ B and adds it into the backbone if it holds (8).Areas C and B are expanded accordingly.Another edge will be selected and checked, until a certain number of edges are included into the backbone.
Remark 1.The local search model is a streaming and iterative scheme in essence [20].An iterative process is invoked to examine each node along with its neighbors and performs a computation, of which the result is associated with the processed node.Such scheme is a very promising technique of scaling the existing method.Moreover, the local search model is independent of the "global knowledge"; that is, it only needs to fetch part of the node adjacency lists into mainmemory.Due to the small-world effect, our model is validated to be slightly dependent on the initial node selection, of which the experimental results will be given in Section 4.1.

Algorithmic Details.
In this section, we introduce how to use BEHwII to extract the backbone starting from any randomly selected node.BEHwII initially places the randomly selected source node V  into the known local area (C ← {V  }) and adds its neighbors into B. Two data structures used in BEHwII are described as follows: (i) Min-heap , which stores the edge information, including   and max(  ,   ), in C ∪ B, so that every update process will take (log ||) time; ( end if (16) if |C| ≥  then (17) break; (18) end if (19) end while (20) return   ; (21) end procedure Algorithm 1: BEHwII algorithm.
(ii) List   , which stores the edges of the backbone, and every insert process will take (1) time.
We describe the BEHwII Algorithm step by step roughly as follows.
Step 1. Find the edge   with the minimal value of max(  ,   ) in C ∪ B and add it into   if it satisfies (8).
Step 2. If any endpoints on the considered edge   are not included in C (∃  ∈ {, }, V   ∉ C), remove V   from B to C; otherwise, delete edge   and turn to Step 1.
Step 3. Delete edge   and remove additional nodes (V The above process continues until it has agglomerated a certain percentage of edges, or it has discovered the entire enclosing component, whichever happens first.Note that if   with the minimal value of max(  ,   ) in Step 1 does not satisfy (8), we still check its endpoints and add corresponding edges into C ∪ B. Here, the nodes between   can be seen as the excessive nodes to continue the search process.See Algorithm 1 for more exact pseudocode.
Computational Complexity.The main computational cost of the above algorithm originates from the number of examined edges .For each examined edge   , BEHwII needs to calculate the value of max(  ,   ) on it and update the minheap .Because max(  ,   ) depends on the degrees of nodes V  and V  and on the normalized weights   and   , thus, it takes (  +   ) time to calculate max(  ,   ) on each examined edge.The updating (inserting or deleting) cost of  for each examined edge is (log ||).In general, the running

Experimental Results
Four real-world undirected and weighted networks, Lesmis, USAir97, OClinks, and RTNN, are used for experiments.stories about the September 11 attack, where each node represents a word and each tie means that the two words appear in the same story.

Comparison Results.
In this subsection, we compare BEHwII with the disparity filter (DF for short) proposed by Serrano et al. [8] in performance and scalability.BEHwII is a local search based algorithm, which can start from any randomly selected source node.To investigate the impact of the parameter V  , we fix  = 0.5 and take V  = V ℎ , V  , respectively, where V ℎ is a high-connected node and V  is a low-connected one.Both V ℎ and V  are randomly selected from the original network.For convenience, we denote BEHwII starting from V ℎ by BEHwII  ; then BEHwII  represents BEHwII starting from V  .For a given extraction goal (the percent edges kept in the backbone), the effectiveness of BEHwII  , BEHwII  , and DF can be validated by measuring the average weight and node betweenness of the extracted backbones, while the efficiencies can be measured by the number of examined edges and the overall running time.
Effectiveness. Figure 3 shows the average weight of the extracted backbones when the original graphs are extracted by BEHwII  , BEHwII  , and DF, respectively.Note that as the only parameter for DF is , for a given network, the fraction of extracted edges %  is a monotonically increasing function of .For convenient comparison, both DF and BEHwII use the same parameter %  , which is gradually increased so that the number of extracted edges grows accordingly.Two observations are noteworthy from Figure 3. First, compared with DF, BEHwII  shows slight improvements in terms of the average weight, no matter what %  is input.BEHwII  does not perform well when %  is set to be too small.For instance, BEHwII  obtains the %  = 0.1 backbone with the average weight lower than 10 on the Lesmis network, but, after using BEHwII  and DF to extract backbones, the average weight increases significantly.Another important observation is that BEHwII  and BEHwII  will trend consistently as %  grows to a certain level.As can be seen from Figures 3(a) and 3(b), when the fraction of edges grows to around 0.25, the backbones extracted by BEHwII  and BEHwII  will have the same value of average weight.As BEHwII  adds local optimum edge into the backbone, even if it starts from a low-connected source node, it can sniff several high-connected nodes within limited steps.Therefore, BEHwII  will evolve to a BEHwII  after a certain percentage of edges have been discovered.We then extensively explore the average node betweenness in the backbones extracted from Lesmis, USAir97, OClinks, and RTNN.Node betweenness centrality is the fraction of all shortest paths in the network that contain a given node, which reflects the connectedness of the node.Figure 4 shows the average betweenness of extracted nodes for different fractions of edges %  in the backbones.We can clearly find out that both BEHwII  and BEHwII  outperform DF in all of the test graphs.This implies that the edges extracted by BEHwII always lie between two high-connected nodes.As for DF, the filtering rule is so loose that some outliers (nodes with degree equal 1) will be included in the backbones, which will drop the connectedness of extracted backbone.
We then take a direct look at the extracted backbones.The Lesmis and USAir97 networks are used here as two examples.We set %  = 0.25 and  = 0.5 for BEHwII  .In the case of Lesmis, the extracted backbone obtained by BEHwII  is shown in Figure 5(a).The source node is colored with green, the nodes and edges colored with blue are those kept in the backbones, the size of the node expresses its strength (∑    ), and the thickness of the edge represents the weight on it.Interestingly, the backbone obtained by BEHwII  preserves almost all high-connectivity nodes and essential connections.We then employ DF directly on this network and obtain a backbone as shown in Figure 5(b).The clique-like pattern on the top is missed, and, what is more, two outliers (highlighted by dashed circles) are kept.As for the USAir97 network, nodes are placed in the plane according to their actual coordinates on the earth.The backbone extracted by BEHwII  , as shown in Figure 5(c), almost covers all the geographic regions of USA.In addition, the hierarchy of the transportation system is fully highlighted, including not just the most high flux connections but also small weight edges that are statistically significant because they represent relevant signal at the small scales.However, the backbone extracted by DF includes many small airports in Alaska and the west coast of USA (highlighted in dashed ellipses).
The Efficiency. Figure 6 compares the efficiencies of BEHwII and DF, given the extraction goal %  = 0.25.The numbers of examined edges by BEHwII  , BEHwII  , and DF for the four test networks are shown in Figure 6(a).Apparently, BEHwII  and BEHwII  examine fewer edges than DF does.The latter will examine all nodes and edges in the network.Figure 6(b) verifies our analysis in Section 3.2; that is, the running time of BEHwII originates from the number of examined edges.It is interesting to find that the running time of BEHwII  and BEHwII  remains nearly constant in relative large dense graphs (e.g., OClinks and RTNN), that is, because those two networks have the "small world" effect [23,24], in which most nodes can be reached from each other by a small number of hops or steps.In this context, both BEHwII  and BEHwII  can rapidly sniff those high-connected nodes; therefore their overall running times are almost consistent.

Inside BEHwII.
Here, we take a further step to explore several factors that affect the performance of BEHwII.We select BEHwII starting from a high-connected source node, that is, BEHwII  , for experiments.Two inside factors have been investigated: the significant level  and the inside filtering rule.
The Significant Level .It is particularly interesting to analyze the behavior of the topological properties of the backbones extracted by BEHwII  at increasing levels of the significant level .Figures 7(a 7(c) and 7(d), from which we observe that the original USAir97 and OClinks networks are both heavy tailed.Interestingly, almost all scales are kept during the search process until BEHwII  becomes too restrictive, in which case BEHwII  applies a very small value of .A restrictive BEHwII  cuts () off below   , which may discard the region of small weights.Finally, we analyze the cumulative node betweenness centrality distributions of extracted backbones.It is worth mentioning that the node betweenness centrality in the backbone is given as that in the original network.Figures 7(e) and 7(f) give the evolution of the cumulative betweenness centrality distribution with different .For both test graphs, () starts from a very low value if BEHwII  applies a very small value of , which implies that those low-connected nodes will not be included in the backbones.
Therefore, we can conclude that values of  in the range [0.4,0.8] are optimal, in the sense that backbones extracted by BEHwII  in this region have a large proportion of highconnective nodes and essential connections, and the stable stationary degree/weight distributions, compared with the original network.It is important to stress that BEHwII  also includes the connections with the largest weight present in the network.This is because the heavy tail of the () distribution is mainly determined by relevant large-scale weight.This is clearly illustrated in Figures 7(c) and 7(d).
The Inside Filtering Rule.We further explore the critical factor that contributes to the success of BEHwII  .As discussed  in Section 3.1, BEHwII  uses a strict filtering rule to absorb edges.Here, we relax the previous inside filtering rule, by imposing min instead of max, so that a connection is preserved whenever its intensity is significant for one of the nodes involved.In this loose BEHwII  (denoted by BEHwII  * ), an edge   is preserved in the backbone, if min(  ,   ) < .We visualize the backbones of Lesmis and USAir97 extracted by BEHwII  * in Figure 8.For each test network, we set  = 0.5 and %  = 0.25.In the case of the Lesmis network, six outliers (highlighted by dashed circles) are extracted by BEHwII  * , and it also fails to discover many essential connections.Obviously, its performance is worse than BEHwII  by comparing Figures 8(a) and 5(a).BEHwII  * has made progress in the case of USAir97, as most regions of USA have been covered in the extracted backbone as shown in Figure 8(b).However, it still includes many small airports in Alaska and the west coast of USA (highlighted in dashed ellipses) as DF does.

Related Work
In the literature, the existing backbone extraction methods can fall into two categories: the coarse graining based methods and the filter-based methods.The methods based on the coarse graining clump nodes sharing common attributes together in the same group/community and then consider the whole group as one single unit in the new networks.Some methods along this line include the box-covering technique [4], fractal skeleton [7], and traditional community detection techniques such as the Kernighan-Lin algorithm [11], latent space models [12], stochastic block models [13], and modularity optimization [14].The differences between these methods ultimately come down to the precise definition of a community.However, there is often no clear statement on whether properties of the initial network are preserved in the network of groups.
The filter-based methods typically employ a bottom-up strategy to extract the backbone.They often start by defining a statistical property of a node or an edge, and then this property is used as a criterion to determine nodes/edges to be preserved or discarded.In this case, the observation scale is fixed and the representation that the network symbolizes is not changed.Instead, those elements, nodes, and edges, which carry relevant information about the network structure, are kept while the rest are discarded.An example of a well-known hierarchical topological filter is the -core decomposition [16], with a filtering rule that acts on the connectivity of the nodes.In the case of weighted networks, two basic reduction techniques refer to the extraction of the minimum spanning tree [17] and the application of a global threshold [18] on the edge-weights, so that just those that beat the threshold are preserved, as real-world weighted networks that are usually with strong disorder heavy-tailed distributions of weight, which exerts nontrivial challenges to define the filtering property.Serrano et al. [8] addressed this challenge by introducing the disparity filter based on the null hypothesis.
In summary, although backbone extraction based on the coarse graining and filter models are extensively studied, they all need the knowledge of the entire network.Further study is still needed on finding a nice balance between the good performance and high efficiency.Our work attempts to fill this void by conducting backbone extraction based on an efficient BEHwII method.

Conclusion
In this work, we propose a backbone extraction heuristic with incomplete information (BEHwII) to find the backbone in a complex weighted network.First, a strict filtering rule is carefully designed to determine edges to be preserved or discarded.Second, we present a local search model to examine part of edges in an iterative way, which only relies on the local/incomplete knowledge rather than the global view of the network.Experimental results on four reallife networks demonstrate the advantage of BEHwII over the classic disparity filter method by either effectiveness or efficiency validity.

Figure 2 :
Figure 2: Illustration for the local search.

Figure 3 :
Figure 3: Comparison in terms of the average weight.

Figure 4 :
Figure 4: Comparison in terms of the average betweenness.

Figure 5 :
Figure 5: Comparison in terms of network visualizations.
) and 7(b) show the evolution of the cumulative degree distribution, () = ∑   ≤ (  ), with different values of  for USAir97 and OClinks, respectively.The backbones extracted by BEHwII  have the cumulative degree distributions similar to the original networks.Smaller values of  have flat startups, indicating that the extracted backbones contain fewer low-degree nodes.The evolution of the weight distribution (()) with different values of  is shown in Figures

Table 1 :
Real-world networks for experiments.

Table 1 ,
where || and || indicate the numbers of nodes and edges, respectively, in the network, ⟨⟩ indicates the average degree, and ⟨⟩ indicates the average weight.
[24]alized distance among two airports.OClinks[23]is a network created from an online community, where nodes represent students at the University of California and edges are established between two students if one or more messages have been sent from one to the other.RTNN[24]is also a coappearance network including all words/terms in online