Quasi-Closeness: A Toolkit for Social Network Applications Involving Indirect Connections

We come up with a punishment in the form of exponential decay for the number of vertices that a path passes through, which is able to reconcile the contradictory effects of geodesic length and edge weights. This core thought is the key to handling three typical applications; that is, given an information demander, he may be faced with the following problems: choosing optimal route to contact the single supplier, picking out the best supplier between multiple candidates, and calculating his point centrality, which involves indirect connections. Accordingly, three concrete solutions in one logic thread are proposed. Firstly, by adding a constraint to Dijkstra algorithm, we limit our candidates for optimal route to the sample space of geodesics. Secondly, we come up with a unified standard for the comparison between adjacent and nonadjacent vertices. Through punishment in the form of exponential decay, the attenuation effect caused by the number of vertices that a path passes through has been offset. Then the adjacent vertices and punished nonadjacent vertices can be compared directly. At last, an unprecedented centrality index, quasi-closeness, is ready to come out, with direct and indirect connections being summed up.


Introduction
Among many social network analysis (SNA) methods, point centrality has received particular focus from network researchers. Point centrality undertakes the task of identifying important and insignificant actors, which is the key application in graph theory. As early as 1934, Moreno tried to distinguish "stars" (people who drive more attention in a network) and "outsiders" (people who are neglected by others in a network) quantitatively [1]. Over the years, network researchers have developed many kinds of centrality variants [2,3]. Measuring centrality in various aspects, these indices have been proved to be of great value in understanding the roles of vertices in networks [4]. Numerous efforts have been made to classify centrality indices. In an influential research, Freeman [5] gave more importance to degree [5][6][7], closeness [8][9][10], and betweenness [11,12]. Along with eigenvector centrality [13], the four indices have become the most famous ones in measuring the centrality of a point.
In SNA terminology, geodesic refers to the shortest path between a given pair of vertices, and there may exist more than one geodesic. To some extent, geodesic is the most effective way for a vertex to communicate with other vertices. A number of centrality indices based on shortest path are widely used, such as closeness, betweenness, and Harmonic centrality [14] to deal with unconnected graphs (graphs with vertex having no path to any others). As a rapid expanding interdisciplinary field, SNA has encountered a variety of unprecedented application contexts, which makes the existing centrality indices powerless. Theoretically, there is no centrality index taking direct and indirect connection (if the geodesic distance between a given pair of vertices equals 1, they are directly connected; otherwise they are indirectly connected) into account in undirected valued graphs. Specifically, strength centrality is able to deal with valued graphs but loses indirect connection information. Closeness and betweenness involve direct and indirect connections; however they are not feasible in valued graphs.
We propose a quasi-closeness centrality index which is capable of being applied for valued graphs. Bavelas [15] and Leavitt [16] showed that distance is positively correlated with communication efficiency, which means that transition 2 Mathematical Problems in Engineering fades with distance. Following this thread, many centrality indexes based on shortest path are proposed by researchers. In line with other shortest-path centrality indices, our index is based on geodesic as well. We assume that, given a pair of unordered vertices in an undirected valued graph, the optimal path for information spreading or communication must be a geodesic. This premise is born from the idea that information attenuation caused by longer distance is serious enough to neutralize the information augment arising from more weight.
The centrality algorithm we propose can be divided into three parts and corresponds to three applications. Firstly, inspired by the spirit raised by Beauchamp [17] that geodesic distance can seriously affect the communication efficiency; we propose an algorithm which is different from the Dijkstra's [18] and allows for the effect of geodesic distance. The algorithm is applicable in finding the optimal route between given vertices, which is a common situation in network analysis.
Secondly, as is frequently the case, an information searcher needs to choose the optimal supplier among many adjacent or nonadjacent candidates. In line with Hubbell [19] and Friedkin [20] who noted that evaluating the importance of a node in a network needs both direct connections and indirect connections, we advocate that indirect connections should also be considered. To achieve this, we come up with a unified standard to compare the relative importance of adjacent vertices and that of nonadjacent vertices and then get the priority order of all candidate information suppliers. Specifically, we set penalty in form of exponential decay for geodesics distance, and the decay indices are flexible values depending on concrete applications. This unified standard can also be applied to find the optimal partner in a network.
Finally, by summing up the exponential decayed indirect connections and direct connections, we can get an unprecedented point centrality index, namely, quasi-closeness centrality. This centrality takes both indirect and direct connections into account and is applicable for weighted networks, such as the citation network, biological network, or logistics network.
The rest of the paper is organized as follows. Section 2 gives some preliminary notions closely related to our theme. Section 3 describes the algorithm. Section 4 describes simulation results. Section 5 concludes.
An undirected graph = ( , ) consists of a vertex (also called node) set and a set of undirected edges (also called links or connections). An edge represents the tie between a pair of disorder vertices. If there is an edge between vertices and , we say that and are adjacent. And the graph is connected if every pair of vertices is linked by a path.
Given a pair of vertices ( , ) in a connected graph, if they are adjacent, the distance between them equals one. Otherwise they must be linked by other vertices. In this paper we focus on geodesic between vertices. A geodesic is defined as follows. In Figure 1, vertices 1 and 5 are not adjacent, but they are linked through other vertices. Geodesic is defined as the shortest path; here it refers to path (1,2,4,5), rather than (1,2,3,4,5). Let (1,5) denote the geodesic distance or simply the distance between vertices 1 and 5, which is the number of edges; then (1,5) = (5,1) = 3.
Perhaps the simplest centrality index is degree, which is the sum of vertices adjacent to a given vertex. Degree is often calculated in a scaled form: where is the degree centrality of vertex and is the number of all vertices in the graph. equals one if vertices and are adjacent and equals zero otherwise. This scaled degree is ranged between 0 and 1.
Closeness centrality measures how close an actor is linked to others in the network. The idea is that centrality of a vertex decreases as geodesic distance increases. What makes closeness different from degree centrality is that it depends not only on direct ties but also on indirect ties; that is to say, closeness picks up information on nonadjacent vertices lost by degree centrality. Sabidussi [9] proposed that closeness is (2) As mentioned above, ( , ) is for the geodesic distance between and . The standardized closeness ranges from 0 to 1.
Though containing more information than degree, closeness is not feasible for valued graphs. In an undirected valued graph = ( , ; ), is the value of corresponding edge, representing distance or strength between adjacent vertices and always being positive. Strength centrality, which measures the connection strength of a given vertex to other vertices, is commonly used in valued graphs. It assumes that stands for connection strength between unordered vertices; that is to say, higher value of represents stronger connection; in particular, the fact that equals zero means there is not any connection between vertices. Strength centrality can be calculated as follows:  where is the value for the connection between vertices and . Strength centrality is a general form of degree centrality . The only difference between and is that the former takes connection strength into account, but it loses the information of indirect connection as well.
A famous method that gives consideration to both weights and indirect ties is Dijkstra algorithm conceived by computer scientist Dijkstra [18], which is a typical algorithm to find the shortest path between vertices in undirected valued graphs, such as city distance networks. The algorithm exists in many variants, and the most common variant is finding the shortest path from one "source" vertex to all other vertices in a graph, thus forming a shortest-path tree. There are two main differences between strength centrality and Dijkstra algorithm. Firstly, strength is a kind of point centrality used to evaluate its position or reputation, while Dijkstra algorithm is used to find out the optimal path between given vertices (finding out the optimal path between vertices is indispensable in calculating the point calculating indirect for a node; thus it is reasonable to compare Dijkstra algorithm with strength centrality). Secondly, in Dijkstra algorithm, the weight, , means barrier of communication between vertices, such as distance between cities, while in most centrality indices refers to communication convenience between vertices (of course, one can effortlessly convert communication barriers to communication convenience, just using the reciprocal form). Furthermore, Dijkstra algorithm accounts for not only direct connection but also indirect connection, which makes it more comprehensive than strength centrality. While Dijkstra algorithm is commonly used in many fields where it is necessary to find the shortest path, it has a nonnegligible disadvantage, that is, failing to account for the cost of constructing a node, such as constructing a logistics center in a logistics network. The quasi-closeness centrality we propose in this paper is similar to Dijkstra algorithm but overcomes the defect. Similar to but more effective than Dijkstra algorithm, Floyd algorithm finds the shortest path between a given pair of vertex with an important improvement for its appropriateness for directed and negative weight graphs.

Algorithm
In this section, we first give a more specific description of the quasi-closeness centrality and its calculating process, and then the main features of quasi-closeness are presented.

Optimal Route between Nonadjacent Vertices.
Consider the optimal route for an information demander to get information from supplier in a social network like that in Figure 2, which represents an undirected valued graph. Edge values in the network stand for connection strength, that is, convenience to get information transmitted. Higher value of an edge means smoother communication between the vertices.
Our main premise is that information is attenuated with the increasing of number of involved vertices, which is reasonable in many applications, such as citation networks, biological networks, and communication networks. As noted by Beauchamp [17], actors with short distance to others can be very productive in information exchange process. Hakimi [23] and Sabidussi [9] quantitatively researched the "minimum steps" contacting to other vertices. In this vein, many researchers related geodesics to centrality based on the idea that centrality is negatively correlated with distance. As to finding out geodesics (i.e., shortest path), Flament et al. [24] and Harary et al. [25] proposed several clever algorithms, which are standard in network computing programs such as UCINET and Pajek, and more flexible ways to realize it are some programing languages like R and Python.
As is frequently the case, there is more than one geodesic between a given pair of vertices. In this situation, a clever solution is to give each geodesic equal weight such that the weights add to unity [12,[26][27][28]. Newman [4] put forward another method assuming that the route of information spreading through is some kind of random walk. Though both methods have advantages, they are not applicable for valued graphs.
Another branch beyond the geodesic-based method is Dijkstra algorithm proposed by Dijkstra in 1959 [18]. In this paper he came up with a computer program to find all shortest paths from one given vertex to other nodes in the network, thus forming a shortest-path tree. A typical application of Dijkstra algorithm is to find the shortest path from one source node to another in a city network. The only difference is that edge value stands for geographical distance (communication barriers) in city network, while in Figure 2 edge value stands for convenience in communication. Certainly, we can effortlessly convert communication barriers into communication convenience through reciprocal form. Essentially, Dijkstra algorithm is a practical solution to the following optimization problem: where ( 1 , 2 ) is for weighted distance between nodes 1 and 2 . In Figure 2, edge values represent communication convenience, while, in the typical Dijkstra algorithm settings, edge values represent communication barriers. We take their reciprocal form to reconcile this inconsistency. In Figure 2, distance through 2 and 3 is 1/3 + 1/2 + 1 = 11/6, and distance through 6 and 5 is 1 + 1/3 + 1/3 = 5/3. Then the route passing through nodes , 2 , 4 , 5 , and is the optimal one to be selected. The optimal route distance is 1/3 + 1/3 + 1/3 + 1/3 = 4/3.
In many situations where the cost of forming a node (such as the construction of a logistics center) or the attenuation of information transmitting through nodes (such as the communication network or citation network) is unneglectable, it is not reasonable to prescind the number of nodes to pass through. In accordance with the thought of Beauchamp [17], we limit our candidates for optimal route in the sample space of geodesics. Simply put, we add another constraint on Dijkstra algorithm; the optimization problem becomes min ( 1 , 2 ) , where 1 , 2 is a route set consisting of all geodesics between nodes 1 and 2 . We choose optimal route, the shortest path, from 1 , 2 . For example, there are two geodesics between and in Figure 1, that is, . Note that edge values represent communication convenience, and then the second route is selected.

Exponential Decay for Indirect Connections. A common situation is that
is not the only information source and that there are several completed candidates for information demander to choose from. With optimal route found between a given pair of vertices, the next step is to find out the optimal information source node among candidates. So, one approach is to get the priority order, in terms of communication efficiency, of all the candidate information source nodes. It is rather intuitive to sort the adjacent nodes by the value of edges, while it is more sophisticated to handle nonadjacent nodes. The reason is that the different number of nodes that optimal routes passed by varies and that there is no existing unified standard for comparison, including comparison between adjacent and nonadjacent nodes and comparison among nonadjacent nodes.
Our main contribution is providing a unified standard for comparison among nodes, no matter whether they are adjacent or not. We achieve this by the punishment, in the form of exponential decay, for the number of nodes that a route passes by. Specifically, the route between nonadjacent vertices is like a plucked rope. We first homogenize it by calculating its arithmetic average; see Figure 3. Then we set exponential decay as penalty for the information attenuation caused by more nodes. Formally, let denote the geodesic distance between vertices and . In Figure 3, equals two, and 1 , 2 , 3 , . . . , denote edges on the route. Then the connection strength is where is the decay index and is equal to or greater than one. The value of depends on concrete applications and is positively corresponding to the construction cost or signal attenuation caused by nodes.
In particular, when the vertices are adjacent, that is, equals one, * is just the edge value of the vertices.
As * is exponentially decayed and has eliminated the attenuation effect or construction cost, it is reasonable to compare it with other edges. Thus the priority order is obtained, and then the aforementioned problem about how to choose the best information source node is solved.

Quasi-Closeness
Centrality. An important application of social network analysis is to identify the "most important" vertex in networks. As early as 1934, Moreno tried to distinguish "stars" (people who drive more attention in a network) and "outsiders" (people who are neglected by others in a network) quantitatively. Hubbell [19] and Friedkin [20] noted that evaluating the importance of a node in a network needs both direct connections and indirect connections. This point of view is accepted and carried out by researchers. We propose an unprecedented centrality index that takes into account both direct and indirect connections. What is more, the centrality index is applicable for weighted networks, which is rather common in modern applications. Preparations in previous subsections make it straightforward to calculate the quasi-closeness centrality. Given a vertex, what needs to be done is to sum up all direct and indirect edges connected to the vertex. The flowchart is shown in Figure 4.

Formatting the Undirected Weighted Networks.
For the sake of simplicity, we set the number of vertices to be 10 in the network. Without loss of generality, the weights of edges in the network are random numbers between 0 and 3, where 0 means no connection between the pair of undirected vertices, 1 means weak connection, 3 means strong connection, and 2 means medium connection. That is to say, the strength of connection is positively correlated with weights, which is in contrast to Dijkstra algorithm. The simulated adjacent matrix is shown in Table 1.
Decorated visualization of the network is shown in Figure 5. The vertices coordinate layouts are generated according to circular Reingold-Tilford algorithm [29]. Vertex area is proportional to the strength centrality of the vertex. Width of edge is positively correlated to the corresponding weight. In the network, vertex G has the highest degree centrality score and vertex E has the highest strength centrality score.

Finding All Geodesics.
As our quasi-closeness centrality belongs to the branch of geodesic-based centrality indices, a prerequisite to get quasi-closeness is to find out all geodesics between every pair of vertices. In this stage, we reset all the existing edges having the same weight; thus the original network becomes unweighted. For adjacent vertices, there is only one geodesic, while nonadjacent vertices may have more than one geodesic. For example, vertex E is adjacent to vertices A, B, C, D, G, H, I, and J, which means they are directly connected. While vertex E and 6 are not adjacent, they are indirectly connected. As Figure 6 says, there are six geodesics between vertex E and vertex F, that is, (E, A, F), (E, B, F), (E, C, F), (E, G, F), (E, H, F), and (E, J, F).

Exponential Decay for Indirect Connections.
A main feature of quasi-closeness is that it involves indirect connections, and what makes this reasonable is the penalty for indirectness. We set exponential decay as the penalty, which is quite feasible in many applications. Among the six geodesics between vertices E and F, geodesic (E, J, F) has the highest weight (2.25) after exponential decay, which will be used to calculate the quasi-closeness centrality. Table 3 is the network updated by involving indirect connections.

Calculating Quasi-Closeness
Centrality. Similar to the algorithm of strength centrality, quasi-closeness of a vertex is the sum of weights from the vertex to the others. Table 4 shows the quasi-closeness centrality and other common centrality indices; higher score means the vertex plays more important roles in the network. The values in parentheses rank every index in increasing order. Intuitively, quasicloseness should have larger values than strength centrality because of different ways to deal with indirect connections, which is reflected in Table 4. Nevertheless, quasi-closeness and strength have similar ranks in this simulation network.

Analyzing the Freeman EIES Data.
In this section, we focus on a dataset that arose from an early experiment organized by Freeman (1979) (the dataset is available at https://toreopsahl.com/datasets/). Among the several networks within the dataset, we are interested in the network with edge values representing the total number of messages that person sent to over the entire period of the experiment. The original network of interest is directed; however we undirected it by copying the lower tangle over the upper one in adjacent matrix, considering that the centrality of one researcher can be evaluated by the number of messages that he received. To protect privacy, the names of all researchers are replaced by numbers from 1 to 32. The data are also used by Wasserman   an Electronic Information Exchange System (EIES). We focus on 32 of them who completed the experiment. Related information metadata about the researchers and messages among them between the experiment periods were collected. We choose this commonly used and online available data to make our simulation reproducible). Due to the huge differences among vertex strength centrality and edge value, it is not appropriate to display all information in the visualization of Freeman's EIES network. So all the vertices have the same area, and the edges width is also equal. However the relative importance on degree can be identified by their coordinates. In Figure 7, we can see that researchers 29 and 31 occupy the central position, reflecting their important position in the network.
Using quasi-closeness as an anchor, all researchers are sorted in descending order. For the sake of simplicity, only 7   Table 5. It is obvious that quasicloseness is generally bigger than strength centrality, which is reasonable as the quasi-closeness centrality involves indirect connections. Figure 8 displays scaled index scores of quasi-closeness and other popular centrality indices of the 32 researchers. The original scores are scaled to facilitate the comparison. As expected, all the five indices have similar trend in crosssection comparison. The quasi-closeness index always has moderate value relative to other indices. As a result, its robustness is manifested.

Conclusion
We have pointed out some unprecedented applications that existing centrality indices are not wise enough to handle. Given an information demander, there are three common problems, that is, how to choose optimal route to contact the single supplier, how to pick out the best supplier among multiple candidates, and how to calculate the point centrality of the given demander. We proposed three reasonable solutions in one logic thread for the problems. Different from Dijkstra algorithm that uses edge weights as the only standard to choose optimal route, our solutions allow for the cost of constructing a node (like logistics network) or signal attenuation through 8 Mathematical Problems in Engineering

16
(2) 0.059 (7) 0.500 (7) Table 4 shows the quasi-closeness centrality and other common centrality indices; higher score means the vertex playing more important roles in the network. Columns Qc, Dc, Sc, Cc, and Bc represent quasi-closeness, degree centrality, strength centrality, closeness centrality, and betweenness centrality, respectively. The values in parentheses stand for rankings of the nodes according to the centrality index. nodes (wireless network). With respect to the problem of choosing the best supplier among several candidates, we come up with a unified standard for comparison among the candidates, making the problem straightforward. To conclude, punishments in form of exponential decay for the number of nodes that a route passes through are set, and the decay index depending on concrete applications is flexible enough to offset nodes effects. At last, a new kind of point centrality index involving both direct and indirect connections is ready to come out. Based on previous steps, it is rather straightforward to get the centrality index, with direct and indirect connections summed up. Since the quasicloseness centrality takes into account both direct and indirect connections, it can be viewed as a more comprehensive index.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.