Group Measures and Modeling for Social Networks

Social network modeling is generally based on graph theory, which allows for study of dynamics and emerging phenomena. However, in terms of neighborhood, the graphs are not necessarily adapted to represent complex interactions, and the neighborhood of a group of vertices can be inferred from the neighborhoods of each vertex composing that group. In our study, we consider that a group has to be considered as a complex system where emerging phenomena can appear. In this paper, a formalism is proposed to resolve this problematic by modeling groups in social networks using pretopology as a generalization of the graph theory. After giving some definitions and examples of modeling, we show how some measures used in social network analysis (degree, betweenness, and closeness) can be also generalized to consider a group as a whole entity.


Introduction
Network modeling is an area of research which covers several domains like computer sciences, physics, sociology, or biology. In social networks modeling, graphs are often used to describe the links representing relationships or flows between entities [1]. Based on graph theory, the studies consider in most cases individuals as single elements, a group being formed by several persons interacting with each other. Most of the few works on modeling groups in social networks consider a group as a combination of persons [2], not as a whole entity. As social network analysis leads to centrality notion and others sociometric features, what about group centrality?
The centrality of a vertex in a graph is widely used to determine the relative "importance" of this vertex within the network [3]. Centrality measures enable us to find users who are extensively involved in relationships with other network members. There are different centralities such as degree centrality, betweenness centrality, or closeness centrality. The problem we face is the following: analyzing a vertex can be done with this kind of measure, but if we analyze a group of persons using the same measure, we will have no particular emergence of characteristics as the union property of the neighborhoods in a graph is preserved. As social networks are complex networks [4][5][6], emergence of phenomena can occur [7], and the behavior of a group of persons can be different from the "sum" of the person behaviors composing the group. Some work tried to capture the different scales of a network, and a group can be viewed as a community [8]; thus, in our opinion, graph theory only is inadequate to model all complex interactions occurring in a social network. Some group-based measures based on graphs have been proposed before [2,9,10], and multilayer analysis is a topic widely studied [11][12][13] combining multiple graphs; however, to our knowledge, there is no general framework coupling the modeling of a group of entities with a multirelational network. Based on this observation, we propose to use a mathematical framework for group modeling in social networks including several relation types and new groupbased measures from a topological point of view. This paper is structured in two parts: first we introduce pretopology formalism, and we give the definitions and some illustrations for group modeling in a social network, which are reminders of a previous work [14]. Then we define new group measures with pretopology generalizing known classical methods. The reader of this paper is supposed to be familiar with graph theory and some concepts from mathematical topology.

Modeling Groups in Social Networks
Before giving theoretical definitions, we first illustrate the problem of a realistic group modeling by giving a simple example: we would like to represent interactions between four persons. A small social network with a simple symmetrical friendship relation will help in illustrating our approach (Figure 1).
Let focus on John and Ed. John has three friends: Tim, Ben, and Ed, while Ed has two friends: Ben and John. As we notice, it is very easy to know who are the friends of a person. But now, we want to know whose people are the friends of the group {John, Ed}. In this case, having a look at the nonoriented graph induced by the network, we see Tim in relation with John, and Ben in relation with John and Ed. As we can reach Tim and Ben from John and Ed, we could say that Tim and Ben are friends with John and Ed. However, in reality, Ed and Tim are not friends. We need to set a constraint, saying, for instance, the friends of the group {John, Ed} must be friends with each person of the group. So, we finally have only Ben as a friend of the group {John, Ed}. The operation we made in this small example is defining the neighborhood of an element and, more generally, of a set of elements. This example can be extrapolated to large known online social networks like Facebook or Twitter. Thus, we need a theory able to precisely model the neighborhood of sets: here we thought of pretopology formalism.
Pretopology is a mathematical modeling tool for the concept of proximity suitable for discrete spaces [15,16]. It provides also powerful tools for structure analysis and automatic classification [17]. Pretopology theory generalizes topology theory and has weaker axioms than topology, allowing us to analyze transformation procedures step-by-step and handle a multirelation description. As pretopology formalism is based on set theory, a group of individuals is now considered as a set, allowing us to consider a group as a whole independent entity.
A pretopological space is a set endowed with a pseudoclosure map.
The pseudoclosure operator is useful to model dynamic processes in the considered space ( , ). In social networks, we can say that elements belonging to ( ) are close to , allowing us to determine direct neighbors. For each pseudoclosure, we absorb new elements ( ⊆ 2 ( ) ⊆ 3 ( ) ⊆ ⋅ ⋅ ⋅ ⊆ ( )) which are more and more "distant. " Hence, we are able to model complex dynamics like, for instance, information diffusion process. The definition of the neighborhood can also be based on several different relations (see [16] for the complete mathematical definitions).
To understand this approach, let us return to our network example with the four friends and define the problem by applying the pretopology formalism. We build a pretopological space ( , ) with four elements in and we build ( ) with ∈ P( ) as With the same assumptions, we make the same operations as in the previous example: In a network, diffusion process modeled by the pseudoclosure can stop its progression. Such a configuration is called a closure [22] (Figure 4). Definition 3 (closed subset and closure). Let ( , ) be a pretopological space: (i) a subset of such as ( ) = is called a closed subset of for (⋅). The notation (⋅) corresponds to the closure operator, (ii) an elementary closed subset, denoted as , is the closure of a one element set { } of . The notation ({ }) is equivalent.
Closure is very important because of the information it gives about the "influence" or "reachability" of a set, meaning, for example, that a set can influence or reach elements into ( ), but not further.

Modeling Group
Interactions. First, we have to define a social network with pretopology concepts. A social network is a social structure made of nodes (which are generally individuals or organizations) that are tied by one or more specific types of binary or valued relations [23].
In pretopology, we can generalize this definition by the fact that a (social) network is a family of pretopologies on a given set ( Figure 5) [24]. Different kinds of relations with different natures can be represented: for instance, we can model a social network containing work colleagues and geographic relations. We can build a pseudoclosure answering to a modeling problem, having, for example, a person close to another if they are friends, working together, and living close to each other (using a given distance threshold). The following paragraphs give to the reader examples of how we can build pseudoclosures depending on the data or information we want to use for our modeling.
Example 5 (metric space). We represent the relations between people with an euclidean distance (the people are represented as nodes), considering, for instance, the geographical location of each person.
Let be endowed with a metric defined by a distance . Let be a positive real. For each element of , ( , ) is a ball with the center and a radius defined by A pseudoclosure (⋅) can be built with ( , ): The pseudoclosure ( ) is composed of all elements of and all elements ∉ such that is within a distance of at most from at least one element of ( Figure 6).
Example 6 (binary space). In social networks, we are confronted to nonmetric relations, representing qualitative information (friendship relation, etc.). In such a space, the elements of are bound by reflexive binary relations with ∈ N * . is not necessarily symmetric. We define ∀ ∈ {1, . . . , }: As the relation is reflexive, belongs to ( ) and to −1 ( ).
We can construct two pseudoclosures: These pseudoclosures are not equivalent when is not symmetric. The pseudoclosure ( ) is composed of ( is reflexive) and of all elements which have (or −1 ) relations with, at least, one element of . An illustration of this kind of space with two binary relations is illustrated in Figure 7. In this example, an element of belongs to pseudoclosure of if is in relation with an element of according to relation 1 and 2 .
Example 7 (valued space). In order to model certain problems, binary relations are not sufficient. We therefore need to have a value (integer, real, function, . . .) on the links. In this kind of space, the elements of are bound by a valued relation. For instance, we can define an integer value V on relations as  We can build a pseudoclosure giving different results between singletons and sets ( − and are equivalent notations corresponding to the complementary of ): where is an integer. The pseudoclosure ( ) is composed of and of all elements where the sum of valued edges between some elements of and is greater than the threshold . Figure 8 gives an illustration of this space with = 4. This kind of modeling can be used in social networks where weighted relations are necessary and illustrates the interest of the pretopology modeling. Indeed, this example shows that group behavior is different than the "sum" of individuals composing it. In Figure 8, the person at top (bold) is absorbed because he knows 2 persons (value = 1) a little and better another one (value = 2), so he can be considered as a friend of group . If we take each individual of saying this external individual, the person, at top, is friend of ; if the value of a link is superior to = 4, he will not be taken into account.

Example of a Complex Group Interactions Model.
We show how we can model social networks with complex interactions. The following example is taken from [25] study, a dataset which represents several relations between monks in a cloister and has been often used in sociology studies. It concerns the social interactions which have been collected as numerous sociometric rankings. A lot of relations are coded, but we focus on two of them: esteem and disesteem. Relations are nonsymmetric and weighted with three integer values, from 1 to 3 (3 indicates the highest or first choice and 1 the last choice). We built our model on the following assumptions: a person is close to another person if (i) there is an esteem relation from to according to a chosen threshold, (ii) there is no disesteem relation from to according to a chosen threshold.
This proximity definition makes sense if we consider group coalitions: if a person asks other people "who wants to join my group ?", people who have the greatest esteem for him should join his group more than the others. But this person will not accept in his group people he does not esteem, even if these people have esteem for him. We set two valued nonsymmetric relations with the set containing individuals: with est being the esteem relation and disest the disesteem relation. We build our pseudoclosure as ∀ ∈ P ( ) , with , ∈ N. With = 3 and = 1, we put a strong restriction on the neighborhood: if we apply this pseudoclosure on a set , people would be in ( ) if they have esteem with one or several persons of (according to ) and if no people of have disesteem for them (see Figure 9).
In this model, the question of finding the largest group following the rules of our neighborhood can be treated by building F , the family of all elementary closed subsets of (see Definition 3.). When closure is applied to each singleton    of , it reveals that VICTOR 8 is the person who can rally the biggest amount of people in the network (Figure 10).
To highlight the interest of this formalism for centrality measures, using the network sample Figure 9, we give the example of the degree centrality (see Section 3.1 for definitions). First, degree definition in weighted directed graph changes from the classical definition, as we can, instead of counting incoming or outgoing arcs, sum all the weights of incoming or outgoing arcs. In this example, there are two relations, giving two graphs. The set contains two vertices and : as we can easily compute degree values for each vertex with each relation (e.g., we choose the incoming arcs giving deg Est-in ( ) = 6, deg Est-in ( ) = 5 by summing weights or only use the classical degree definition giving deg Est-in ( ) = 3 and deg Est-in ( ) = 3), and even for the set (deg Est-in ( ) = 8 by summing weights or deg Est-in ( ) = 5 using classical degree definition), but we cannot directly answer the problem we are working on, meaning representing a degree for people who wants to join a group. We should put others rules like "an esteem arc is taken into account if its weight is superior to " and we should also consider disesteem to obtain the same results we can have directly with pretopology which are, in General examples of social networks modeled with pretopology and complex interactions modeling have been presented. Building a pseudoclosure map depends on the network nature and problem representation and can need several pseudoclosures to be combined to obtain specific results. Several previous works have investigated in modeling complex systems with pretopology and have shown some interesting results in modeling the impact of geographic proximity on scientific collaborations [26], structuring and clustering data [17], modeling pollution phenomena [27], analyzing the evolution of a communicable disease [28], or analyzing communities of the Web [29]. This section showed the interest of modeling groups in multirelational social networks by applying pretopology. In the next section, existing measures used in social network analysis are adapted in order to give more general and relevant results for group measurement.

Group Measures for Social Network Analysis
In social network analysis, choosing the "good" measure depends on the problem we are working on and on the properties we would like to highlight. Measures are used to describe the features of actors within social networks as well as to indicate personal importance of individuals [2]. The centralities are some of these measures and are usually applied to individual actors. However, there are many situations when it would be advantageous to be able to apply them to a group or a set of actors. The proximity of actors would be dependent on each actor neighborhood but also on a neighborhood group formed by the actors. It would be useful to identify emergent groups in a network (such as cliques or structural equivalence in graphs). In another way, we would like to construct groups that have maximal centrality, for example, to maximize information diffusion through a particular group. Another thing is the ability to optimize the efficiency of a group by removing actors who would not or slightly reduce the centrality of the group. Moreover, the current original and extended centralities measures have some weakness [30] that the group versions resolve: (i) classical centralities are rarely used in multirelational networks, as the complex networks consist of more than one type of relationship, (ii) there is a lack of applicability in weighted networks, (iii) disconnected networks are not handled by methods based on the shortest paths (if a given node cannot be reachable by one of the rest nodes, it results in not relevant outcomes).
Furthermore, if groups are seen as sets in a pretopological space with complex neighborhoods, groups cannot be measured with classical centralities. Some work goes beyond simple graph theory using hypergraph formalism [31], but it is still not sufficient to model multirelational and dynamic networks [32,33]. In this section, group measures are not normalized because it can bring some bias in the measures. If we take the normalized group degree centrality (NGDC) in [2] study, normalizing the measure benefits larger group ( Figure 11) and gives erroneous results. Constructing groups with maximal centrality becomes a problem, and working without normalization allows for making this construction where resulting groups will have a pertinent size.
Remark Notation. The difference between two sets noted as − means we remove ∩ from .

Group Degree Centrality.
In graph theory, we simply call this measure the degree. In a nonoriented graph, the degree of a node is the number of edges incident upon a node. In an oriented graph, we distinguish for a vertex V its indegree and Journal of Complex Systems 7 outdegree. The indegree corresponds to the number of arcs with V as the target of the arc. The outdegree corresponds to the number of arcs with V as the source of the arc. The degree of V is equal to the sum of indegree and outdegree of V (see [34]). In sociology, degree centrality measures locally the individual capacity to communicate, independent of its neighbor centralities [35,36]. Degree centrality is the simplest and most intuitive measure that can be used in network analysis.
Definition 8 (group degree centrality). We define the group degree centrality in a pretopological space ( , ) denoted by ( ), where ∈ P( ) such as In pretopology, this process already exists and is called orle or external semifrontier defined by ( ) = ( ) − . The group degree centrality returns the cardinality of this process equal to the amount of direct neighbors of a group.

Group Betweenness Centrality.
Betweenness centrality has been proposed by [37]. The concept of this measure is based on the idea that an individual can be weakly connected to others and be distant from them, but this individual can nevertheless be an important intermediary, for example, concerning information exchange between other individuals of the network. The stronger the intermediary role of a person is, the more he or she can take control over the communications or be independent of other members. Such a person can easily influence the network by filtering or distorting information circulating on it. This special position allows an assurance of the network coordination. Here is the classical betweenness centrality definition.
Definition 10 (betweenness centrality). Let be the number of vertices of a graph, let be the number of geodesic (shortest) paths from vertex to vertex , and let ( ) be the number of these paths passing through vertex , and we note ( ) the betweenness centrality of the vertex such as with ̸ = ̸ = and < .
The original measure has been extended to measure the amount of geodesic paths passing through a group [9]. It gives more general results, but the problems resulting from this method remain the same: individuals in a group are treated separately. Reference [38] proposed a more precise measure, but it can only be applied to simple graphs with only one type of relation, and their framework cannot be reused for other centralities measures.
Our approach for processing a general betweenness centrality is slightly different from Freeman's approach: we do not consider paths as in graph theory: there is a path from to if is entirely reachable from [15]. The group betweenness centrality of a set is defined as with

Experiments.
To illustrate the interest of the generalized group betweenness centrality measure, we applied it to the network model used in Section 2.3 part using only one relation (esteem) with the same threshold value ( = 3). The problem to resolve becomes slightly different: we want to find the groups which give the best score. As all parts of a set cannot all be tested, we rely on the construction of the family of elementary closed subsets. For each singleton { }, while processing its closure , we measure the group betweenness centrality of the intermediate sets obtained.
we measure the group betweenness centrality on each ({ }). The use of a single relationship is justified by the fact that we want to compare our measure to the classic version [2] which cannot handle multirelational networks.
To be able to compare the results, we changed the classic group betweeness centrality (GBC) definition: for a couple { , V} in the network, considering the GBC of a set , we increment its value of 1 if at least one shortest path passing through exists. It follows the idea developed in Definition 12. The results (Remark: as classic GBC is based on the shortest paths in a graph, we inversed values, meaning that if a link has its esteem value set to 3 in the pretopological model (max value), we change the value in the network (Section 2.3) to 1 (min value) which corresponds here to a 8 Journal of Complex Systems does not belong to path Figure 13: Pretopological path illustration.
"cost" for a shortest path algorithm) of the group betweeness centralities are given in Table 1. In this table, we highlighted the three groups with the highest GBC values: results between classical and generalized GBC show that our measure is able to detect emergent persons who have not been considered with the classical measure. It is interesting to find matches in the results from both methods but also to see that a pretopological model highlights some groups that should not have been considered otherwise.

Generalized Closeness Centrality.
The closeness of an element describes the total distance of to all other elements in the network [39,40]. For calculation, sum the shortest paths ( , ) of a node to all other nodes in the network: where ̸ = , ( , ) the shortest path between nodes and , and the total number of nodes in the network.
Closeness can be regarded as a measure of how long information will take to spread from a given vertex to other reachable vertices in the network [41].
Definition 13 (closeness centrality). The closeness centrality ( ) for a vertex is standardized on the size of the network: With pretopology, we can generalize this concept, as the pseudoclosure is a function that represents direct proximity and the closure which represents the largest proximity of a set. Our measure is built on the eigenvector centrality and power centrality ideas [42,43]. Regarding a set, the further away an element is from this set, the less it is influenced by this set. Definition 14 (generalized closeness centrality). We define generalized closeness centrality in a pretopological space ( , ) denoted by ( ), where ∈ P( ) such as until we reach closure of ( +1 ( ) = ( )) with a positive integer.
Closure is defined in Definition 3 and illustrated in Figure 4. The more pseudoclosure steps are needed to reach closure, the less is close to the rest of the network. It depends on the number of elements that can be reached too. The distances used in classic closeness centrality (Definition 13) are represented in our generalized definition by the number of pseudoclosures needed to reach an element. The case ( ) = means is a closed subset, and it cannot reach other elements but itself. In that case, ( ) = 0. Here, one can realize the meaning of closure: all elements in the closure of can be reached from .
By this measure, we can find which elements or groups are the "closest" to others in a complex network.

Conclusion
In this paper, we presented ideas and concepts which lead to two main contributions.
First, we proposed using pretopology, a general mathematical framework, in order to model groups and relations in social networks. With definitions and examples, we have highlighted the interest to model complex interactions using the pretopology formalism, generalizing graph theory. Furthermore, the introduction of various pseudoclosures functions through examples illustrates our approach.
The second contribution is the generalization of various social network analysis measures, which are degree, betweenness, and closeness centralities. These measures are real group-based measures as we consider a group as a whole entity, not as a combination of individuals.
This paper provides tools to model group interactions in multirelational complex networks. A point not discussed in this paper which can be seen as a perspective is that we can also model dynamic relationships as the neighbourhood functions of a pseudoclosure can evolve through time. Future works can also be axed on some new methods using generalized measures to find groups which maximize a given centrality, to optimize the "efficiency" of a group or to identify emergent groups in a network. Moreover, other centralities and measures could be generalized and based on the same mathematical framework.