Extraction of Multilayered Social Networks from Activity Data

The data gathered in all kinds of web-based systems, which enable users to interact with each other, provides an opportunity to extract social networks that consist of people and relationships between them. The emerging structures are very complex due to the number and type of discovered connections. In web-based systems, the characteristic element of each interaction between users is that there is always an object that serves as a communication medium. This can be, for example, an e-mail sent from one user to another or post at the forum authored by one user and commented on by others. Based on these objects and activities that users perform towards them, different kinds of relationships can be identified and extracted. Additional challenge arises from the fact that hierarchies can exist between objects; for example, a forum consists of one or more groups of topics, and each of them contains topics that finally include posts. In this paper, we propose a new method for creation of multilayered social network based on the data about users activities towards different types of objects between which the hierarchy exists. Due to the flattening, preprocessing procedure of new layers and new relationships in the multilayered social network can be identified and analysed.


Introduction
Nowadays, for the first time, we have possibility to process big data about interactions and activities of millions of individuals gathered in all sorts of web-based systems. Communication technologies allow us to form large networks, which in turn shape and catalyse our activities. Due to their scale, complexity, and dynamics, these networks are extremely difficult (or impossible) to analyse in terms of traditional social network analysis methods. The analysis of the network data is at the very early stages and requires a lot of efforts in both developing tools and approaches to tackle it as well as understanding the nature and functioning of networks extracted from this data. The process of network creation is not as straightforward as it seems to be. In the webbased systems, users can interact with each other via different communication channels and utilize various services. This implies that the relationships between users can be extracted based on both direct and indirect communications. The former one is, for example, sending emails or video calls where information is passed directly from one person (group of people) to other(s), whereas to the latter we can count in, for example, commenting objects in the multimedia sharing systems or using the same tags to describe the objects. In both situations there are objects (e.g., email, photo, and tag) that serve as medium in communication between users. The types of these objects differ depending on the web system; for example, at the Internet forum the objects are groups of topics, topics, and posts while in the email service it will be a single message. Additionally, within a single system, the hierarchies of objects can exist.
Extraction of a social network in the environment where users interact with each other using different objects, which create hierarchies, is the main contribution of this paper. In order to perform this task first the hierarchical presocial network (HPSN), where relation between users and objects and between objects exists (Section 5), must be created. After that, the flattening process, in which the hierarchy of objects is removed, is performed (Section 6). As a result, the flat presocial network (FPSN) is obtained where the only 2 The Scientific World Journal connections that exist are between users and one type of the previously chosen objects. Based on FPSN, social network, where only the connections between users exist, is created (Section 7). The whole idea is presented using simple case study of the Internet forum (Section 8). Finally, the realworld experiments were performed and their outcomes are presented in Section 9.

Related Work
There are many types of complex network systems. One of the classifications distinguishes between infrastructures and natural complex systems [1]. The former are physical systems (energy and transportation networks) and virtual systems (Internet, WWW, and telecommunication), whereas the latter are biological networks, social networks, food webs, and ecosystems.
The type of complex systems that is investigated in this paper is a social network formed by people who interact with each other or take part in common activities. The concept of social network has been described by different researchers [2][3][4][5][6] and the definition that is commonly and widely used is that a social network is a finite set of individuals, who are the nodes of the network and activities or relations between them, which are represented by edges of the network. A social network (SN) commonly represents the mutual communication and/or activity occurring between users as well as their direction, intensity, and profile [7].
It should be noted that, during analysis of social networks, researchers usually take into account only one activity type while in most cases many different types of relationships exist between users. The special type of social networks that allows the representation of many different activities is called a multilayered social network [5,[8][9][10][11][12][13][14][15], a.k.a. multidimensional network [16][17][18], or multiplex networks [7,[19][20][21][22]. Even the same authors use different names for this kind of complex networks; compare, for example, [12,21]. Overall, due to high complexity, such networks are more difficult to be extracted and analysed than simple one-layered networks.
Sociologists and psychologist typically create questionnaires and perform interviews in order to collect data which allow them to create and analyse social networks. However, nowadays, the rapid development of the Internet and telecommunication together with the ease of gathering vast amount of data have created the possibility for IT systems to provide vast amount of information about users activities. As a result, researchers have now easy access to big datasets about people's activities ready to analyse. Social networks can be extracted from, for example, bibliographic data [23], blogs [24], photos sharing systems like Flickr [25], e-mail systems [26], telecommunication data [27,28], social services like Twitter [29] or Facebook [30,31], video sharing systems like YouTube [32], Wikipedia [33], and much more. Moreover, the whole separate systems were created only for the extraction, aggregation, and visualization of social networks [34,35].
Nevertheless, as mentioned before, only few scientists have focused their research interests on multilayer social network extraction from activity data [14,16,18,20,22,[36][37][38][39]. Moreover, no one has studied the hierarchy and relationships between objects in this data. The only hierarchical dependencies in the social networks that were analysed were associated with the hierarchy between users such as employee-employer and the employee-manager [40]. Thus, the analysis of hierarchy between objects presented in this paper is a new approach to extract multilayer social networks from the activity data.

Object-Based Relationships
In web-based social systems, there is always an object that plays a role of "a middleman" in a relationship between two users ( Figure 1) [41]. In the case of direct communication, people send e-mails to each other or make videoconferences or phone calls via VoIP services. In those cases all participants are aware of the existing relationship. However, sometimes two users can be in a relationship but they do not maintain it actively and consciously, for example, people who comment on the same blog or participate in the same conference. These types of common activities can result in indirect relationships. The roles of both users in an indirect relationship towards the object can be either the same or different ( Figure 2).
Object-based relation with equal roles means that users and meet each other through the object and their role in relation to this object is the same. In other words, they participate in common activity related to the certain object with the same role ; for example, two users take part in the videoconference, two users comment on the same picture, or both of them add the same object to their favourites [25], Figure 2(a). Object-based relation with different roles , is the relation between two users and that are connected through the object but their roles and towards the object are different; for example, user comments on a photo (role , commentator) that was published by (role , author) [25], Figure 2 (i) Commentator-commentator. This relation is created between user and user when both of them have added the opinions about at least one common object; for example, they have commented on the same picture at the photo sharing system or on the same post at the forum.
(ii) Favourite-favourite. Such a relation from user to exists if both users have marked at least one common object as their favourite; for example, they have added the same film to their lists of favourites at the multimedia sharing system.  (iii) Author-author. Such a relation from user to exists when they are coauthors of at least one object; for example, they have written a scientific article together.
(iv) Membership in the group/forum. This relation from user to exists when both of them belong to at least one group together; for example, they belong to a group that gathers people who like dogs at the photo sharing system.
(v) Utilization of keywords to describe objects (tags). Such a relation exists between two users if they use at least one common tag to describe their objects; for example, two users are in relation with each other at the photo sharing system when they use a word "cat" to describe some of their photos.
On the other hand, the examples of object-based relations with different roles can be as follows.
(i) Opinion-author and author-opinion. These relations between user and exist when user commented on at least one object that is authored by user .
(ii) Favourite-author and author-favourite. These relations between users and exist when user added to its favourite list at least one object authored by user .
(iii) Citation-author and author-citation. These relations between users and exist when user quoted at least one object authored by user .

Hierarchies between Objects
In all web-based social networks analysed in the literature, the relationships between users were extracted mainly based on a given type of communication or common activity. For example, if two users send e-mails to each other, then the relationship between them in the social network may be established. However, both user communication and common activities are always related somehow to the objects which serve as a medium in interactions between users and their common activities; see Section 3. This object may be "a message" in the case of e-mail exchange, "a video" in YouTube, or "a topic" in the Internet forums. These objects connect either a pair of users (an e-mail sent to a single recipient) Level 2 (GT) Figure 3: Hierarchies between objects in the Internet forum. or many users simultaneously (an e-mail passed to multiple recipients, a video commented on by many users, and a forum with many members). Besides, the IT system may provide many different functions, which can result in various user activities towards objects of different types. For example, an Internet forum may consist of topics aggregated into groups. Topics, in turn, contain a list of posts. Thus, the objects that enable interactions between users are in hierarchical relationships, Figure 3. Depending on the functionalities of the system, users can moderate a topic group, can subscribe to a topic (be a member of the topic), or provide their opinions about posts (play the role of commentator), Figure 10. Hence, users "meet" each other by performing activities towards objects that belong (i) to one specific level in the object hierarchy (many users can comment on a post authored by another user) or (ii) to two different levels of this hierarchy; for example, a moderator of the group topic is in the indirect relation with authors of the posts.

Hierarchical Presocial Network
In order to create the hierarchical presocial network (HPSN) based on gathered activity data, first such elements as users, objects, and hierarchy between objects and relations between users and objects need to be extracted. HPSN contains information about relations between users and objects towards 4 The Scientific World Journal which users performed some activities. The main characteristic of HPSN is that there exist hierarchies between different objects. The whole process of HPSN extraction consists of the following four steps.
(1) User extraction. Users are network nodes both in the presocial network and in the final social network. Users perform different activities towards various types of objects; for example, they send e-mails to each other or comment on the photos uploaded by others. These activities are the basis to create the role of a user in relation to a specific object, for example, author, commentator, and so forth.
(2) Object extraction. Objects are the nodes in both hierarchical and in flat presocial network, that is, elements through which users communicate with each other (e.g., email and phone call) or items towards which users perform some activities (e.g., photo, video, and tag).
(3) Extraction of the hierarchy between the objects. Some objects can be in hierarchical relation with other objects; for example, an object "group of topics" contains one or more "topics" which may include many "posts" (Figure 3). The consequence of the existence of the hierarchy between different objects types is that the objects on the lower level cannot exist without objects on the higher level. These hierarchies exist within HPSN and are removed during the prenetwork flattening process (see Section 6).

(4) Extraction of the relations between users and objects.
Relation between a given user and object exists if user performed some activities towards object ; for example, user commented on photo . The type of activity that user performed towards an object is assigned to each relationship.
The concept of HPSN is presented in Figure 4 where the hierarchy between objects has three levels ( , , and ); at each level some objects exist (e.g., at the level objects OB 1 and OB 2) with which users ( , , and ) are in different types of relations ( , , and roles); that is, users performed some activities towards these objects.

Flat Presocial Network
The flattening process aims at removing relationships between objects (hierarchies). A consequence of this process is based on the knowledge about existing hierarchies; both new user roles and relationships between users are created. The transformation from the presocial network where the hierarchies between objects exist (HPSN) into the flat presocial network (FPSN) without hierarchies will be performed in the following steps.
(a) The operator chooses the level in the hierarchy to which the flattening process will be performed-the end level. Note that, after each flattening process, the only object type in FPSN will be the one that is on the end level selected by the operator and all users will be in relation only towards these objects.
(b) If there exist levels that are lower in the hierarchy than the end level ( Figure 5), then for those levels the bottom-top approach is used; that is, we get the following.
(i) Relationships between people and objects existing on the hierarchy levels that are below the end level (relation user , OB 1, and user , OB 2, in Figure 5) are changed. The relation between a user and an object from the lower lever is moved to the upper level by (1) identification of an object on the upper level that is "a father" of the object from the lower level ("child"); (2) creation of a new relation between the user and "the father" object; (3) name of the relation between user and "father" object which is created by adding to the name of the relation user "child" the word that denotes the movement from the lower level. For example, in Figure 5, the relation user , OB 1 ("child"), has a name, role, and the name of the new relation user , OB 1 ("father"), is role; (4) deletion of the relation between the user and the "child" object from the lower level.
NOTE. This process is repeated for other upper levels until the end level is reached ( Figure 5). (ii) Relationships between people and objects existing at the end level remain unchanged (relation user , OB 2).
The final FPSN presented in Figure 6 is an outcome of the bottom-up approach where the HPSN from Figure 5 is flattened to level 1 ( ).
The Scientific World Journal (c) If there exist levels that are upper in the hierarchy than in the end level (Figure 7), then the top-bottom approach is applied for these levels; that is, we get the following.
(i) Relationships between people and objects existing on the hierarchy levels that are above the end level (relation user , OB 2, and user , OB 2, in Figure 7) are changed. The relation between user and object from the upper lever is moved to the lower level by (1) identification of all objects on the lower level that are "children" of an object from the upper level ("father"); (2) creation of the relation between the user and all "child objects;" (3) name of the relation between the user and "child object" which is created by adding to the origin name information about the "child object. " For the example, in Figure 7, in the relation user , OB 2 ("father"), the relation name was role and the new relation user , OB 3 ("child"), will have the name role; (4) deletion of the relation between the user and "father object" on the upper level.
NOTE. This process is repeated until the end level is reached (Figure 8). (ii) Relationships between people and objects existing on the end level remain unchanged.
An example of top-bottom approach is presented below where the HPSN from Figure 7 is flattened to level 3 ( ), that is, to FPSN in Figure 8.
The goal of the flattening process is to facilitate the extraction of the unified structure that represents the social connections between pairs of users from user activity data and relations between objects. New types of user roles can  be identified during the flattening process, for example, or role in Figure 8. The newly obtained knowledge about these roles gives an opportunity to investigate the complex profile of user relationships in more detail and in consequence enables their more comprehensive analysis.

Social Network
The flat presocial network structure (FPSN) is used to extract the social network (SN) where the relations user-object from FPSN no longer exist. These connections are converted into direct relations between users in SN. The process consists of the following steps. multigraph, all layers are represented by a single social network and different layers are distinguished by different colours of edges (or another labelling mechanism is used).
(d) Extraction of relations user from-user to is by calculation of the relationship strengths and colours (labels) between SN nodes (users) using activity data stored in FPSN. There are many possible formulas for calculating the relationship strength. Most of them are based on the normalized quantity of shared user activities towards objects in FPSN (for some of the examples please see [25,42]).
Social network SN created from FPSN in Figure 8 according to the process described above is presented in Figure 9.
Depending on the goal of analysis, the strength of a relationship can be a static measure calculated based on all available data and taking into consideration the number of activities of a given type. On the other hand, we can take into account time factor and split the data according to the time when activities occurred. In the latter case, the whole period from which the data comes from is divided into time frames and the relationship strength is calculated for each slot separately. The time frame can be created using two approaches as follows.
(i) Sliding window. A user defines the length of the time window (e.g., ) and the time interval that is used to move the window (e.g., ). In order to extract time frames the whole period of the length is moved by . In consequence the entire dataset is divided into partly overlapping frames. Note that both time window and time interval need to be specified in a way that the period from the start date to end date should be completely covered. (ii) Equal, separate periods. A user sets the number of periods, for example, , and then the data is divided into separate, equal periods according to the dates of activity. This is equivalent to the situation (i) where equals .
After the time windows are created, the weight is assigned to each of them. Usually, the recent periods are more important than the previous ones and because of that greater weight is assigned to those recent time windows. In this paper we present how to calculate the static version of relationship strength for two different types of relations: (i) object-based relations with equal roles and (ii) object-based relations with different roles (see Figure 2).
(a) Object-Based Relationships with Equal Roles. The objectbased relationship with equal roles denotes a connection in which two users are related to each other through the object and their roles towards this object are the same; see Section 3. Note that the same formula is used in order to calculate the connection strength between user and user who (i) have commented on at least one common object, (ii) have marked at least one common object as their favourite, (iii) are coauthors of an object, (iv) are in the same group or forum, and (v) have used the same keywords to describe objects. In all of these relations, there is an object on which both users perform specific activity. To calculate the static strength of the relationship, the following formula may be applied: where is the type of activity that is performed by users towards an object, for example, membership to a group/ forum, utilization of a tag to describe objects, coauthorship of an object, commenting on an object, and so forth. is the number of common activities for users and performed together, for example, number of groups/forums to which both users and belong, the number of tags that both users and use commonly or the number of objects that were coauthored by both users and , and so forth. is the number of a given activity for user , for example, the number of groups/forums to which user belongs, the number of tags used by user or the number of objects authored by user , and so forth.
Let us consider the situation in which users of multimedia sharing system utilise tags to describe different multimedia content. In this case an object is a tag and relationship between two users is created when they utilise some common tags. Let us assume that the data obtained from the system contains the following information: user utilised 20 identical tags as user and user used 60 tags in total. Then relation strength from user to is calculated as follows: Section 3). For example, one user can comment in a forum in which another user is a moderator and the relationship between users is moderator-commentator. Thus, in the case of calculating the relations strength, we will refer to the relations activity type a-activity type b. Considering the relation activity type a-activity type b its strength will be calculated as follows: where denotes the 1st activity type; denotes the 2nd activity type (different that ); is the number of activities performed by user towards objects for which user performed activity ; is the total number of activities of type performed by user towards objects for which any other users performed activity .
As an example, let us consider the case where a user adds to the list of favourites an object authored by another user. A relationship between two users is created when one user adds to its favourites an object authored by another person. Assume that the following data is available in the system: user added to favourites 20 objects authored by user . User added to favourites 60 objects in total. The objects of user were added to favourites by other 30 times in total. Moreover, means the activity "authored by" and means the activity "added to favourite by. " The relation strengths are calculated as follows:

Example of Flattening Process
One of the examples where hierarchies between objects exist is the Internet forum where people can create their own topics that contain posts added by users. The hierarchy between objects within a forum and the activities that can be performed towards these objects are presented in Figure 10.
The hierarchy that will be used for our case study is forum-topic-post. Both relationships between objects and between users and objects in the exemplary hierarchical prenetwork are presented in Figure 11. In order to create the flat presocial network (Figure 12) we perform the flattening process after which the relationships between objects will be removed but at the same time on other levels new relationships between users and objects will be created.
Two types of flattening process can be considered: bottom-top and top-bottom (see Section 6). We present here the bottom-top flattening process in which the relationships will be moved to the highest forum level; that is, the forum will be the final level.
The bottom-top approach applied to HPSN from Figure 11 results in several new relationships and roles ( Figure 12). Some examples are enumerated as follows.    P is co m m . Figure 11: Relations between users and objects together with users' roles towards the objects.
Flat presocial network PTF is comm. PTF is comm .
F is creator PTF is comm. (iv) User is a creator of the forum 1 and user is a creator of forum 2; then the existing relationship stays unchanged: Is Creator.
Two different layers of the final multilayered SN derived from the flat presocial network FPSN (Figure 12) are presented in Figures 13 and 14. For instance, the relation between two users exists if one user was the moderator of the post that was commented on by another user, Figure 13; for example, user moderates the post commented on by user so there is a relation moderator-commentator from to in the final social network SN. This relation is an object-based relationship with different roles. In Figure 14, another layer PTF Is Author, PTF Is Author, is presented. This is a layer in which the extracted relationships are object-based with equal roles. The relationships' strengths presented on both figures are calculated using formulas from Section 7.
Note that user and user are connected only because of flattening process ( Figure 13). The same is with user and user ( Figure 13) and user and user ( Figure 14). Any of these relationships would be revealed if the flattening process did not take place.

Experiments
The real-world dataset used for experiments was obtained from the social web site extradom.pl. The analysed dataset covers the period from August 21, 2008, to January 8, 2010.  Before the hierarchical presocial network HPSN was created, the dataset was cleansed and validated. Several rules were applied in the cleansing phase. Two most important ones were as follows: (i) each object must have creation date; (ii) each object must be assigned to its creator.
There were 104,625 users registered in the portal, but only 4.25% (4,404) of them were active on forum. The number of different types of objects in the forum is shown in Table 1. One or more activity types were identified for each object type. Objects with activities that were performed towards them are shown in Table 2.
As we can see in Table 2, there are two activity types (topic group moderation and post reading) that are not present in the dataset used for experiments and thus are not included in the further analyses. Additional assumptions that have been made and which helped to detect some of activities are as follows: (i) a user who creates the first post in the topic will be treated as a topic creator; (ii) creation of the first topic in the group is simultaneously treated as the creation of the entire group; (iii) users who create their first post in the topic will be automatically subscribed to this topic. Table 3 summarizes the profile of the hierarchical presocial network (HPSN) that was created from the extradom.pl dataset.
Three distinct flattening processes (see Section 6) with three separate final object levels, topics group, topic, and  post, have been applied to the hierarchical presocial network HPSN. Some of the activities were multiplied after the flattening process (for detailed statistics please see Table 4). Such situation takes place when the hierarchical presocial network is flattened to the object type, which is not the highest level in the hierarchy. For example, when HPSN was flattened to the topic groups (level 2 in the hierarchy, see Figure 3), the forum creation activity was multiplied by 692 because there were distinct 692 groups in the dataset.

10
The Scientific World Journal  Once the flattening process has been accomplished, the separate layers in the multilayered social network were identified and relationships between users within these layers were extracted (see Section 7). Both layers and number of distinct relationships existing within each layer are presented in Table 5.
As a result of the flattening process 14 new layers in the multilayered social network were created. Additionally, percentages of new user relationships in SN are 90% in the case of topic groups as the final object, 10% for topics, and 3% for posts (see Table 5 and Figure 15). Note that these new relationships would not be visible without the flattening process. It means that the method of preprocessing with flattening of object relations reveals completely new knowledge about the complexity of connections between people.

Conclusions
The wide variety and availability of Web 2.0 systems, where users can interact with each other and perform different types of activities, give us an opportunity, by analysing the largescale data gathered in these systems, to better understand human social behaviour. A very interesting research problem is to investigate social connections that emerge between people based on their shared activities. However, the extraction of these relations is not a trivial task. The main reason is that user behaviour in such systems is often very complex due to the variety of available services and functionalities. As presented in the paper, people can perform different activities towards different objects. Additional challenge is that the relationships existing between these objects can form a hierarchical structure. In this paper, we propose the process to extract the multilayered social network from the data about both user behaviours and relations between objects. The whole method consists of three main phases: (i) extraction of the hierarchical presocial network HPSN, (ii) creation of the flat presocial network PFSN, and finally (iii) creation of the multilayered social network SN. We believe that such systematic approach to the problems is necessary to be able to cope with the massive volume of data being generated by social-based systems every day. Moreover, the proposed process is generic and robust in a way that it is able to accommodate new ways of interactions between users.
The new flattening concept enables discovering in the multilayered social network new layers with new types of relationships which otherwise would not be available for analysis. It is possible due to the presented above process in which the object hierarchy is removed. Thus, the new method of preprocessing enables revealing new information, which is invisible in the regular analysis of user activities and this in turn opens new possibilities for network analysis. The experiments confirmed that some new types of relations between users can be extracted in the flattening process. This enables a deeper insight into analysis of multilayered social networks as more information is included in the final network structure.