Social Network Analysis Based on Network Motifs

Based on the community structure characteristics, theory, and methods of frequent subgraph mining, network motifs findings are firstly introduced into social network analysis; the tendentiousness evaluation function and the importance evaluation function are proposed for effectiveness assessment. Compared with the traditional way based on nodes centrality degree, the new approach can be used to analyze the properties of social network more fully and judge the roles of the nodes effectively. In application analysis, our approach is shown to be effective.


Introduction
A large number of systems in the real world exist as networks, such as social networks (coauthor network, criminal networks, etc.), biological networks (protein interaction networks, metabolic networks, etc.), and technology networks (electricity networks, the Internet, etc.) [1][2][3][4][5][6][7][8][9][10][11][12].In order to reveal their structure and principle, Milo et al. first proposed the concept of "network motifs, " which can be defined as patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized network [13].Later, research on network motifs has been developed extensively.Kim et al. defined biological network motifs as biologically significant subgraphs [14].Farina et al. identified regulatory network motifs from gene expression data, and they proposed the corresponding algorithm [15].In order to specify network motifs, Ohnishi et al. analyzed an interfirm network consisting of about one million firms and four million directed links [16].
The study of social networks has always been a hot research topic.In order to judge the importance of nodes, the staple methods of traditional social network analysis are basing on the calculation of the centrality of nodes in network, [17,18].In recent years, various new methods are introduced into social network analysis; network motif is an important kind of them [19,20].Analyzing motifs for the large social networks derived from email communication firstly, Juszczyszyn found that the distribution of motifs in all analyzed real social networks is similar and can be treated as the network fingerprint.This property is most distinctive for stronger human relationships [21,22].
In this paper, we introduce network motifs to develop a set of network analysis methods, which is different from the traditional social network analysis, and also illustrate its application.
The centrality analysis is the staple method of traditional social network analysis [17,18].In a network, if there are direct links between an actor and other actors, this actor resides in the centre of the network, having more "power" [17].The importance of a node, point centrality, can be measured by the number of contacted nodes [18].Based on 2 Journal of Applied Mathematics the adjacency matrix, the formula of point centrality of node V  is as follows: where  = 1, 2, . . ., .

Network
Based on the adjacency matrix of a subgraph, the maximum encoding is obtained as unique identification of the subgraph.AGM algorithm is used to mine frequent subgraphs based on the maximum encoding.

Random Network Model.
In typical network motifs finding algorithms, random network model maintains the degree sequence of the real network very well [25].Exchange algorithm is an algorithm for generating random network according to degree sequence, which is as follows [26].

Algorithm A.
Input: degree sequence Output: random network Step 1: Construct a network according to degree sequence.
Step 4: Cancel the exchange if the exchange has led to multiple edges or loops.
Step 5: Repeat until reaching the target number of times.
In this way, a set of random networks with the same degree sequence as  can be obtained.

Statistical Significance of Network Motifs.
Network motifs are frequent subgraphs with special statistical significance, which have some special functions in the network.
Network motifs satisfy the following conditions: occurrence of the subgraph in real network is not less than a minimum and is significantly higher than their occurrence in random network [13,27].
The statistical significance of network motifs is denoted by -score: where  real denotes the occurrences of a subgraph in real network and ⟨ rand ⟩ and  rand denote mean and standard deviation of the occurrences of the subgraph in random networks.In order to determine the role tendentiousness of the unknown role nodes, we can count the frequency of the unknown role nodes occurring in different network motifs, respectively, through the composition of nodes in network motifs, which contain different known role nodes.

Frequency
Based on network motifs, frequency matrix  = (  ) × is obtained.The elements   of  denotes the total occurrences of node V  in the network motifs that contain the known role nodes, whose role is   .
The algorithm for calculating frequency matrix is as follows.

Tendentiousness Evaluation Function (TEF).
Based on frequency matrix , the tendentiousness of node V  with respect to role   is evaluated by TEF. Figure 1: Visualized of the network model of the 83 people (nodes) and 400 messages between these people (links).
Obviously, the greater   , the greater the tendentiousness of node V  with respect to role   .

Importance Evaluation Function (IEF).
Based on point centrality and TEF, the importance of node V  with respect to role   in the network is evaluated by IEF.

Application Analysis
The Intergalactic Crime Modelers (ICM) is investigating a conspiracy to commit a criminal act.The case involves 83 members and 400 messages between these people, as shown in Figure 1.As priorly known in [28], Jean, Alex, Elsie, Paul, Ulf, Yao, Harvey are conspirators, Darlene, Tran, Jia, Ellin, Gard, Chris, Paige, Este are nonconspirators.Now, we analyze the set of prior conspirator and the set of prior non-conspirator by using the theory and methods of network motifs.
Firstly, let  1 be "conspirator" and let  2 be "nonconspirator".Then the links are divided into two categories, of which daily topic is denoted by topic 1, and conspiracy topic is denoted by topic 2.
Based on adjacency matrix of the network, point centrality of nodes is calculated by using formulas (2) as shown in Table 1.The point centrality reflects the influence of a node in the network, which means the larger point centrality [24]fs Finding 2.2.1.Frequent Graph.Frequent subgraph mining is an important method of network information mining.Frequent subgraph mining algorithms are divided into breadth-first search (BFS) algorithm and depth-first search (DFS) algorithm based on subgraph search path[23].As a breadth-first search algorithm, Apriori graph mining (AGM) algorithm is an early adopter of Apriori idea.AGM algorithm takes an adjacency matrix to represent the graph.Then it generates code based on adjacency matrix and takes minimum coding as unique identification for the graph in order to solve NP problem of subgraph isomorphism[24].The graph which is constituted by  node set Matrix.The nodes of a network always have a lot of roles, such as teacher and student.Most social networks can be simulated by role network model (RNM).The role set is denoted by  = {  = 1, 2, . . ., }, where  is the number of roles.The set of nodes whose role is   ( = 1, 2, . . ., ) is denoted by    , so the set of    is denoted by   = {  1 ,   2 , . . .,    }.  denotes the set of subgraphs whose structure is the same as the network motifs in the network.

Table 2 :
The sample of frequent subgraph and network motifs.Note of figure: S1-2 are not network motifs ( < 5); M1-8 are network motifs ( > 5), which are structure module with special features (the minimum of score is 5).