The increasement of software complexity directly results in the augment of software fault and costs a lot in the process of software development and maintenance. The complex network model is used to study the accumulation and accumulation of faults in complex software as a whole. Then key nodes with high fault probability and powerful fault propagation capability can be found, and the faults can be discovered as soon as possible and the severity of the damage to the system can be reduced effectively. In this paper, the algorithm MFS_AN (mining fault severity of all nodes) is proposed to mine the key nodes from software network. A weighted software network model is built by using functions as nodes, call relationships as edges, and call times as weight. Exploiting recursive method, a fault probability metric FP of a function, is defined according to the fault accumulation characteristic, and a fault propagation capability metric FPC of a function is proposed according to the fault propagation characteristic. Based on the FP and FPC, the fault severity metric FS is put forward to obtain the function nodes with larger fault severity in software network. Experimental results on two real software networks show that the algorithm MFS_AN can discover the key function nodes correctly and effectively.
With the development of computer technology and the expansion of software applications [
The knowledge of complex network has been introduced into software engineering by using network model to represent the structural characteristics of a software system, and researchers have found many novel features of the structure from different points of view [
Measuring the importance of nodes accurately in software network is the premise to improve the security and reliability of software [
In complex networks, random walk model judges the importance of a node by considering its own connectivity degree and the importance of neighbouring nodes around it. Typical methods are PageRank, NodeRank [
In summary, this paper focuses on the call dependence relationship among functions and the fault accumulation and propagation of dynamic execution process. Firstly, according to the dynamic execution information of software, we build a weighted software network model. Then, utilizing recursive method, the fault probability metric FP of a function is defined in accordance with fault accumulation characteristic, and the fault propagation capability metric FPC of a function is proposed on the basis of fault propagation characteristic. Finally, by combining FP and FPC, the fault severity metric FS is put forward and the algorithm MFS_AN (mining fault severity of all nodes) is proposed to calculate the FS and obtain the function nodes with larger fault severity in software network.
The rest of this paper is organized as follows. Section
In software network, the different execution times among functions reflect the tightness degree of nodes’ interaction. In order to incorporate this difference, this paper constructs a weighted software network model.
where NSet is the function node set of a software network, ESet is the edge set which is the function call relationship during the software execution process, and Weight denotes the execution times that a function calls another one.
Figure
A portion of a weighted function execution network.
As the software system works, a function is a calling function and also a called function. In the execution process of a function node u, the nodes called directly by u are the direct outdegree neighbor node of u, and the set of these direct out-degree neighbor nodes is called as the Direct Out-degree Neighbor Set (DONS). Similarly, set of the indegree neighbor nodes which call node u directly is named as the Direct In-degree Neighbor Set (DINS).
Figure
Common topologies in software network.
For nodes B1 and C1, they have the same size of the call function set; that is, the number of call nodes is equal, but the call relationship between these nodes is different. For nodes B1 and D1, they have the same size of execution routes, but the node D1 has a larger call function set and a more complex execution process. Therefore, the influences of the node B1, C1, and D1 on the node A are different.
With the structure shown in Figure
where
Figure
A simple weighted function execution network.
In Figure
For nodes E and F, which belong to leaf nodes in the software network, then
Through the calculation of FP, the fault probability of a function node is identified. According to the above calculation results, the fault probability of each function node in Figure
The fault probability of a node (FP) in Definition
Via the above analysis, based on the fault accumulation characteristics of a function and the recursive method, we utilize the formula (
In Algorithm
(01) (02) (03) (04) FPList.add(U,tempList.get(U)); (05) break; (06) (07) value = MFP (U,outDegreeList,inDegreeList); (08) FPList.add(U,value); (09) (10)
count = Label(I,V); sum += count; count_UV = Label(U,V); P = count_UV/sum; Fa += P (V,outDegreeList,inDegreeList); FP= tempList.add(U,FP);
Similarly, this section defines the fault propagation capability metric FPC of a function according to the fault propagation characteristics.
where
Based on Figure
According to the definition of FPC, the fault propagation capability of all nodes in the software network is as follows:
Through the calculation of FPC, the fault propagation capability of a function node is identified. According to the above calculation results, the fault propagation capability of each function node in Figure
(01) (02) (03) (04) FPCList.add(U,tempList.get(U)); (05) break; (06) (07) value = MFPC (U,outDegreeList, inDegreeList,inDegreeMax); (08) FPCList.add(U,value); (09) (10)
inDegree = inDegree[U.index].size; Initialize sum=0; count = Label(V,I); sum += count; count_VU = Label(V,U); P = count_VU/sum; break; Fp += P (V,outDegreeList,inDegreeList,inDegreeMax); FPC= inDegree/inDegreeMax + Fp; tempList.add(U,FPC);
Similarly, via the above analysis, based on the fault propagation characteristics of a function and the recursive method, we use the formula (
Some researchers believe that the type of fault determines the behaviour of transmission [
In Sections
A node is more likely to have faults if its FP is higher, and it should be paid more attention. However, if a function only has faults but it does not spread its own faults, then the node will not cause very serious consequences to software system, while if a function is not only prone to fail but also has a strong capability to spread its faults to others, then it will cause very serious consequences to software system. Therefore, from the perspective of the fault severity, the fault probability FP and the fault propagation capability FPC of a function are directly proportional. The definition of FS (The fault severity) is given as follows.
where
First, we obtain the fault probability set FPList and the failure propagation capability set FPCList of software network through Algorithms
In Algorithm
(01) (02) (03) FP(U)=FPList.get(U); (04) FPC(U)=FPCList.get(U); (05) FS(U)=FP(U) (06) FSList.add(U,FS(U)); (07) (08) FSList.sort(); (09) Knodes = FSList.get(K); (10)
In this section, we verify the method MFS_AN by testing two kinds of classic tool software Tar and Cflow obtained from the open source community. Tar is a file compression and decompression tool. Cflow is a C program analysis tool for tracking the calling process of functions in the C program. In the Linux environment, we can extract the functions and the dependence relationships of open-source software with the help of tool pvtrace. The results are output to files as text (such as graph.dot). The nodes and the dependence relationships then can be graphically displayed by means of the visualization tool Graphviz. As the main function must be very important to software, so it is excluded in the following experimental verification. In addition, before the experiment, we pretreat the experimental data and delete the loop in the software network, so that recursion can be finished successfully.
By tracking the execution process of Tar and Cflow, the dynamic execution information of the two types of software is obtained, and the weighted function execution network WFEN is constructed as the basis of experimental data. The fault probability FP and the fault propagation capability FPC of all functions are obtained by mapping the node set and the call relationships of software network to Algorithm
SF value distribution of Tar.
SF value distribution of CFlow.
From the results distribution shown in Figures
By analysing the node criticality in the two types of software from Figures
The rank of function nodes by SF for Tar versions.
Rank/value | |||||
---|---|---|---|---|---|
Node | Tar-1.21 | Tar-1.23 | Tar-1.25 | Tar-1.27 | Tar-1.28 |
flush_archive | 1/11.178 | 1/10.253 | 1/9.238 | 1/9.221 | 1/9.208 |
| |||||
dump_file | 2/8.509 | 2/8.256 | 2/8.306 | 2/8.357 | 2/8.408 |
| |||||
dump_file0 | 3/6.287 | 3/6.120 | 3/6.150 | 3/6.150 | 3/6.150 |
| |||||
find_next_block | 4/5.220 | 4/4.676 | 5/4.323 | 5/4.313 | 5/4.304 |
| |||||
update_archive | 5/4.460 | 5/4.412 | 4/4.556 | 4/4.729 | 4/4.830 |
| |||||
gnu_flush_write | 6/3.849 | 6/3.567 | 6/3.112 | 6/3.107 | 7/3.103 |
| |||||
_gnu_flush_write | 7/3.475 | 7/3.246 | 8/2.744 | 8/2.740 | 9/2.737 |
| |||||
create_archive | 8/2.873 | 8/3.004 | 7/2.944 | 7/3.104 | 6/3.170 |
| |||||
dump_regular_file | 9/2.621 | 9/2.642 | 9/2.509 | 9/2.629 | 10/2.629 |
| |||||
open_archive | 10/2.311 | 10/2.292 | 10/2.299 | 10/2.534 | 8/2.768 |
The rank of function nodes by SF for Cflow versions.
Rank/value | |||||
---|---|---|---|---|---|
Node | Cflow-1.0 | Cflow-1.1 | Cflow-1.2 | Cflow-1.3 | Cflow-1.4 |
nexttoken | 1/14.878 | 1/15.642 | 1/15.642 | 1/13.266 | 1/14.610 |
| |||||
yylex | 2/7.282 | 2/7.728 | 2/7.728 | 3/6.341 | 3/7.037 |
| |||||
get_token | 3/7.255 | 3/7.670 | 3/7.670 | 2/6.382 | 2/7.063 |
| |||||
yyparse | 4/2.370 | 4/2.492 | 4/2.530 | 4/2.656 | 4/2.514 |
| |||||
parse_declaration | 5/2.175 | 5/2.308 | 5/2.355 | 5/2.547 | 5/2.349 |
| |||||
parse_dcl | 6/1.949 | 7/2.046 | 6/2.188 | 6/2.455 | 6/2.241 |
| |||||
expression | 7/1.890 | 6/2.069 | 7/2.069 | 7/1.831 | 7/1.775 |
| |||||
yyrestart | 8/1.479 | 9/1.479 | 9/1.479 | 10/1.479 | 8/1.663 |
| |||||
func_body | 9/1.377 | 8/1.510 | 8/1.510 | 9/1.513 | 11/1.381 |
| |||||
append_to_list | 10/1.371 | 10/1.347 | 10/1.347 | - | - |
From the data shown in Tables
In order to illustrate the correctness of the key nodes, taking Cflow-1.4 as an example, we use indegree
According to the data in Table
In/Out-degree statistics of ranking top-5 and back-5 nodes in Cflow-1.4.
Rank | Node | | |
---|---|---|---|
1 | nexttoken | 14 | 2 |
| |||
2 | get_token | 1 | 1 |
| |||
3 | yylex | 1 | 5 |
| |||
4 | yyparse | 1 | 5 |
| |||
5 | parse_declaration | 1 | 4 |
| |||
-5 | clear_active | 1 | 0 |
| |||
-4 | set_active | 1 | 0 |
| |||
-3 | compare | 1 | 0 |
| |||
-2 | depmap_alloc | 1 | 0 |
| |||
-1 | register_output | 1 | 0 |
Figures
Joint distribution of FP and FPC in Tar-1.28.
Joint distribution of FP and FPC in Cflow-1.4.
In social network, the Independent Cascade Model (IC Model) is a propagation model of researching influence maximization problem. It is a probabilistic model. When a node u is activated, it tries to activate its inactivated neighbor node v with probability
In software network, when a failed node u is called, it propagates faults to the neighbor node that calls it with a probability
According to the mining results of the two kinds of software Tar and Cflow, the top-10 nodes and back-10 nodes are selected as source nodes, respectively. Through the IC model to simulate the number of nodes they can affect after being called, which in turn shows the severity of their failure. Due to the randomness of the IC model, we repeated the simulation 10 times for each version of each kind of software and then averaged the results, as shown in Figures
IC Model simulation results for different Tar versions.
IC Model simulation results for different Cflow versions.
As can be seen from Figures
The algorithm MFS_AN measures the node criticality from two aspects of the outdegree and indegree in the whole network structure. In directed graph, the degree centrality algorithm is a classical algorithm to measure the node criticality from outdegree and indegree. Thus, this paper compares the algorithm MFS_AN with degree centrality algorithm (denoted as Degree). Tables
The comparison of node rankings in Cflow-1.4.
Cflow-1.4) | Rank/value | |
---|---|---|
Node | MFS_AN | Degree |
nexttoken | 1/14.610 | 1/16 |
| ||
get_token | 2/7.063 | 10/2 |
| ||
yylex | 3/7.037 | 6/6 |
| ||
yyparse | 4/2.514 | 6/6 |
| ||
parse_declaration | 5/2.349 | 7/5 |
| ||
parse_dcl | 6/2.241 | 5/7 |
| ||
expression | 7/1.775 | 4/8 |
| ||
yyrestart | 8/1.663 | 6/6 |
| ||
tree_output | 9/1.489 | 2/14 |
| ||
linked_list_iterate | 10/1.383 | 5/7 |
The comparison of node rankings in Tar-1.28.
Tar-1.28 | Rank/value | |
---|---|---|
Node | MFS_AN | Degree |
flush_archive | 1/9.208 | 9/4 |
| ||
dump_file | 2/8.408 | 8/5 |
| ||
dump_file0 | 3/6.150 | 3/13 |
| ||
update_archive | 4/4.830 | 1/14 |
| ||
find_next_block | 5/4.304 | 7/6 |
| ||
create_archive | 6/3.170 | 4/9 |
| ||
gnu_flush_write | 7/3.103 | 11/2 |
| ||
open_archive | 8/2.768 | 9/4 |
| ||
_gnu_flush_write | 9/2.737 | 11/2 |
| ||
dump_regular_file | 10/2.629 | 4/9 |
The node ranking lists presented in Tables
In summary, the algorithm MFS_AN proposed in this paper is correct and effective for the node criticality evaluation in software network. By using the algorithm MFS_AN to identify the key nodes in software network, it is helpful to reduce the software fault severity and improve the robustness and stability of software.
In this paper, a novel algorithm MFS_AN is proposed to evaluate the criticality of nodes in software network by combining the two characteristics of fault probability and fault propagation capability together. And function nodes with larger fault probability and stronger fault propagation capability are regarded as the key nodes. With experiment, we analyse the FS distribution of the nodes in different software versions, realize the evolution law of software, and prove the algorithm MFS_AN can discover the key function nodes correctly and effectively in software network. On the other hand, the criticality of a function node is not directly related to degree, but it has a certain positive correlation. Furthermore, we could understand the software structure more easily and reduce the workload of testing and maintenance process to a maximum extent. In the future research, we will focus on how to divide the software module based on the key nodes.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
This work is supported by the National Key R&D Program of China under Grant No. 2016YFB0800700, the National Natural Science Foundation of China under Grant Nos. 61472341, 61772449, 61572420, 61807028, and 61802332, the Natural Science Foundation of Hebei Province China under Grant No. F2016203330, and the Advanced Program of Postdoctoral Scientific Research under Grant No. B2017003005.