Modeling Software Systems as Complex Networks: Analysis and Their Applications

Software systems are of great importance, whose quality will inﬂuence every walk of our life. However, with increase in their scale and complexity, we are unable to control their quality since only little is known about their actual internal structure. “We cannot control what we cannot measure.” Thus, to control these complex software systems, the ﬁrst task that we should do is to measure their internal structure. In recent years, people applied the theories and techniques in the ﬁeld of complex networks to sys-tematically investigate the structure of software systems by representing software systems as networks (i.e., software networks), and many interesting and useful results have been revealed. In this work, we aim to brieﬂy review some recent research advances in the interdisciplinary research between complex networks and software engineering, including modeling, analysis, and applications. Speciﬁcally, we ﬁrst describe some novel techniques to model the structural details of a speciﬁc software system. Then, based on these modeling techniques, we introduce some research work on characterizing the static and dynamic structural properties of software systems. Third, we describe some promising applications of software networks in real-world scenarios. Finally, we suggest some future research topics.


Introduction
Nowadays, software systems have almost been used in every walk of life. us, how to provide a piece of software with high quality has been a problem attracting a lot of attention. However, with increase in the scale and complexity of software systems, it is a hard task to control the quality of a specific piece of software, especially when we know very little about the internal complexity of a specific software system [1,2]. It is well known that we cannot control what we cannot measure.
us, to control the quality of software systems, the first task that we should do is to measure their internal complexity [3]. Software structure which is defined as the software elements (e.g., attributes, methods, classes, interfaces, and packages) and their couplings (e.g., "methodcall" couplings between methods and "inheritance" between classes) have been one of the most important factors that may influence the software complexity and further influence the quality of the software. us, how to measure and even to control the complexity of a software system has been a challenge faced by many researchers [4]. ere is an urgent need to develop a systematic approach to deeply explore the internal structure of software systems.
Networks (or graphs) provide a natural and most adequate representation of the software structure; i.e., software elements are nodes and the couplings between software elements are edges (or links). ough network representation is not novel in software engineering, its form is simple and intelligible, which makes it feasible to perform the network analysis of software structure by using theories and techniques in the field of complex networks, and many significant discoveries and research results have been provided in the last decade [1,2]. Note that such a network representation of the internal software structure of a specific software system is usually termed "software networks," a notion similar to "complex networks" [5].
In this work, we aim to briefly review some recent research advances in the field of software networks, highlighting different techniques in modeling, analysis, and applications. Specifically, we first describe some novel techniques to model the structural details of a specific software system. en, based on these modeling techniques, we introduce some research work on characterizing the static and dynamic structural properties of software systems.
ird, we describe some promising applications of software networks in real-world scenarios. Finally, we provide some future research topics. Note that, in this work, we only review the most recent research work which is published in the last seven years (2013 to 2019). For research work published before 2013, please refer to the reviews [1,2]. In this work, we only focus on the brief review of the most recent research work in the field of software networks, rather than their detailed comparison. e rest of this paper is organized as follows: Section 2 introduces the related reviews on software networks. Section 3 describes the data set we used. Section 4 introduces the related advances in software networks from three perspectives, i.e., modeling, analysis, and applications. Section 5 outlines some future research topics. Finally, in Section 6, we conclude the paper.

Related Work
To the best of our knowledge, there are a total of three reviews related to software networks.
Li et al. [6] reviewed 36 research papers related to software networks in 2008. ey organized the existing work into three groups, i.e., related work on discoveries of software structural properties, related work on models of software growth, and related work on software metrics based on software networks. In the related work on discoveries of software structural properties, they discussed some research work on revealing shared structural properties in software networks. In the related work on models of software growth, they discussed the proposed evolution models to characterize the software growth. In the related work on software metrics, they discussed the metrics which are based on software networks and are used to characterize software quality.
Pan's review examined 32 research papers on software networks published before 2011 [2]. He discussed the existing research from four perspectives, i.e., characterization of software networks, modeling of software networks growth, measurement of software networks, and application of software networks in software engineering. In the "characterization of software networks," he reviewed the work that aims to characterize the properties of software structures such as scale-free and small world at different levels of granularity. In the "modeling of software networks growth," he reviewed the work that aims to propose an evolution model to explain the growth of software structures. In the "application of software networks in software engineering," he reviewed the work that applied software networks in software engineering practices such as software refactoring and software selection. Sbelj and Bajec [1] also reviewed the related work on software networks. First, they reviewed the work on discovering the shared structural properties such as scale-free and small world phenomena. Second, they reviewed the work on characterizing the dynamical properties of software networks such as bug propagation.
ird, they reviewed some work on the application of software networks such as refactoring and software abstraction.
Our current review is different from those of Li, Pan, and Sbelj. We cover a different time period from 2013 to 2019. us, our focus is to review the very recent research work in the field of software networks and shed some lights on the future research topic.

Data Set
We searched the 7 most popular digital libraries, i.e., ACM, IEEE, Springer, Scopus, ISI, ScienceDirect, and Compendex and Inspec, to obtain a relatively complete list of the primary research work. When searching the digital libraries, we used the following search string: ( e search string contains the major research terms and their alternative spellings from the titles and keywords of related work on software networks. We use Boolean expressions "AND" and "OR" to connect the major research terms and their alternative spellings, respectively. Note that, in a specific digital library, the search string should be adapted slightly according to the grammar that library uses. For example, in the Springer library, the above string should be written as follows: (Java OR OO OR "object-oriented" OR "object oriented" OR package OR packages OR class OR classes OR interface OR interfaces OR method OR methods OR attribute OR attributes) AND ("software network" OR "software networks" OR "complex networks" OR "complex network" OR graph OR graphs).
Obviously, the results returned by each digital library may overlap. us, we should identify and remove the redundant results. Furthermore, we also exclude research papers based on their titles, abstracts, and full texts. Finally, our data set contains 30 research papers (see the References section).

Analysis and Discussion
e research papers in our data set can be roughly categorized into three groups, i.e., modeling, analysis, and applications. Papers in the "modeling" category focus on the novel techniques to model the structural details of a specific software system. Papers in the "analysis" category focus on characterizing the static and dynamic structural properties of software systems. Papers in the "application" category try to apply the software networks to solve some real-world problems in software engineering. e three categories will be detailed in the following sections.

Modeling of Software
Structure. Different types of software networks have been proposed to represent the structural details of a specific software system at different levels of granularity, such as associated software graphs [7], class diagram [8], and cyclic dependency graphs [9] (see Table 1). ese software networks can be differentiated from the levels of granularity (i.e., package level, class level, and method (or attribute) level). Furthermore, these software networks can also be differentiated from the nature of couplings, i.e., whether their couplings are directed or weighted.
As is shown in Table 1, we can observe that, in the method level software networks, nodes represent methods and edges (or links) represent the method call relations between methods. We can use the frequency of method calls to weigh the edge (or link) with the aim to signify the coupling intensity that might exist between the two methods. Edge can also be directed to denote the coupling direction. e existing two software networks at the method level (i.e., weighted networks [10] and FCN [12]) are all not very accurate to describe the software structure. Weighted networks ignored the reference relations between methods and attributes, while FCN ignored the coupling direction. us, we can combine the two software networks to build a much more accurate software network, i.e., weighted directed feature coupling network (WDFCN). We use this feature to denote methods and attributes. In the WDFCN, nodes denote features, edges denote method class relations and method-attribute reference relations, weights on the edges denote the coupling frequencies, and the direction of edges denotes coupling directions.
In the class level software networks, nodes represent classes (or interfaces) and edges (or links) represent the couplings between classes (i.e., inheritance and implement) and couplings between the methods and attributes the classes contain (i.e., parameter, global variable, local variable, return type, and method call). Edges can also be assigned weights to signify the coupling intensity between classes (or interfaces) and can also be directed to denote the coupling direction. In the existing class-level software networks, CCN and MCN are much more accurate than others. However, they ignored two important coupling types between classes, i.e., the reference relations between methods and attributes the classes contain and the instantiate relations between classes.
us, CCN and MCN can be improved by considering the two coupling types.
In the package-level software networks, nodes represent packages and edges (or links) represent the couplings between packages, which are derived from the couplings between classes. Edges can also be assigned weights to signify the coupling intensity and can also be directed to denote the coupling direction. In the existing two types of package-level software networks, PDN is much more accurate than MPN. However, as that in CCN and MCN, PDN and MPN also ignored two important coupling types between classes, i.e., the reference relations between methods and attributes the classes contain and the instantiate relations between classes.
us, PDN and MPN can also be improved by considering the two coupling types.
Note that, compared with the research work on software networks published before 2013, the main difference of the software recently built is that they took into consideration much more information in the software systems such as different coupling types, coupling frequencies, and the nature of couplings. e software networks recently built are much more accurate. But they still ignored some information, e.g., the reference relations between methods and attributes the classes contain and the instantiate relations between classes. If the software networks we built are not very accurate, the results or findings that we obtained from experimental studies may contain errors. us, there is still much more work that we can do.
In fact, how accurately the software networks can describe the software systems depends on the tools that are used to extract the information enclosed in the software. To the best of our knowledge, many research papers only provide their software network models and show their results or findings. ey usually do not mention the tools that they used to build software networks. Pan et al. [4,5,12] developed a software network analysis platform (SNAP) to build many types of software networks at different levels of granularity. eir tools can be obtained via the URLs provided in their work. By their tools, we can build all the abovementioned software networks.

Analysis of Software Networks.
Chaikalis and Chatzigeorgiou [18] proposed a network-based prediction model to characterize the growth of software systems. eir model took into consideration both of the information from past data and domain-related rules.
Wang and Xiao [10] represented the runtime structure of the Linux operating system as a weighted network, where nodes represent functions and edges represent function calls. Based on the weighted network, they explored the execution process of Linux by using theories and techniques in complex networks. ey found that the weight distribution follows a power-law distribution, the process management component of Linux plays the most important role, and the reliability of Linux declines with the versions from 3.15 to 4.4.
Yang et al. [11] modeled software systems as Function Call Networks (FCNs), where nodes represent functions and edges represent function calls. Based on the FCNs, they characterized the software structure using a set of measurements from the perspective of modularity, hierarchy, complexity, and fault propagation.
ey also proposed a model to quantify software structural quality, which gives a better understanding of the evolution of software systems.
Trindade et al. [19] represented software at the class level as Little House, where nodes represent classes and edges represent dependencies among classes. Based on the Little House, they analyzed 81 versions of 6 software systems and found some software evolution patterns. ese patterns are Mathematical Problems in Engineering 3 further applied to define a software evolution model to characterize software evolution and growth. Pan et al. [16] use a multilayer network at the class level to model software systems, where nodes are classes and interfaces and edges are different coupling types between classes (or interfaces). In their model, a specific type of coupling forms a layer. ey used an aggregation approach to analyze the multilayer structure of a specific software system by using a set of 10 topological measures from complex networks. It is the first work to represent software systems as multilayer networks, providing a novel perspective to analyze software systems.
Note that the abovementioned papers on analysis of software networks used a traditional way to explore the growth of software systems from a structural perspective and also the structural properties enclosed in the software systems, but some of them took a new perspective. Specifically, Chaikalis and Chatzigeorgiou characterize software evolution from a network perspective, Wang and Xiao built the software network from execution process of the software, Yang et al. characterized the software structure by using the dynamic process of faults, Trindade and Orfano tried to use a model to characterize software evolution and growth, and Pan et al. used a multilayer networks, which is a much more accurate software network model.

Software Metrics.
Gu et al. [20] proposed metrics to quantify the class cohesion from a complex network perspective.
Pan and Chai [3] modeled software systems at the class level as a weighted directed software network, and based on the network, they proposed a metric, NIN, to quantify the coupling intensity of two classes. ey further proposed a metric to quantify the class stability. In [14], they further proposed a simulation way to calculate the software stability, which is based on the analysis of change propagation dynamics in the software structure.
In [12], Pan et al. modeled software systems as feature coupling networks (FCNs), where nodes are methods and attributes and edges are method-call relations and methodattribute reference relations. Based on the FCNs, they borrowed some idea from community detection techniques in complex networks and used the metric "modularity" to quantify the modularity of a specific software system.
Obviously, the recent research work on software metrics followed the traditional line of thoughts of the related work published before 2013. e only difference is they used a much more accurate software network model and characterized the software structure from a different perspective.

Bug Prediction.
Concas et al. [7] used an Associated Software Graph (ASG) to represent software systems at the class level, where nodes represent classes and edges represent the "inheritance," "composition," and "dependence" relations between classes. Based on the ASG, they computed the number of communities, modularity of the software network, and other network metrics such as clustering coefficient, average path length, and mean degree. en, they analyzed the correlation between these metrics and the number of bugs in the software. ey found that mediumsize systems with community structures tend to be buggy.
Yang et al. [11] proposed a software class network to represent software systems at the class level. In the software network, classes are nodes, and the calling relations between the methods that every pair of classes contain constitute the edges. Based on the software network, they proposed a set of metrics to characterize the software network structure and used some machine-learning algorithms to construct defect prediction models. eir results showed promising results. Zakari et al. [21] proposed a software network at the statement level, where statements are nodes and the execution traces between statements are edges. Based on the software network, they computed two centrality metrics (i.e., degree centrality and closeness centrality) for fault diagnosis. Experimental results showed their approach is promising and better than existing fault localization techniques.
Obviously, in the existing research work on fault prediction, software networks are usually used to calculate some structural metrics and the structural metrics can be used to correlate with bugs or be used in traditional prediction models to improve fault prediction performance. However, the software networks the existing approaches used are not very accurate, which makes the metrics obtained inaccurate.
us, in the future, we can use a much more accurate software network to compute structural metrics.

Software Refactoring.
Pan et al. [22] modeled the software structure at the method level as SFN, where nodes represent methods and attributes and edges represent method-call relations and method-attribute reference. en, they applied an evolutionary algorithm to optimize software structure and detect the methods to be moved. In their algorithm, they optimized a function which is based on software modularity. In [23], Pan et al. proposed a similar approach to identify the classes to be moved.
In [24], Wang et al. represented software at the class level as a Class-Level Multirelation Directed Network (CMDN), where nodes are classes and edges are the coupling between classes, i.e., inheritance, association, and aggregation. Based on the CMDN, they used the community detection algorithm to identify many refactoring opportunities simultaneously. Experimental results showed that their approach is better than some existing approaches.
ere are many other refactorings in object-oriented software systems. However, the existing work only considered three refactorings, i.e., move method refactoring, move field refactoring, and extract class refactoring. Many other refactorings such as extract method, pull up method, and inline class need further exploration.

Key Class Identification.
Meyer et al. [25] modeled software systems at the class level as a software network and applied the coreness in the k-core decomposition to measure class importance in the software network. e coreness is further used as a criterion to rank classes.
Şora and Chirila [26] recently modeled software systems as graph, where nodes represent classes and edges represent the couplings between classes. Weights are assigned to the edges to measure the coupling intensity. en, they applied PR-U2-W, CONN-TOTAL, and CONN-TOTAL-W to measure class importance, respectively.
In [27], Luo et al. proposed an extend call graph to represent methods and their calling relations and utilized a VertexRank algorithm to quantify the importance of methods.
In [28], He et al. modeled software systems as a weighted software network, where nodes are methods and edges are their calling relations. en, they applied a PageRank-like algorithm to quantify the importance of methods.
In [15], Pan et al. modeled software systems at the class level as a weighted directed software network, where nodes are classes and interfaces, edges are the 7 types of couplings between classes (or interfaces), and the weights on the edges are the coupling frequencies. en, they proposed a generalized k-core decomposition to quantify the importance of classes. In [5], they further proposed a multilayer software network at the class level. Based on the software network, they compute the importance of classes at each layer and further combine the class importance at each layer to obtain the final importance.
e software network proposed in [5] is the best accurate one in the existing research work. But the authors ignored two important types of couplings, i.e., the reference relations between methods and attributes the classes contain and the instantiate relations between classes. us, there is still much room for improving the existing work on key classes identification. We can also use improved ranking algorithm to improve the performance of the existing approaches. To the best of our knowledge, there is no work on identifying important software elements at other levels of granularity.

Future Research Topics
Based on the brief review of the related work on software network, we proposed the following research topics that we can carry out in the future: (i) Much more accurate software networks at different levels of granularity: for example, in the existing software networks, no one considered all the coupling types that might exist between classes. us, much work should be performed to consider much more information in the software. (ii) Runtime software networks: the majority of the software networks is constructed statically from the source code or bytecode of a specific software system. Only one research paper [10] reported constructing a runtime software network. us, much work can be carried out on runtime software network modeling, analysis, and applications. (iii) Software evolution model: software evolution models are used to characterize the software evolution and growth. ey should reflect the properties enclosed in software systems. us, if we find more properties of software, the evolution model can be updated. (iv) Bug prediction: we can propose many software metrics to characterize software structure and further use them to improve any bug prediction models. (v) Software refactoring: much more work can be proposed to identify other refactoring opportunities such as extract methods, pull up methods, and inline classes. (vi) Software comprehension: identifying important software elements can be used to aid people Mathematical Problems in Engineering 5 understand a specific software system. Much more work can be carried out on identifying important software elements at other levels of granularity (i.e., package level, method level, or even statement level). Furthermore, much more work can also be performed to guide the specific comprehension process of a software system. (vii) Service-oriented system analysis: software networks have also been used in service-oriented software systems. Pan et al. [29,30] used software networks to represent API and their couplings in serviceoriented systems and applied community detection algorithm to organize APIs into clusters. us, in the future, we can also perform service-oriented software modeling, analysis, and applications.

Conclusions
is paper briefly reviewed the recent advances in the research field of software networks from 2013 to 2019. First, we described the data set we used, i.e., the research work published in the time period of 2013 to 2019. en, we briefly described the existing work from three perspectives, i.e., modeling, analysis, and applications. Specifically, we reviewed the software networks that used to model the structural details of specific software systems and highlighted the problems in the existing models. We briefly introduced some research work on characterizing the static and dynamic structural properties of software systems. We also described some promising applications of software networks in real-world scenarios such as software metrics, bug prediction, refactoring, and key element identification. Finally, we outlined some future research topics.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.