We study the application to object-oriented software of new metrics, derived from Social Network Analysis. Social Networks metrics, as for instance, the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to software modules and their dependencies. These metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics. We examine the empirical distributions of all the metrics, bugs included, across the software modules of several releases of two large Java systems, Eclipse and Netbeans. We provide analytical distribution functions suitable for describing and studying the observed distributions. We study also correlations among metrics and bugs. We found that the empirical distributions systematically show fat-tails for all the metrics. Moreover, the various metric distributions look very similar and consistent across all system releases and are also very similar in both the studied systems. These features appear to be typical properties of these software metrics.
Measuring software to get information about its properties and quality is one of the main issues in modern software engineering. Limiting ourselves to object-oriented (OO) software, one of the first works dealing with this problem is the one by Chidamber and Kemerer (CK), who introduced the popular CK metrics suite for OO software systems [
Modern software systems are made of many elementary units (software modules) interconnected in order to cooperate to perform specific tasks. In particular, in OO systems the units are the classes, which are in turn interconnected with each other by relationships like inheritance and dependency. Recently, it has been shown how these software systems may be analyzed using complex network theory [
Considering software systems as graphs is not a new approach, and different authors have already investigated some of their properties, like the distribution of Fan-in or Fan-out of network nodes [
Only recently, SNA has been applied to the study of software systems. Zimmermann and Nagappan used SNA metrics to investigate a network of binary dependencies [
In this paper, we study a set of releases of two large Open Source OO systems, Eclipse [
We study the relationships between these metrics and software fault-proneness—measured as the number of Bugs affecting software modules—and between them and more traditional software metrics. We also study the possibility of estimating the metric features for the future releases. For all the observed distributions, we performed best fits, finding analytical distributions able to model the system.
The systems analyzed are written in Java. All their classes are contained in Java source files, called Compilation Units (CU). A CU generally contains just one class, but less frequently it may contain two or more classes. We extracted the Bugs affecting files merging information found in bug-tracking repositories, specifically Bugzilla [
We found that most of the studied metrics are distributed according to the Yule-Simon distribution [
The paper is organized as follows. In Sections
Product metrics, extracted by analyzing static code of software, have been used to build models that relate these metrics to failure-proneness [
CK metrics are intended to measure the degree of coupling and cohesion of classes in OO software contexts. However, the studies using CK metrics do not consider the amount of “information” passing through a given module of the software network. Social Network Analysis (SNA) fills this gap, providing a set of metrics able to extract a new, different kind of information from software projects. Recently, this ability of SNA metrics was successfully employed to study software systems. Zimmermann and Nagappan [
The Pareto principle (80–20 rule) and the presence of power-laws in the tail of the distributions of many properties of software systems, including Bugs, have already been observed [
We investigate if the new proposed SNA metrics possess the same properties and have similar empirical distributions. Moreover, the new metrics might possibly show correlations with Bugs and/or with other metrics and properties. Thus, it is desirable to study these correlations.
We also investigate if there are analytical distribution functions which may be used to describe such empirical distributions and possibly to forecast future properties of the software systems.
Consequently, our research questions are the following. RQ1: Are there analytical distribution functions describing the empirical data? Have these functions power-law behavior in their tails? What is the significance level of fitting empirical data with these distributions? RQ2: Are these distributions similar in all the releases and in different systems, or tend to vary significantly? RQ3: Is it possible to use these distributions to estimate the metrics values in subsequent releases? RQ4: Are there SNA metrics significantly correlated with software Bugs, and to which extent? RQ5: Are there SNA metrics significantly correlated to traditional CK metrics, and to which extent?
An oriented graph can be associated to an OO software system, whose nodes are classes and interfaces, and whose edges are the relationships between classes, namely, inheritance, composition, and dependence. This approach has already been used in the literature. In [
All these studies were devoted to exploit general dependencies among pieces of code in different software modules. With the same aim, in our study we do not distinguish between the various possibilities of software relationships, and with regard to SNA metrics, for simplicity, we do not even consider edges orientation, which would imply the construction of different EGO networks for the different kinds of links. Ours is a static analysis. Furthermore, since our software nodes are CUs, as explained later, many relationships among Java classes lose their original meaning at this granularity level. Our purpose is to focus on the role of the interactions among the software elements.
The number and orientation of edges allow to study the coupling between nodes, that is between classes. In this graph, the in-degree of a class, or Fan-in, is the number of edges directed toward the class. It measures how much this class is used by other classes of the system. The out-degree of a class, or Fan-out, is the number of edges leaving the class. It represents the level of usage the class makes of other classes in the system. Besides Fan-in and Fan-out metrics, we computed also, for each class, four CK metrics which were observed to be significantly correlated with the number of Bugs. They are as follows. Weighted Methods per Class (WMC). A weighted sum of all the methods defined in a class. We set the weighting factor to one, to simplify our analysis. Coupling Between Objects (CBO). The counting of the number of classes which a given class is coupled to. Response For a Class (RFC). The sum of the number of methods defined in the class, and the cardinality of the set of methods called by them and belonging to external classes. Lack of Cohesion of Methods (LCOM). The difference between the number of noncohesive method pairs and the number of cohesive pairs.
We also computed the lines of code of the class (LOC), excluding blanks and comment lines. This is useful to keep track of the class size, because it is known that a “big” class is more difficult to maintain than a smaller class.
Every class is contained in a Java file, called CU. While most files include just one class, there are files including two or more classes. In Eclipse, about 10% of CUs host more than one class, whereas in Netbeans this percentage is about 30%.
While OO metrics and class graphs are usually referred to classes, Bugs and Issues are typically associated to CUs, because the logs of coding efforts aimed to fix Bugs are associated to changes to the source code, which are made to files (the CUs). Since the number of Bugs is of paramount importance to define software quality, to make Issue tracking consistent with source code we decided to base our analysis on CUs. Consequently, we extended CK metrics from classes to CUs. CUs represent therefore the main element of our study.
We defined a CU graph, whose nodes are the CUs of the system. Two nodes, CU LOC is the sum of the LOCS of the classes contained in the CU; CU CBO is the number of out-links of each node, excluding those representing inheritance. This definition is consistent with that of CBO metrics for classes; CU LCOM and CU WMC are the sum of LCOM and WMC metrics of the classes contained in the CU, respectively; CU RFC is the sum of weighted out-links of each node, each out-link being multiplied by the number of specific distinct relationships between classes belonging to the CUs connected to the related edge.
For each, CU we have thus a set of 7 metrics: In-links (Fan-in), Out-links (Fan-out), CU-LOCS, CU-LCOM, CU-WMC, CU-RFC, and CU-CBO. These metrics were computed for CUs of all versions of Eclipse and Netbeans.
Once the CU software graph is defined, we can compute on this graph the metrics used in Social Network Analysis. We restricted ourselves to the subset of SNA metrics that were found most correlated to software quality [
Other SNA metrics we considered, not directly related to the EGO network, are some centrality metrics, determining how important a given node/edge is relative to other nodes/edges in the network. Overall, we consider the following SNA metrics. Size: size of the EGO-network related to the considered node (i.e., Compilation Unit); it is the number of the nodes of the EGO-network. Ties: number of edges of the EGO-network related to the node. Brokerage: the number of pairs not directly connected in the EGO network, excluding the EGO node. Eff-size: effective size of the EGO network; the number of nodes in the EGO network minus one, minus the average number of ties that each node has to other nodes of the EGO network. Nweak-comp: normalized Number of Weak Components; the number of disjoint sets of nodes in the EGO network without EGO node and the edges connected to it, divided by Size. Reach-Efficiency; the percentage of nodes within two-step distance from a node, divided by Size. Closeness; the sum of the lengths of the shortest paths from the node to all other nodes. Information Centrality: the harmonic mean of the length of paths starting from all nodes of the network and ending at the node. DwReach: the sum of all nodes of the network that can be reached from the node, each weighted by the inverse of its geodesic distance. The weights are thus 1/1, 1/2, 1/3, and so on.
All previous metrics are computed on the CU graph and are among those studied in [
We analyze the correlations among all of these metrics, as well as with the other metrics and with Bugs. For some metrics, we analyzed the statistical distributions and performed best fits with analytical distribution functions.
Bug Tracking Systems (BTSs) are commonly used to keep track of Bugs, enhancements, and features—called with the common term “Issues”—of software systems. The open source systems studied, Eclipse and Netbeans, make use of BTS Bugzilla and Issuezilla, respectively.
Each Issue inside a BTS is univocally identified by a positive integer number, the Issue-ID. BTS store, for each tracked Issue, its characteristics, life-cycle, software releases where it appears, and other data. In Bugzilla, a valid Bug is an Issue with a resolution of “fixed”, a status of “closed”, “resolved”, or “verified”, and a severity that is not “enhancement”, as pointed out in Eaddy et al. [
Software configuration management systems like CVS (Concurrent Version System) keep track of all maintenance operations on software systems. These operations are recorded inside CVS in an unstructured way; it is not possible, for instance, on query CVS to know which operations were done to fix Bugs, or to introduce a new feature or enhancement. In order to identify Issues (Bugs) affecting systems CUs, we had to match data stored in BTS with other data recorded in CVS of Eclipse and Netbeans.
All commit operations are committed to the CVS log messages as single entries. Each entry contains various data—among which the date, the developer who made the changes, a text message referring to the reasons of the commit, and the list of CU's interested by the commit. To obtain a correct mapping between Issue(s) and the related CU(s), the only way is to analyze the CVS log messages, to identify commits associated to maintenance operation where Issues are fixed. If a maintenance operation is done on a CU to address an Issue, we consider the CU as affected by this Issue.
In our approach, we first analyzed the text of commit messages, looking for Issue-IDs. In fact, in commit messages, there may be strings such as “Fixed 141181” or “bug no. 141181”, but sometimes only the Issue-ID is reported. Unfortunately, every positive integer number is a potential Issue-ID, but sometimes numbers can refer to maintenance operations not related to Issue-ID resolution, such as branching, data, number of release, and copyright updating.
To avoid wrong mappings between Issue-IDs and CUs, we applied the following strategies. For each release, a CU can be hit only by Issues which are referred to in the BTS belonging to the same release. We did not consider some numeric intervals particularly prone to host false positive Issue-IDs.
The latter condition is not particularly restrictive in our study, because we did not consider the first releases of the studied projects, where Issues with “low” ID appear. All IDs not filtered out are considered Issues and associated to the addition or modification of one ore more CUs, as reported in the commit logs. This method might not completely address the problems in the mapping between bugs and CUs [
- 10% of CU-bug(s) associations (randomly chosen) for each release,
- each CU-bug association for 6 subprojects (3 for Eclipse and 3 for Netbeans) without finding any error. A bias may still remain due to lack of information on CVS [
The total number of Issues affecting a CU in each release constitutes the Issue-metric we consider in this study, while the subset of Issues satisfying the conditions as in Eaddy et al. is the Bug-metric [
We systematically analyzed several main releases of Eclipse and Netbeans projects, namely, releases from 2.0 to 3.4 of Eclipse and releases from 3.2 to 6.1 of Netbeans. For each release, we computed the class graph and the consequent CU graph, and computed all the above quoted metrics at CU level. We analyzed the statistical distributions of the metrics among the systems CU's, which are our graph nodes, as well as the Bugs and Issues distributions. Note that we used CU metrics to be able to study more easily their relationships with Bugs and Issues. However, we verified that the behavior of CU metrics is absolutely similar to the behavior of the corresponding class metrics, for all considered metrics.
Tables
Number of CUs of Eclipse for each release.
Release | 2.0 | 2.1 | 3.0 | 3.1 | 3.2 | 3.3 | 3.4 |
---|---|---|---|---|---|---|---|
Number of CU | 6391 | 7545 | 10288 | 11854 | 14138 | 15439 | 17387 |
Release date | 06-2002 | 03-2003 | 06-2004 | 06-2005 | 06-2006 | 06-2007 | 05-2008 |
Number of CUs of Netbeans for each release.
Release | 3.2 | 3.3 | 3.4 | 4.0 | 6.0 | 6.1 |
---|---|---|---|---|---|---|
Number of CU | 3346 | 4383 | 6264 | 9317 | 31425 | 35034 |
Release date | 04-2001 | 11-2001 | 08-2002 | 12-2004 | 12-2007 | 04-2008 |
In the following figures, we systematically report the experimental CCDF (Complementary Cumulative Distribution Function) in log-log scale, as well as the best-fitting curves in many cases. This is convenient because, if the PDF (probability distribution function) has a power-law in the tail, the log-log plot displays a straight line for the raw data. This is a necessary but by no means a sufficient condition for power-law behavior. Thus we used log-log plots only for convenience of graphical representation, but all our calculations (CDF, CCDF, best fit procedures and the same analytical distribution functions we use) are always in normal scale.
The problems with representing the experimental PDF are that it is sensitive to the binning of the histogram used to calculate the frequencies of occurrence, and that bins with very few elements are very sensitive to statistical noise. This causes a noisy spread of the points in the tail of the distribution, where the most interesting data lie. Furthermore, because of the binning, the information relative to each single data is lost. All these aspects make difficult to verify the power-law behavior in the tail. Thus, we adopted the CCDF representation, which presents various advantages. With this representation, there is no dependence on the binning, nor artificial statistical noise added to the tail of the data. If the PDF exhibits a power-law, so does the CCDF, with an exponent increased by one. Fitting the tail of the CCDF, or even the entire distribution, results in a major improvement in the quality of fit. An exhaustive discussion of all these issues may be found in [
We were able to obtain high quality best fits using three different distribution functions, all compatible with a power-law behavior in the tail. This approach has already been proposed in the literature to explain the power-law in the tail of various software properties [
The CCDF is defined as
The log-normal distribution has been also proposed in the literature to explain different software properties [
It exhibits a quasi-power-law behavior for a range of values and provides high quality fits for data with power-law distribution with a final cut-off. Since in real data largest values are always limited and cannot actually tend to infinity, the log-normal is a very good candidate for fitting power-laws distributed data with a finite-size effect. Furthermore, it does not diverge for small values of the variable, and thus may also fit well the bulk of the distribution in the small values range.
The Yule-Simon distribution is expressed through the Euler Gamma function and has two parameters:
We started the analysis by computing the empirical CCDF's of the software network metrics for the various system studied. The empirical distributions of all considered SNA metrics show the same shape for all releases,both in Eclipse and Netbeans.Therefore,we show only the figures for some selected metrics for the last considered releases of the studied systems, namely, Eclipse-3.4 and Netbeans-6.0.
Figure
CCDF of SNA metrics for Eclipse 3.4 release. The name of the metrics is in the top of the box. The power-law behavior in the tail is patent for all metrics.
CCDF of SNA metrics for Netbeans 6.0 release. The name of the metrics is in the top of the box.
In order to compare the empirical distributions across the releases, we show in the same plot two SNA metrics, Effective Size and Brokerage, for both Eclipse and Netbeans, to highlight their overlap. Figure
CCDF of EffSize and Brokerage metrics for various Eclipse and Netbeans releases. A very similar behavior is patent for all metrics and across all releases of the same system.
The empirical distributions of all considered metrics highly preserve the same shape, meaning that, for each specific metric, a single distribution function may account for the empirical data for all the system releases. Moreover, the distributions of the same metric look also very similar in Eclipse and Netbeans releases. Thus, once this distribution is known for one metric in one release, it is possible to infer the properties of the same metric in other releases, provided that the number of CUs is known.
Regarding what specific distribution function can best fit our empirical data, we experimented with the three distributions cited above—power-law, lognormal, and Yule Simon distributions. Figure
Empirical CCDFs of various metrics in Eclipse 3.1, with their best-fit theoretical distributions. Yule-Simon fit is shown separately.
The fit using a truncated power-law is almost always very good. Note, however, that this fit is made starting from a minimum value
This distribution is able to fit very well the bulk of the samples with small values, but in general it tends to zero too quickly with respect to empirical data. The fit with Yule-Simon distribution is sometimes very good, both for small values and in the tails. Other times, it fails to get a good fit in the tail.
In order to evaluate fit accuracy, we used the determination coefficient
Determination coefficients for the three distribution functions (Eclipse-3.1).
Yule-Simon | Lognormal | Power-law | |
---|---|---|---|
Fan-in | 0.999 | 0.971 | 0.998 |
Fan-out | 0.995 | 0.989 | 0.997 |
Size | 0.987 | 0.999 | 0.998 |
Ties | 0.998 | 0.999 | 0.999 |
Our purpose in this paper is, on the contrary, to provide a reasonable statistical description of the empirical data, and to find the analytical distribution function with the best fit. This allows us to make statistically reliable forecasts on the value assumed by some metrics in the future system releases. In our case, power-law is not in principle more interesting than the log-normal or Yule-Simon distributions, as long as these provide reliable estimates and good descriptions of the empirical data. Any other statistical speculation in order to discriminate among power-law or other distributions is out of our purposes.
Note that the determination coefficients are evaluated on the linear scale, whereas all the figures are in a log-log scale. In this scale, the discrepancy between best fitting curves and empirical curves is visually enhanced, especially in the tail, whereas in the original scale the fitting curves and the empirical ones visually overlap. On the other hand, our fitting procedure does not rely on any log-log representation of the data.
Figure
Determination coefficients for the three distribution functions (Netbeans-3.2).
Yule-Simon | Lognormal | Power-law | |
---|---|---|---|
Fan-in | 0.999 | 0.978 | 0.998 |
Fan-out | 0.998 | 0.982 | 0.996 |
Size | 0.980 | 0.995 | 0.998 |
Ties | 0.999 | 0.998 | 0.999 |
Empirical CCDFs of various metrics in Netbeans 3.2, with their best-fit theoretical distributions. Yule-Simon fit is shown separately.
The empirical studies presented above answer our first two research questions.
We definitely found that all studied metrics, traditional OO, network-based, and derived from Social Network Analysis, tend to follow precise analytical distributions to a high degree of significance level, according to our best-fitting criteria. These distributions are power-law—from a minimum value of data,
The fit using a truncated power-law are always very good. However, they depend on an
We found that all considered metrics have a very consistent statistical behavior across all the releases of the same system, even when these releases span over years and have very different numbers of classes (and CUs).
For completeness, we studied also other Java systems, belonging to the Qualitas Corpus [
Next, we analyzed also the metrics related to Issues and Bugs. We found that also the distributions of Bugs and Issues follow similar patterns, in both Eclipse and Netbeans. In Figures
Empirical CCDFs of Bugs and Issues in Eclipse 3.3, with their best-fit theoretical distributions. Yule-Simon fit is shown separately.
Empirical CCDFs of Bugs and Issues in Netbeans 6.0, with their best-fit theoretical distributions. Yule-Simon fit is shown separately.
The distributions of these metrics are well fitted by the simple power-law, according to the determination coefficient, above a threshold
In this section, we report the correlations among SNA metrics, CK metrics, and Bugs. Since the empirical distributions of all metrics are strongly not normal, correlations are better described using the Spearman coefficient. In our study, we computed also Pearson correlations, which are reported only in one case, for comparison. Our considerations, however, will refer only to Spearman correlation. Using the latter, data must be ranked, with the correlation coefficient being given by
We report the correlations only for Eclipse-2.1 and for Netbeans-3.2, as representative of all the other releases. Tables
Eclipse 2.1. Pearson correlation among metrics.
Num. issue | Num. bug | LOCS | WMC | RFC | LCOM | CBO | Fan-in | Fan-out | Reach efficiency | Eff.size | Closeness | Dwreach | Infocentrality | Size | Ties | Nweakcomp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbug | 97%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
LOCS | 53%** | 53%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — | |
WMC | 49%** | 48%** | 57%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
RFC | 58%** | 59%** | 68%** | 92%** | — | — | — | — | — | — | — | — | — | — | — | — | — |
LCOM | 32%** | 30%** | 19%** | 79%** | 62%** | — | — | — | — | — | — | — | — | — | — | — | — |
CBO | 54%** | 55%** | 65%** | 41%** | 70%** | 11%** | — | — | — | — | — | — | — | — | — | — | — |
Fanin | 18%** | 17%** | 62%** | 30%** | 26%** | 26%** | 4%** | — | — | — | — | — | — | — | — | — | — |
Fanout | 51%** | 52%** | 62%** | 30%** | 58%** | 3%** | 94%** | −1% | — | — | — | — | — | — | — | — | — |
Reach | 0% | 1% | 4%** | −4%** | −1% | −3%** | 9%** | −14%** | 14%** | — | — | — | — | — | — | — | — |
Effsize | 30%** | 29%** | 25%** | 36%** | 40%** | 26%** | 29%** | 96%** | 26%** | −10%** | — | — | — | — | — | — | — |
Closeness | −2% | −2% | −1% | −1%** | −2% | 0% | −3% | −1% | −3%* | −5%** | −2%* | — | — | — | — | — | — |
Dwreach | 27%** | 27%** | 23%** | 17%** | 29%** | 4%** | 46%** | 19%** | 50%** | 44%** | 32%** | −18%** | — | — | — | — | — |
Infocentrality | −2%* | −2%* | −2% | −2% | −3%* | 0% | −4%** | −1% | −5%** | −61%** | −2%* | 94%** | −25%** | — | — | — | — |
Size | 32%** | 31%** | 28%** | 38%** | 42%** | 26%** | 32%** | 95%** | 29%** | −10%** | 100%** | −2% | 34%** | −3%* | — | — | — |
Ties | 32%** | 31%** | 27%** | 43%** | 45%** | 37%** | 27%** | 87%** | 23%** | −9%** | 89%** | −1% | 21%** | −2% | 89%** | — | — |
Nweakcomp | −23%** | −23%** | −27%** | −18%** | −28%** | −2%* | −39%** | −14%** | −40%** | −2%** | −21%** | 4%** | −15%** | 5%** | −25%** | −22%** | — |
Brokerage | 16%** | 15%** | 9%** | 30%** | 26%** | 35%** | 8%** | 85%** | 3%* | −5%** | 83%** | 0% | 12%** | −1% | 82%** | 88%** | −7%** |
**Correlation is significant at the 0.01 level. *Correlation is significant at the 0.05 level.
Eclipse 2.1. Spearman correlation among metrics.
Num. issue | Num. bug | LOCS | WMC | RFC | LCOM | CBO | Fan-in | Fan-out | Reach efficiency | Effsize | Closeness | Dwreach | Infocentrality | Size | Ties | Nweakcomp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbug | 95%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
LOCS | 46%** | 46%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — | |
WMC | 38%** | 38%** | 84%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
RFC | 46%** | 46%** | 90%** | 89%** | — | — | — | — | — | — | — | — | — | — | — | — | — |
LCOM | 34%** | 34%** | 66%** | 85%** | 74%** | — | — | — | — | — | — | — | — | — | — | — | — |
CBO | 45%** | 45%** | 78%** | 61%** | 86%** | 50%** | — | — | — | — | — | — | — | — | — | — | — |
Fanin | 8%** | 7%** | 7%** | 26%** | 7%** | 26%** | −14%** | — | — | — | — | — | — | — | — | — | — |
Fanout | 44%** | 44%** | 77%** | 60%** | 84%** | 51%** | 95%** | −15%** | — | — | — | — | — | — | — | — | — |
Reach | 16%** | 16%** | 29%** | 17%** | 35%** | 13%** | 48%** | −29%** | 52%** | — | — | — | — | — | — | — | — |
Effsize | 41%** | 41%** | 59%** | 59%** | 68%** | 56%** | 63%** | 45%** | 66%** | 21%** | — | — | — | — | — | — | — |
Closeness | 38%** | 39%** | 51%** | 45%** | 59%** | 42%** | 62%** | 14%** | 66%** | 63%** | 72% | — | — | — | — | — | — |
Dwreach | 40%** | 40%** | 54%** | 48%** | 62%** | 45%** | 66%** | 15%** | 70%** | 68%** | 76%** | 96%** | — | — | — | — | — |
Infocentrality | −33%** | −34%** | −45%** | −35%** | −48%** | −35%** | −53%** | −2%* | −55%** | −51%** | −59%* | −79%** | −83%** | — | — | — | — |
Size | 43%** | 42%** | 63%** | 62%** | 71%** | 58%** | 66%** | 46%** | 69%** | 22%** | 98%** | 72%** | 76%** | −59%** | — | — | — |
Ties | 42%** | 42%** | 64%** | 60%** | 69%** | 56%** | 65%** | 41%** | 67%** | 17%** | 17%** | 65%** | 69%** | −65%** | 94%** | — | — |
Nweakcomp | −28%** | −28%** | −48%** | −41%** | −47%** | −37%** | −44%** | −23%** | −44%** | −3%** | −42%** | −31%** | −34%** | 51%** | −52%** | −71%** | — |
Brokerage | 42%** | 42%** | 61%** | 60%** | 69%** | 57%** | 65%** | 45%** | 67%** | 21%** | 100%** | 73%** | 76%** | −59%** | 99%** | 91%** | — |
**Correlation is significant at the 0.01 level. *Correlation is significant at the 0.05 level.
Netbeans 3.2. Spearman correlation among metrics.
Num. issue | Num. bug | LOCS | WMC | RFC | LCOM | CBO | Fan-in | Fan-out | Reach efficiency | Effsize | Closeness | Dwreach | Infocentrality | Size | Ties | Nweakcomp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbug | 98%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
LOCS | 47%** | 46%** | %** | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
WMC | 44%** | 42%** | 87%** | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
RFC | 44%** | 42%** | 87%** | 95%** | — | — | — | — | — | — | — | — | — | — | — | — | — |
LCOM | 41%** | 39%** | 70%** | 87%** | 81%** | — | — | — | — | — | — | — | — | — | — | — | — |
CBO | 33%** | 32%** | 58%** | 53%** | 71%** | 43%** | — | — | — | — | — | — | — | — | — | — | — |
Fanin | 13%** | 12%** | 19%** | 34%** | 30%** | 31%** | 13%** | — | — | — | — | — | — | — | — | — | — |
Fanout | 45%** | 44%** | 65%** | 58%** | 68%** | 54%** | 71%** | 3%** | — | — | — | — | — | — | — | — | — |
Reach | 17%** | 16%** | 18%** | 14%** | 18%** | 15%** | 18%** | −19%** | 49%** | — | — | — | — | — | — | — | — |
Effsize | 38%** | 36%** | 52%** | 58%** | 62%** | 54%** | 53%** | 56%** | 70%** | 20%** | — | — | — | — | — | — | — |
Closeness | 38%** | 36%** | 43%** | 43%** | 46%** | 43%** | 38%** | 14%** | 70%** | 73%** | 61%** | — | — | — | — | — | — |
Dwreach | 37%** | 35%** | 45%** | 45%** | 48%** | 44%** | 40%** | 17%** | 73%** | 77%** | 66%** | 94%** | — | — | — | — | — |
Infocentrality | −26%** | −25%** | −30%** | −23%** | −23%** | −24%** | −14%** | 9%** | −38%** | −37%** | −26%** | −54%** | −60%** | — | — | — | — |
Size | 39%** | 38%** | 55%** | 61%** | 65%** | 57%** | 56%** | 57%** | 73%** | 23%** | 98%** | 64%** | 69%** | −28%** | — | — | — |
Ties | 39%** | 37%** | 54%** | 57%** | 60%** | 53%** | 50%** | 48%** | 65%** | 10%** | 83%** | 52%** | 57%** | −47%** | 88%** | — | — |
Nweakcomp | −26%** | −25%** | −37%** | −32%** | −32%** | −31%** | −23%** | −16%** | −30%** | 12%** | −25%** | −17%** | −20%** | 60%** | −36%** | −63%** | — |
Brokerage | 38%** | 37%** | 53%** | 59%** | 63%** | 55%** | 54%** | 56%** | 71%** | 21%** | 10%** | 62%** | 67%** | −28%** | 99%** | 84%** | −30%** |
**Correlation is significant at the 0.01 level. *Correlation is significant at the 0.05 level.
The higher correlations are among Issues and Bugs, as it is natural, being one a subset of the other. This means that nodes having a high number of Issues also tend to have a high number of Bugs. In other words, the number of Bugs is always about the same fraction of Issues. Thus only one of them will be included in the subsequent analysis.
We computed the correlation matrix among Issue, Bug, CK metrics, LOC, Fan-out, Fan-in, and EGO-metrics. Correlations are almost the same in each release, with fluctuations generally below 10%.
In Eclipse, CK metrics, LOCS, Fan-Out, and EGO metrics generally show a moderate correlation with respect to Issues (Bug). In Netbeans, we have similar correlations, though usually slightly smaller. In both cases, the predictive power of these metrics is similar for the same software system. In both systems, LOC metric is the most correlated with Issues. This is expected, because bigger files have a larger chance to produce Issues and Bugs. However, other good predictors of Issues—comparable with LOC—are RFC, Fan-out, Size and, to a lesser extent, LCOM, Ties and Brokerage. In general, we observe that many SNA metrics are quite correlated with the number of Issues (and Bugs), showing the importance of considering these metrics.
In both Eclipse and Netbeans, Fan-in always shows a small—though significant—correlations with Issues. The different correlation between Fan-in and Fan-out with respect to Issues, indicates that to identify a fault-prone node it is important to take into account not only the number of links but also their direction. An Out-link directed from a compilation unit A to a compilation unit B may be considered like a channel easing the propagation of defects from B to A, but not vice versa. This fact highlights the importance of an analysis of a software system as an oriented graph.
CK and LOC metrics correlations with Issues are in line with results previously shown in [
In Netbeans, correlations between CK metrics and Eff-Size, Size, Ties, Brokerage are also large. Smaller correlations hold between CK metrics and Closeness, Nweakcomp, Dwreach. Only minor correlations, like in Eclipse, exist between CK metrics, Reach-Efficiency, and Info-Centrality.
In both Eclipse and Netbeans, the only metrics that are anticorrelated with the number of Issues are Info-Centrality and Nweak-Comp, suggesting that it is better for a CU to have a high Information Centrality and Normalized number of Weak Components, to be less prone to get Issues and Bugs.
Most Eclipse and Netbeans EGO metrics are not strongly correlated with each other. For example, Reach-Efficiency has small correlation with Eff-Size, Size, and Brokerage, and no correlation with Nweakcomp. Size metric is the most correlated with the others EGO-metrics and shows an almost perfect correlation with Eff-Size and Brokerage. Consequently, it is clearly needed to consider just one of these metrics. We suggest to use Size, which is easier to compute and, at least in the considered systems, looks slightly better correlated to Issues.
These findings related to correlations answer our last two research questions.
The data reported, and data very similar to them related to all other considered releases of Eclipse and Netbeans, confirm that there are significant correlations between several SNA metrics and the number of Bugs. These correlations are of the same order of magnitude of more traditional CK metrics—whose predictive power in predicting faulty classes has been studied and assessed for a long time [
The study of Tables
In this section, we discuss how it is possible to estimate some values for the metrics starting from the knowledge of the analytical fitting functions. We assume that all the data are known for one system release and assume the persistence of the distributions across releases.
Let us consider, for instance, the metric Ties, and the Eclipse releases from 2.1 to 3.3. Let us start with the lognormal distribution. If we compute the estimate of the mean values using the best fitting parameters found, using the usual formula (
It is also possible to estimate the expected maximum value for a lognormal population of finite size
If we consider the best-fit power-law distribution, its exponent
Using the power-law, however, we may provide an estimate for the maximum value, a quantity more relevant than the estimate of the mean. It is well known that the following formula holds [
The best fitting parameters for the three different distributions for the metric Ties. For each version of Eclipse, empirical first and second moment, number of CU and maximum value are also reported.
Ties | lognormal | Power-law | Yule-Simon | |||||||
Release | ||||||||||
2.1 | 2.85 | 1.54 | 2.38 | 174 | 2.23 | 20.6 | 7545 | 59.6 | 227.3 | 9799 |
3.0 | 2.80 | 1.54 | 2.39 | 141 | 2.21 | 19.1 | 10288 | 59.2 | 257.6 | 11901 |
3.1 | 2.85 | 1.56 | 2.37 | 143 | 2.16 | 18.6 | 11854 | 64.9 | 294.7 | 14711 |
3.2 | 2.82 | 1.57 | 2.35 | 141 | 2.14 | 17.7 | 14138 | 65.3 | 316.2 | 17029 |
3.3 | 2.83 | 1.57 | 2.33 | 145 | 2.13 | 17.4 | 15439 | 66.8 | 336.3 | 18819 |
Estimates for the extreme values of the metric Ties. In the last column, the two values refer to the estimate obtained using parameters from release 2.1, or using parameters from the immediate previous version, respectively.
Release | Actual value | |||
---|---|---|---|---|
3.0 | 11901 | 12962 | 12268 | 12609/== |
3.1 | 14711 | 13634 | 13594 | 14148/14234 |
3.2 | 17029 | 14511 | 15446 | 16327/16838 |
3.3 | 18819 | 14967 | 16463 | 17539/18363 |
The Yule-Simon distribution is a good compromise between the two other considered distributions, because it fits both the bulk and the tail of the data. We numerically estimated the average using the best fitting parameters of the Yule-Simon distribution in Table
We may now answer to the third research question
We found that mean values, as obtained from the analytical distributions, are in agreement with the empirical ones. From the knowledge of the best fitting parameters of the Yule-Simon distribution in one release, assuming persistence, we estimated the extreme values of subsequent releases using the CU number. Such estimates are in agreement with the empirical values with an error of
These results have been obtained for the metric Ties for Eclipse but similar considerations hold also for the other metrics which are best fitted using Yule-Simon distribution.
In this paper, we studied for the first time the distribution of SNA metrics in OO software networks, comparing their properties with those of CK metrics and other graph-related metrics. We used as a central concept the Compilation Unit and not the class, to be able to better study the impact of metrics on Bugs and Issues, which always refer to CUs and not to classes, in commonly used configuration management systems.
The empirical distributions of all the studied metrics systematically present power-laws in their tails. This property holds also for bug distribution. It must be noted that bug distributions may be biased due to the lack of information in CVS commits, thus our results on bug distributions are as reliable as the information about bugs extracted from CVSs. All metrics have very similar features and shapes across all the system releases and also show very similar behavior in both Eclipse and Netbeans systems.
We found analytical distribution functions suitable for fitting the empirical data. Power-law always outperforms other fittings in the tails, whereas Yule-Simon distribution follows the shapes of most metrics empirical distributions very well. In particular, Ties and Fan-in metrics are fitted by Yule-Simon distribution from the very beginning of values, with the determination coefficients being over 0.98. We have shown—using the metric Ties—how it is possible to provide reliable estimates for averages and extreme values of subsequent releases from the knowledge of the best fitting parameters and system size. The knowledge of extreme values of metrics could be exploited to keep under control the quality of software systems, because in general high values of these metrics denote high coupling among classes.
Regarding correlations among SNA metrics and Bugs, they are generally good, and when using the Spearman coefficient to assess them, they are comparable to those of CK metrics. It is known that LOC is one of the metrics best correlated with the number of defects. Nevertheless, as it holds for some other complexity metrics, they focus only on single software elements, while the use of SNA metrics allows to take into account the role of interactions between elements, and how these interactions correlate with defects. Consequently, we can state that the new SNA metrics are worth studying in greater detail, to better assess their predictive power regarding Issues and Bugs, maybe in conjunction, and not as an alternative to more traditional OO metrics.
Future developments of this seminal work will include controlled experiments to better understand the effect of SNA metrics on bug proneness and if they are able to identify different kind of bugs, and the construction of software graphs where the link direction and type are taken into account.