Software refactoring is a software maintenance action to improve the software internal quality without changing its external behavior. During the maintenance process, structural refactoring is performed by remodularizing the source code. Software clustering is a modularization technique to remodularize artifacts of source code aiming to improve readability and reusability. Due to the NP hardness of the clustering problem, evolutionary approaches such as the genetic algorithm have been used to solve this problem. In the structural refactoring literature, there exists no search-based algorithm that employs a hierarchical approach for modularization. Utilizing global and local search strategies, in this paper, a new search-based top-down hierarchical clustering approach, named TDHC, is proposed that can be used to modularize the system. The output of the algorithm is a tree in which each node is an artifact composed of all artifacts in its subtrees and is a candidate to be a software module (i.e., cluster). This tree helps a software maintainer to have better vision on source code structure to decide appropriate composition points of artifacts aiming to create modules (i.e., files, packages, and components). Experimental results on seven folders of Mozilla Firefox with different functionalities and five other software systems show that the TDHC produces modularization closer to the human expert’s decomposition (i.e., directory structure) than the other existing algorithms. The proposed algorithm is expected to help a software maintainer for better remodularization of a source code. The source codes and dataset related to this paper can be accessed at https://github.com/SoftwareMaintenanceLab.
1. Introduction
Software maintenance is the process of modifying a software product after releasing it to reduce faults, improve performance, or improve the design. Software maintenance tasks are important for future software development and consume approximately 90 percent of the total cost [1].
In software maintenance, some changes including add, delete, or modify code lead to the growth of code blocks and difficulty in code’s understandability in the future. Code smells (or bad code smells) are part of source code which do not cause faults on external behavior and do not have a significant problem in internal behavior at this moment but may cause issues in the future development process [2]. Software refactoring is modifying the source code to rectify code smells without any change in the external behavior of the system. It improves the quality of software source code by reducing the potential occurrence of bugs and keeping the code easier to maintain or extend in the future.
Fowler et al. reported some possible code smells in their book [3] for object-oriented programming-based systems and proposed possible refactoring scenarios for them. Since then, many studies have been done to propose new refactoring scenarios or validating effects of applying various scenarios in the source code to achieve better quality.
Refactoring techniques are classified into two major conceptual and structural groups. For example, rename method refactoring is a conceptual refactoring scenario that changes the name of a method for a better explanation of its responsibility. Some structural refactoring scenarios are about methods or functions composing. For example, long code blocks usually have multiple responsibilities or duplicate blocks that should be refactored. Some other structural refactoring scenarios are to improve the functionality of code blocks. As an example, move method refactoring (MMR) is a refactoring scenario that is defined as the act of moving a method from one class to another class which has the most relation with that method. The relation between methods can be structural relations like calls or semantic relations. There are also some composite refactorings that are defined as a sequence of primitive refactorings that reflect complex transformations.
To illustrate a structural refactoring task, Figure 1 depicts an example modularization for a small software system. In this figure, each node is a class and edges represent a collaboration between the classes. These classes are separated into two modules according to their collaborations. Figure 2 shows several changes on this software after some maintenance actions. As shown, the relations between nodes are changed and also a new class “I” is added to the system. In Figure 2, relations of the node “G” with the nodes in the left module are more than relations in the right module. So it is necessary to relocate the position of this node (and node “I”) by a remodularization. The result of remodularization is shown in Figure 3.
An example modularization for a small software system.
Some maintenance actions on Figure 1.
Remodularization of Figure 2.
Manually analyzing the source code to refactoring is a costly and time-consuming process. Hence, many researches have been done about automatic refactoring. One approach for structural refactoring is remodularization, as shown in Figure 2, so that the remodularization is performed by clustering techniques. According to [4], “The aim of the software clustering process is to partition a software system into modules (subsystems or packages), where a module is made up of a set of software artifacts which collaborate with each other to implement a high-level attribute or provide a high-level service for the rest of the software system.” The input of a clustering algorithm is artifact dependency graph (ADG), where the nodes of this graph indicate artifacts and the edges show the relationships between artifacts. An artifact can be an entity such as a function, a file, a software class, or even a collection of classes so-called package or files in a source code folder. The relation between artifacts can be created from structured features like calls or nonstructured features like semantic relations. Figure 4 shows an example of clustering in which artifacts of a small compiler are partitioned into four modules (clusters) according to their relations. These modules are expected to have maximum cohesion and minimum coupling with other modules [6, 7].
An example of clustering [5].
Current clustering strategies for obtaining proper modularization are based on two major hierarchical or nonhierarchical techniques. In hierarchical methods, a tree of relations is constructed from the artifacts at the leaf to the root. These techniques give developers a hierarchical view for decision-making about the number and appropriate cutpoint in a tree to construct modules. Most presented hierarchical methods for software clustering are agglomerative (bottom-up). In such algorithms, each artifacts starts in its own cluster; based on certain criterion, e.g., Jaccard, the proximity is calculated between all clusters, and pairs of clusters with the highest proximity are merged as one moves up the hierarchy [8]. The main limitations of hierarchical algorithms are as follows [8]:
Due to the presence of zigzag, to identify modules, it is necessary to make the whole tree to the end.
There exists no well-defined criterion to decide where the clustering process should stop.
Arbitrary decisions are one of the main problems in hierarchical clustering methods. These decisions have a magnificent impact on the final clustering. When faced with arbitrary decisions and a wrong choice, there is no possibility of reversing and correcting wrong choices.
These algorithms are greedy and hence cannot explore the problem space well. Several previous studies [9–11] have shown these methods do not perform well in software clustering. On the contrary, there is no hierarchical clustering algorithm that proposes cut points from different levels of the dendrogram.
There are also nonhierarchical modularization methods based on search-based approaches which explore solution space by global search or local search algorithms. But these methods do not give the developer a vision about upper-level relationships between modules.
In the literature, because of the NP hardness of clustering problem, search-based methods (such as genetic algorithm) have been widely used [8, 12]. Because of their exploration and exploitation ability, they are an effective way to solve the clustering problem [13]. Currently, search-based works on software refactoring with remodularization approaches are in the flat mode (i.e., nonhierarchical methods) and do not offer appropriate composing at higher levels.
1.1. The Problem
In this paper, we focus on a specific restructuring problem in the context of object-oriented and procedural programs: given an ADG constructed from an existing code, decompose it into smaller and meaningful modules that have a higher cohesion and lower coupling. Cohesion is defined as “the degree to which the internal contents of a module are related” [1]. Our method supports “big-bang” remodularization; i.e., all the artifacts of the software system are considered for remodularization.
The main problem addressed in this paper is to suggest a possible hierarchical remodularization for a source code, while keeping that accurate in terms of proximity to (human) expert decomposition. In this paper, a hierarchical top-down clustering algorithm is proposed to structurally refactor the source code from its artifact dependency graph (ADG) with a branch and bound approach. The aim is to find the appropriate composition tree and recommend the lowest appropriate levels to merge artifacts as a module. It, therefore, will be easier for the developer to recognize the position of the different levels, such as files, packages, or components. In the proposed method, a genetic algorithm (GA) along with a neighboring search algorithm is designed to search in trees of the composition of artifacts. The proposed algorithm is evaluated on seven folders of Mozilla Firefox and five other open-source systems. The results indicate that the method is able to propose an acceptable refactoring by hierarchical remodularization of artifacts, by giving a vision about high-level relation between modules for developers.
1.2. Contribution
The contributions of this paper are summarized as follows:
Proposing a new software refactoring method with a top-down hierarchical modularization technique. The output of the algorithm is a tree generated from source code which helps software maintainer to have better vision on source code structure to decide appropriate composition points of artifacts aiming to create modules (i.e., files, packages, and components). It is important to note that, in the literature, there exists no search-based algorithm that employs a hierarchical approach for modularization.
Prufer sequence is utilized in GA for encoding tree. Existing encoding methods used in software modularization are real-based (e.g., BUNCH [5], ECA [12], and SGA [14]) or permutation-based (e.g., DAGC [15] and E-CDGM [16]), in which these methods show only a flat modularization.
A new objective function is proposed to evaluate hierarchical remodularization.
The rest of paper is as follows: in Section 2, some research studies on software refactoring are discussed; Section 3 introduces the proposed algorithm, and in Section 4, experimental results are presented. The result of research and threats to validity are discussed in Sections 4 and 5, respectively. Finally, Section 6 is conclusions of this research and future work.
2. Related Work
After publishing Fowler’s book [3] on software source code refactoring, many studies have been done to refine the concepts of this reference, as well as an automated solution for detecting and repairing code smells, e.g., [17–20].
Remodularization of source code artifacts is an approach for structural refactoring. Due to the large space of the solution space for modularization, many search-based research studies have been done. In Bunch algorithm [5, 7, 21], a GA, namely, Bunch-GA, and two hill-climbing algorithms, namely, Bunch-NAHC and Bunch-SAHC, are utilized to search in solution space. In this algorithm, the space size of solutions is nn (n is the number of artifacts), in which most of them represent the same modularization. Parsa and Bushehiran introduced DAGC coding [15] to solve this problem, which reduces the space of states to n!. Tajgardan et al. [22] presented an algorithm based on estimation of distribution algorithm (EDA) which does not have the challenge of specifying the parameters of GA algorithms. Izadkhah et al. [16] presented E-CDGM method that at first converts the source code to an intermediate code called mCode from call dependency graph (CDG) and then proposes a modularization with a fitness function (using class-property, class-method, and method-method relations) and self-automata algorithm and DAGC encoding. Amarjeet et al. [23] presented the MaABC algorithm for software modularization which is a multiobjective optimization method using the bee population algorithm. They also presented PSOMC [24], a PSO-based module clustering, which partitions software system by optimizing intracluster dependency, intercluster dependency, number of clusters, and number of modules per cluster.
Recent research on multiobjective search methods has expanded. Praditwong et al. [12] presented two equal-size cluster approaches (ECA) and the maximizing cluster approach (MCA) for software modularization using a multiobjective genetic algorithm and Pareto optimality. Harman and Tratt [25] also had used Pareto optimality to combine two metrics: CBO [26] and a new metric called SDMPC. Seng et al. [27] proposed a GA-based approach to suggest refactorings by a fitness function composing of coupling, cohesion, complexity, and stability. Kebir et al. [28] presented a genetic algorithm-based approach, which consists of detecting component-relevant code smells and eliminating these code smells by searching for the best sequence of refactorings using a genetic algorithm. In [29], Kumari and Srinivas proposed MHypEA (multiobjective hyperheuristic evolutionary algorithm) to suggest software module clusters while maximizing cohesion and minimizing coupling of the software modules. It is based on different methods of selection, crossover, and mutation operations of evolutionary algorithms, and the selection mechanism to select a low-level heuristic is based on reinforcement learning with adaptive weights.
In [30], Huang and Liu introduced a new objective function called MS to automatically guide optimization algorithms to find a good partition of software systems which consider both global modules and edge directions. Then, three modularization algorithms named HC-SMCP, GA-SMCP, and MAEA-SMCP are proposed in this paper which are adopted to optimize MS for software systems.
Bavota et al. have some researches on refactoring. In [31], a new technique is proposed for automatic remodularization of packages, which use structural and semantic measures to decompose a package into smaller, more cohesive ones. The results showed that the decomposed packages have better cohesion without deterioration of coupling, and the remodularization proposed by the tool is also meaningful from a functional point of view. In [32], they introduced a tool called R3 that automatically analyzed the underlying latent topics inferred from identifiers, comments, and string literals in the source code classes as well as structural dependencies among these classes. They presented [33] a method for extract class refactoring based on three SSM [34], CDM [35], and CSM [36] structural and semantic factors that strongly increase the cohesion of the refactored classes without leading to significant increase in terms of coupling. In [37], they proposed a technique based on relational topic models to identify MMR opportunities.
Maletic and Marcus [38] proposed an algorithm which uses semantic and structural data to propose refactoring decisions. In [39], Palomba et al. presented a technique, called TACO (textual analysis for code smell detection), that exploits textual analysis to detect a family of smells of different natures and different levels of granularity.
Jalali et al. [8] proposed a new multiobjective fitness function for modularization, named MOF, which uses the structural and nonstructural features with EoD algorithm. In [40], a new deterministic clustering algorithm named neighborhood tree algorithm is presented which creates a neighborhood tree using available knowledge in an ADG. Mahouachi [41] proposed a method which used NSGA-II [42] to find the best sequence of refactorings that maximize structural quality, maximize semantic cohesiveness of packages, and minimize the refactoring effort that is able to produce a coherent and useful sequence of recommended refactorings both in terms of quality metrics and from the developer’s points of view. Ouni et al. [43] proposed a new refactoring recommendation, called MORE, to improve design quality and fix code smells using NSGA-III [42]. Dallal [44] introduced a measure to precisely predict whether a class includes methods in need of MMR. Me et al. [45] presented a new mathematical programming model for the software remodularization problem with a novel metric based on the principle of complexity balance and a hybrid genetic algorithm (HGA).
Kargar et. al have some research studies on the remodularization of multiprogramming language software systems. In [14], they have presented two dependency graphs called semantic dependency graph (SDG) and nominal similarity graph (NSG). Both of these graphs are constructed independently of programming languages syntax. The SDG is constructed based on all nouns of the source code, and the NSG is constructed based on the similarity between artifact names. Then, in [46], they proposed a genetic algorithm to modularize programs by combining the constructed dependency graphs (i.e., call dependency graph, semantic dependency graph, and nominal similarity graph).
In summary, search-based algorithms are described in three aspects. One aspect is the scope of the search (local strategy and global strategy). Some algorithms are based on local search strategy, and the result may not be the optimal solution. Global search techniques always aim to find good solutions. Single objective or multiobjective is another grouping for search algorithms. In multiobjective algorithms, there are multiple functions or metrics aiming to guide the search process. The last aspect is to use semantic features vs structured features for clustering. In semantic search optimizations, lexical analysis or latent semantic analysis (or both) is considered in search progress. In structural features, the function call between two artifacts, inheritance, etc. is considered for clustering. Some search-based clustering algorithms are shown in Table 1.
LS : local search; GS : global search; SO : single objective; MO : multiobjective; S : structural; Se : semantic.
In the hierarchical methods, all the artifacts are initially considered as units of modularization, and during a repetitive process, the more similar modules are merged to create a new module. Single-linkage, complete-linkage, and average-linkage algorithms are most common hierarchical clustering algorithms which Maqbool et al. adapted to modularize source codes [59]. Kuhn et al. proposed a new algorithm using the average linkage that used nonstructural features for modularization [60]. The authors of this paper have used program code property attributes and variables’ naming for communication recognition, which makes the output of the algorithm dependent on the level of knowledge of developers in inserting descriptions and naming variables. Andritsos and Tzerpos introduced a method called LIMBO [61] as a hierarchical algorithm combining structural and nonstructural information. This algorithm is a hierarchical sampling algorithm based on minimizing the loss of information during the modularization of a software system. Rathee et al. [62] proposed a new hierarchical technique of software remodularization by estimating conceptual similarity among software artifacts that uses both structural and semantic coupling measurements together to get much more accurate coupling measures. They also presented a new weighted dependency measurement scheme in which combined structural, conceptual, and change history-based relations are among software elements together.
In addition to the search-based and hierarchical methods discussed above, there are a number of graph-based and pattern-based methods. Mohammadi and Izadkhah in [40] use a neighboring tree generated from the ADG to cluster a software system. The clustering quality obtained by this algorithm is better than hierarchical methods and less than evolutionary methods. Spectral methods [63] use algebraic properties of the graph, such as eigenvalues and eigenvectors in the corresponding Laplacian matrix to perform clustering. Algorithm for comprehension-driven clustering (ACDC) [64] is a pattern-based algorithm that was introduced by Tzerpos and Holt. It uses several patterns to cluster code artifacts.
2.1. Gaps in the Literature
Using hierarchical property is not practically new and has been used for many years in the remodularization field, but there is no previous research using the hierarchical property with an evolutionary approach for remodularization. Due to the NP hardness of the modularization problem, most modularization methods utilize search-based clustering methods and evolutionary algorithms [8, 12]. These clustering algorithms show only a flat modularization of a program. Therefore, these algorithms cannot represent the hierarchy properties of a program, so there is no way to specify the encapsulation levels, e.g., module, package, and component, in it by the designer.
3. The Proposed Clustering Algorithm
Most of the work on remodularization is based on clustering techniques [31]. Hierarchical clustering algorithms proposed up to now are greedy algorithms and have arbitrary decisions that may lead to undesired results. On the contrary, these algorithms do not recommend an appropriate cutpoint in the dendrogram or modularization point from different levels of it. In this section, a new clustering algorithm with a hierarchical approach is proposed for source code remodularization which does not have these problems. To this end, we design a genetic algorithm with a new encoding and fitness function. The encoding presented is utilized to construct a tree from source code’s artifacts and the fitness function with a branch and bound approach is applied to determine appropriate levels in the constructed tree, which result can be a qualified modularization. To improve the quality of the resulting modularization, we also designed a hill-climbing algorithm. This local search algorithm will be applied on the outcome of the genetic algorithm for a neighboring search. The algorithm’s input is an ADG constructed from source code, and its output is a modularization suggested for software maintainer. Our method supports “big-bang” remodularization; i.e., all the artifacts of the software system are considered to perform modularization, and the current structure (modularization) will not be considered.
We consider classes and files are the smallest composing unit as an artifact to perform modularization in object-oriented and structured software systems, respectively. These parts are combined in larger modules such as packages or components in which members of each module are contributing to other parts of that module for a single responsibility. Hence, it is important to have proper upper-level compositions. We, also, consider call dependency to create a dependency between two artifacts, i.e., edges, in the ADG. Some artifacts that just are called by other artifacts are utility classes or files. So, they can be removed at the beginning and address them after completion of the algorithm. For each one, if all calls are from one module, this artifact will also be added to that module. But if it was used by multiple modules, it is considered as a utility.
To design a genetic-based algorithm, five features encoding (chromosomal representation), fitness function (evaluation), selection, crossover, and mutation must be described.
3.1. Encoding
A chromosome in GA is a parameter collection that represents a solution to the problem. The aim of GA is to find a chromosome with an optimal or near-optimal solution. These parameters can be a binary string or any other data structure. In this paper, the Prufer sequence [65] is employed to encode the tree to a sequence of numbers as a chromosome. Prufer sequence is a one-to-one mapping between a sequence of numbers and a labeled tree. The steps of constructing Prufer numbers for a tree are shown as Algorithm 1. Let p1,p2,p3,…,pn−2 denote Prufer sequence. The corresponding tree of a Prufer sequence is constructed as Algorithm 2.
Algorithm 1: Pseudocode of encode a labeled tree to Prufer sequence.
Input: a labeled Tree as T(V, E)
Output: a Prufer sequence
Comment: the nodes of T are labeled from 1 to size of V which vi represents a node with label i.
PruferSequence ← Empty List
while V.size is greater than 2 do
vi ← the leaf in V with the smallest number
j ← label of node vi connected to.
Remove vi from V and (vi, vj) from E
Add j to PruferSequence
end while
return PruferSequence
Algorithm 2: Pseudocode of convert Prufer sequence to correspond tree (decoding process).
Input: a Prufer sequence as P
Output: a labeled tree
degrees ← a list with size P.size + 2 filled by 1
for i ∈ P do
degrees[i] ← degrees[i] + 1
end for
# Now we know that node i has degree degrees[i] in the tree
i ← 1
T ← an empty list for tree edges
while at least three items in degrees are nonzero do
v ← first smallest item in degrees
degrees[i] ← degrees[i] − 1
Add (ai, v) to T
i ← i + 1
end while
a, b ← nonzero items in degrees
Add (a, b) to T
return T
return PruferSequence
For example, the Prufer sequence for the tree in Figure 5 is 2,1,3,3,1 and vice versa. To encode the tree to a Prufer sequence, the node with label 4 (as a leaf node with the smallest number) is removed and number 2 is added to the sequence. Then, the node labeled 2 is removed and number 1 is added to the sequence. In the two next steps, the nodes 5 and 6 are removed and number 3 is added to the sequence twice. In the final step, node 3 is removed and number 1 is added to the sequence as the last number of the sequence.
Prufer sequence examples: decoded tree for 2,1,3,3,1 as a Prufer sequence.
In the proposed method, trees are binary tree, and Prufer sequences follow the following rules:
The trees always have n leaves numbered from 1 to n for artifacts and n−1 inner nodes numbered from n+1 to 2n−1.
All the artifacts are in the leaves of the tree in which degree is one. Hence, numbers 1 to n do not appear in the corresponding Prufer sequence.
The root of the tree (node number 2n−1) is in degree 2, and according to rules of creating the Prufer sequence, it appears only once in the sequence.
All inner nodes except root are in degree 3 (attached to their parent nodes and have two children nodes) and appear twice in the sequence.
Hence, each sequence of numbers n+1 to 2n−1 which contains n+1 to 2n−2 two times and number 2n−1 has appeared once and represents a hierarchical modularization tree in this algorithm. Figure 6 shows corresponding hierarchical modularization tree for 8,6,6,8,7,9,7 Prufer sequence.
Correspond hierarchical modularization tree for 8,6,6,8,7,9,7 Prufer sequence.
3.2. Evaluation
Each chromosome in the population of a GA should be evaluated to determine the quality of solutions. In the following, we propose a new quality function to evaluate the chromosomes. In the proposed quality function, the fitness of a chromosome is calculated by using the dependencies between modules extracted from the corresponding tree of the chromosome. Let μi, λi, and ɛi represent the number of connections between the artifacts inside the node (module), the number of connections with the artifacts in the sibling node, and the number of connections with other artifacts, respectively. The fitness of node (i.e., a module) is calculated by exCF in the following equation:(1)exCFiμi,λi,ɛi=0,if μi+λi+ɛi=0,−1,if λi<ɛi,2μi2μi+λi+ɛi,otherwise.
This relation aims to increase cohesion in a module and reduce coupling with other modules. But coupling is separated into two types of sibling coupling (λi) and external coupling (ɛi). When external relation is more than relations to the sibling node in the tree, this module (regardless of cohesion) is not in proper position and should be scored with a negative value. When ɛi is greater than λi, connections with artifacts in the sibling node is less than connections with the other artifacts and this shows that the artifact is not in appropriate position and we should give a penalty to total score by assigning -1 this node. Algorithm 3 shows the pseudocode of the evaluation part of this customized genetic algorithm. To evaluate the tree and propose modularization according to the structure of it, the tree is traversed by the breadth-first search (BFS) algorithm from the root. In traversing, if the sum of exCF for two child nodes is greater than or equal to the exCF of that node, they will be added to the process queue. If not, this node is the lowest appropriate position to compose artifacts in leaves of that subtree, as a module. When a node is partitioned into two child nodes, if λi<ɛi for one of the child nodes, that node cannot be part of the tree because its outer relation is more than inner relations with its sibling node. In this case, exCF is equal to -1, and child nodes will not be added to the BFS process queue. The total fitness of the tree, exTMQ, is calculated by (2), where K is the set of all nodes in which their children (if exists) did not proceed:(2)exTMQ=∑i∈KexCFi.
Algorithm 3: Pseudocode of chromosome evaluation.
Input: a chromosome as ch
Output: update ch.fitness
ch.fitness = 0
tree ← decode the chromosome ch to tree
q ← Queue()
tree.root.cf = 1
q.push(tree.root)
while q is not empty do
parent ← q.pop()
child1, child2 ← t.children()
child1.cf ← exCF(child1, parent)
child2.cf ← exCF(child2, parent)
if child1.cf + child2.cf ≥ parent.cf then
q.push(child1)
q.push(child2)
else
ch.fitness ← ch.fitness + parent.cf
end if
end while
Figure 7 shows an example of a tree evaluating in this algorithm. This tree has 55 nodes (28 nodes for artifacts in the leaves and 27 inner nodes numbered from 28 to 53) in which numbers in parentheses are exCF for each node. When the evaluation starts, the nodes number 37 and 42 are added to process queue because the sum of their exCF (i.e., 0.79+0.62) is greater than the exCF of the parent node 55 (i.e., 1). This tree is traversed until the nodes in the set 29,42,47,52 (colored in grey). Each of these nodes contains all artifacts in the leaves of its subtree and is the first position proposed by the algorithm to create modules. Their child nodes did not add to the BFS queue because the sum of exCF of sibling nodes is not greater than or equal to the parents exCF.
An example of applying branch and bound technique on a modularization tree (see Algorithm 2).3.3. Genetic Operations.
The three operations of GA for this algorithm are described as follows:
Selection: to select the next generation of the population in GA generations, the classic roulette wheel selection operator is used in the proposed algorithm.
Crossover: cycle crossover operation (CX) [66] is selected for this algorithm, which finds a genes cycle between two parents and swaps other genes. If C11,C12,C13,…,C1n and C21,C22,C23,…,C2n represents two parents, at first, one random position i is selected. If C1i and C2i are different, one of the locations of value C2i in the first parent is selected, and this new position is added to the selected position lists. These selections continue until selecting a position like j in which C2j is C1i. When finished, the values of the selected positions in the first parent are a permutation of values in the same positions of the second parent. Finally, the values of all other positions (unselected) swapped between two parents. Figure 8 shows an example of the crossover operation. In this example, the first position is selected randomly and then third and fourth positions are added to the selection list, respectively, to create a cycle. Values 6, 9, and 8 in the first parent are a permutation of 8, 6, and 9 in the second. In the last step, values in the other positions are swapped with the corresponding position in the other chromosome.
The output of CX is a permutation of the input. Hence, it does not disrupt the rules mentioned in Encoding section. However the structure of the tree (relationships between nodes) will be changed.
Mutation: single swap operation is used for mutation of a chromosome in which the value of two random positions in the sequence is swapped. Figure 9 shows an example of single swap operation on a Prufer sequence. This change creates a new binary tree.
An example for crossover operation of TDHC.
An example for mutation operation of TDHC.
3.3. Neighboring Search
A genetic algorithm is a global search. To improve the resulting modularization quality at the last step of the GA, we design a hill-climbing local search strategy. The designed local search algorithm tries to produce a neighboring modularization for the resulting modularization with better quality. This operation is continued until no better modularization can be found. We used the steepest ascent strategy for searching neighboring modularizations. In this strategy, all neighboring modularizations for a specific modularization are generated, and then among them, the highest quality modularization is selected as the neighbor of the current modularization and replaces it. This operation for the new modularization is continued until no better modularization can be found. How to define a neighborhood is very important in the climbing algorithm. Depending on the type of problem, it is necessary to define the appropriate neighborhood with it.
3.4. Definition: Neighbor of a Modularization
Let M and M′ be two modularizations from an ADG. Modularization M′ is called a neighbor of modularization M if an artifact into module i in modularization M is moved to module j. In fact, two modularizations are called neighbors if they differ only in the position of a node. Let G=V,E be a dependency graph, where V=v1,v2,…,vn represents artifacts and E⊆V×V=vi,vj represents dependency between artifacts. For example, Figure 10 depicts a sample modularization and Figure 11 shows a neighbor modularization for that. The formal definition of this concept is as follows.
An obtained modularization.
A neighbor for modularization Figure 10.
Let M=m1,m2,…,mk represents the k modules obtained for graph G such that ∀i,j∈1,…,k,mi⊆V,∪mi=V,∀i≠j,mi∩mj=∅,mi≠∅. In M=m1,m2,…,mk, let us take a node v such that v∈mi. The neighbor M′=m1,…,mi′,mj′,…,mk is created such that mi′=mi\v and mj′=mj∪v, where mj (i≠j) is a module with at least one relation to v. Now, M′ is better than M if exTMQ(M′) > exTMQ(M).
In the following, we compute the time complexity of the algorithm. Let N, P, and G represent the number of artifacts, population size, and the number of generations, respectively. We have the following:
To initiate the population, a chromosome with length 2N−1 is generated in which all numbers between 1 and N−1 are repeated twice and one N. Then, for each chromosome, a shuffle (replacing each genome with a random one) is applied on it to generate a new random chromosome. So, the order of this step is O2N−1×P=ONP.
To evaluate the chromosome, the data are converted to a tree in ONlogN, and then, the tree is explored in ON. Hence, the order of evaluation is ONlogN+N=ONlogN.
Selection step with roulette wheel is in order ON.
The crossover for each pair will be in ON, and the mutation is a simple swap in order O1. So, this step for whole population will be in order OPN+1=OPN.
Steps 2–4 will be repeated G times. Hence, the total order is ONP+GNlogN+GN+GPN. In this paper, P is 300N. So, the order is ON2+GNlogN+GN+GN2=OGN2.
In the last step, a NAHC algorithm is applied to search in neighbors for better solution. Each solution will have at most N/2 clusters, and each iteration of NAHC will cost ON×n/2=ON2. So, for M iteration, it will be OMN2.
According to the paragraphs above, the total order is OGN2+MN2, but, in practice, M is a small number and the total order can be explained by OGN2.
4. Experimental Setup
In this section, we outline in detail the experimental setup we carried out to empirically assess the proposed clustering algorithm.
4.1. Case Study
Mozilla Firefox, a web browser, is a large-scale and open-source application developed by the Mozilla Foundation and its subsidiary Mozilla Corporation. Based on open hub (http://www.openhub.net) report, this application is the most popular project among other open-source applications, and this application has the largest development teams in the world, more than 13000 developers. We select the Mozilla Firefox 3.7, a developer preview version, for the experiments (https://ftp.mozilla.org/pub/). This version is stable and has approximately five million lines of code. Seven folders with different sizes and functionalities are chosen from this software system. Details of these folders are listed in Table 2. Besides, five medium-size open-source software has been chosen whose details are given in Table 3. In all experiments, the file is considered as an artifact.
Properties of selected folders.
Folder name
Number of files
Number of links
Number of modules
Folder functionality
ACCESSIBLE
179
293
8
Enabling as many people as possible to use web sites, even when those people’s abilities are limited in some way; files for accessibility (i.e., MSAA (Microsoft Active Accessibility) and ATK (accessibility toolkit, used by GTK + 2) support files)
BROWSER
45
45
4
Contains the front-end code (in XUL, Javascript, XBL, and C++) for the Firefox browser contains the front-end code for the DevTools (scratchpad and style editor); contains images and CSS files to skin the browser for each OS (Linux, Mac, and Windows)
DOM
163
324
5
IDL definitions of the interfaces defined by the DOM specifications as the parts of the connection between JavaScript and the implementations of DOM objects implementations of a few of the core “DOM level 0” objects, such as window, window.navigator, and window.location
EXTENSIONS
179
206
13
Contains several extensions to Mozilla, which can be enabled at compile-time implementation of the negotiate auth method for HTTP and other protocols; has code for SSPI and GSSAPI; content- and locale-pack switching user interface permissions backend for cookies, images, etc., as well as the user interface to these permissions and other cookie features; support for the datetime protocol; support for the finger protocol; a two-way bridge between the CLR/.NET/Mono/C#/etc. world and XPCOM implementation of W3C’s platform for privacy preferences standard; support for implementing XPCOM components in python; support for accessing SQL databases from XUL applications; support for webservices
GFX
342
644
7
Contains interfaces that abstract the capabilities of platform specific graphics toolkits, along with implementations on various platforms these interfaces provide methods for things like drawing images, text, and basic shapes; it also contains basic data structures such as points and rectangles used here and in other parts of Mozilla
INTL
573
957
7
Internationalization and localization support; code for “sniffing” the character encoding of web pages code for dealing with complex text layout, related to shaping of south Asian languages code related to determination of locale information from the operating environment code that converts (both ways: encoders and decoders) between UTF-16 and many other character encodings code related to implementation of various algorithms for unicode text, such as case conversion
IPC
391
59
4
Container for implementations of IPC (interprocess communication)
Systems used in the case study.
System
Description
Artifacts type
# of artifacts
# of edges
Mini-Tunis
Academic operating system
C file
20
28
JUnit 4
Unit testing framework
Java classes
23
32
Servlet-API
Java server API
Java classes
32
24
Easy mock
Dynamic mock object generator
Java classes
84
118
Calculator
Microsoft calculator
C++ files
210
343
The authoritative decomposition (domain expert decomposition or ground-truth structure) is utilized to evaluate the soundness of a remodularization algorithm [67]. The proximity of the remodularization generated by an algorithm to the decomposition given by a domain expert shows the acceptable achievement of the remodularization algorithm [67]. Like in [14, 67], we use the directory structure to prepare an expert decomposition from source code. In this paper, we used Mozilla Firefox and five other software systems, whose authoritative decomposition (i.e., directory structure) is there to assess the proposed algorithm. For example, the “extensions” folder has 179 files that have been assigned by Mozilla Firefox developers to 13 subfolders (package). Using a designed tool, we merged the files in the different folders in a single folder F^, aiming to consider these 179 as flat. After modularizing the flatted files F^, the aims to measure how much modularization achieved by the proposed algorithm will be similar to the directory structure implemented by Mozilla Firefox developers. In other words, the proposed algorithm is applied to the F^ in order to reconstruct (or improve) the original structure.
4.2. Research Questions
To evaluate the effectiveness of TDHC, we answer the following research questions:
RQ1. Does the proposed clustering approach produce modularization having a better precision, recall, F-measure, MoJo, and MoJoFM compared to existing approaches?
RQ2. Is TDHC a stable algorithm?
RQ3. By using TDHC, can we give better view of hierarchical modularization?
To answer these research questions, five software systems and the seven folders of Mozilla Firefox are remodularized by the proposed clustering algorithm and some other available clustering algorithms.
4.3. Algorithmic Parameters
The setting of parameters is necessary for search-based algorithms. We obtained the implementations of five of the selected clustering techniques—ACDC (https://wiki.eecs.yorku.ca/project/cluster/protected:acdc), Bunch (https://www.cs.drexel.edu/spiros/bunch/) (SAHC and GA), SGA and SNDGA (https://github.com/Masoud-Kargar-QIAU), and EoD—from their original authors or official web sites. On the contrary, we got the working implementations of DAGC, ECA, and MCA from https://github.com/Masoud-Kargar-QIAU.
Values of crossover and mutation rates affect exploration and exploitation of solution space during the evolutionary process. Adding one extra artifact to the input of this problem will add two genes to chromosomes. Hence, the problem space grows exponentially. So the crossover and mutation rates are set dynamically based on population to cover the solution space better. Crossover rate usually is selected as a number more than 0.7, and the mutation rate is usually very low. In this research, the numbers 0.7 and 0.9 are selected as boundaries to crossover with linear steps. Because mutation steps are with log, it should not increase much. Table 4 shows the parameters setting for TDHC, in which N is the number of artifacts after the preprocessing operation. For the TDHC, we followed the algorithmic parameters setting used in [12, 30]. Algorithmic parameters are dependent on the number of artifacts (N).
As in [8, 12, 14], to reduce randomness in the results of our experiments, we collect the average and best of 30 independent runs. To perform a fair comparison, the average of runs is used, and to determine the performance of an algorithm, the best value of runs is utilized.
4.4. Assessment of Results
The comparison has been performed by comparing modules in the leaves of solution tree by modules in the source code (which is developed by the expert team) using precision/recall [4] and MoJoFM [68] and F-measure [4] metrics. The precision/recall metric is used to compare the modularization obtained by the proposed algorithm against expert modularization by (3) in which TP (true positive) is the number of comodules that are relevant (appeared in the original modularization) and were retrieved correctly by the algorithm, FP (false positive) is the number of comodules that are irrelevant but were retrieved, and FN (false negative) is the number of comodules that are relevant but were not retrieved. F-Measure is defined as the harmonic mean of the precision and recall (4). A high value for precision/recall and F-measure shows more similarity between two modularizations:(3)precision=TPTP+FP,recall=TPTP+FN,(4)F−measure=21/precision+1/recall=2×precision×recallprecision+recall.
Let mno denotes the number of move or join operations in which one modularization can be transformed to another. The MoJoFM between extracted modularization F^ and original modularization F is calculated with the relationship shown in (5). A high value for MoJoFM shows more similarity between two modularizations:(5)MoJoFMF^,F=1−mnoF^,Fmaxmno∀F^,F×100%.
To compare the overall results of TDHC against other tested algorithms in terms of precision/recall, F-measure, and MoJoFM, we utilized a nonparametric effect size statistic, namely, Cliff’s δ which is used to quantify the amount of difference between two algorithms.
With having different results of algorithms on different criteria, and considering all criteria, deciding which algorithm performs well is not easy. In such circumstances, multicriteria decision-making (MCDM) can be utilized [69]. This technique measures the performance of various algorithms and assigns to each algorithm a value between zero and one, where zero indicates the weakest performance and one indicates the best performance. To this end, let n and m denote the number of algorithms and the number of criteria, respectively. A matrix, called Xn×m, is created, and then based on entropy, the efficiency of each algorithm is calculated. Algorithm 4 shows these steps.
Algorithm 4: MCDM steps.
Input: n: number of algorithms
m: number of criteria
X: a matrix with n × m dimension and xij is amount of algorithm i for criterion j
Step 1: pij=xij/∑p=1nxpj
Step 2: calculate the entropy value, Hj, for criterion jHj=−k∑i=1npijlnpij and k=i/logn
Step 3: calculate the importance and weight of each criterion dj=1−Hj,Wj=dj/∑j=1ndj
Step 4: calculate the maximum and minimum vector of each criterion A+=maxpi1,maxpi2,…,maxpim, for 1≤i≤n, A+=p1+,p2+,…,pm+, A−=minpi1,minpi2,…,minpim,for 1≤i≤n, A−=p1−,p2−,…,pm−
Step 5: calculating a positive and negative ideal distance from reality dij+=pj+−pij,dij−=pij−pj−
Step 6: calculating a positive and negative ideal distance for each algorithm di+=∑j=1mWjdij+21/2, di−=∑j=1mWjdij−21/2
Step 7: calculate the efficiency of each algorithm Ei=di+/di−+di−
Step 8: select the best algorithm k=jEj=max1≤i≤nEj
5. Empirical Study Results
To compare and evaluate the proposed algorithm, five software systems with different domains and sizes have been selected. Also, seven folders with different functionalities have been selected from the Mozilla Firefox application.
To answer the research question RQ1, for comparison, in this paper, nine search-based algorithms with different characteristics including single objective, multiobjective, global search, local search, structured-based methods, and semantic-based approaches are chosen. The algorithms selected are Bunch-GA, DAGC, ECA, MCA, Bunch-SAHC, SGA, GA-SMCP, EoD, and SNDGA. The characteristics of these algorithms are described in Table 5. We, also, selected ACDC as a pattern-based algorithm for comparison. Several previous studies [9–11] have shown that ACDC routinely outperformed the others. Because ACDC is a pattern-based method, it produces the same clustering each time it is repeated, so the best and average results are always the same.
Features of selected algorithms search-based algorithms for comparison with the proposed algorithm.
Algorithm
# of objective used
Search type
Structural-based/semantic-based
Encoding type
Reference
Bunch-GA
Single objective
Global (GA)
Structural
Value-based
[5]
DAGC
Single objective
Global (GA)
Structural
Permutation-based
[15]
ECA
Multiobjective
Global (two-archive GA)
Structural
Value-based
[12]
MCA
Multiobjective
Global (two-archive GA)
Structural
Value-based
[12]
Bunch-SAHC
Single objective
Local (hill climbing)
Structural
Value-based
[7, 21]
SGA
Single objective
Global (GA)
Semantic
Value-based
[14]
GA-SMCP
Single objective
Global (GA)
Structural
Value-based
[30]
EoD
Multiobjective
Global (estimation of distribution)
Semantic and structural
Value-based
[8]
SNDGA
Single objective
Global (GA)
Semantic, nominal, and structural
Value-based
[46]
The best and average results of TDHC on seven folders of Firefox folders and five other software systems are compared with the results of selected state-of-the-art algorithms with different features in terms of precision, recall, F-measure, and MoJoFM. The details are reported in Tables 6–9.
Algorithms comparison in terms of MoJoFM.
Folder name
Bunch-GA
ECA
MCA
DAGC
Bunch-SAHC
SGA
GA-SMCP
EoD
ACDC
SNDGA
TDHC
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Browser
70
65
60
52
72
68
45
36
52
48
55
50
70
65
52
49
66
66
68
64
80
78
Dom
58
55
53
48
54
46
26
25
39
36
45
42
58
55
58
55
83
83
77
75
74
75
Accessible
42
39
37
36
40
36
27
24
27
26
55
53
38
36
42
39
42
42
65
61
78
54
Extensions
50
46
59
46
53
44
22
21
28
25
35
33
48
46
51
46
77
77
65
62
84
71
Gfx
54
50
61
54
67
57
29
29
42
41
57
55
54
50
60
57
73
73
84
81
86
75
Ipc
81
81
81
81
81
81
40
38
81
81
81
80
80
79
80
79
61
61
91
90
67
70
Intl
80
76
75
75
84
81
41
39
75
75
78
76
71
68
75
71
92
92
86
80
89
79
Mini-Tunis
80
78
80
76
80
73
71
69
80
71
80
78
59
51
80
73
46
46
80
75
84
67
JUnit 4
60
58
60
58
60
56
59
51
60
58
60
59
60
57
67
65
63
63
65
60
60
51
Servlet-API
90
88
91
90
88
85
73
68
78
75
71
61
83
80
90
84
89
89
80
71
93
90
Easy mock
78
73
75
69
71
60
61
53
63
51
65
54
78
69
78
73
67
67
60
49
87
91
Calculator
40
37
41
38
41
35
33
31
38
31
41
38
40
37
41
39
44
44
62
54
85
79
Algorithms comparison in terms of precision.
Folder name
Bunch-GA
ECA
MCA
DAGC
Bunch-SAHC
SGA
GA-SMCP
EoD
ACDC
SNDGA
TDHC
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Verage (%)
Browser
83
62
33
321
100
96
7
5
49
28
55
48
79
71
84
76
95
95
56
41
73
58
Dom
53
41
27
24
44
43
2
2
33
21
38
33
49
39
51
44
82
82
75
56
56
45
Accessible
27
2
23
14
85
62
3
2
17
13
45
45
32
20
26
22
23
23
44
36
42
32
Extensions
26
26
26
19
35
28
3
2
12
10
36
36
19
16
75
72
69
69
39
33
43
30
Gfx
25
23
26
21
65
61
0.95
0.88
20
15
33
25
21
19
26
20
60
60
51
40
53
50
Ipc
50
29
25
25
26
26
0.43
0.42
50
28
57
34
45
38
55
48
95
95
52
41
58
50
Intl
45
32
15
15
49
47
0.51
0.44
17
15
72
65
45
42
40
32
94
94
26
21
71
69
Mini-Tunis
81
78
81
79
81
78
69
63
81
72
68
65
81
78
81
80
35
35
71
54
84
55
JUnit 4
63
61
61
60
61
60
57
51
59
53
48
41
61
59
61
60
54
54
68
61
46
43
Servlet-API
50
47
51
49
83
80
48
45
48
41
48
42
50
47
50
49
100
100
48
43
55
72
Easy mock
41
36
65
61
60
53
38
30
35
28
41
35
41
33
45
38
57
57
45
40
41
38
Calculator
35
31
37
35
37
32
14
10
12
8
29
25
20
17
40
33
20
20
40
34
14
12
Algorithms comparison in terms of recall.
Folder name
Bunch-GA
ECA
MCA
DAGC
Bunch-SAHC
SGA
GA-SMCP
EoD
ACDC
SNDGA
TDHC
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Browser
65
55
51
46
56
51
70
54
42
36
35
32
55
51
61
57
10
10
56
52
60
49
Dom
43
40
40
37
38
35
48
44
29
29
31
30
41
33
46
39
42
42
71
67
52
71
Accessible
31
23
27
25
29
24
50
37
21
19
26
26
31
25
44
40
30
30
56
51
63
52
Extensions
35
30
51
37
39
27
36
31
20
17
18
17
30
24
28
22
98
98
48
45
50
49
Gfx
48
39
68
48
62
43
42
40
31
30
42
39
33
28
46
42
25
25
80
72
75
58
Ipc
70
69
70
69
69
68
63
62
68
68
72
72
70
65
69
63
6
6
84
81
73
85
Intl
74
72
62
61
75
72
60
60
60
59
71
70
58
44
76
71
94
94
82
74
84
63
Mini-Tunis
71
63
73
70
71
70
48
39
68
61
60
51
60
52
71
69
92
92
69
52
78
67
JUnit 4
48
41
48
45
48
41
28
25
31
22
45
39
38
32
53
50
49
49
50
43
34
31
Servlet-API
73
65
78
74
75
68
51
42
41
30
59
48
48
41
78
72
41
41
75
69
100
98
Easy mock
79
72
81
74
81
76
52
45
38
31
80
73
68
61
83
78
32
32
78
70
90
90
Calculator
48
40
55
43
61
50
28
19
21
12
50
42
39
35
55
44
52
52
55
49
89
80
Algorithms comparison in terms of F-measure.
Folder name
Bunch-GA
ECA
MCA
DAGC
Bunch-SAHC
SGA
GA-SMCP
EoD
ACDC
SNDGA
TDHC
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Best (%)
Average (%)
Browser
73
57
40
37
71
67
13
9
41
30
41
39
65
59
71
65
18
18
55
44
66
52
Dom
45
40
32
30
40
38
3
3
31
24
34
31
45
36
48
41
56
56
69
65
54
38
Accessible
27
21
24
20
36
32
6
4
18
15
41
39
32
22
32
28
26
26
45
40
51
41
Extensions
30
28
34
25
35
28
5
4
15
12
24
23
23
19
28
24
81
81
41
38
46
35
Gfx
33
29
37
29
63
41
18
2
24
20
35
30
26
23
33
27
35
35
60
51
62
51
Ipc
51
39
37
37
37
37
0.83
0.84
58
39
60
45
55
48
61
55
12
12
64
54
65
61
Intl
56
44
24
24
59
57
1
0.87
30
24
73
66
51
43
52
44
96
96
37
33
77
59
Mini-Tunis
76
70
77
74
76
75
57
48
74
66
64
57
69
62
76
74
51
51
70
53
81
60
JUnit 4
54
49
54
51
54
49
38
34
41
31
46
40
47
41
57
55
51
51
58
50
39
36
Servlet-API
59
55
62
59
79
74
49
43
44
35
53
45
49
44
61
58
58
58
59
54
71
83
Easy mock
54
48
72
67
69
62
44
36
36
29
54
47
51
43
58
51
41
41
51
51
56
53
Calculator
40
35
44
39
46
40
19
13
15
10
37
31
26
23
47
38
29
29
46
40
24
21
In Table 6, the TDHC has better performance in most cases, and the “dom” and “Intl” folders the ACDC algorithm has better results in best and average, respectively. Table 7 shows that, in terms of precision, MCA and ACDC have the best performance against other algorithms. In Table 8, the algorithms are compared in terms of recall in which TDHC has better performance in most cases. In Table 9, for F-measure, the TDHC and SNDGA perform almost the same.
From Tables 6–9, we conclude that DAGC, ECA, Bunch-SAHC, GHA, and GA-SMCP, compared to the other algorithms, systematically provide an extremely low precision/recall, F-measure, and MoJoFM. On the contrary, if we ignore the precision criterion, TDHC clearly seems to be among the best algorithms, always at the top. It often competes with ACDC, EoD, and SNDGA, which sometimes clearly outperform TDHC.
To exact and direct compare the results of the TDHC against other algorithms Cliff’s, δ is calculated for them which results are represented in Table 10. Cliff’s δ is a nonparametric effect size metric that quantifies the difference among two groups of observations (here TDHC against other tested algorithms). The result of this metric is in range −1 to 1, and higher value shows that results of the first group (here, TDHC) generally is better than the second group (other algorithms). To interpret, as in [10], the following magnitudes are used: negligible (δ<0.147), small (δ<0.33), medium (δ<0.474), and large (0.474≥δ). The results indicate that the values for MoJoFM, precision, recall, and F-measure of TDHC output are better than the other algorithms in general.
Cliff’s δ.
Metric name
Bunch
ECA
MCA
DAGC
Bunch-SAHC
SGA
GA-SMCP
EoD
ACDC
SNDGA
Best
Average
Best
Average
Best
Average
Best
Average
Best
Average
Best
Average
Best
Average
Best
Average
Best
Average
Best
Average
MoJoFM
0.57
0.37
0.60
0.43
0.56
0.39
0.93
0.90
0.76
0.60
0.78
0.60
0.76
0.56
0.60
0.45
0.50
0.27
0.39
0.22
Precision
0.22
0.26
0.42
0.31
−0.01
−0.24
0.51
0.53
0.44
0.41
0.51
0.29
0.24
0.20
0.07
−0.02
−0.14
−0.26
0.19
0.12
Recall
0.28
0.41
0.22
0.43
0.21
0.36
0.51
0.72
0.60
0.74
0.49
0.69
0.51
0.69
0.20
0.35
0.27
0.35
0.17
0.17
F-measure
0.29
0.25
0.44
0.31
0.10
−0.03
0.69
0.52
0.65
0.70
0.67
0.42
0.45
0.31
0.22
0.06
0.35
0.19
0.18
0.05
In addition to the above experiments, we use MCDM to compare the performance of the tested algorithms considering all criteria employed for experiments. Table 11 shows the modularization quality in TDHC is better than other tested algorithms in most cases with an acceptable difference. The numbers in Table 11 show the superiority of the algorithms. The proximity of the produced numbers to one indicates that the algorithm, in that case, performed better than the rest in most experiments and most criteria.
Multicriteria decision-making to compare the algorithms.
Browser
Dom
Accessible
Extensions
Gfx
Ipc
Intl
Mini-Tunis
JUnit 4
Servlet-API
Easy mock
Calculator
Bunch-GA
0.0334
0
0
0
0
0.2521
0
0
0.438
0
0
0
ECA
0
0
0
0
0
0
0
0
0
0
0.723
0.01
MCA
0.8641
0
0.1953
0
0.294
0
0
0.03
0
0.0113
0
0.01
DAGC
0.0481
0
0
0
0
0
0
0
0
0
0
0
Bunch-SAHC
0
0
0
0
0
0
0
0
0
0
0
0
SGA
0
0
0
0
0
0
0
0
0
0
0
0
GA-SMCP
0
0
0
0
0
0
0
0
0
0
0
0
EoD
0.0334
0
0
0.331
0
0
0
0
0.50
0
0
0.1
ACDC
0
0.826
0
0.661
0
0.010
1
0.327
0
0.190
0
0
SNDGA
0
0.731
0.15
0
0.498
0.81
0
0.041
0.552
0
0
0.127
TDHC
0.5210
0.125
1
0.431
0.391
0.81
0
0.985
0
0.691
0.821
0.802
To answer the research question RQ2, the genetic algorithm is a stochastic optimizer, and the results achieved may be different in each run. The results achieved by the algorithm for several independent runs are expected to be close enough to each other. Therefore, to answer RQ2, the proposed algorithm is executed 30 times for each case and the stability of the results is analyzed by the t-test statistical technique. To apply the t-test, the results are grouped into two groups with the same size, named G1 and G2, and then some descriptive and inferential statistics are extracted from them. According to [70], having 30 rows of data is enough to suppose that the distribution is normal. This is a critical condition to use the t-test for analyzing. But we also utilized the Wilcoxon-signed rank test [71] as a nonparametric statistical hypothesis test to check stability of the results without considering being in normal distribution.
The results are represented in Table 12. The three first columns show the average, the standard deviation, and the standard error between mean of the two groups, respectively, as descriptive statistics. The two last columns of the table show the output of the inferential statistics. Levene’s test is an inferential statistic for assessing the equality of variances for a variable calculated for two groups. If the p value (sig. column in the table) is greater than some significance level (0.05 in our tests), the null hypothesis of equal variances cannot be rejected. This is also true for the Wilcoxon-signed rank test. Two columns of Table 12 refer to the results of independent two-sample t-test with equal sample sizes and equal variances (according to results of Levene’s test) on two randomly separated groups of TDHC results, and the last two lines are for the results of Wilcoxon-signed rank test if the data are not in normal distribution. All the p values are greater than 0.05, which shows we cannot reject the null hypothesis of equal means. Hence, the results of the different tests are converging to an acceptable range.
t-test for the experiments.
Case study
Descriptive statistics
Inferential statistics
Mean
Standard deviation
Standard error between mean
Levene’s test
t-test
Wilcoxon-signed rank test
G1
G2
G1
G2
G1
G2
F
Sig.
T
Sig.
Z
Sig.
Firefox-browser
1.4714
1.5572
0.337
0.246
0.1472
0.1003
1.5044
0.2550
0.4702
0.651
−0.73
0.465
Firefox-dom
0.573
0.614
0.247
0.306
0.1175
0.1370
0.035
0.8571
0.281
0.8135
−1.753
0.08
Firefox-accessible
0.5228
0.5942
0.1275
0.1285
0.5702
0.0574
0.013
0.9670
0.882
0.4036
−1.483
0.138
Firefox-extensions
0.553
0.636
0.212
0.2391
0.0.096
0.103
0.268
0.6109
0.527
0.530
−0.674
0.5
Firefox-gfx
0.3344
0.2688
0.1468
0.1254
0.06456
0.05611
0.054
0.823
0.760
0.469
−1.461
0.144
Firefox-ipc
1.306
1.462
0.4081
0.436
0.179
0.195
0.348
0.572
0.521
0.5617
−0.135
0.893
Firefox-intl
0.347
0.319
0.364
0.1272
0.1628
0.0422
3.847
0.085
0.465
0.654
−1.084
0.279
Mini-Tunis
0.9726
1.0157
0.14256
0.14043
0.06376
0.06280
0.0181
0.014
0.674
0.643
−0.135
0.893
JUnit 4
0.7950
0.7923
0.00759
0.01831
0.00339
0.00819
0.4701
0.2481
0.496
0.770
−1.069
0.285
Servlet-API
2.5234
2.5134
0.05345
0.5871
0.02390
0.02626
1.007
0.3277
0.690
0.785
−0.73
0.465
Easy mock
1.9572
2.0360
0.29716
0.24379
0.13289
0.10903
0.745
0.127
0.953
0.659
−0.405
0.686
Calculator
2.3094
2.3210
0.16658
0.29373
0.07449
0.13136
0.814
0.378
0.182
0.941
−0.405
0.686
Mini-Tunis details.
Code
Name
Module
Code
Name
Module
20
Main
5
7
File
6
19
User
5
9
FileTable
6
16
State
5
4
Directory
6
2
Control
5
1
Computer
3
6
Family
5
14
Memory
3
8
FileIO
4
17
System
1
11
INode
4
15
Panic
1
12
INodeGlobals
4
5
Disk
2
13
INodeTable
4
18
Tty
2
10
FreeINode
4
3
Device
2
To answer the research question RQ3, the output of TDHC for Mini-Tunis is investigated. Mini-Tunis (mtunis) is an academic operating system with 20 artifacts numbered from 1 to 20 in Figure 12. Each artifact is a file, and all the artifacts with the same number in the parenthesis are in the same module [5]. According to the figure, artifact numbers 15 and 17 just are called by other artifacts and can be discarded in the preprocessing. Figure 13 shows the tree produced by the proposed algorithm for 18 remaining artifacts, and Figure 14 represents its corresponding hierarchical modularization in a flat view. The numbers between parentheses for each artifact are its module number in the expert modularization. As is shown in this figure, a new arrangement is proposed to the artifacts of module numbers 3 and 5 in the expert modularization, but they joined together in the upper level. The artifact numbers 15 and 17 have a relation with several of these modules. So they are identified as a new module in which the artifacts are utility libraries. These two artifacts are in a separate module in expert modularization too.
Call dependency graph of Mini-Tunis based on its design documentation [5].
The tree view.
The flat view.
The most important advantages of this method are that it can perform very well in to specify the encapsulation levels, e.g., module, package, and component by the designer.
6. Threats to Validity
In this section, to clarify the validity of TDHC, the limitations that can affect the results of the algorithm are discussed. Several factors may bias the validity of the study. These are typically divided into two categories: external and internal validity. External validity is about the ability to generalize the results to other than used case studies or indifferent settings for them:
The input of the algorithm is an ADG extracted from source code, and cohesion and coupling are considered as an indicator for refactoring. Candela et al. in [1] discussed that cohesion and coupling are not enough to remodularization of source code, and more indicators are probably needed. However, they did not discuss in their work what other indicators could improve the quality of the modularization.
In search-based techniques for source code remodularization, generalizing a technique to any software is an important threat to the validity of results. So, in this paper, Mozilla Firefox as a large-scale software system is selected alongside five medium-size other open-source systems. It is important to note that there are just some software systems that have more than Mozilla Firefox artifacts (here files) in a folder.
Internal validity is concerned with experimental treatments that affect the algorithm results, leading to poor results:
In this paper, precision, recall, F-measure, and MoJoFM metrics have used to compare study results with current modularization algorithms. These metrics are not necessarily in line with the developer expert’s opinion. Also, these metrics do not evaluate the structure of the tree, and none of them consider edges between artifacts in calculating similarity.
In the preprocessing step of TDHC, some artifacts may be selected to set aside from input of the tree generation step. In the end, it is important to suggest an appropriate position in modules for them or aggregate them as a new module.
The related rate of crossover and mutation operators used in GA is achieved from several experiments on the Mini-Tunis, JUnit, and Servlet-API software systems and applied to other case studies. However, these numbers may not work well on other software systems.
In the proposed algorithm, labels of inner nodes are not important and the Prufer sequence generates the same modularization for different codes. For example, 5,5,6,6,7 and 6,6,5,5,7 both represent the same modularization. On the contrary, the concept of neighborhood in this encoding is not transparent, and small changes in number positions make a great change in the structure of the output tree.
7. Conclusion and Future Work
During software maintenance and evolution, the structure of the software deviates from its original structure. Thus, source code refactoring is an essential role in the software maintenance process. In this paper, a new clustering algorithm based on cohesion and coupling between artifacts is proposed. In this method, a top-down hierarchical approach has been used with a metaheuristic algorithm (combining genetic algorithm and hill-climbing). In the proposed algorithm, a suitable point to start modularization of artifacts is suggested for developers. The input of the algorithm is ADG, which is independent of the source code programming language. However, its prepossessing operations may depend on the programming language of the source code or the type of input artifacts (class, file, function, or low-level module). Because the proposed refactoring method is automatic, it is supposed to serve as an assistant to the developer. Design decisions are often more complex and subtle than just trying to maximize cohesion and minimize coupling in the modularization process. In outcome, the derived modularization is analyzed by the software developer who can accept the proposed remodularization as is or change it by moving artifacts from one module (package) to another. The following is suggested for future works:
Increasing number of artifacts affect the quality of the optimal solution proposed by the algorithm. It is because of the exponential growth of the search space by increasing the input size. So it is important to improve the algorithm factors.
According to the size of search space in the software source code, a new preprocessing method can be offered to reduce search space. For example, in software source code refactoring, there is a modularization as a current developer suggests, and artifacts in a module are usually closely in contact with each other on that module and only some of them are in relation to other modules. Therefore, they can be ignored in calculating the relationship between modules.
Many research studies use structured or nonstructured features for refactoring of source code that can be used in this top-down search-based algorithm too.
Other heuristic or metaheuristic algorithms can be used instead of GA.
Data Availability
The data used to support the findings of this study are available at https://github.com/SoftwareMaintenanceLab.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
CandelaI.BavotaG.RussoB.OlivetoR.Using cohesion and coupling for software remodularization2016253110.1145/29282682-s2.0-84978101496de Paulo SobrinhoE. V.De LuciaA.de Almeida MaiaM.A systematic literature review on bad smells-5 w’s: which, when, what, who, where20182FowlerM.BeckK.BrantJ.OpdykeW.RobertsD.Refactoring: improving the design of existing programs1999IsazadehA.IzadkhahH.ElgedawyI.2017Berlin, GermanySpringerBrianS. M.2002New York, NY, USADrexel UniversityPressmanR. S.2005Berlin, GermanyPalgrave MacmillanMitchellB. S.MancoridisS.On the automatic modularization of software systems using the bunch tool200632319320810.1109/tse.2006.312-s2.0-33645833890JalaliN. S.IzadkhahH.LotfiS.Multi-objective search-based software modularization: structural and non-structural features201923211114111165LutellierT.ChollakD.GarciaJ.Comparing software architecture recovery techniques using accurate dependenciesProceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering2015Berlin, GermanyIEEE6978LutellierT.ChollakD.GarciaJ.Measuring the impact of code dependencies on software architecture recovery techniques2017442159181GarciaJ.IvkovicI.MedvidovicN.A comparative analysis of software architecture recovery techniquesProceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE)2013New York, NY, USAIEEE486496PraditwongK.HarmanM.YaoX.Software module clustering as a multi-objective search problem201137226428210.1109/tse.2010.262-s2.0-79953183586KohmotoK.KatayamaK.NarihisaH.Performance of a genetic algorithm for the graph partitioning problem20033811–131325133210.1016/s0895-7177(03)90134-82-s2.0-0345166079KargarM.IsazadehA.IzadkhahH.Multi-programming language software systems modularization20198010650010.1016/j.compeleceng.2019.106500ParsaS.BushehrianO.A new encoding scheme and a framework to investigate genetic clustering algorithms2005371127IzadkhahH.ElgedawyI.IsazadehA.E-cdgm: an evolutionary call-dependency graph modularization approach for software systems2016163709010.1515/cait-2016-00352-s2.0-84983543047MensT.TourwéT.A survey of software refactoring200430212613910.1109/tse.2004.12658172-s2.0-1842431668DucasseS.PolletD.Software architecture reconstruction: a process-oriented taxonomy200935457359110.1109/tse.2009.192-s2.0-68949213770PalombaF.De LuciaA.BavotaG.OlivetoR.Anti-pattern detection: methods, challenges, and open issues2014Berlin, GermanyElsevier201238Al DallalJ.AbdinA.Empirical evaluation of the impact of object-oriented code refactoring on quality attributes: a systematic literature review2018441446910.1109/tse.2017.26585732-s2.0-85040724898MitchellB. S.MancoridisS.On the evaluation of the bunch search-based software modularization algorithm20081217793TajgardanM.IzadkhahH.LotfiS.Software systems clustering using estimation of distribution approach2016829911310.1515/jacsm-2016-0007AmarjeetChhabraJ. K.Many-objective artificial bee colony algorithm for large-scale software module clustering problem201822196341636110.1007/s00500-017-2687-32-s2.0-85022026505PrajapatiA.ChhabraJ. K.A particle swarm optimization-based heuristic for software module clustering problem201843127083709410.1007/s13369-017-2989-x2-s2.0-85056280437HarmanM.TrattL.Pareto optimal search based refactoring at the design levelProceedings of the 9th Annual Conference on Genetic and Evolutionary Computation2007London, UKACM11061113BriandL. C.DalyJ. W.WustJ. K.A unified framework for coupling measurement in object-oriented systems19992519112110.1109/32.7489202-s2.0-0032627862SengO.StammelJ.BurkhartD.Search-based determination of refactorings for improving the class structure of object-oriented systemsProceedings of the 8th Annual Conference on Genetic And Evolutionary Computation2006London, UKACM19091916KebirS.BorneI.MeslatiD.A genetic algorithm-based approach for automated refactoring of component-based software201788173610.1016/j.infsof.2017.03.0092-s2.0-85015986827KumariA. C.SrinivasK.GuptaM.Software module clustering using a hyper-heuristic based multi-objective genetic algorithmProceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC)2013New York, NY, USAIEEE813818HuangJ.LiuJ.A similarity-based modularization quality measure for software module clustering problems20163429611010.1016/j.ins.2016.01.0302-s2.0-84961901435BavotaG.De LuciaA.MarcusA.OlivetoR.Using structural and semantic measures to improve software modularization201318590193210.1007/s10664-012-9226-82-s2.0-84881187471BavotaG.GethersM.OlivetoR.PoshyvanykD.LuciaA. d.Improving software modularization via automated analysis of latent topics and dependencies2014231110.1145/25599352-s2.0-84894384748BavotaG.De LuciaA.MarcusA.OlivetoR.Automating extract class refactoring: an improved method and its evaluation20141961617166410.1007/s10664-013-9256-x2-s2.0-84909999876GuiG.ScottP. D.Coupling and cohesion measures for evaluation of component reusabilityProceedings of the 2006 International Workshop on Mining Software Repositories2006Berlin, GermanyACM1821BavotaG.De LuciaA.OlivetoR.Identifying extract class refactoring opportunities using structural and semantic cohesion measures201184339741410.1016/j.jss.2010.11.9182-s2.0-78951485846PoshyvanykD.MarcusA.FerencR.GyimóthyT.Using information retrieval based coupling measures for impact analysis2009141532BavotaG.OlivetoR.GethersM.PoshyvanykD.De LuciaA.Methodbook: recommending move method refactorings via relational topic models201440767169410.1109/tse.2013.602-s2.0-84904326575MaleticJ. I.MarcusA.Supporting program comprehension using semantic and structural informationProceedings of the 23rd International Conference on Software Engineering2001Berlin, GermanyIEEE Computer Society103112PalombaF.PanichellaA.De LuciaA.OlivetoR.ZaidmanA.A textual-based technique for smell detectionProceedings of the 2016 IEEE 24th International Conference on Program Comprehension (ICPC)2016Berlin, GermanyIEEE110MohammadiS.IzadkhahH.A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source code201910525225610.1016/j.infsof.2018.09.0012-s2.0-85053159718MahouachiR.Search-based cost-effective software remodularization20183361320133610.1007/s11390-018-1892-62-s2.0-85056854272DebK.PratapA.AgarwalS.MeyarivanT.A fast and elitist multiobjective genetic algorithm: nsga-ii20026218219710.1109/4235.9960172-s2.0-0036530772OuniA.KessentiniM.Ó CinnéideM.SahraouiH.DebK.InoueK.More: a multi-objective refactoring recommendation approach to introducing design patterns and fixing code smells2017295e184310.1002/smr.18432-s2.0-85014656546Al DallalJ.Predicting move method refactoring opportunities in object-oriented code20179210512010.1016/j.infsof.2017.07.0132-s2.0-85026779685MuL.SugumaranV.WangF.A hybrid genetic algorithm for software architecture re-modularization201992129KargarM.IsazadehA.IzadkhahH.Improving the modularization quality of heterogeneous multi-programming software systems by unifying structural and semantic concepts20207618712110.1007/s11227-019-02995-32-s2.0-85073998059MahdaviK.2005Berlin, GermanyBrunel UniversityMonçoresM. C.AlvimA. C. F.BarrosM. O.Large neighborhood search applied to the software module clustering problem2018919211110.1016/j.cor.2017.10.0042-s2.0-85033564168KargarM.IsazadehA.IzadkhahH.Semantic-based software clustering using hill climbingProceedings of the 2017 International Symposium on Computer Science and Software Engineering Conference (CSSE)2017London, UKIEEE5560HuangJ.LiuJ.YaoX.A multi-agent evolutionary algorithm for software module clustering problems201721123415342810.1007/s00500-015-2018-52-s2.0-84957587985ChhabraJ. K.Harmony search based remodularization for object-oriented software systems201747153169KumariA. C.SrinivasK.Hyper-heuristic approach for multi-objective software module clustering201611738440110.1016/j.jss.2016.04.0072-s2.0-84964345567PrajapatiA.ChhabraJ. K.An efficient scheme for candidate solutions of search-based multi-objective software remodularizationProceedings of the International Conference on Human Interface and the Management of Information2016London, UKSpringer296307HwaJ.YooS.SeoY.-S.BaeD.-H.Search-based approaches for software module clustering based on multiple relationship factors20172771033106210.1142/s02181940175003952-s2.0-85029633351RamírezA.RomeroJ. R.VenturaS.Interactive multi-objective evolutionary optimization of software architectures2018463-4649210910.1016/j.ins.2018.06.0342-s2.0-85049338161AkbariM.IzadkhahH.Hybrid of genetic algorithm and krill herd for software clustering problemProceedings of the 2019 5th Conference on Knowledge Based Engineering And Innovation (KBEI)2019London, UKIEEE565570ChhabraJ. K.Many-objective artificial bee colony algorithm for large-scale software module clustering problem2018221963416361IzadkhahH.TajgardanM.Information theoretic objective function for genetic software clustering201946118MaqboolO.BabriH.Hierarchical clustering for software architecture recovery2007331175910.1109/tse.2007.707322-s2.0-35348960950KuhnA.DucasseS.GîrbaT.Semantic clustering: identifying topics in source code200749323024310.1016/j.infsof.2006.10.0172-s2.0-33846409962AndritsosP.TzerposV.Information-theoretic software clustering200531215016510.1109/tse.2005.252-s2.0-17644406339RatheeA.ChhabraJ. K.Clustering for software remodularization by using structural, conceptual and evolutionary features2018241217311757ShokoufandehA.MancoridisS.MaycockM.Applying spectral methods to software clusteringProceedings of the Ninth Working Conference on Reverse Engineering2002Berlin, Germany310TzerposV.HoltR. C.Acdc: an algorithm for comprehension-driven clusteringProceedings of the Proceedings Seventh Working Conference on Reverse Engineering2000Berlin, GermanyIEEE258267GottliebJ.JulstromB. A.RaidlG. R.RothlaufF.Prüfer numbers: a poor representation of spanning trees for evolutionary searchProceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation2001Berlin, GermanyMorgan Kaufmann Publishers Inc.343350OliverI.SmithD.HollandJ. R.Study of permutation crossover operators on the traveling salesman problemProceedings of the Second International Conference on Genetic Algorithms and their Applications1987Hillsdale, NJ, USAL. Erlhaum AssociatesWuJ.HassanA. E.HoltR. C.Comparison of clustering algorithms in the context of software evolutionProceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05)2005Hillsdale, NJ, USAIEEE525535WenZ.TzerposV.An effectiveness measure for software clustering algorithmsProceedings of the 12th IEEE International Workshop on Program Comprehension2004New Jersey, NJ, USAIEEE194203TianZ.-P.ZhangH.-Y.WangJ.WangJ.-Q.ChenX.-H.Multi-criteria decision-making method based on a cross-entropy with interval neutrosophic sets201647153598360810.1080/00207721.2015.11023592-s2.0-84945324926CohenJ.Things i have learned (so far)Proceedings of the 98th Annual Convention of the American Psychological Association Presented at the Aforementioned Conference1990Boston, MA, USAAmerican Psychological AssociationWilcoxonF.KattiS.WilcoxR. A.Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test19701171259