Genetic Algorithm-Based Test Data Generation for Multiple Paths via Individual Sharing

The application of genetic algorithms in automatically generating test data has aroused broad concerns and obtained delightful achievements in recent years. However, the efficiency of genetic algorithm-based test data generation for path testing needs to be further improved. In this paper, we establish a mathematical model of generating test data for multiple paths coverage. Then, a multipopulation genetic algorithm with individual sharing is presented to solve the established model. We not only analyzed the performance of the proposed method theoretically, but also applied it to various programs under test. The experimental results show that the proposed method can improve the efficiency of generating test data for many paths' coverage significantly.


Introduction
One of the approaches to improve the quality of software is to do a large number of tests before delivery and usage in order to detect bugs or faults in software. Software testing is an expensive, tedious, and labor-intensive task and requires significant human effort [1]. If the process of testing can be automated, it will undoubtedly shorten the period of software development and improve the quality of software, so as to enhance the market competitiveness. One of the most important issues in automated software testing is the generation of effective test data satisfying the selected test adequacy criteria.
It has been proved that many software test problems can come down to those of generating test data for paths coverage [2,3], which can be described as follows: for a given path of a program under test, search for a test datum in the input domain of the program, such that the traversed path of the test datum is just the desired one.
In recent years, it is becoming a promising direction to generate test data for complex software using the genetic algorithm (for short, GA) and has achieved many research results [4]. But most GA-based test data generation methods for path coverage intend to cover target paths one by one, which make the process of test data generation inefficient.
In this study, we established a mathematical model of generating test data for multiple paths coverage, which takes each optimization problem corresponding to one target path as a subproblem, and a number of subproblems form an overall optimization problem. This model is different from those existing multiobjective problems due to the specificity of generating test data.
On this basis, we proposed a multipopulation genetic algorithm to solve the proposed optimization problem. In our algorithm, each subpopulation optimizes one subproblem, so the fitness functions of different subpopulations differ from each other. All subpopulations evolve in parallel. A very key step of our algorithm is the individual sharing of different subpopulations; specifically, every time when the evolutionary operations of a generation finish, the algorithm not only determines whether an individual is an optimal solution of the subpopulation it belongs to, but also does that for the other subpopulations. By this way, the efficiency of finding optimal solutions for each subproblem improves with the complexity of the algorithm not increasing obviously.
We not only analyzed the performance of the proposed method theoretically, but also applied it to different programs under test for evaluation. The experimental results show that 2.2. GA-Based Test Data Generation. As an efficient searchbased optimization algorithm, the GA shows special advantage and efficiency in solving problems with high complexity, such as the problems of large space, multipeak, and nonlinear. Therefore it has become a research hotspot to automatically generate test data with GAs and produced encouraging results [13].
Gong and Yao [14] used a GA to generate test data for statement coverage based on testability transformation. Yao et al. [15] proposed an approach to reduce target statements according to their dominant relations and the test suite covering the reduced set of target statements was generated by a GA.
Miller et al. [16] used GAs to generate test data satisfying branch coverage criterion. The experimental results show that the test suite obtained by GAs can achieve or be very close to branch coverage. Baars et al. [17] presented an algorithm for constructing fitness functions that improve the efficiency of search-based testing when trying to generate branch adequate test data. Alshraideh et al. [18] proposed a multiple-population algorithm to improve the efficiency of branch coverage testing. The experimental results showed that the proposed method outperforms the single-population algorithm significantly.
Michael et al. [19] used a GA to generate test data satisfying condition coverage criterion. In their work, the problem of test data generation is reduced to a function minimization, and the function is minimized using one of two genetic algorithms in place of the local minimization techniques.
As for the works of GA-based software testing for path coverage criterion, we will introduce them individually in Section 2.3.
Besides traditional structural software testing, Bühler and Wegener [20] applied an evolutionary algorithm to functional testing. Watkins and Hufnagel [21] used two GAs to generate a couple of test data pieces and then trained a decision tree using them, in order to obtain an agent model which distinguishes the merit of test data. Ferrer et al. [22] presented a method of automatically generating test data by considering multiple objectives: maximizing the coverage and minimizing the oracle cost.

GA-Based Path Testing.
Path coverage testing is the strongest sufficiency criterion in white box testing. Automatically generating data for paths coverage remains a challenging problem [23].
Bueno and Jino [24] and Watkins and Hufnagel [25] used a GA to obtain test data fulfilling path coverage, respectively. Mei and Wang [26] proposed a method that can automatically generate test cases for selected paths using a special genetic algorithm. In their algorithm, the best chromosome called queen crosses with the selected drones, which enhances the exploitation of global optimal solutions. Hermadi and Ahmed [27] have observed that existing GA-based test data generators can generate only one test datum for one test goal at a time. When there are many target paths to be covered, the generator has to be run many times. In fact, the generated individuals when trying to find test data covering a path may be just test data covering other target paths. This, hence, makes those existing test data generators inefficient in trying to generate test data for multiple paths.
Wegener et al. [28] developed a fully automatic GAbased test data generator for structural software testing. In their approach, all generated individuals are evaluated with regard to all unachieved partial aims. Partial aims reached by chance are identified, and the individuals with good fitness values for one or more partial aims are noted and stored for seeding the subsequent testing of uncovered targets. But they only considered one partial aim for optimization at a time, which means that they solved the problems of generating test data one by one. Furthermore, they did not discuss whether multiple targets can be covered in one run. Besides, they reported that full coverage of some programs is achieved but not for all programs though.
Bueno and Jino [24] looked after methods to improve the performance of test data generation by using past input data to compose the initial population for the search. Although these methods can improve the performance of the initial population by reusing test data, they still cannot make full use of the test data generated in the evolutionary process.
Ahmed and Hermadi [29] proposed a GA-based test data generator for multiple paths. In their work, the problem of generating test data for multiple paths is regarded as a multiobjective optimization problem and solved by a multiobjective evolutionary algorithm. In fact, the problem of generating test data for multiple paths is strictly different from traditional multiobjective optimization problems. Therefore, it is necessary to establish an appropriate mathematical model for the problem of generating test data for multiple paths coverage according to its specificity and give a corresponding evolutionary solution.
Gong and Zhang [30] also proposed a test data generation method for multipath coverage. They represent a target path using Huffman encoding method and designed the fitness function according to the Huffman codes of target paths. Their method is simple and has better performance than Ahmed's method, but the fitness function cannot distinguish individuals well.
In order to stop searching as soon as all feasible paths have been covered, Hermadi et al. [31] proposed method for determining when it is no longer worthwhile to continue searching for test data to cover uncovered target paths. Compared to searching for a standard number of generations, an average of 30-75% of total computation was avoided in test programs with infeasible paths, and no feasible paths were missed due to early termination. The extra computation in programs with no infeasible paths was negligible.

Mathematical Model of Test Data Generation for Multiple Paths
In order to illustrate conveniently, we first introduce several concepts. Then, an objective function is constructed in order to transform the problem of generating test data into an optimization one. On this basis, the optimization model of generating test data for multiple paths coverage is established.

Basic Concepts
Control Flow Graph (CFG) [1]. The CFG of a program Φ is a directed graph = ( , , , ), where is the set of nodes, is the set of edges, and and are unique entry and exit nodes of the graph, respectively. Each node is a statement in the program; each edge ( , ) represents a transfer of control from node to node .
For large-scale programs, the sequence of a path may be very long. We represent a path using a (0, 1)-string for simplicity. Suppose that there are conditional statements in path , denoted as C 1 , C 2 , . . . , C . Define inludes the true branch of C . (1) Thus we obtain a (0-1)-string 1 2 ⋅ ⋅ ⋅ of length . In program Φ, the mapping between a path and such a (0, 1)string is one to one. Without special illustration, a path is represented by such a (0, 1)-string in this study.
Let the input vector of program Φ be = ( 1 , 2 , . . . , ), and let the domain of be ; then the input domain of Φ is (Φ) = 1 × 2 × ⋅ ⋅ ⋅ × . When program Φ adopts as an input, the traversed path is denoted by ( ). We call the first dissimilar character of and ( ) their bifurcation.

Structure of Objective Function.
The key problem of applying GAs to test data generation is the construction of a suitable objective function. The goodness of a candidate test datum is often expressed in terms of the closeness that the test datum fulfills the test goal. The approach to forming an objective function typically involves two parts: approach level (AL) and branch distance (BD) [3,24,25].
The approach level assesses how close an execution comes to reaching the predicate which controls the test object. If ̸ = ( ), we define the approach level of input to a target path as the number of characters between the bifurcation of and ( ) to the last character of , denoted by AL ( ); otherwise, we define AL ( ) = 0. covers path if and only if AL ( ) = 0.
The branch distance assesses how close the predicate comes to evaluating either true or false branch. For example, suppose that a conditional statement is "if ≥ 12, " and the aim is to execute the true branch. Suppose that the value of is ( ) after the execution of this statement with input ; then the branch distance of for branch condition ≥ 12 is defined as follows: Branch distances of different kinds of simple branch conditions are listed in Table 1. For a complex branch condition, branch distance is the composite of those of all simple conditions included in it, which is listed in Table 2.
We define the general objective function ( ) of input to target path as follows: where BD ( ) refers to the branch distance of to the conditional statement corresponding to the bifurcation of and ( ), and function normalized ( ) = 1 − 1.01 − .

4
Computational Intelligence and Neuroscience Table 1: Branch distances of simple branch conditions [3]. Table 2: Branch distances of complex branch conditions [3].

Branch condition
A sufficient and necessary condition of ( ) = 0 is that the traversed path of is ; that is, ( ) = ; furthermore, the smaller the value of ( ), the nearer the to the data covering . So the problem of generating test data for path can be transformed into that of minimizing ( ).
For example, see the program in Figure 1. Suppose that the target path is = 1 2 3 4 5 . There are three conditional statements in , that is, statements 1, 2, and 4, respectively.

Mathematical Model of Generating Test Data for Multiple Paths
Coverage. Let the set of target paths be Γ = { 1 , 2 , . . . , }; then the problem of generating test data for Γ can be described as follows: find a test suite { 1 , 2 , . . . , }, such that ( ) = . Let the objective function for path using the method proposed in Section 3.2 be ( ); then the problem of generating test data for { 1 , 2 , . . . , } can be transformed into an optimization one described as follows: Most existing GA-based test data generation methods take the above problem as self-governed optimization ones and solve them one by one. Specifically, for each optimization problem min ( ), run a GA in order to find an optimal solution of ( ), which is just a test datum traversing target path . Repeat above process, until all optimization problems have been solved. If the number of target paths is , the GA has to be run times.
This approach, however, does not take advantage of the fact that some of the required test data can be readily available as by-products when trying to find other test data, because different target paths have similarities. Therefore the efficiency of these methods is low when is large.
Ahmed et al. gave an algorithm of generating test data for multiple paths coverage, but they regarded this problem as a multiobjective optimization one. Thus, their model should be In fact, the problem of generating test data for multiple paths is strictly different from traditional multiobjective optimization ones. In traditional multiobjective optimization problems, the aim is to find one solution which satisfies all objectives well. In a multiobjective environment, we often encounter conflicting objectives with some trade-off among them. But for the problem of generating test data for paths 1 , 2 , . . . , , what we need is to obtain a test suite is an optimal solution of ( ), = 1, . . . , .
In addition, the number of objective functions in traditional multiobjective optimization problems remains unchanged, while that in the proposed model gradually reduces. Therefore, there is much limitation to take the problem of generating test data as a multiobjective optimization one.
Different from existing methods, we consider the problem of generating test data for paths coverage as a uniform problem, in which each optimization problem corresponding to one target path is a subproblem. We solve all subproblems at the same time. Thus the problem corresponding to the test data generation for multiple paths coverage can be described as follows: if (X ≥ 1) the same domain. We will seek an algorithm to solve these problems simultaneously, rather than solve them independently. So problem (8) strictly differs from (6) and (7), and we should seek a suitable method to solve it.

Multipopulation GA for Test Data Generation of Multiple Paths
In this section we will give a multipopulation GA to solve problem (8), which is different from traditional multipopulation GAs. The main purpose of our strategy is to expand the search range of each population by individual sharing, so as to improve the efficiency of the algorithm.

Initialization of Populations.
For the th optimization problem min ( ), randomly generating a subpopulation of size , that is, (1) refers to the th individual in the th population of the first generation. An individual corresponds to a string by proper encoding. Population size might have some influence on the performance of the algorithms, but this is not a focus of this study, so we just give an appropriate value for it.

Genetic Operations.
As a typical GA, our method mainly includes three kinds of operations, that is, selection, crossover, and mutation.
Individuals are selected according to their fitness, so that good gens have more chances to be copied to the next generation. We adopt objective function ( (1) ) as the fitness of individual (1) . Because what we are solving are minimization problems, the smaller the fitness of an individual is, the better we consider it.
Crossover operation exchanges parts of two gene strings in a certain probability to produce two new chromosomes, while mutation operation modifies some of the genes in a string, resulting in a new chromosome. The crossover and mutation rates are denoted by and , respectively. Because parameter setting is not the focus of this work, we just give the value of the parameters based on experience.
Each subpopulation implements these operations independently. By this way, individuals of the th generation are evolved to the ( + 1)th generation, which can be shown as

4.4.
Steps of the Algorithm. Based on the above discussion, the main steps of the proposed algorithm are shown as follows.
Step 1. Set the values of the number of subpopulations , maximum termination generation , crossover probability , and mutation probability , where is equal to the number of target paths.
Step 6. If the number of subpopulations becomes 0, or the number of generations is larger than , then stop the evolution and output the test data; otherwise, go to Step 7.

Performance Analysis
We will illustrate the performance of the proposed algorithm by analyzing its efficiency and computational complexity.

Efficiency of Algorithm.
Suppose that the set of target paths is { 1 , 2 , . . . , } ( > 1) and ℵ( ) is the subpopulation used to optimize the th subproblem, which is related to the problem of generating test data for . Let be the number of generations in which the th subpopulation finds the test datum covering path ; thus is a random variable. From experiences, we can suppose that ∼ ( , 2 ). Let , ̸ = , be the number of generations in which the th subpopulation finds the test datum covering ; then is also a random variable. Suppose that the probability of ℵ( ) finding the test datum that covers is ; then { = } = (1 − ) −1 , = 1, 2, . . .. For convenience to illustration, we also denote by .
If we use traditional single-objective GAs to solve (3), in the circumstance of using the same population size, the probability of ℵ( ) finding an optimal solution within generations is { ≤ } = Φ(( − )/ ), where Φ( ) is the distribution function of standard normal distribution. Thus the probability of all subpopulations finding their optimal solutions within generations is If we adopt the proposed method to solve (6), then the probability of finding the test datum covering path within generations is Thus the probability of all subpopulations finding all optimal solutions within generations is Computational Intelligence and Neuroscience 7 Since (1 − ) ( −1) < 1, we obtain That is to say, the probability of finding all optimal solutions using the proposed algorithm is larger than that of traditional single-objective GAs. In addition, the more the number of target paths is, the more obvious the advantage of the proposed method is, which can also be easily understood by the following example.
Suppose that the set of target paths is { 1 , . . . , 5 } and ∼ (500, 100 2 ), = 1/10000, , = 1, . . . , 5; then the probabilities of finding all optimal solutions within 500 and 600 generations using traditional single-objective GAs are respectively, whereas the probabilities of finding all optimal solutions within 500 and 600 generations using the proposed algorithm are respectively. If the number of target paths increases to 10, and ∼ (500, 100 2 ), = 1/10000, , = 1, . . . , 10, then the probabilities of finding all optimal solutions within 500 and 600 generations using traditional single-objective GAs are respectively, whereas the probabilities of finding all optimal solutions within 500 and 600 generations using the proposed algorithm are respectively. As can be seen from these results, in circumstance with 5 target paths, the probabilities of finding all optimal solutions within 500 and 600 generations using the proposed algorithm are 0.0719 and 0.4540, respectively, which are 0.0719/0.0313 ≈ 2.3 and 0.4540/0.3580 ≈ 1.3 times those of traditional single-objective GAs; in circumstance with 10 target paths, the probabilities of finding all optimal solutions within 500 and 600 generations using the proposed algorithm are 0.0215 and 0.3182, respectively, which are 0.0215/0.000977 ≈ 22 and 0.3182/0.1282 ≈ 2.5 times those of traditional single-objective GAs. The above results forcefully illuminate that the proposed algorithm is more efficient than traditional single-objective GAs; moreover, with the increase of the number of target paths, the advantages become more obvious.

Computational Complexity.
We will compare the computational complexity of our multipopulation genetic algorithm and those of traditional ones. Suppose that the program under test has executable statements and there are target paths { 1 , 2 , . . . , }. The population size is . Because can be set manually, we consider as a constant. We take the number of executed statements for the calculation of individual fitness and individual sharing in a generation as a measure of the computational complexity of an algorithm. If we use traditional multipopulation GAs to solve the problem, which means that there is no individual sharing among subpopulations, then the program under test will be run times, which is equal to the number of all individuals. Since each run of the program probably executes statements, the number of executed statements for the run of the program under test will be . Taking the computation of the fitness as one statement, then all these individuals need to execute statements. So the number of executed statements in a generation using traditional multipopulation GAs is ( , ) = + . If we use the proposed method to solve the problem, which means that subpopulations share all individuals, in addition to the run of the program under test and the computation of the fitness function, we consider the computation 8 Computational Intelligence and Neuroscience due to individual sharing among subpopulations. Taking the computation of the approach level as a statement, individual sharing needs to execute 2 sentences. So the number of executed sentences in a generation using the proposed method is ( , ) = + + 2 . Under normal circumstances, is much smaller than , so On the other hand, each subpopulation has individuals as possible solutions for each generation in traditional multipopulation GAs. But in our method, the possible solutions become for each generation via individual sharing, which is times that of traditional methods. In other words, the population size is magnified to times via individual sharing with the computation quantity almost doubling.

Experiments
A group of experiments are conducted so as to investigate the performance of the proposed method. In the following section, subject programs are first introduced. Afterwards, experimental design is characterized. Finally, empirical results are presented and discussed.

Subject Programs.
In order to evaluate the proposed method, we select eighteen programs for experiments. Table 3 shows some basic information of each program, including its name, size, and description. Table 3 is sorted by the sizes of the programs. These test subjects include not only laboratory programs, but also nontrivial industry ones. In addition, their lengths and functions are different from each other. These programs have been thoroughly used by other researches in the literature of software testing and analysis [19,[32][33][34]. The number of target paths for each program is also listed in Table 3.
For each program under test, we just randomly choose a part of feasible paths to cover. If there are too many paths to be covered, we can divide them into several groups, so that the scale of paths is reasonable. In addition, if we choose infeasible paths as target ones, the performances of different methods will not be distinguished, because it is impossible for any method to generate test data covering infeasible paths. The prediction of the infeasibility of a program path is an undecidable problem, and heuristic techniques that automatically select likely feasible paths can be employed [32].

Experimental Design.
When designing the experiment, we specially have concern about two issues that can be described as follows.

Proposition 1. Can individual sharing improve the efficiency of the algorithm?
In order to verify the first proposition, we conduct two groups of experiments. In the first group of experiments, we use the proposed multipopulation GA with individual sharing to generate test data, while in the second one, different populations do not implement individual sharing but evolve independently.

Proposition 2. How is the overall performance of the proposed method?
In order to validate the overall performance of the proposed method in this study (for short, our method), we compare it with other three methods, namely, Gong's method Computational Intelligence and Neuroscience 9 [30], Ahmed's method [29], and random method. The reason why we adopt Gong's and Ahmed methods to compare is that they are also about the problem of generating test data for multiple paths. In addition, random method is a basic test technique and has been widely used, so we also adopt it as a consult object. All methods (except random one) apply the same values of parameters, which are listed in Table 4. There are two termination criteria: one is that the test data for all target paths have been found; the other is that the number of generations has reached the maximum.  Table 5, in which Ave. and S.D. are the sample average and standard deviation of time consumption for each program and method, respectively. Sh.R. means the radio of the number of test data obtained by individual sharing and the number of all test data.
It can be seen from Table 5 that, (1) for all subject programs, the average time consumption using the method of individual sharing is all less than that not implementing individual sharing. The least time consumption of the method applying individual sharing is 6.38 seconds (Bubble Sort), while that not implementing individual sharing for the same program is 11.03 seconds. The most time consumption of the method applying individual sharing is 183.53 seconds (barcode), while that of the second method is 265.72 seconds.
(2) The sharing rates of all programs exceed 30% except schedule (29.8%). The average sharing rate of the eight programs is 36.87%, which means that approximately one of each three test data pieces is obtained by individual sharing. By this way, we can make more full use of individuals generated in evolutionary process, therefore improving the efficiency of generating test data.
We use hypothesis testing to give a more scientific analysis for the above experimental results. Let 1 and 2 denote  the time consumption using and not using individual sharing, respectively (without confusion, we will use the same symbol for all programs under test). It can be verified that 1 and 2 are random variables obeying normal distribution. Suppose that ∼ ( , 2 ), = 1, 2. Because the sample standard deviation is an unbiased estimate of the standard deviation of the population, we take the values of sample standard deviations as those of standard deviations. Let the significance level = 0.01. We will illustrate the performances of different methods by comparing 1 (= ( 1 )) and 2 (= ( 2 )).
Step 4. Calculating the value of statistics. The values of statistics of different programs are listed in Table 6; = 2.325.
Step 5. Drawing conclusions From Table 6 we conclude that the values of are all less than − = −2.325. Then we reject null hypothesis 0 for all object programs, which means that the time consumption using individual sharing is significantly less than that not using it.

Experimental Results for Testing the Proposed Method.
The experimental results of comparing different methods are listed in Table 7. The meanings of all symbols are the same with Table 5. We also use hypothesis testing to give a scientific analysis for the above experimental results. The value of 1 shows the hypothesis testing results by comparing our method and Gong's method, that of 2 shows the hypothesis testing results by comparing our method and Ahmed's method, and that of 3 shows the hypothesis testing results by comparing our method and the random method.
It can be seen from Table 7 that, (1) for all subject programs, the average time consumption using our method is all less than that of Gong's, Ahmed's, and the random methods. The least time consumption of our method is 5.85 seconds (Bubble Sort), while that of Gong's, Ahmed's, and the random methods for the same program is 8.79, 9.85, and 12.32 seconds, respectively. The largest time consumption of our method is 192.82 seconds (Barcode), while that of Gong's, Ahmed's, and the random methods is 224.87, 289.67, and 316.42 seconds, respectively. (2) Gong's and Ahmed's methods have better results than the random method but are all poorer than ours. (3) The values of 2 and 3 are all less than − = −2.325. The values of 1 are all less than − = −2.325 except three programs, that is, Comn, Splinge, and Printtok. For these three programs, the time consumption of our method is still all less than that of Gong's method. Then we conclude that the time consumption using our method is significantly less than that using Gong's, Ahmed's, and random methods.

Threats to Validity
The present study focuses on generating test data for multiple paths coverage. One possible threat to the validity of the proposed method may be related to parameter settings. The settings of parameters in GAs have an influence on the performance of generating test data. Appropriate choices of these values can improve the performance of an algorithm and therefore enhance its efficiency in generating test data. However, how to set proper parameters is not the emphasis of this study; thus we just give the values of the parameters based on our experience. The second threat to the validity may have relation with the use of software systems. Thus, possible bugs or errors, different program conversions, and test objectives may also have influence on the obtained results. Additionally, the selection of target paths may have also influenced the obtained results.

Conclusion
We establish a mathematical model which is a rational reflection of the problem of generating test data for multiple paths coverage. On this basis, a multipopulation GA is presented to solve the problem in the model. The main idea of this algorithm, very different from traditional multipopulation GAs, is to improve the search efficiency by means of individual sharing among different subpopulations. In addition, we not only prove the efficiency of our method theoretically, but also apply it in various programs under test. The experimental results show that our method has more significant advantages than Ahmed's multiobjective method and random method. The proposed algorithm in this study enriches the theory and technique of GA-based test data generation and provides a new way to improve the efficiency of software testing.
Possible future researches are presented as follows: one is the method to generate test data when the number of target paths is very large; the other one is the establishment of test platform based on our method.