A Hybrid Intelligent Search Algorithm for Automatic Test Data Generation

The increasing complexity of large-scale real-world programs necessitates the automation of software testing. As a basic problem in software testing, the automation of path-wise test data generation is especially important, which is in essence a constraint optimization problem solved by search strategies. Therefore, the constraint processing efficiency of the selected search algorithm is a key factor. Aiming at the increase of search efficiency, a hybrid intelligent algorithm is proposed to efficiently search the solution space of potential test data by making full use of both global and local search methods. Branch and bound is adopted for global search, which gives definite results with relatively less cost. In the search procedure for each variable, hill climbing is adopted for local search, which is enhanced with the initial values selected heuristically based on the monotonicity analysis of branching conditions. They are highly integrated by an efficient ordering method and the backtracking operation. In order to facilitate the searchmethods, the solution space is represented as state space. Experimental results show that the proposedmethod outperformed some other methods used in test data generation.The heuristic initial value selection strategy improves the search efficiency greatly and makes the search basically backtrack-free.The results also demonstrate that the proposed method is applicable in engineering.


Introduction
With the surge of increasingly complex real-world software, software testing plays a more and more important role in the process of software development, as it is an important stage to guarantee software reliability [1], which is a significant software quality feature [2].In 2002, National Institute of Standards and Technology (NIST) estimated the cost of software failure to the US economy at $6 × 10 10 , which was about 0.6% of GDP at the time [3].The same report also found that over one-third of the cost of software failure could be eliminated by an improved testing infrastructure.But manual testing is time-consuming and error-prone and is even impracticable for large-scale real-world programs such as a Windows project with millions of lines of codes (LOC) [4].So the automation of testing is of crucial concern [5].Furthermore, as a basic problem in software testing, pathwise test data generation (denoted by ) is of particular importance because path-wise testing can detect almost 65 percent of the faults in the program under test (PUT) [6] and many problems in software testing can be transformed into .
The methods of solving  can be categorized as dynamic and static.The dynamic methods require the actual execution of the PUT and metaheuristic (MHS) [7] methods such as simulated annealing (SA) [8] and genetic algorithm (GA) [9] are very popular.They can generate test data with appropriate fault-prone ability [10,11], but their slow convergence speed makes the process of generating test data quite long.Recently, particle swarm optimization (PSO) [12][13][14] has become a hot research topic due to its convenient implementation and faster convergence speed.
The static methods utilize techniques including symbolic execution [15,16] and interval arithmetic [17,18] to analyze the PUT without executing it.The process of generating test data is definite with relatively less cost.They abstract 2 Mathematical Problems in Engineering the constraints to be satisfied and propagate and solve these constraints to obtain the test data.Due to their precision in generating test data and the ability to prove that some paths are infeasible, the static methods have been widely studied by many researchers.DeMillo and Offutt [19] proposed a faultbased technique that used algebraic constraints to describe test data designed to find particular types of faults.Gotlieb et al. [20] introduced "static single assignment" into a constraint system and solved the system.Cristian et al. from Stanford University proposed a symbolic execution tool named KLEE [21] and employed a variety of constraint solving optimizations.They represented program states compactly and used searching heuristics to reach high code coverage.In 2013, Yawen et al. [22] proposed an interval analysis algorithm using forward dataflow analysis.But no matter what techniques are adopted the static methods require a strong constraint solver.
Aiming at constructing an efficient constraint processing engine, this paper proposes a new method for static test data generation based on the abstract memory model (AMM) [23] in Code Test System (CTS) (http://ctstesting.cn/),which tests real-world programs written in C programming language.AMM underlying automatic test data generation maintains a table of memory states and the constraints related to the structure of the data types can be represented by the table.
Following are the main contributions of this paper.The problem of path-wise test data generation () is defined as a constraint optimization problem (COP), which is often solved by searching strategies.We introduce two algorithms in artificial intelligence to form a hybrid intelligent search method to solve .Branch and bound is used as the global search method, which gives definite results with relatively less cost.Hill climbing is utilized as the local search method when searching for a fixed value for a specified variable.Specifically, the initial value selection in the process of hill climbing is based on the heuristic analysis of the monotonicity of branching conditions.In order to facilitate the search methods, the solution space is represented as state space.
The rest of this paper is organized as follows.The background underlying our research is introduced in Section 2. The problem  is reformulated as a COP and the solution is presented in Section 3. Section 4 illustrates the proposed hybrid search algorithm and an efficient variable ordering algorithm.Section 5 describes the local search algorithm hill climbing in detail.A case study is provided in Section 6 to thoroughly explain how the hybrid search algorithm works.Section 7 makes experimental analyses and empirical evaluations on the proposed algorithm and coverage comparison with some currently existing test data generation methods.Section 8 concludes this paper and highlights directions for future research.

Background
State space search [24,25] is a process in which successive states of an instance are considered, with the goal of finding a final state with a desired property.Problems are normally modeled as a state space, a set of states that a problem can be in.The set of states forms a graph where two states are connected if there is an operation which can be performed to transform the first state into the second.State space search characterizes problem solving as the process of finding a solution path from an initial state to a final state.In state space search, the nodes of the search tree are corresponding to partial problem solution and the arcs are corresponding to steps in a problem-solving process.State space search differs from traditional search methods because the state space is implicit; the typical state space is too large to generate and store in memory.Instead, nodes are generated as they are explored and typically discarded thereafter.
Branch and bound (BB) [26,27] is an efficient backtracking algorithm for searching the solution space of a problem and a common search technique to solve optimization problems.The advantage of the BB strategy lies in alternating branching and bounding operations on the set of active and extensive nodes of a search tree.Branching refers to partitioning of the solution space (generating the child nodes); bounding refers to lowering bounds used to construct a proof of feasibility without exhaustive search (evaluating the cost of new child nodes).
Hill climbing (HC) [28,29] is a comparatively simple local search algorithm that works to improve a single-candidate solution, starting from a randomly selected starting point.From the current position, the neighboring search space is investigated.If a better candidate solution is found, the search moves to that point which replaces the current solution.The neighborhood of the new solution is then investigated.If a better solution is found, the current solution is replaced again and the process continues, until no improved neighbors can be found for the current solution.The search conducted by hill climbing relies on the starting point very much.This progressional improvement is like climbing a hill in the "landscape" of an objective function.In this landscape, the peak signifies a solution with the locally optimal objective values.
Interval arithmetic is an important static testing technique, which represents each value as a range of possibilities.An interval is a continuous range in the form of [min, max], while a domain is a set of intervals.A fixed value for a variable is represented as an interval with min equal to max, for example, [5,5].Interval arithmetic has a set of arithmetic rules defined on intervals.It analyzes and calculates the ranges of variables starting from the entrance of the program and provides precise information for further program analysis efficiently and reliably.Let two intervals be  = [, ] and  = [, ]; some basic rules used for explanation in this paper are listed below.

Reformulation of Path-Wise Test Data Generation
This section addresses the reformulation of path-wise test data generation.Problem definition and its solution are presented in Sections 3.1 and 3.2, respectively.To be specific, each constraint defined by the PUT along  should be satisfied.In static analysis, the feasibility of a path is judged by the result of interval arithmetic.In this paper, the paths in the examples that we use are all feasible for the convenience of explanation, but our work also involves the detection of infeasible paths in the process of generating test data.
An example with a program test and its corresponding CFG is shown in Figure 1, where if out 6, if out 7, if out 8, if out 9, and exit 10 are dummy nodes.Adopting branch coverage, there are five paths to be traversed, namely, Path1: The numbers along the paths denote nodes rather than edges of the CFG.Assuming Path5 is the path to be traversed as shown in bold, see Figure 1, our work is to select  = { 1 ,  2 } from { 1 ,  2 } for 1 and 2, so that when executing test using {1  →  1 , 2  →  2 } as an input, the path traversed is Path5.There are four branching nodes if head 1, if head 2, if head 3, and if head 4 along Path5 and four corresponding branches T 1, T 2, T 3, and T 4 that contain the constraints to be met.

Solution to the Problem.
A COP is generally solved by search algorithms [32], which may be global or local.To date, there has been no theoretical analysis that characterizes the types of search methods (global or local) to be effective for path-wise test data generation as a COP.Global search aims to overcome the problem of local optimum in the search space and can thereby find more globally optimal solutions.
Local search may become trapped in local optimum within the solution space, but can be far more efficient for simpler problems.In software testing, global search may achieve better coverage than local search, but at the cost of greater computational effort.However, Harman and McMinn [11] revealed that local search can be very effective and efficient, but there remain problems for which global search is the only technique that can successfully achieve coverage.The strong performance of local search, coupled with the necessity to retain global search for optimal effectiveness, naturally points to the consideration of hybrid search techniques.In view of that, we present a hybrid intelligent search method BB-HC, combining two search methods branch and bound and hill climbing.And we also try to heuristically find a better starting point to improve the search efficiency, rather than selecting a random value.
During the search process, variables are divided into three sets: past variables (short for PV, already instantiated), current variable (now being instantiated), and future variables (short for FV, not yet instantiated).In addition, although the experiments were carried out on programs of different data types, integer variables are used as example in the following algorithms in order to simplify the explanations.

The Hybrid Search Strategies
This section proposes the framework of the hybrid search.Specifically, the representation of state space search is described in detail in Section 4.1, which is followed by the hybrid intelligent search algorithm in Section 4.2.And the dynamic ordering algorithm in the hybrid search algorithm is explained in Section 4.3.

The Representation of State Space Search.
The state space is a quadruple (, , , ), where  is a set of states,  is a set of arcs or connections between the states that correspond to the steps or operations of the search at different states,  is a nonempty subset of  denoting the initial state of the problem, and  is a nonempty subset of  denoting the final state of the problem.
A state is a quintuple (Precursor, Variable, Domain, Value, and Type).Precursor provides a link to the previous state; Variable =   ∈  ( = 1, 2, . . ., ) is the current variable; Domain =   ∈  in the form of [min, max] is the set of possible values that may be selected to instantiate Variable; Value =   ∈   is a value selected from Domain; Type marks the type of state which might be active, extensive, or inactive.
State space search is all about finding one final state in a state space (which may be extremely large).Final means that every variable has been instantiated with a definite value successfully.At the start of the search Precursor is null, and when Variable is null the search ends.The path made up of all the extensive nodes in the search tree makes the solution path.The process of generating test data for path  takes the form of state space search.The state space needs to be searched to find a solution path from an initial state to a final state.We can decide where to go by considering the possible moves from the current state and trying to look ahead.

The Hybrid Intelligent Search
Algorithm.The idea of our algorithm is to extend partial solutions.At each stage, a variable in FV is selected and assigned a value from its domain to extend the current partial solution.Hill climbing evaluates whether such an extension may lead to a possible solution of the COP and prunes subtrees containing no solutions based on the current partial solution.Some relevant concepts in this paper are described in Table 1.
The overview of our approach can be seen from Figure 2. The path to be traversed is shown in the left part, where the circles represent nodes and the arrows represent edges of the CFG.The path contains the constraints to be met, the set of input variables, and the domains corresponding to the variables.Then BB-HC works to generate the test data.All the variables in FV are permutated by DVO (see Section 4.3) to form a queue and its head  1 is determined as the first variable to be instantiated.Next PTC (see Section 5.1) calculates path tendency for each variable and IDC (see Section 5.1) reduces the domain  1 in which the initial value  1 is selected for  1 .With all these, the initial state is constructed as (null,  1 ,  1 ,  1 , active), which is also the current state  cur .Then the hill climbing (see Section 5.2) process begins for  1 .For brevity, our following explanation refers to the hill climbing process for each   in FV, and  is the sequential number of the current variables   .But it is only by the DVO following a successful HC that variable   can be determined (except  1 ).Accordingly,   and   are the domain and value of the current variable   .
Hill climbing utilizes interval arithmetic to judge whether   for   leads to a conflict or not.If not, the peak of the hill is reached with 0 as the objective value, and the type of  cur is changed into extensive, which means the hill climbing process ends for   , and DVO for the next variable will begin.But if a conflict is detected, we will calculate the objective function and reduce the domain of   (  ) according to its return value.After selecting a new   from reduced   , interval arithmetic again will help to judge whether this   leads to a conflict.In summary, the hill climbing process ends with two possibilities.One is that it finally finds the optima for   and reaches the peak of the hill, so the type of  cur is changed to extensive indicating that the local search for   ends and DVO for the next variable will begin.The other is that it fails to find the optima for   and there is no more search space, so the type of  cur is changed to inactive indicating that the local search for   ends and backtracking is inevitable.BB-HC ends when the hill climbing processes for all the variables succeed and there is no more variables to be permutated.And the test data is the output of BB-HC as shown in the right part of Figure 2. The above mentioned search process is described by pseudocodes as shown in Algorithm 1.

Dynamic Variable Ordering.
In practice, the chief goal in designing variable ordering heuristics is to reduce the size of the overall search tree.In our method, the next variable to be instantiated is selected to be the one with the minimal remaining domain size (the size of the domain after removing the values judged to be infeasible), because this can minimize the size of the overall search tree.The technique to break ties is important, as there are often variables with the same domain size.We use variables' ranks to break ties.In case of a tie, the variable with the higher rank is selected.This method gives substantially better performance than picking one of the tying variables at random.Rank is defined as follows.
Definition 1. Assuming that there are  branches along a path, the rank of a branch (  ,  +1 ) ( ∈ [1, ]) marks its level in the sequence of the branches, denoted by rank(  ,  +1 ).
The rank of the first branch is 1, the rank of the second one is 2, and the ranks of those following can be obtained analogously.The variables appearing on a branch enjoy the same rank as the branch.The rank of a variable on a branch where it does not appear is supposed to be infinity.As a variable may appear on more than one branch, it may have different ranks.The rule to break ties according to the ranks of variables is based on the heuristics from interval arithmetic that the earlier a variable appears on a path, the greater influence it has on the result of interval arithmetic along the path.Therefore, if the ordering by rank is taken between a variable that appears on the branch (  ,  +1 ) and a variable that does not, then the former has a higher rank.That is because on the branch (  ,  +1 ), the former has rank  while the latter has rank infinity.The comparison between  and infinity determines the ordering.The algorithm is described by pseudo-codes as shown in Algorithm 2.
Quicksort is utilized when variables are permutated according to remaining domain size and returns   as the result.If no variables have the same domain size, then DVO returns the head of   (  ).But if there are variables whose (2) for  → 1 : (5) else for domain sizes are the same as that of the head of   , then the ordering by rank is under way, which will terminate as soon as different ranks appear.

Hill Climbing
Hill climbing is the focus of this section, and it is used to judge whether a fixed value   for the current variable   makes path  feasible.In other words, a certain   that makes  feasible is the peak that we are trying to search for   .

Initial Value
Selection.Initial values of variables are of great importance to a search algorithm.On the one hand, in a backtrack-free search, the initial value of a variable is almost part of the solution.On the other hand, the selection of initial values affects whether the search will be backtrack-free.Initial values are often selected at random in MHS methods, which return different test data each time allowing diversity, but randomness without any heuristics is characterized by blind search, which causes too many iterations.Meanwhile midvalues are selected in methods using bisection, so it is obvious that sometimes the same result may be returned since the same initial value is always selected.In our method, the above two methods are combined, and the initial value of a variable is determined based on its path tendency (see Definition 3).First we give the definition of branching condition.
Definition 2. Let B be the set of Boolean values {, },   be the domain of the variable in question (  ), the branching condition Br(  ,  +1 )(  ) : where   is a branching node is defined as the following formula: where rel is a relational operator,   ,   , and  are constants, and ∑  ̸ =     is the linear combination of the variables except   and is regarded as a constant.Then we can design the value selection strategies, starting from the monotonic relation between the branching condition and   .Monotonicity describes the behavior of a function in relation to the change of the input.It gives an indication whether the output of the function moves in the same direction as the input or in the reverse direction.If a branching condition is decomposed into its basic functions, then the monotonicity of the branching condition as a function can be known, and in turn the direction in which the input needs to be moved to make the function true can be determined.Following is the definition that is used for the initial value selection of variables.Definition 3. Path Tendency ∈ {positive, negative} is an attribute of a variable on a path, which is in favor of the satisfaction of all the branching conditions along the path.And it provides the information about where to select its initial value.Positive implies that a larger initial value will work better, while negative implies that a smaller initial value is better.
The calculation of the path tendency of a variable   involves the calculation of its weight on each branch (  ,  +1 ) ( ∈ [1, ]) and its path weight, denoted by   (  ,  +1 ) and   , which are calculated by formula (2) and formula (3), respectively.Consider Path tendency calculation (PTC) gleans the path tendency of each variable with   .Subsequently, initial domain calculation (IDC) works on the result of PTC.In this way, the initial value selection takes diversity and heuristics into account.The algorithms are expressed by pseudo-codes as shown in Algorithms 3 and 4.

The Hill-Climbing
Process.This part focuses on the process where interval arithmetic judges whether the value assigned to a variable leads to a conflict or not.The calculating process is illustrated in Figure 3. Interval arithmetic first receives   , the value of the current variable   , which is part of the domain of all variables before evaluating the first branching condition (denoted by  1 ) (  = [  ,   ] ∈  1 ).For the  branching nodes along the path, all the  branching conditions should be true to make the path feasible if  is traversed with   .The value of the branching condition Br(  ,  +1 )(  ) ( ∈ [1, ]) depends on two factors: (1)   , which is the domain of all variables that satisfies all the  − 1 branching conditions ahead and will be used as input for the calculation of the th branching condition; (2) D , which is the result when calculating Br(  ,  +1 )(  ) with   which satisfies the th branching condition.To be specific, according to Definition 1, since Br(  ,  +1 )(  ) is in essence a relational expression, we can calculate the domain of each variable based on the specific form of the expression.  ∩ D ̸ = ⌀ means that   ∩ D satisfies all the  − 1 branching conditions ahead and the th branching condition, ensuring that interval arithmetic can continue to calculate the remaining branching conditions.
In this process, if Br( ℎ ,  ℎ+1 )(  ) = false (1 ≤ ℎ ≤ ), which means a conflict is detected, then interval arithmetic is aborted, the reduction of   is carried out according to the result of the corresponding objective function, a new   is selected from the reduced domain   , and interval arithmetic will restart to judge whether   causes a conflict.The above procedure is like climbing a hill.Formula ( 4) is defined to calculate the objective function (  ), where ∑  =1 (  ∩ D )(  ) is the value that is calculated according to values of   and D at each branch and is a definite value.Consider (  ) = 0 implies that there is no conflict detected and   is the value judged to be appropriate for   .Otherwise   will have to be reduced according to the return value of (  ).
In the procedure of hill-climbing, the absolute value of (  ) will approximate more closely to 0, which is the objective or the peak of the hill.The algorithm is shown by pseudo-codes as shown in Algorithm 5.
When interval arithmetic fails, (  ) provides both the upper and the lower bounds of   for its reduction, determined by the sign and absolute value of (  ), respectively.Since the reduction is taken in two directions, the efficiency of the algorithm is improved greatly.

Case Study
In this section, the problem mentioned in Section 3.1 is used as an example to explain how BB-HC works.The path to be traversed is Path5 as shown in bold in Figure 1.We choose this example, because the constraints along the path are very strict for two variables.It is very obvious that {1  → 60, 2  → 40} is the only one solution to the corresponding COP.For simplicity, the input domains of both variables are set [1,100] with the size 100.The path tendency of each variable is calculated by PTC as shown in Table 2. DVO serves to determine the first variable to be instantiated as shown in Table 3, which is 1 highlighted in bold.On determining 1 to be the current variable, an initial value needs to be selected from [1,100].The retrieval of path tendency map by IDC returns positive for 1, indicating that a larger value will perform better and 70 is selected.
The calculating process of 70 for 1 is decomposed in Figure 4, and 70 is judged not to be a solution for 1 or not the peak of the hill corresponding to 1, so it is required to calculate the objective function to determine the next search step.

Experimental Analyses and Empirical Evaluations
To observe the effectiveness of BB-HC, we carried out a large number of experiments in CTS.Within the CTS framework, the PUT is automatically analyzed, and its basic information is abstracted to generate its CFG.According to the specified coverage criteria, the paths to be traversed are generated and provided for BB-HC as input.The experiments were performed in the environment of Ubuntu 12.04 with 32-bits Pentium 4 with 2.8 GHz and 2 GB memory.The algorithms were implemented in Java and run on the platform of Java Runtime Environment (JRE).The experiments include two parts.Section 7.1 presents the performance evaluation on BB-HC, and Section 7.2 tests the capability of BB-HC to generate test data in terms of coverage and makes comparisons with some currently existing static and dynamic methods.

Performance Evaluation.
The number of variables and the number of expressions (path constraints) [33,34] are two important factors that affect the performance of test data generation methods.Hence, in this part, experiments were carried out to evaluate the effectiveness of the initial value selection strategy and the hill-climbing process for varying numbers of input variables and varying numbers of expressions, and we also paid attention to the number of backtracking.Specifically, three methods were used: random initial value and no hill climbing (RI&NHC), random initial value and hill climbing (RI&HC), and heuristic initial value and hill climbing which is BB-HC.Due to the variety in generation time for different cases, the axes of generation time of both cases are normalized for simplicity.

Varying Number of Variables.
The testing of the relationship between the performance of test data generation methods and the number of variables was accomplished by repeatedly running the three methods on generated test programs having input variables  1 ,  2 , . . .,   , where  varied from 1 to 50.Adopting statement coverage, in each test, the program contained 50 if statements (equivalent to 50 branching conditions or 50 expressions along the path) and there was only one path to be traversed of fixed length, which was the one consisting of entirely true branches; that is, all the branching conditions were the same as the corresponding predicates.The expression of each if statement was a linear combination of all the  variables in the form of where  1 ,  2 , . . .,   were randomly generated numbers, either positive or negative,   ∈ {>, ≥, <, ≤, =, ̸ =}, and const[] ( ∈ [1,50]) was an array of randomly generated constants within [0, 1000].The randomly generated   and const[] should be selected to make the path feasible.This arrangement constructed the tightest linear relation between the variables.In addition, we ensured that there was at least one "=" in each program to test the equation solving capability of the methods.The programs for various values of  ranging from 1 to 50 were each tested 50 times and the average time required to generate the data for each test was recorded.The results are presented in Figure 7.
It can be seen that the average generation time of BB-HC is far less than RI&NHC and RI&HC.RI&NHC takes the longest time.The points corresponding to RI&NHC and RI&HC are not very regular, so we did not try to make fitting curves for them.For BB-HC, it is clear that the relation between average generation time and the number of variables can be represented as a quadratic curve very well and the quadratic correlation relationship is significant at 95% confidence level with  value far less than 0.05.Besides, average generation time increases at a uniformly accelerative speed as the increase of the number of variables.The differentiation of average generation time indicates that its increase rate rises by  = 1.06 − 8.682 as the number of variables increases.We can roughly draw the conclusion that generation time is very close for  ranging from 1 to 8, while it begins to increase when  is larger than 8.And according to our statistics, the numbers of backtracking conducted by BB-HC were all 0 for all the 50 cases while those of the others were not, so this search was completely backtrack-free.conditions or  expressions) and there was only one path with entirely true branches to be traversed; that is, all the branching conditions were the same as the corresponding predicates.The expression of each if statement was an expression in the form of where  1 ,  2 , . . .,  50 were randomly generated numbers either positive or negative,   ∈ {>, ≥, <, ≤, =, ̸ =}, and const[] was an array of randomly generated constants within [0, 1000].The randomly generated  V (V = 1, 2, . . ., 50) and const[] should be selected to make the path feasible.This arrangement constructed the strongest linear relation between variables.In addition, we ensured that there was at least one "=" in each program to test the equation solving capability of the methods.The programs for various values of  ranging from 1 to 50 were each tested 50 times and the average time required to generate the data for each test was recorded.The results are presented in Figure 8.
It can be seen that the average generation time of BB-HC is far less than RI&NHC and RI&HC.RI&NHC takes the longest time.The points corresponding to RI&NHC and RI&HC are not very regular, so we did not try to make fitting curves for them.For BB-HC, it is clear that the average generation time increases approximately linearly with the number of expressions and the linear correlation relationship is significant at 95% confidence level with  value far less than 0.05.As the increase of the number of expressions, average generation time increases at an even speed.And according to our statistics, the numbers of backtracking conducted by BB-HC were all 0 for all the 50 cases while those of the others were not, so this search was completely backtrack-free.
The above searches conducted by BB-HC were both completely backtrack-free, which is encouraging.Surely there are nonlinear constraints, which will sometimes cause backtracking.But according to statistical data [35,36], nonlinear  constraints in real-world programs only account for a very small proportion of program constraints, so BB-HC will be useful for most of the cases.It is safe to conclude that BB-HC functions are stably given a PUT of regular structure, which lays a solid foundation for its application in engineering.

Coverage Evaluation.
To evaluate the capability of BB-HC to generate test data in terms of coverage, we used some real-world programs to compare BB-HC with both static and dynamic methods adopted in test data generation.

Comparison with a Static
Method.This part presents the results from an empirical comparison of BB-HC with the static method [22] (denoted by "method 1" to avoid verbose description), which was implemented in CTS prior to BB-HC.Three of the test beds were from an engineering project del18i-2 at http://www.moshier.net/with numeric data types and two were from a TCP port scanner Masscan at https://github.com/robertdavidgraham/masscanwith complex data types.The comparison adopted three coverage criteria: statement, branch, and MC/DC.For each test bed, the experiments were carried out 100 times and the average coverage (AC) was used for comparison, that is, the average of achieved coverage of all tests in 100 times.The details of the comparison are shown in Table 5.
From Table 5, it can be seen that BB-HC reached higher coverage than method 1 for most of the cases as shown in bold.That is largely due to the heuristic methods utilized in BB-HC.Method 1 was unable to handle complex data types with MC/DC as the coverage criterion.But it also shows that there are some programs where BB-HC could not achieve 100% coverage.By examining those programs, we found that there are some logical operators that make the static analysis even harder, thus resulting in more difficulty in generating test data for the paths containing them.To deal with such programs will be part of our next work.

Comparison with PSO.
This part presents results from an empirical comparison of BB-HC with PSO, which is mentioned in Section 1 as a popular MHS method with relatively fast convergence speed.Following is a brief introduction to some parameters used in PSO.
Suppose the population size is  in the -dimensional search space, a particle represents a potential solution.The velocity    and position    of the th dimension of the th particle can be updated by the following formulae: where   = ( 1  ,  2  , . . .,    ) is the position of the th particle,   = ( 1  ,  2  , . . .,    ) is the velocity of particle , best   is the personal best position found by the particle assigned for dimension , and best  is the global best position of dimension .The parameters  1 and  2 are the acceleration constants reflecting the weight of stochastic acceleration terms that pull each particle towards best and best, respectively. 1 and  2 are random numbers in the range [0, 1].And a particle's velocity on each dimension is clamped to a maximum  max .Inertia weight  is used to balance the global and local search abilities and it controls the impact of history on the new velocity.The parameter setting in Table 6 is typical and customary for PSO and it is used for our comparison.
We used three real-world programs, which are the wellknown benchmark programs and have been widely adopted by other researchers.And branch coverage was taken as the adequacy criterion.For each test bed, the experiments were carried out 100 times.The coverage achieved by the two methods might be different each time and AC was used for comparison.Table 7 shows the details of the test beds and the comparison results.
Obviously BB-HC achieved 100% coverage as shown in bold on all the three benchmark programs, which are rather simple programs for BB-HC and it outperformed the algorithm in comparison.Two factors contribute to the better performance of BB-HC.One is that the initial values of variables are selected by heuristics on the path, so BB-HC reaches a relatively high coverage for the first round of the search.The other is that BB-HC coordinates BB and HC flexibly to make sure that solution can be found for each variable efficiently.

Conclusion
The increasing demand of testing large-scale real-world programs makes the automation of the testing process necessary.In this paper, the problem of path-wise test data generation () which is a basic problem in software testing is reformulated as a constraint optimization problem (COP), and a hybrid intelligent algorithm BB-HC is presented to solve it, hybridizing two search methods: branch and bound (BB) and hill climbing (HC).BB is used as the global search method and HC is dedicated to local search.They are highly integrated by dynamic variable ordering (DVO) and the backtracking operation.With a heuristic rule to break ties, DVO permutates variables to be instantiated.The monotonicity analysis of branching conditions is applied in the selection of the initial values by path tendency calculation (PTC) and initial domain calculation (IDC).Starting from the heuristically selected initial value, the process of determining a fixed value for a specified variable resembles climbing a hill, the peak of which is the value judged by interval arithmetic that does not cause conflict.To facilitate the search procedure, the solution space is represented as state space.Empirical experiments were conducted to evaluate the performance of BB-HC.The results show that it searches in a basically backtrack-free manner for linear constraints, generates test data on programs of complex structure and strong constraints with promising performance, and outperforms some current static and dynamic methods in terms of coverage.The application of BB-HC in engineering proves its effectiveness.
Our future research will involve how to generate test data to reach high coverage.The programs where BB-HC did not achieve 100% coverage, especially, will be put more emphases on.We will also study how coverage criteria, generation approach, and system structure jointly influence test effectiveness.The effectiveness of the generation approach continues to be our primary work.

Figure 1 :
Figure 1: Program test and its corresponding CFG.

Figure 2 :
Figure 2: Overview of BB-HC for searching the test data.

7. 1 . 2 .R 2 Figure 7 :
Figure 7: The relationship between generation time and the number of variables.

Figure 8 :
Figure 8: The relationship between average generation time and the number of expressions.

Table 1 :
Some methods and their description used in this paper.

Table 4 :
The hill-climbing process for x1.

Table 5 :
The details of comparison with method 1.

Table 6 :
Parameter setting for PSO.Acceleration constants  1 and  2  1 =  2 = 2 Maximum velocity  max Set according to the input space of the tested program, such as  max = 24 for the program triangleType

Table 7 :
The details of comparison with PSO.