Locating Minimal Fault Interaction in Combinatorial Testing

Combinatorial testing (CT) technique could significantly reduce testing cost and increase software system quality. By using the test suite generated by CT as input to conduct black-box testing towards a system, we are able to detect interactions that trigger the system’s faults. Given a test case, there may be only part of all its parameters relevant to the defects in system and the interaction constructed by those partial parameters is key factor of triggering fault. If we can locate those parameters accurately, this will facilitate the software diagnosing and testing process. This paper proposes a novel algorithm named complete Fault Interaction Location (comFIL) to locate those interactions that cause system’s failures and meanwhile obtains the minimal set of target interactions in test suite produced by CT. By applying this method, testers can analyze and locate the factors relevant to defects of system more precisely, thus making the process of software testing and debugging easier and more efficient. The results of our empirical study indicate that comFIL performs better compared with known fault location techniques in combinatorial testing because of its improved effectiveness and precision.


Introduction
Combinatorial testing could significantly reduce test cost and increase quality of software system [1].It has been proved to be effective especially in a software system where faults come from the interactions of its parameters [2].Combinatorial testing could detect the parameter interactions that trigger the system faults rather than localize it.If a test case triggers the fault of a system, it reflects that there exists one or more defects in the program [3][4][5].However, not all parameters in the test case are relevant to defects.If we are able to locate a parameter in the test case that is relevant to the fault, we can apply this useful information to facilitate the debugging process.
In combinatorial testing, the study on fault interaction technique could be categorized into adaptive method and nonadaptive method according to the dependence between additional test cases and running results [6,7].For nonadaptive methods, the generation of additional test cases does not rely on the running result of original test cases.Colbourn and McClary [8] present a nonadaptive method named Locating and Detecting Arrays (LDA).Based on basic known information such as parameters' number, values, and faults' number, the method applies -way Locating and Detecting Array to locate faults in software system.Martínez et al. [9] present a self-adaptive algorithm based on Errors Locating Arrays (ELA) and analyze the algorithm complexity.However, this method could only be used under the condition that the value's number of each parameter in software is not larger than 2. On the basis of LDA and ELA, Hagar et al. [1] propose the method of Partial Covering Array (PCA) which could be used in the software with known safe value, and it presents a new combinatorial structure to generate ELA.
Another category is known as adaptive method [10], whose generation of additional test cases depends on the information given by the execution of original test cases.Zeller and Hildebrandt [11] present a typical adaptive method named Delta Debugging.The main idea of this method is to identify the interaction that is relevant to the faults by modifying the input parameters.For a test case that triggers the fault, modify some of its input parameters; if the modified test case still triggers the fault, then the modified parameters are irrelevant to fault; otherwise, the modified parameters are related to fault.Based on Delta Debugging, Z. Zhang and J. Zhang [12] present a method named FIC.Similar to Delta Debugging, FIC modify one parameter in a test case with  parameters once.Then repeat this process  times and the minimal fault interaction is calculated afterwards.A restraint of Delta Debugging based methods is that they could only be applicable to the test case containing one minimal fault interaction, but, in real-world programs, more than one minimal fault interaction is very common.Ghandehari et al. [13] present a fault localization tool based on a failureinducing combinations algorithm of [14]; it leverages the notion of inducing combination to locate the faults inside the source code.
To locate each interaction related to faults, in this paper we present a new complete Fault Interaction Location (com-FIL) method.The method includes 2 steps to locate minimal fault interactions.First, after the execution of original test cases, the test cases will be divided into 2 sets: FTS and PTS; the former contains test cases that trigger faults, while the latter contains test cases that do not trigger faults.Then we identify the set of interactions covered by FTS but not covered by PTS and name this set as candidate faulty interaction set (canFIS).Second, we generate additional test cases to select interactions in canFIS and then the minimal fault interaction set is obtained finally.
For example, a model named BCSD is shown in Table 1.The model has 4 input parameters: client browser, client operating system (client OS), server operating system (server OS), and server database.
Definition 5 (minimal fault interaction).For an interaction , if for any interaction  ∈ subSet() − {},  is not fault interaction, then  is minimal fault interaction.
Definition 6 (additional test case function).For an interaction , additional test case function generates a test case  that satisfies the following: if () = fail, then  is a fault interaction; otherwise,  is not a fault interaction, denoted as addTF : {ALL INTERACTIONS IN SYSTEM} →  all .

Basic Assumptions
Assumption 1.The parameters in the system are independent of each other.In many systems, there are constraint conditions between parameters.In [15] Cohen et al. study the method to generate test cases under constraint conditions.However, the constraint conditions among parameters are not the focus of fault location method, so we do not consider the dependence between parameters in this paper.
Assumption 2. If an interaction causes the fault of the system, the test cases containing this interaction will trigger the fault necessarily.
Assumption 3. The test cases generated by additional test case function addTF do not contain the minimal fault interaction.
For an interaction , addTF generate a test case ; if  trigger the fault, then this fault is caused by interactions in the set of subinteractions of .If  contains the minimal fault interaction, the complexity of fault location will be increased.So we avoid considering it in this paper.This assumption will be explained later in the implementation of addTF.
The algorithm comFIL presented by this paper is based on the assumptions above.The algorithm is effective only under the condition that all these 3 assumptions are tenable.

Basic Inferences
Inference 1.The test case that contains fault interaction will trigger the fault of system necessarily and the test case that does not trigger the fault does not contain the fault interaction.
Inference 2. If an interaction is not a fault interaction, any of its subinteractions is not fault interaction.

Theorem 3. Use MFIS which denotes all minimal fault interactions set of 𝐶𝐴(𝑚, 𝑡
Proof.This could be directly concluded from Theorem 2.

comFIL Algorithm
3.1.Description of comFIL.Theorem 3 and Definition 5 provide a screening method for obtaining the set of all minimal fault interactions from the set of candidate fault interactions; we name the set as complete Fault Interaction Location and denote it as comFIL for short.When an interaction  proves to be the fault interaction, we delete all the parent interactions of  from canFIS except , while we delete all the child interactions of  from canFIS except  when it proves not to be the fault interaction.
Under the three assumptions in Section 2, algorithm comFIL is based on Theorems 2 and 3 and Definition 6 as theoretical core.According to Theorem 2, we calculate the set of candidate fault interactions from the combination test cases.And, based on Theorem 3 and Definition 6, we screen out the set of minimal fault interactions from the set of candidate fault interactions.
The basic framework of comFIL algorithm is shown in Algorithm 1.It has two input parameters CA and () and an output parameter canFIS, which is the minimal fault interaction set.Generally the algorithm process could be divided into 2 phases.
Generate the canFIS for fault location.Firstly, the algorithm counts the number of times that the fault interaction exists in passed or failed test cases separately (steps (2) to ( 12)); then the canFIS is screened out from test case set CA (steps ( 13) to ( 15)).
Generate the minimal fault interaction set (steps ( 16) to (20)).Steps ( 17) and (18) describe the following: if a schema  is a fault interaction, then we delete all its superinteraction except for  in canFIS.Steps ( 19) and (20) describe the following: if an interaction is not a fault interaction, then we delete all elements in its subinteraction in canFIS.

Implementation of Key Functions of Algorithm comFIL
(1) Implementation of subSet.The input of function subSet is an interaction , and its output is the subinteraction set of .For -way interaction A binary string is corresponding to a certain decimal number, so, for -way interaction   , its subinteraction set is the corresponding binary stings of integer set [1, 2  − 1].
(2) Implementation of addTF.The input of function addTF is an interaction , and its output is an additional test case .For the input , the number of its ways is not ; that is,  is not a test case.From Assumption 3 we can know that  ⊂ ; moreover,  is the only minimal fault interaction among all subinteractions of .
From Theorem 4 we know that when generating addition test case , we will first ensure that there is no minimal fault interaction in 2  − 2  subinteractions.However, if  is very large and  is relatively small, the numbers of interactions to be examined and additional test cases to be generated are very large.For convenience, we assume each parameter has a value, which does not associate with any fault (i.e., this value does not belong to any minimal fault interaction).This value is denoted as safe value.
In comFIL, the number of times that each interaction appears in passed test cases and failed test cases needs to be recorded.For a one-way interaction, that is, a value of a parameter, the value /( + ) is named the fault ratio of the value.We simply consider that the smaller the fault ratio of a parameter's value is, the more likely this value would be the safe value of the parameter.
When it comes to testing an object as we could not get the safe value of each parameter before testing, according to Theorem 4, normally there will be an enormous number of interactions to be tested for the additional test cases generated by the interaction, resulting in the fact that the cost is very high.However, when there only exists a small amount of software defects (e.g., the software in the Delta testing phase), the number of the minimal fault interactions in the combination test cases is very small and thus there is high possibility for the existence of safe value in the parameters.
The process of generating additional test cases is as follows.
First, if a value of a parameter   (1 ≤  ≤ ) appears in interaction , the value of   in additional test case  is   itself; otherwise it will be assigned by the value that has the smallest safe value.If many values of   have the same smallest safe value, then   will be assigned randomly among these values.
Second, to check , if it does not belong to test suite CA, then  is used as an additional test case; otherwise,  will be regenerated.The regeneration process is to modify a parameter's value in  by assigning this parameter another value whose safe value is the smallest or second smallest, thus making sure  is a subinteraction of .Then we repeat this process till  does not belong to CA.
Table 3 shows the results of first process, in which the first column refers to the number of interactions while the first row represents the parameter of each interaction and the remaining rows represent the respective value of the parameters.Each row in Table 3 indicates an interaction from row 2 on.For example, interaction 1 {.0,.0, .0} is shown in row 2 (I # 1).
The second process is to generate minimal fault interaction set, as shown in Table 4.The second column of Table 4 describes the interactions contained by canFIS in each step.
The third column in Table 4 shows the interactions under testing.The fourth column shows the additional test cases for undertesting interactions.Columns 5 and 6 show the outputs of additional test cases and the set consisting of the elements deleted from canFIS, respectively.We can conclude from Table 4 that the whole process takes 14 steps and each step generates an additional test case.The minimal fault interaction set {5, 20} is screened out at step (15) at last.Meanwhile we can get a conclusion that the number of steps the process takes depends on the order of interactions being tested.For example, if, in step (13) test interaction 22, the element to delete in canFIS is {21, 22}, then the minimal fault interaction set could be generated directly.The whole process only takes 13 steps and needs only 13 test cases.Therefore, an optimized interaction test order could reduce the generating of test cases.In this paper, we do not carry on the discussion and simply consider the order is random.The method presented by Ghandehari et al. [13] give the result that contains 9 interactions in the set; however, the minimal fault interaction set contains 2 interactions.This shows that the method comFIL is more precise than the method proposed by Ghandehari et al.

Empirical Study
In algorithm comFIL, we need to record the number  of each interaction existing in passed test cases and the number  of each interaction existing in failed test cases.For a 1-way interaction, the value of /( + ) is called fault ratio.The smaller the value is, the more probable the value is a safe value of the parameter.

Additional Test Case Generation.
In our experiment, the additional test case generation follows the following two steps: (I) For an input parameter   ,  ∈ [1, ], if its value is in interaction , then the value of   in  has the same value; otherwise,   in  is assigned by its value that has the smallest fault ratio.
(II) Check whether  belongs to CA; if not, then  is used as additional test case; otherwise, change a value of   in  till  does not belong to CA.

Evaluation Criteria.
We use fault ratio as our evaluation criteria.In comFIL algorithm, we need to record the number  of each interaction that exists in passed test cases and the number  of each interaction that exists in failed test cases.
For a 1-way interaction, the value of /( + ) is called fault ratio.The smaller the value is, the more probable the value will be a safe value of the parameter.

Test Oracle.
Since the feature of these programs is not a concern in this paper, they are assumed to be correct.Then the standard and fault versions are compiled and run with test case  as input; if the outputs of standard and fault versions are different, we believe the test case triggers the fault; that is, () = fail; otherwise, () = pass.

Experiment I
(1) Experiment Objective.We use six C programs (comdline, count, nametbl, ntree, series, and tokens [12]) as test samples and input parameter model presented by Z. Zhang and J.
Zhang [12].Table 5 shows the basic information of these programs.
The second column represents the number of lines without comments in each program, while column 3 refers to their input models.For example, comdline (9; (2 1 , 3 4 , 4 1 , 6 2 , 15 1 )) means comdline has 9 parameters, in which 4 parameters have 3 values, 2 parameters have only 1 value, 1 parameter has 2 values, 1 parameter has 4 values, 2 parameters have 6 values, and 1 parameter has 15 values.Count (6; (2, 2, 3, 3, 3, 3)) can also be represented as count (6; (2 2 , 3 4 )).(2) Result of the Experiment.Table 6 indicates the detailed test results of the experiment; the data is mainly focused on test steps (additional test cases) and radix.In Table 6, column 2 shows the number of test cases.Column 3 represents the different fault versions of each program.Column 4 shows the size of each canFIS.Column 5 refers to the number of test cases needed by each canFIS.Columns 6∼11 represent the number of -way minimal fault interactions and the number of -way fault interactions to be selected, respectively;  could be identified by the column title.For example, the first fault version of comdline is shown in Table 6.It means that its canFIS's size is 1663, and it needs 149 test cases.The numbers of 1∼5-way minimal fault interactions are 1; 0; 0; 0; 0, respectively, and the minimal fault interaction larger than 5 ways is 0. The numbers of interactions being tested for computing each canFIS are 1; 15; 20; 22; 27; 64, respectively.Table 7 is an experiment result compared with FIC algorithm presented by Z. Zhang and J. Zhang [12].
Table 7 shows in most situations comFIL could generate a minor minimal fault interaction set compared to FIC.However, to locate the minimal fault interaction in software, comFIL need the involvement of more additional test cases than FIC algorithm.The main cause for that is that, for a test case that triggers the fault, it has 2  −1 child interaction; if the number of minimal fault interactions covers the array and test cases, we can not exclude the idea that most interactions of the child interaction set is fault interaction during the initial interaction fault detection.
We may find that the result of ntree is not the minimal fault interaction set of CA.That is because many "assert" statements exist in its source codes.These statements make the program exit before data stream reaches the bug; that is, the program does not satisfy Assumption 2. Therefore, the comFIL algorithm failed to identify the minimal fault interaction set of ntree.

Experiment II.
Generally, we consider a technique more effective if it could generate fewer test cases and be highly active.In this section, in order to verify the complexity of comFIL and show the additional test case count needs in different testing models, two simulated experiments will be examined.All the programs are developed with special design in order to better match the 3 assumptions of comFIL.
(1) Simulated Experiment 1.In this simulated experiment, we developed a simple program, Animal; it has an input model SUT(7; (5 7 )); that is, the system has 7 parameters, and each of them has 5 values.We produce 5 fault versions by injecting fault in different position; each has 2∼6 minimal fault interactions.The result of the experiment is shown in Figure 1.The histogram in red shows the radix of canFIS while the data in green indicates the steps (additional test cases) to take to screen out the canFIS.
The simulated experiment result presented by Figures 1  and 2 shows the number of additional test cases is decreasing while the program could better match the 3 assumptions of this paper.Furthermore, with the increment of minimal fault interactions' count, the Ratio becomes smaller; this means the higher probability of getting safe value for input parameters.However, when the minimal fault interaction increases to a special value such as 9 in Figure 2(b), the Ratio would rise; this is because when the number of input parameters becomes too large, too many additional test cases will be generated; this would affect the efficiency and lead the Ratio to rise.

Experiment Conclusion.
We can get the following conclusion about comFIL algorithm from both real program and simulated experiment results.
(1) comFIL Could Obtain a Minimal Fault Interaction Set.From the result of experiment I-Table 7, we can see that comFIL could generate a minor minimal fault interaction set compared to FIC.
(2) comFIL Has Higher Capability of Getting Safe Value for Parameters.The result of experiment II shows, with the increment of minimal fault interactions' count, the Ratio becomes smaller; this means the higher probability of getting safe value for input parameters.
(3) The Efficiency of comFIL Would Be Affected While the Number of Input Parameters Is Too Large.When the number of input parameters is too large, the additional test cases size would be very huge; this is a limitation of comFIL, but we found that [14] presents an approach of leveraging the notion of inducing combination to locate the faults inside the source code; the combination of these two approaches would potentially benefit combinatorial testing.
(4) The Assumptions Are Common but Sometimes They Do Not Establish in Real-World Programs.Although the assumptions of comFIL are common in combination testing study area, but sometimes they do not establish in real-world programs; this would limit the application of the algorithm and need to be further studied.

Conclusion.
In this paper, we present a new combinatorial testing algorithm named comFIL, which could screen out the minimal fault interaction set of test cases.
The main contributions of this paper are listed as follows: (1) Summarizing the basic idea of the previous fault interaction location techniques, including their advantages and disadvantages.
(2) Proposing a novel fault interaction location method named comFIL (complete Fault Interaction Location) which has more powerful functionalities and performs more precisely compared with other fault location techniques.(3) Presenting the basic idea and theory model for com-FIL and illustrating 2 key points when implementing comFIL.(4) Using 6 programs as sample and 2 additional simulated experiments to verify the precision and effectiveness of comFIL.
If we can not obtain the safe value of each parameter before testing, the cost in generating an additional test case for an interaction is very high.However, if there are only a few bugs in a program, the number of minimal fault interactions Step Radix is small and it is more probable for a parameter to be in its safe value.When we calculate the safe values of each parameter with proper methods, it is almost impossible that generated additional test cases do not satisfy Assumption 3.Even for a test case generated for an interaction randomly, its possibility that it does not satisfy Assumption 3 is rather low.So almost every testing method in combinatorial testing could only work effectively when applied in program with relatively less faults.
The theory and experiments indicate that comFIL is more accurate in fault localization compared with other algorithms in terms of combinatorial testing.However, comFIL also has its deficiency: (1) Assumption 3 is very strong and (2) the number of additional test cases to be generated is very large.

Future Work.
The future work will include three aspects: (1) for algorithm comFIL, lots of additional test cases should be generated; however, the order of the interactions being tested will influence the number of additional test cases generated.Thus optimizing the order of interactions being tested to reduce the number of additional test cases will be an interesting direction to explore further in the future.(2) The objective of algorithm comFIL is to generate the minimal fault interaction set, while how to use the minimal fault interaction set to further locate bugs will also be a significant topic to study further afterwards.(3) Exploring more effective combinatorial testing method according to different type of software (such as web service) is also worthwhile of further study.

Figure 1 :Figure 2 :
Figure 1: Test result of Simulated Experiment 1.(a) Ratio of steps and radix.(b) Step and radix.

Table 1 :
The input model of a simple system.
CA: test case set (): test result Output: canFIS: the minimal fault interaction set of CA. Inputs:

Table 2 :
Test result of 2-way coverage.The program foo is tested on 2-way coverage in this paper.Test case set and test result are shown in Table 2.The algorithm comFIL uses Table

Table 3 :
The canFIS of CA.

Table 4 :
The step of computing canFIS.

Table 6 :
Test result of standard program.

Table 7 :
Compare with FIC.