Combining slicing and constraint solving for better debugging: the CONBAS approach

Although slices provide a good basis for analyzing programs during debugging, they lack in their capabilities providing precise information regarding the most likely root causes of faults. Hence, a lot of work is left to the programmer during fault localization. In this paper, we present an approach that combines an advanced dynamic slicing method with constraint solving in order to reduce the number of delivered fault candidates. The approach is called Constraints Based Slicing (Conbas). The idea behind Conbas is to convert an execution trace of a failing test case into its constraint representation and to check if it is possible to find values for all variables in the execution trace so that there is no contradiction with the test case. For doing so, we make use of the correctness and incorrectness assumptions behind a diagnosis, the given failing test case. Beside the theoretical foundations and the algorithm, we present empirical results and discuss future research. The obtained empirical results indicate an improvement of about 28% for the single fault and 50% for the double-fault case compared to dynamic slicing approaches.


Introduction
Debugging, that is, locating a fault in a program and correcting it, is a tedious and very time-consuming task that is mainly performed manually. There have been several approaches published that aid the debugging process. However, these approaches are hardly used by programmers except for tools allowing to set breakpoints and to observe the computation of variable values during execution. There are many reasons that justify this observation. In particular, most of the debugging tools do not smoothly integrate with the software development tools. In addition, debuggers fail to identify unique root causes and still leave a lot of work for the programmer. Moreover, the approaches can also be computationally demanding, which prevents them from being used in an interactive manner. In this paper, we do not claim to solve all of the mentioned problems. We discuss a method that improves debugging results when using dependence-based approaches like dynamic slicing. Our method makes use of execution traces and the dependencies between the executed statements. In contrast to slicing, we also use symbolic execution to further reduce the number of potential root causes.
In order to introduce our method, we make use of the program numfun that implements a numeric function. The program and a test case are given in Figure 1. When executing the program using the test case, numfun returns a value of 10 for variable f, which contradicts the expectations. Since the value for variable g is computed correctly, we only need to consider the dependencies for variable f at position 6. Tracing back these dependencies, we are able to collect statements 6, 4, 3, 2, and 1 as possible fault candidates. Lines 7 and 5 can be excluded because both lines do not contribute to the computation of the value for f. The question now is can we do better? In the case of numfun, we are able to further exclude statements from the list of candidates. For example, if we assume that the statement in Line 4 is faulty, the value of y has to be computed in a different way. However, from Statement 7, the value of y = 6 can be derived using the value of z = 6 and the expected outcome g = 12. Knowing the value of y, we are immediately able to derive a value for f = 6 + 6 -2 = 10 and again we obtain a contradiction with the expected value. As a consequence the assumption that Line 4 is faulty alone cannot be true and must be retracted.
Using the approach of assuming correctness and incorrectness of statements and proving consistency with the expected values, we are able to reduce the diagnosis candidates to Lines 1, 2, 3, and 6. It is also worth to mention that we would also be able to remove statements 1 and 2 from the list of candidates. For this purpose we only have to check whether a different value of cond would lead to a different outcome or not. For this example, assuming cond to be false also leads to an inconsistency. However, when using such an approach possible alternative paths have to be considered. This extension makes such an approach computationally more demanding.
From the example we are able to summarize the following findings. (1) Using data and control dependences reduces the number of potential root causes. Statements that have no influence on faulty variables can be ignored. (2) During debugging, we make assumptions about the correctness or incorrectness of statements. From these assumptions, we try to predict a certain behavior, which should not be in contradiction with the expectations. (3) Backward reasoning, that is, deriving values for intermediate variables from output variables and other variables, is essential to further reduce the number of fault candidates. In this case, statements are not interpreted as functions that change the state of the program but as equations. The interpretation of statements as equations allows us to compute the input value from the output value. (4) Further reductions of fault candidates can be obtained when choosing alternative execution paths, which is computationally more demanding than considering the execution trace of a failing test case only.
In this paper, we introduce an approach that formalizes findings (1)- (3). However, finding (4) is not taken into consideration, because of the resulting computational overhead. In particular, we focus on providing a methodology that allows for reducing the size of dynamic slices. Reducing the size of dynamic slices improves the debugging capabilities of dynamic slicing. We introduce the basic definitions and algorithms and present empirical results. We gain a reduction of more than 28% compared to results obtained from pure slicing. Although the approach increases the time needed for identifying potential root causes, the overhead can be neglected at least for smaller programs. It is also worth noting that we do not claim that the proposed approach is superior compared to all other debugging approaches. We belief that a combination of approaches is necessary in practice. Our contribution to the field of debugging is in improving dynamic slicing with respect to the computed number of bug candidates. This paper is based on previous work [1], where the general idea is explained. Now, we focus on the theoretical foundations and an extended empirical evaluation. In the reminder of this paper, we discuss related work in Section 2. We introduce the basic definitions, that is, the used language, execution traces, relevant slicing, model-based debugging, and the conversion into a constraints system in Section 3. We formalize our approach, named constraints based slicing (Conbas), in Section 4. In Section 5, we apply Conbas to several example programs. We show that Conbas is able to reduce the size of slices for programs with single and double faults without losing the fault localization capabilities. In addition, we apply Conbas to circuits. In Section 6 we discuss the benefits and limitations of the approach as well as future work. Finally, we conclude the paper in Section 7.

Related Research
Software debugging techniques can be divided into fault localization and fault correction techniques. Fault localization techniques focus on narrowing down possible fault locations. They comprise spectrum-based fault localization, delta debugging, program slicing, and model-based software debugging.
(i) Spectrum-based fault localization techniques are based on an observation matrix. Observation matrices comprise the program spectra of both passing and failing test cases. Harrold et al. [2] give an overview of different types of program spectra. A high similarity of a statement to the error vector indicates a high probability that the statement is responsible for the error [3]. There exist several similarity coefficients to numerically express the degree of similarity, for example, Zoeteweij et al. [4] and Jones and Harrold [5]. Empirical studies [3,6] have shown that the Ochiai coefficient performs best. Several techniques have been developed that combine spectrum-based fault localization with other debugging techniques, for example, Barinel [7] and Deputo [8].
(ii) Delta debugging [9] is a technique that can be used to systematically minimize a failure-inducing input.
The basic idea of delta debugging is that the smaller the failure-inducing input, the less program code is covered. Zeller et al. [10,11] adopted delta debugging to directly use it for debugging.
(iii) Program slicing [12] narrows down the search range of potentially faulty statements by means of data and control dependencies. A slice is a subset of program Advances in Software Engineering 3 statements that directly or indirectly influence the values of a given set of variables at a certain program line. A slice behaves like the original program for the variables of interest. Slices will be discussed in detail in Section 2.1. (iv) Model-based software debugging derives from model-based diagnosis, which is used for locating faults in physical systems. Mayer and Stumptner [13] give an overview of existing model-based software debugging techniques. Some of these techniques will be discussed in detail in Section 2.2.
Fault correction techniques focus on finding solutions to eliminate an observed misbehavior. They comprise, for instance, genetic programming techniques.
Genetic programming deals with the automatic repair of programs by means of mutations on the program code. Arcuri [14] and Weimer et al. [15,16] deal with this debugging technique. Genetic programming often uses fault localization techniques as a preprocessing step.
In the following, we will discuss techniques that are related to our approach, that is, slicing and model-based debugging, in detail.

Slicing.
Weiser [12] introduced static program slicing as a formalization of reasoning backwards. The basic idea here is to start from the failure and to use the control and data flow of the program in the backward direction in order to reach the faulty location. Static program slices tend to be rather large. For this reason, Korel and Laski [17] introduced dynamic program slicing which relies on a concrete program execution. Dynamic slicing significantly reduces the size of slices. Occasionally, statements which are responsible for a fault are absent in the slice. This happens when the fault causes the nonexecution of some parts of the program. Relevant slicing [18] is a variant of dynamic slicing, which eliminates this problem. A mentionable alternative to relevant slicing is the method published by Zhang et al. [19]. This method introduces the concept of implicit dependencies. Implicit dependencies are obtained by predicate switching. They are the analog to potential data dependencies in relevant slicing. The obtained slices are smaller since the use of implicit dependencies avoids a large number of potential dependencies.
Sridharan et al. [20] identify data structures as the main reason for too large slices. They argue that data structures provided by standard libraries are welltested and thus they are uncommonly responsible for observed misbehavior. Their approach, called Thin Slicing, removes such statements from slices.
Gupta et al. [21] present a technique that combines delta debugging with forward and backward slices. Delta debugging is used to find the minimal failure-inducing input. Forward and backward slices are computed for the failure-inducing input. The intersection of the forward and backward slices results in the failure-inducing chop. This technique requires a test oracle in contrast to Conbas.
Zhang et al. [22] introduced a technique that reduces the size of dynamic slices via confidence values. These confidence values represent the likelihood that the corresponding statement computed the correct value. Statements with a high confidence value are excluded from the dynamic slice. Similar to Conbas, this approach requires only one failing execution trace. However, it requires one output variable with a wrong value and several output variables where the computed output is correct. In contrast, Conbas requires at least one output variable with a wrong value, but no correctly computed output variables.
Other mentionable work of Zhang et al. includes [23][24][25]. In [24], they discuss how to reduce the time and space required for saving dynamic slices. In [23], they evaluate the effectiveness of dynamic slicing for locating real bugs and found out that most of the faults could be captured by considering only data dependencies. In [25], they deal with the problem of handling dynamic slices of long running programs.
Jeffrey et al. [26] identify potential faulty statements via value replacement. They systematically replace the values used in statements so that the computed output becomes correct. The original value and the new value are stored as an interesting value mapping pair (IVMP). They state that IVMPs typically occur at faulty statements or statements that are directly linked via data dependences to faulty statements. They limit the search space for the value replacement to values used in other test cases. We do not limit the search space to values used in other test cases. Instead, our constraint solver determines if there exist any values for the variables in an abnormal statement so that the correct values for the test cases can be computed. On the one hand, our approach is computationally more expensive, but on the other hand it does not depend on the quality of other test cases. As Jeffrey et al. stated, the presence of multiple faults can diminish the effectiveness of the value replacement approach. In contrast, the Conbas approach is designed for handling multiple faults.
Many other slicing techniques have been published. For a deeper analysis on slicing techniques the reader is referred to Tip [27] for slicing techniques in general and to Korel and Rilling [28] for dynamic slicing techniques.

Model-Based Software
Debugging. Our work builds on the work of Reiter [29] and Wotawa [30]. Reiter describes the combination of slicing with model-based diagnosis. Wotawa proves that conflicts used in model-based diagnosis for computing diagnoses are equivalent to slices for variables where the expected output value is not equivalent to the computed one.
Nica et al. [31] suggest an approach that reduces the number of diagnoses by means of program mutations and the generation of distinguishing test cases. Our approach and [31] differ in two major aspects. First, our approach uses the execution trace instead of the source code. Thus, we do not have to explicitly unroll loops. Second, we use a constraint solver to check whether a solution can be found. The diagnosis candidates are previously computed via 4 Advances in Software Engineering the hitting set algorithm. In contrast, Nica et al. [31] use a constraint solver to obtain the diagnoses directly.
Wotawa et al. [32] present a model-based debugging approach which relies on a constraint solver as well. They show how to formulate a debugging problem as a constraint satisfaction problem. Similar to [31], they use source code instead of execution traces. Other related work includes research in applying model-based diagnosis for debugging directly, for example, [33,34] and research in applying constraints for the same purpose, for example, [35].

Basic Definitions
In this chapter, we introduce the basic definitions that are necessary for formalizing our approach. Without restricting generality we make some simplifying assumptions like reducing the program language to a C-like language and ignoring arrays and pointers. However, this language is still Turing-complete. The reason for the simplification is to focus on the underlying ideas instead of solving purely technical details. We start this chapter with a brief introduction of the underlying programming language L. We define execution traces and dynamic slices formally. Afterwards, we define test cases and the debugging problem. Finally, we introduce model-based software debugging and the conversion of execution traces into their constraint representation.

3.1.
Language. The syntax definition of L is given in Figure 2. The start symbol of the grammar in Bacchus-Naur form (BNF) is P. A program comprises a sequence of statements B. In L, we distinguish three different types of statements: (1) the assignment statement, (2) the ifthen-else statement, and (3) the while statement. In the following, we will refer to if-then-else statements and while statements as conditional statements (or conditionals) and to the conditions in conditional statements as test elements. The right side of an assignment has to be a variable (id). The name of the variable can be any word except a keyword. An expression is either an integer (num), a truth value (true, false), a variable, or two expressions concatenated with an operator. An integer optionally starts with a "−" (if a negative integer is represented) followed by a sequence of digits (0, 1, . . . , 9). We do not introduce data types, but we assume that the use of Boolean and integer values follow the usual expected type rules. We further assume that comments start with // and go to the end of a line. The program numfun ( Figure 1) gives an example for a program written in L.
After defining the syntax of L, we have to define its semantics. In this section, we rely on an operational definition. For this purpose, we introduce an interpretation function · : L × Σ -→ Σ ∪ {⊥}, which maps programs and states to new states or the undefined value ⊥. In this definition, Σ represents the set of all states. A concrete state ω ∈ Σ specifies values for the variables used in the program. We call a state also a variable environment. Hence, ω itself is a function ω : VARS → DOM, where VARS denote the set of variables and DOM its domain comprising all possible values. Note that we also represent ω ∈ Σ as a set  The definition of the semantics of L is given in Figure 3. We first discuss the semantics for conditions and expressions. For this purpose, we assume that num (id) represents the lexical value of the token num (id) used in the definition of the grammar. An integer num is evaluated to its corresponding value num * ∈ Z M and the truth values are evaluated to their corresponding values in B. A variable id is evaluated to its value specified by the current variable environment ω. Expressions with operators are evaluated according to the semantics of the used operator. After defining the semantics of the expressions, we define the semantics of the statements in L in a similar manner. A sequence of statements, that is, the program itself or a sub-block of a conditional or while statement, S 1 . . . S n is evaluated by executing the statements S 1 to S n in the given order. Each statement might change the current state of the program. An assignment statement changes the state for a given variable. All other variables remain unchanged. An ifthen-else statement allows for selecting a certain path (via block B 1 or B 2 ) based on the execution of the condition. A while-statement executes its block B until the condition evaluates to false. Therefore, the formal definition of the semantics is very similar to the semantics definition of an ifthen-else statement without else-branch. In order to finalize the definition of the semantics of L, we assume that if a program does not terminate or in case of a division by zero, the semantics function returns ⊥. Moreover, we further assume that all variable values necessary to execute the program are known and defined in ω ∈ Σ.
When executing the program numfun ( Figure 1) on the state the semantics function · on I returns the state 3), (e, 2), where the value for f contradicts the expected value for the same variable.
When obtaining a result that is in contradiction with the expectations someone is interested in finding and fixing Advances in Software Engineering 5 Semantics of expressions: the fault, that is, locating the statements that are responsible for the faulty computation of a value and correcting them. Weiser [12] introduced the idea to support this process by using the dependence information represented in the program. Weiser's approach identifies those parts of the program that contribute to faulty computations. Weiser called these parts a slice of the program. In this paper, we use extensions of Weiser's static slicing approach and consider the dynamic case where only statements, which are considered in a particular test run, are executed. In order to define dynamic slices [17] and further on relevant slices [18] we first introduce execution traces.
Definition 1 (execution trace). An execution trace of a program Π ∈ L and an input state ω ∈ Σ are a sequence s 1 , . . . , s k , where s i ∈ Π is a statement that has been executed when running Π on test input ω, that is, calling Π ω.
For our running example numfun, the execution trace of the input ω I is illustrated in Figure 4 and comprises the statements 1-7.
We now define dependence relations more formally. For this purpose, we introduce the functions DEF and REF, where DEF returns a set of variables defined in a statement and REF returns a set of variables referenced (or used) in the statement. Note that DEF returns the empty set for conditional statements and a set representing the variable on the left side of an assignment statement. Using these functions we define data dependencies as follows.
Definition 2 (data dependence). Given an execution trace s 1 , . . . , s k for a program Π and an input state ω ∈ Σ, Beside data dependences, we have to deal with control dependences representing the control flow of a given execution trace. In L, there are only if-then-else and while statements that are responsible for the control flow. Therefore, we only have to consider these two types of statements.
Definition 3 (control dependence). Given an execution trace s 1 , . . . , s k for a program Π and an input state ω ∈ Σ, an element of the execution trace s j is control dependent on a conditional statement s i with i < j, that is, s i → C s j , if and only if the execution of s i causes the execution of s j .
In the previous definition the term cause has to be interpreted very rigorously. If the condition of the while statement executes to TRUE, then all statements of the outermost sub-block of the while-statement are control dependent. If the condition evaluates to FALSE, no statement is control dependent because the first statement after the whilestatement is always executed regardless of the evaluation of the condition. Please note, that we do not consider infinite loops. They are not in the scope of this paper. For if-then-else statements the interpretation is similar. If the condition of an if-then-else statement evaluates to TRUE, the statements of the then-block are control dependent on the conditional statement. If it evaluates to FALSE, the statements of the elseblock are control dependent on the conditional statement. Note that in case of nested while-statements or if-then-else statements, the control dependencies are not automatically assigned for the blocks of the inner while-statements or ifthen-else statements. Figure 5 shows the execution trace for our running example where the data and control dependencies have been added. Alternatively, the execution trace including the dependences can be represented as directed acyclic graph, the corresponding execution trace graph (ETG).
In addition to data and control dependencies, we make use of potential data dependencies in relevant slicing [18]. In brief, a potential data dependency occurs whenever the evaluation of a conditional statement causes that some statements which potentially change the value of a variable 6 Advances in Software Engineering Control dependences Data dependences 1. cond = a > 0 and b > 0 and c > 0 and d > 0 and e > 0; are not executed. Ignoring such potential data dependencies might lead to slices where the faulty statements are missing.
Definition 4 (potential relevant variables). Given a conditional (while or if-then-else) statement n, the potential relevant variables are a function PR that maps the conditional statement and a Boolean value to the set of all defined variables in the block of n that is not executed because the corresponding condition of n evaluates to TRUE or FALSE.
The previous definition requires all defined variables to be element of the set of potential relevant variables under a certain condition. This means that if there are other whilestatements or if-then-else statements in a sub-block, the defined variables of all their sub-blocks must be considered as well. For the sake of clarity Table 1 summarizes the definition of potential relevant variables.
Based on the definition of the potential data dependence set, we define potential data dependences straightforward.
Definition 5 (potential data dependence). Given an execution trace s 1 , . . . , s k for a program Π and an input state ω ∈ Σ, an element of the execution trace s j is potentially data dependent on a test element s i with i < j, which evaluates to TRUE (FALSE), that is, s i → P s j , if and only if there is a variable x ∈ PR(s i , TRUE) (x ∈ PR(s i , FALSE)) that is referenced in s j and not redefined between i and j.
After defining the dependence relations of a program that is executed on a given input state, we are able to formalize relevant slices, which are used later in our approach.
Definition 6 (relevant slice). A relevant slice S of a program Π ∈ L for a slicing criterion (ω, x, n), where ω ∈ Σ is an input state, x is a variable, and n is a line number in the execution trace that comprises those parts of Π, which contribute to the computation of the value for x at the given line number n.
We assume that a statement contributes to the computation of a variable value if there is a dependence relation. Hence, computing slices can be done by following the dependence relations in the ETG. Algorithm 1 RelevantSlice (ET, Π, x, n) computes the relevant slice for a given execution trace and a given variable at the execution trace position n. The program Π is required for determining the potential data dependences.
The relevant slice is likely smaller than the execution trace, where a statement might be executed more often. In our approach, we use relevant slices for restricting the search space for root cause identification.

The Debugging Problem.
Using the definition of L together with the definition of test cases and test suites, we are able to formally state the debugging problem. Hence, first we have to define test cases and test suites. We do not discuss testing in general. Instead we refer the interested reader to the standard text books on testing, for example, [36]. In the context of our paper, a test case comprises information about the values of input variables and some information regarding the expected output. In principle, it is possible to define expected values for variables at arbitrary positions in the code. For reasons of simplicity, we do not make use of an extended definition. When using the definition of passing and failing test cases, we are able to partition a test suite into two disjoint sets comprising only positive (PASS), respectively, failing (FAIL) test cases, that is, TS = PASS ∪ FAIL and PASS ∩ FAIL = ∅. Formally, we define these two subsets as follows: Advances in Software Engineering 7    . , x k conflicting variables. The set of conflicting variables for a test case t is denoted by CV(t). If the test case t is a positive test case, the set CV (t) is defined to be empty. Using these definitions, we define the debugging problem.
Definition 9 (debugging problem). Given a program Π ∈ L and a test suite TS, the problem of identifying the root cause for a failing test case t ∈ TS in Π is called the debugging problem.
A solution for the debugging problem is a set of statements in a program Π that are responsible for the conflicting variables CV(t). The identified statements in a solution have to be changed in order to turn all failing test cases into passing test cases for the corrected program.

Model-Based Debugging.
In the introduction, we mentioned that correctness assumptions are the key for fault localization. Therefore, a technique for diagnosis that is based on such assumptions would be a good starting point for debugging. Indeed, such methodology can be found in artificial intelligence. Reiter [29] introduced the theoretical foundations of model-based diagnosis (MBD) where a model that captures the correct behavior of components is used together with observations for diagnosis. The underlying idea of MBD is to formalize the behavior of each component C in the form ¬AB(C) → BEHAV(C). The predicate AB stands for abnormal and is used to state the incorrectness of a component. Hence, when C is correct, ¬AB(C) has to be true and the behavior of C has to be valid. In debugging, we make use of the same underlying idea. Instead of dealing with components, we now have statements, and the behavior of a statement is given by a formal representation of the statement's source code. We use constraints as a representation language for this purpose.
In the following we adapt Reiter's definition of diagnosis [29] for representing bug candidates in the context of debugging.
Definition 10 (diagnosis). Given a formal representation SD of a program Π ∈ L, where the behavior of each statement s i is represented as ¬AB(s i ) → BEHAV(s i ) and a failing test case (I, O), a diagnosis Δ (or bug candidate) is a subset of the set of statements of Π such that SD ∪ {I, O} ∪ {¬AB(s)|s ∈ Π \ Δ} ∪ {AB(s)|s ∈ Δ} is satisfiable.
In this definition of a diagnosis, the representation of programs (or execution traces) and failing test cases is not included. Furthermore, a formalism that allows for checking satisfiability is premised. However, the definition exactly states that we have to find a set of correctness assumptions that does not lead to a contradiction with respect to the given test case. We do not want to discuss all the consequences of this definition and refer the interested reader to [29,37,38]. In the following, we explain how to obtain a model for a particular execution trace and how to represent failing test cases.

Advances in Software Engineering
The representation of programs for our model-based approach is motivated by previous work [31,32]. In [31,32] all possible execution paths up to a specified size are represented as set of constraints. In contrast, we now only represent the current execution path. In this case, the representation becomes smaller and the modeling itself is much easier since only testing actions and assignments are part of an execution trace. On the contrary, we loose information and we are not able to eliminate candidates that belong to testing actions. Hence, in the proposed approach, we expect improvements of debugging results compared to slicing. Even though we cannot match obtained with modelbased debugging approaches like Wotawa et al. [32], our approach requires less runtime.
Modeling for model-based debugging in the context of this paper comprises two steps. In the first step, we convert an execution trace of a program for a given test case to its static single assignment form (SSA) [39]. In the second step, we use the SSA representation and map it to a set of constraints. When using a constraint representation, checking for consistency becomes a constraint satisfaction problem (CSP). A constraint satisfaction problem is a tuple (V , D, CO) where V is a set of variables defined over a set of domains D connected to each other by a set of arithmetic and Boolean relations, called constraints CO. A solution for a CSP represents a valid instantiation of the variables V with values from D such that none of the constraints from CO is violated. We refer to Dechter [40] for more information on constraints and the constraint satisfaction problem. Now, we explain the mapping of program execution traces into their constraint representations in detail. We start with the conversion into SSA form. The SSA form is an intermediate representation of a program with the property that no two left-side variables share the same name. The SSA form can be easily obtained from an execution trace by adding an index to each variable. Every time a variable is redefined, the value of the index gets incremented such that the SSA form property holds. Every time a variable is referenced, the current index is used. Note that we always start with the index 0. Algorithm 2 formalizes the conversion of execution traces ET into their SSA form.
The application of the SSA algorithm on the execution trace of our running example numfun delivers the following execution trace: In the second step, the SSA form of the execution trace is converted into constraints. Instead of using a specific language of a constraint solver, we make use of mathematical equations. In order to distinguish equations from statements, we use == to represent the equivalence relation. Algorithm 3 formalizes this conversion. In the algorithm, we make use of a global function index that maps each element of ET to a unique identifier representing its corresponding statement. Such a unique identifier might be the line number where the statement starts. Note that in Algorithm 3 we represent each statement of the execution trace using the logical formula of the form AB(· · · ) ∨ Constraint, which is logically equivalent to ¬AB(· · · ) → Constraint. Moreover, Constraints(ET, (I, O), ssa) also converts the given test case.
Applying Constraints (ET, (I, O), ssa) on the SSA form of the execution trace of the numfun program extracts the following constraints: (4)

The Conbas Algorithm
In this section, we present our approach, Conbas. The basic idea of Conbas is to reduce the size of summary slices by computing minimal diagnoses. Minimal diagnoses are computed by combining the statements of the slices of the faulty variables of a single test case as follows. (1) Each diagnosis must contain at least one element of every slice.
(2) If there exists a diagnosis that is a proper subset of the diagnosis, the superset diagnosis is skipped. The remaining diagnoses are further reduced with the aid of a constraint solver. For doing so, the execution trace of a failing test case is converted into constraints. The constraint solver checks for satisfiability of the converted execution trace assuming that the statement of the diagnosis is incorrect. Figure 6 gives an overview of the Conbas approach. Algorithm 4 explains the Conbas approach in detail. The function Run(Π, t) executes a test case t on a program Π. If s j is an assignment statement of the form x = E then (5) Let E be E where all variables y ∈ E are replaced with y index(y). (6) Let index(x) be index(x) + 1. (7) Add x index(x) = E to the end of the sequence ET . (8) else (9) Let s be the statement s j where all variables y ∈ E are replaced with y index(y).  (1) Let CO be the empty set.
A hitting set d w.r.t. CO is minimal if there exists no subset of d, which is a valid hitting set w.r.t. CO. Minimal hitting sets can be computed by means of the corrected Reiter algorithm [29,41].
Bugs causing a wrong evaluation of a condition lead to the wrong (non) execution of statements. In order to handle such bugs, the function ExtendControlStatements(ET, Π) adds a small overhead to each control statement c in the execution trace ET. For each variable v that could be redefined in any branch of c, the statement v=v is added to the execution trace ET. These additional statements are inserted after all statements that are control dependent on c. The inserted statements will be referenced by the line number of c when calling the function index in Algorithm 3. The returned execution trace is assigned to ET C . This extension can be compared with potential data dependencies in relevant slicing.
SSA(ET C ) (Algorithm 2) transforms the execution trace ET C into its single static assignment form and also delivers the largest index value for each variable used in The reason for this is that we cannot reason over the variables if the execution path alters. Please note that the required test oracle information can be fully automated extracted from an existing test case. The result set S represents the set of possible faulty statements. At first, the result set is the empty set. For all minimal hitting sets d in the set of minimal hitting sets HS, we check if the constraint solver is able to find a solution. For this purpose, we set all AB(i) to false except those where the corresponding statements are contained in d. For all conditional statements c where SC c and d have at least one common element, we set AB(c) to true. The function ConstraintSolver(CS ∪ AB) calls a constraint solver and returns true if the constraint solver is able to find a solution. If a solution is found, we add all elements of d to the result set S.
We illustrate the application of the algorithm by means of our running example. The function Run(Π,t) computes the execution trace illustrated in Figure 4 There are five different configurations for the values of AB: We only indicate the AB values that are set to true. All other AB variables are set to false. Note that setting AB(1) == true implies AB(2) == true since we cannot reason over the correctness of the condition if the computed values used in the condition are wrong. The constraint solver is able to find solutions for all configurations, except for AB(4). Since our approach does only reason on the execution trace and not on all possible paths, AB(1) == true and AB(2) == true are satisfiable even though taking the alternative path does not compute the correct value for f . Algorithm Conbas terminates if the program Π terminates when executing T. The computational complexity of Conbas is determined by the computation of the relevant slices, the hitting sets, and the constraint solver. Computing relevant slices only adds a small overhead compared to the execution of the program. Hitting set computation and constraint solving are exponential in the worst case (finite case). In order to reduce the computation time, the computation of hitting sets can be simplified. We only compute hitting sets of the size 1 or 2; that is, we only compute single and double fault diagnoses. Faults with more involved faulty statements are unlikely in practice. Only in cases where the single and double fault diagnoses cannot  (18) ∀i ∈ d : AB(i) = true (19) ∀i / ∈ d : AB(i) = false (20) ∀c where SC c ∩ d / = {} : AB(c) = true (21) if ConstraintSolver(CS ∪ AB) has solution then explain an observed misbehavior, the size of the hitting sets is increased.

Empirical Results
This empirical evaluation consists of two main parts. First, we show that Conbas is able to reduce the size of slices without losing the fault localization capabilities of slicing. We show this for single faults as well as for multiple faults. Second, we investigate the influence of the number of output variables on the reduction result.
We conducted this empirical evaluation using a proof of concept implementation of Conbas. This implementation accepts programs written in the language L (see Figure 2). In order to test existing example programs, we have extended this implementation to accept simple Java programs, that is, Java programs with integer and Boolean data types only and without method calls and object orientation. The implementation itself is written in Java and comprises a relevant slicer and an interface to the Minion constraint solver [42]. The evaluation was performed on an Intel Core2 Duo processor (2.67 GHz) with 4 GB RAM and Windows XP as operating system. Because of the used constraint solver, only programs comprising Boolean and integer data types (including arrays of integers) could be handled directly. Note that the restriction to Boolean and integer domains is not a limitation of Conbas.
For this empirical evaluation, we have computed all minimal hitting sets. We did not restrict the size of the hitting sets. Since we only deal with single and double faults, hitting sets of the sizes 1 and 2 would be sufficient. This reduction would improve our results concerning the number of final diagnoses and the computation time.
For the first part of the empirical evaluation, we use the 10 example programs listed in Table 2. Most of the programs implement numerical functions using conditional statements. The programs IfExample, SumPower, TrafficLight, and WhileLoops are borrowed from the JADE project (http://www.dbai.tuwien.ac.at/proj/Jade/). The program Taste is borrowed from the Unravel project (http://hissa.nist .gov/unravel/). Table 2 depicts the obtained results. In the table, we present the following data: summary slice size, and the reduced slice size for the data presented in Table 2. Figure 8 illustrates the proportion of the number of minimal diagnoses (total diag.) and the number of valid minimal diagnoses (valid diag.) for the data presented in Table 2. The constraint solver reduces about 20% of the number of diagnoses.
In order to estimate the computation time for larger programs, we have investigated if there exists a correlation between the time (in milliseconds) required for Conbas and (1) the LOC, (2) the size of the execution trace (exec. trace), (3) the number of constraints (con.), or (4) the number of diagnoses to be tested for satisfiability (total diag.). We found out that the strongest correlation is between the execution time and (4). Figure 9 illustrates this correlation. The blue data points represent the data from Table 2. The red line represents the least squares fit as an approximation of the data.
One advantage of Conbas is that it is able to reduce slices of programs that contain two or more faults. In order to demonstrate this, we have performed a small evaluation on double faults. For this, we combined some faults used in the single fault evaluation. The faults were not combined according a particular schema (i.e., masking of faults or avoiding masking of faults). We only made the following restriction: faulty program versions were not combined, where the faults were in the same program line. The reason for this is that two faults in the same line can be seen as one single fault.     It can be seen that sometimes only one of the two faults is contained in the reduced slice. The reason for this is that one fault can be masked by the other fault. Conbas guarantees that at least one of the faults is contained in the reduced slice. This is not a limitation since a programmer can fix the first bug and then apply Conbas again on the corrected program. Figure 10 shows the relation of the program size, the summary slice size, and the reduced slice size for the investigated double faults. On average, the summary slice can be reduced by 50%.
In the second part of the empirical evaluation, we investigate if more than one faulty output variable allows for a higher reduction of the summary slice. For this purpose, we use the circuits C17 and C432 of the Iscas 85 [43] benchmark. The Iscas 85 circuits describe combinational networks. We have chosen Iscas 85, because the different circuits of Iscas 85 have many input and output variables. The circuit C17 has 5 input variables and 2 output variables. The circuit C432 has 36 input variables and 7 output variables. For the evaluation, we have used test cases with different input and output combinations. We used 3 as  In total, we created more than 150 program variants. Table 4 presents the obtained average results for the two circuits of the Iscas 85 benchmark. The column headings are similar to those used in Table 2. An explanation of the column headings can be found as previously mentioned. Conbas is able to reduce the size of the summary slice by 66%. Now, we want to answer the question if we could yield a higher reduction of the summary slice, when there are more faulty output variables. In order to answer this question, we make use of the Reduction metric, which is defined as We group the tested program variants by the number of faulty output variables and compute the Reduction metric for the program variants. Figure 11 shows the box plots for the different numbers of output variables. It can be seen that two and three faulty output variables yield a better reduction of the slice size than only one output variable. The reason for this is that it is more difficult for the constraint solver to find configurations which meet all of the specified output variables.

Discussion and Future Work
Although, Conbas substantially reduces the number of diagnosis candidates with a reduction of about 28% in the single fault and 50% in the double fault case, there is still room for improvements. In particular, the current implementation is not optimized both in terms of handling different kinds of program language constructs and time required for performing the analysis. It would benefit from a relevant slicer for Java programs without restrictions on the language's syntax. Currently, only dynamic slicers are available, which might cause root causes to be ignored during the debugging process. Moreover, the combination of slices and constraint solving that is currently used might be improved. Especially, in cases where there are many possible faults, the calls to the external constraint solver slow down the computation, which could be improved by a closely integrated constraint solver. Apart from these technical issues, there are some open research questions. We start with discussing possible improvements of Conbas that make use of the same underlying ideas but change the way of computing the final results. Instead of computing the hitting sets of the slices, the constraint solver can be directly used to compute all solvable diagnoses of a particular size. Such an approach would restrict the number of constraint solver calls and also the time required for computing the hitting sets for the slices. Such an approach would be very similar to the approaches of Nica and colleagues [31,32], but it works on execution traces instead of the whole program representation. The expectation is that such an approach would be more efficient. However, there have been no publications on this topic so far.
Another research challenge is to improve Conbas by using information about the evaluation of conditions. We have to analyze if taking the alternative execution path of a condition (e.g., the else path if the condition evaluates to true) could satisfy the test case. If the change leads to a consistent program behavior, a root cause is identified. Otherwise, the condition can be assumed to be correct and removed from the list of fault candidates. The underlying challenge is to make such tests only in cases where infinite loops or infeasible behaviors can be avoided. For example, executing a conditional or a recursive function not as intended might cause a non-terminating behavior. Moreover, it is also important that the computational requirements are not significantly increased.
The empirical evaluation of Conbas, especially in comparison with other approaches, has to be improved. The used programs are rather small. Larger programs that belong to different application domains have to be used for evaluation. The currently used programs implement a variety of functions from state machines to numeric algorithms. Therefore, we believe that the obtained comparison with a pure slicing approach would not change even when using larger programs assuming that the underlying constraint problems can be solved within a reasonable amount of time. However, an empirical study that compares different approaches such as spectrum-based debugging with Conbas would be highly required in order to structure the general research field of automated debugging.
The integration of debugging tools into integrated development environments (IDEs) like Eclipse is another hot topic. Technically, the integration seems to be easy. However, the challenge lies in effectively integrating these tools in an interactive environment such that the time needed for the overall debugging process is reduced. For this purpose, research on human-computer interaction in the context of debugging and program development has to be done. Moreover, user studies with the aim of proving that automated debugging tools really support humans are required. Such studies should go beyond the usual student-based studies that are carried out as part of the course program. Instead, the studies should be carried out using real programmers in their industrial environment. Unfortunately, there are only few user studies in the case of automated debugging available, where [44] is the most recent.
Furthermore, the relationship between testing and debugging has not been sufficiently explored. There is work on this topic that deals with answering the question about the influence on the used test cases for debugging and how to construct test cases to further support debugging. An in-depth analysis of this topic and a well established methodology are still not available. Work in the direction of combining testing and debugging includes [45] and [31]. The latter discusses an approach for actively constructing test cases that allow for distinguishing diagnosis candidates. However, a method for test case construction that optimizes the whole debugging process is not available to the best of our knowledge. Moreover, the impact of such a method on other metrics such as mutation score or coverage criteria is not known and worth being researched.

Conclusion
Dynamic program slices are a valuable aid for programmers because they provide an overview of possibly faulty statements when debugging. They are used in many automated debugging techniques as a preprocessing step. However, they are often still too large to be a valuable help.
In this paper, we have introduced the theoretical foundations for an approach which reduces the size of dynamic slices by means of constraint solving. We have formalized the approach for the reduction of slices, named constraint based slicing (Conbas). In an empirical evaluation, we have shown that the size of dynamic slices can be reduced by 28% on average for single faults and by 50% for double faults with the aid of constraint solving. Furthermore, our approach can be used even if there exist multiple faults. We have applied Conbas on circuits of the Iscas 85 benchmark. These circuits contain many data dependencies but lack control dependencies. For these types of programs, Conbas yields a reduction of 66% on average compared to the union of all slices.
The objective behind Conbas is to improve relevant slicing for debugging. Even though other approaches outperform Conbas in certain cases, we point out two application areas where Conbas should be the preferred method to use. First, in case of software maintenance where the root cause for one failing test case has to be identified. In this case, mostly limited knowledge about the program is available. Moreover, the programs themselves are usually large, which makes debugging a very hard task. In such a case, low-cost approaches that require a set of test cases might not be applicable and the application of heavy-weighted approaches might be infeasible because of computational requirements.
Second, in case of programs with a low number of control statements that need a more detailed analysis of data dependences and relationships between variables. In such a case, Conbas provides the right means for analysis because of handling data dependences and constraints between program variables, which originate from the program statements.
Even though Conbas cannot solve all debugging problems, we are convinced that Conbas is a valuable technique for improving the debugging process. Moreover, a combination with other debugging techniques may even increase its fault localization capabilities.