Counterexample-Preserving Reduction for Symbolic Model Checking

The cost of LTL model checking is highly sensitive to the length of the formula under verification. We observe that, under some specific conditions, the input LTL formula can be reduced to an easier-to-handle one before model checking. In our reduction, these two formulae need not to be logically equivalent, but they share the same counterexample set w.r.t the model. In the case that the model is symbolically represented, the condition enabling such reduction can be detected with a lightweight effort (e.g., with SAT-solving). In this paper, we tentatively name such technique"Counterexample-Preserving Reduction"(CePRe for short), and finally the proposed technquie is experimentally evaluated by adapting NuSMV.


Introduction
Linear Temporal Logic (LTL, for short) [11] is one of the most frequently used specification languages in model checking (cf. [14]). It designates properties over a linear structure, which can be viewed as an execution of the program. The task of LTL model checking is to search the state space (explicitly or implicitly), with the goal of detecting the existence of feasible traces violating the specification. If such traces exist, the model checker will report one of them as a "counterexample"; otherwise, the model checker will give an affirmative report.
It can be shown that the complexity of LTL model checking M |= ϕ is in O(|M | × 2 |ϕ| ), meanwhile, the nesting depth of temporal operators might be the major factor affecting the cost in compiling LTL formulae.
Hence, it is reasonable to simplify the specification before conducting model checking. For example, in [12], Somenzi and Bloem provided a series of rewriting schemas for simplifying LTL specifications, and these rewriting schamas preserve logical equivalence.
One may argue that "a majority of LTL formulas used in real applications are simple, succinct rather than complicated", but, we need to notice the following facts: arXiv:1301.3299v1 [cs.LO] 15 Jan 2013 -Typically, the LTL formula F(pUq) is usually considered as a "simple" one, nevertheless, it can be further simplified to Fq, and this fact tends to be omitted. 1 -Indeed, people do use complicate specifications in the real industrial field, as well in some standard benchmark (cf. [2]). -Last but not least, not all specifications are designated manually. Actually, some formulae are generated by specification-generaton-tools (e.g., ProSpec). Indeed, one may find that lots of these machine-generated specifications can be simplified.
Symbolic model checking [10] is one of the most significant breakthrough in model checking, and two major fashions of symbolic model checking are widely used: one is the BDD-based manner [6], and the other is SAT-based manner, such as bounded model checking [1].
Instead of using an explicit representation, the symbolic manner represents state space with a series of Boolean formulae. This enables implicit manipulation of the verification process and it usually leads to an efficient implementation [3]. Meanwhile, such a unified representation of transitions and invariants of the model potentially provides heuristic information to simplify the specification. For example: -The formulae pUq and (rUp)Uq can be respectively reduced as q and (rUp)∨ q, if we know that p → q holds everywhere in the model. -Each occurrence of Gθ in the specification can be replaced with (i.e., logically true), if we can inductively infer that the Boolean formula θ holds at each reachable state in the model. Actually, we can make certain of these conditions with the following efforts.
-To check whether "p → q holds everywhere in the model", we may test if p → q is an invariant in the model -i.e., if ρ ∧ ¬(p → q) is unsatisfiable (we in the later denote it as ρ p → q), where ρ is the Boolean encoding of the model's transition relation. -Likely, to justify that θ holds at each reachable state, it suffices to ensure that θ 0 θ and ρ θ → θ , where θ 0 is the initial condition of the model.
Hence, this provides an opportunity to replace the specification with a simpler one, accompanied with some lightweight extra task of condition detection. Even if such detection fails, the overhead is usually negligible.
In this paper, we systematically investigate the above idea, and tentatively name this technique CounterExample-Preserving REduction (CePRe , for short). Such reduction can be done before starting model checking, and it is an orthogonal optimization technique to both encoding approaches and model compression techniques.
To justify it, we have extended NuSMV and implement CePRe as an upfront option for LTL model checking. Subsequently, we conduct experiments over both industrial benchmarks and randomly generated cases. Experimental results show that CePRe can improve the efficiency significantly.
This paper is organized as follows: Section 2 revisits some basic notions. Section 3 introduces the CePRe technique and gives the performance analysis. In Section 4, experimental results over industrial benchmarks as well over random generated cases are given. We summarize the whole paper with Section 5.

Preliminaries
We presuppose a countable set P of atomic propositions, ranging over p, q, p 1 , etc. For each proposition p ∈ P, we create a primed version p (not belonging to P) for it. For each set V ⊆ P, we define V {p | p ∈ V}. We use B(V) to denote the set of Boolean formulae over V, similarly, we denote by B(V ∪ V ) the set of Boolean formulae over V ∪ V . The scope of the prime operator can be naturally lifted to Boolean formulae over B(V), by defining An assignment is a subset V of P, intuitively, it assigns 1 (or, true) to propositions belonging to V, and assigns 0 (or, false) to other propositions. For each V ⊆ U ⊆ P and θ ∈ B(U), we denote by V θ if θ is evaluated to 1 under the assignment V.
A united assignment is a pair (V 1 , V 2 ), where both V 1 and V 2 are subsets of P. It assigns 1 to propositions belonging to V 1 ∪ V 2 , and assigns 0 to other propositions. Suppose that V 1 , V 2 ⊆ U ⊆ P and θ ∈ B(U ∪ U ), we also write (V 1 , V 2 ) θ if θ is evaluated to 1 under the united assignment (V 1 , V 2 ).
LTL formulae can be inductively defined as follows.
-Each proposition p ∈ P is an LTL formula.
-If ϕ is an LTL formula, then Xϕ and Yϕ are LTL formulae.
For the sake of convenience, we usually directly write π, 0 |= ϕ as π |= ϕ. As usual, we employ some derived Boolean connectives such as and derived temporal operators such as We say that ∧ and ∨, F and G, O and H, Y and Z, U and R, T and S are pairwise the dual operators.
Temporal operators like X, U, F, G, R are called future operators, whereas Y, Z, S, O, H and T are called past operators. We say an LTL is pure future (resp. pure past) if it involves no past (resp. future) operators.
Theorem 1 tells the fact that past operators do not add any expressive power to LTL formulae. Nevertheless, with these, we can give a much more succinct description in defining specifications.
Given an LTL formula ϕ, we denote by sub(ϕ) the set constituted with subformulae of ϕ. Particularly, we respectively denote by sub U (ϕ) and sub S (ϕ) the set of ϕ's subformulae consisting of "U-subformulae" and "S-subfomulae". An U-formula (resp. S-formula) is a formula rooted at U (resp. S).
A model is a tuple M = V, ρ, θ 0 , F, C , where -V ⊆ P, is a finite set of atomic propositions.
, is a set of compassion constraints.
A derived linear structure of M is an infinite word π ∈ (2 V ) ω , such that 3. for each ϕ ∈ F, there are infinitely many i's having π(i) ϕ; 4. for each (ϕ, ψ) ∈ C, if there are infinitely many i's having π(i) ϕ, then there are also infinitely many j's such that π(j) ψ.

Counterexample-Preserving Reduction
We describe the CePRe technique in this section, and we would fix components of the model M , and just let it be V, ρ, θ 0 , F, C .
For M , we are particularly concerned about formulae having the same counterexample set -we say that ϕ and ψ are inter-reduce-able w.
The central part of CePRe is a series of reduction rules being of the form where "Cond" is called the additional condition.
Though the relation ≈ M is, actually symmetric, we always write the formula being reduced on the righthand of the "≈" sign in reduction rules. Since the model M is fixed, in this section, we omit it from the subscript. In addition, if the additional condition trivially holds, we will discard this part, and directly write the rule as ϕ ≈ ψ, and we say such a reduction rule "model-independent"; in contrast, we call other rules "model-dependent".

The Reduction Rules
First of all, we have some elementary reduction rules as depicted in Figure 1. For the rules (Init), (Ind) and (Trans), the notation " " occurring in the condition part standards "inferring" relation in propositional logic (ρ θ iff ρ ∧ ¬θ is unsatisfiable), and we here require that θ, θ 1 , θ 2 ∈ B(V). Subsequently, let us define a partial order " " over unary temporal operators (and their combinations) as follows: Assume that P w , P s ∈ {F, FG, GF, G, O, HO, OH, H} ∪ {X i | i ≺ ω} and P w P s , then we have two model-indenpendent rules, as depicted in Figure 2. Though these rules seem to be trivial, they are useful in doing combinational reductions (see the example given in Section 3.2).
Reduction rules of (Conj) and (Disj). Figure 3 provides some reduction rules that can be used to simplify nested temporal operators. Moreover, we may immediately get such a rule's "past version" by switching U and S, R and T, etc. For example, we may obtain the rule Meanwhile, we also have the Duality Principle for model-independent rules: "by switching each operator with its dual operator, then we may get a new reduction rule". For the rules listed in Figure 3, we may obtain the corresponding rules such as (GR), (R G ), (GG) and (FGF). As an example, the rule (GG) is just GGϕ ≈ Gϕ. Since we always stand at the starting point when doing model checking (i.e., the goal is to check if π, 0 |= ϕ for each π ∈ L(M )), hence, we can sometimes "erase" the outermost past operators, as shown in Figure 4. Note that we can also acquire the rules (Z), (H) and (T) according to the Duality Principle. Just beware the exception that the rule (Z) should be Zϕ ≈ .   From now on, we let θ 1 , θ 2 , . . . range over B(V), and let ϕ 1 , ϕ 2 , . . . be arbitrary LTL formulae. We have some model-dependent rules. The first group of such rules are listed in Figure 6.
In Figure 7, another set of reduction rules are provided, and these rules are mainly concerned with LTL formulae involving adjacent U-operators. Note that when applying the Duality principle to model dependent rules, besides switching the operators, we also need to exchange the antecedent and subsequent in the condition part. As an example, we may obtain the reduction rule by applying the Duality Principle to (U U [2 → 3]). Lastly, Figure 8 provides some reduction rules that can be used to simplify formulae with mixed usage of U and R. Similarly, dualize operators and inverse the additional condition, one may obtain reduction rules for formulae in which R appears (adjancently) out of U.

Reduction Strategy
We show the usage of CePRe reduction rules by illustrating the reduction process of M |= (θ 1 Uθ 2 )Uθ 3 : 1. We may first try with the rule (U U [1 → 3]) by inquiring the SAT-solver if ρ θ 1 → θ 3 holds.
Reduction rules for formulae involving adjacent U operators. 2. If the SAT-solver returns "unsatisfiable" with the input ρ∧θ 1 ∧¬θ 3 , it implies that the additional condition is stated, and we may replace the specification with θ 2 Uθ 3 . 3. Otherwise, we will try with the next reduction rule, such as (U U [2 → 3]).
In fact, these rules can also be "locally applied " to subformulae. For example, to make a local reduction of (FU), we may replace each occurrence of F(ϕUψ) in the specification with Fψ. The only exception is for the group of rules listed in Figure 4: observe that we have Yϕ ≈ ⊥ according to (Y), yet this does not imply that FYϕ ≈ F⊥ holds. Hence, these rules have an "implicit condition" when doing local application: the subformula to be reduced must occur "temporally outermost" in the specification -i.e., the target subformula is not in the scope of any temporal operators in the specification.

Performance Analysis of the Reduction
We now try to answer the question "why we can gain a better performance during verification if CePRe is conducted first". To give a rigorous explanation, we briefly revisit the implementation of symbolic model checking algorithms. The core procedure of BDD-based LTL symbolic model checking algorithm is to construct a tableau for the (negated) property. In what followed, we refer the tableau of ϕ as T ϕ , and we would give an analysis on its major components affecting the cost of model checking. State space: The state space of T ϕ consists of subsets of el(ϕ), and the set el(ϕ) can be inductively computed as follows.
With symbolic representation, each formula ψ ∈ el(ϕ) corresponds to a proposition in building the tableau. Moreover, if ψ ∈ P, then no new proposition need to be introduced (since it has already been introduced in building the symbolic representation of M ), otherwise, a fresh proposition p ψ is required. Hence the total number of newly introduced propositions equals to |el(ϕ) \ P|. From an induction over formula's structure, we have the following claim. Transitions: The transition relation of T ϕ is a conjunction of a set of constraints, and each constraint is either of the form p Xψ ↔ σ (ψ) or p Yη ↔ σ(η), where Xψ, Yη ∈ el(ϕ), and the function σ can inductively defined as follows.
According to the definition of el, we can see that each ψ ∈ sub(ϕ) rooted at a future (reps. past) temporal operator exactly produces one formula Xη (resp. Yη) in el(ϕ), and hence a new proposition p Xη (resp. p Yη ) would be introduced. Subsequently, each such p Xη (reps. p Yη ) adds exactly one constraint to the transition relation. Hence, we have the following claim.
Proposition 2. The number of constraints in the transition relation of T ϕ equals to the number of temporal operators occurring in ϕ (alternatively, |el(ϕ) \ P|).
Fairness constraints: According to the tableau construction, each ψ ∈ sub U (ϕ) would impose a fairness constraint to T ϕ . Hence, the number of fairness constraints equals to |sub U (ϕ)|. With a case-by-case checking, we can show the following theorem.
In contrast, the cost of BMC is quite sensitive to the encoding approach. In a broad sense, we can categorize the encoding approaches into two fashions.
Syntactic encoding Such kind of encodings are inductively produced w.r.t. the formula's structure. The very original one is presented in [1], and this is improved in [4] by observing some properties of that encoding. In [9] (as well in [2]), a linear incremental syntactic encoding is suggested. Semantic encoding In [5], an alternative BMC technique is provided: it mimics the tableau-based model checking process, but it express the fair-path detection upon the product model with Boolean formula. 2 For the semantic encodings, the reason that we can benefit from CePRe is exactly the same as that for BDD-based approach. Because, the encoding is a conjunction of a k-step unrolling of M and a k-step unrolling of T ϕ (an unrolling is either a partial linear structure, or a one ending with a loop). The former is usually in a fixed pattern, and for the latter we need k × |el(ϕ) \ P| new propositions, and the sizes of Boolean formulae w.r.t the transition and fairness constraints 3 are respectively O(k × |el(ϕ) \ P|) and O(k 2 × |sub U (ϕ)|).
For a syntactic BMC encoding, one need to generate a Boolean formula of the form E k M ∧ E k ¬ϕ , where E k M is the "unrolling" of M with k steps, and E k ¬ϕ describes that the k-step unrolling causes a violation of ϕ. In general, E k M is almost the same in all syntactic encodings, and the key factor affecting the cost lies in E k ¬ϕ . Given a subformula ψ of ϕ, if we use ||E k ψ || to denote the max length of the Boolean formula describing that ψ is initially satisfied upon a k-step unrolling, then it can be inductively computed as follows.
Here, L(k) is some polynomial about k, related to the encoding approach. For example, with the technique proposed in [1,8], we have L(k) ∈ O(k 2 ), whereas L(k) ∈ O(k) in [9]. This partly explains the reason that we tend to change temporal nestifications with Boolean combinations, as done in (U U [3 → 2]) etc. Another feature affecting the cost is the number of propositions occurring in the encoding. If we denote by var k (ϕ) the set of additional propositions which only taking part in the encoding of E k ¬ϕ , then we have the following conclusions.
-For the techniques proposed in [1] and [4], we have var k (ϕ) = 0. i.o.w., all propositions required in encoding E k ¬ ϕ can be shared with those for E k M . -In term of the encoding presented in [9], we need to add O(k) new propositions to var k (ϕ) for each U-subformula and for each S-subformula.

Experimental Results
We have integrated CePRe as an upfront option in NuSMV. 6 We have conducted experiments upon both industrial benchmarks and random generated cases in terms of both BDD-based and bounded model checking (and the BMC encoding approach here we adapt is that proposed in [4], which is the current implementation of NuSMV). We conduct the experiments under such platform: CPU -Intel Core Duo2 E4500 2.2GHz, Mem -2G Bytes, OS -Ubuntu 10.04 Linux, Cudd -v2.4.1.1, Zchaff -v2007.3.12.

Experiments upon Industrial Benchmarks
The benchmark we choose in this paper is from [2], and most of them come from real hardware verification. Table 1 provides experimental results for BDD-based LTL symbolic model checking. The field #Time is the summation of user time and system time, and the field #R.S. refers to the number of totally reachable states. For Table 1  It should be pointed that both model-independent and model-dependent rules contribute to the reductions. For example, for the model srg5 and specification Pti.g.ltl, the rules (FS) and (S) are applied; meanwhile, for the model msi wtrans and the specification Seq.ltl, the rule (U U [¬2 → 3]) takes part in the reduction.

Experiments w.r.t. Random Models and Specifications
We have also performed experiments upon randomly generated models and specifications with the tool Lbtt [13] and with the methodology suggested in [2]. Reachable states x 1000 Without CePRe

With CePRe
Length of spec.   For each 3 ≤ ≤ 7, we randomly generate 40 specifications having length . Subsequently, for each specification, we generate two models respectively for the BDD-based model checking and for BMC. Hence, we totally have 200 specifications and 400 models.
For the BDD-based model checking, we give the comparative results on 1) the scale of BDD-nodes, 2) the number of reachable-states, and 3) the time consumed, as shown in Figure 9. For BMC, we have set the max-bound to 20 and we have compared 1) the number of clauses, and 2) the executing time, as shown in Figure 10. Each value here we provide is the average of the 40 executions.
For the BDD-based model checking, there are 123 (out of 200) specifications can be reduced; whereas for BMC, the number of specifications that can be reduced is 118.

Concluding Remarks
In this paper, we present a new technique to reduce LTL specifications' complexity towards symbolic model checking, namely, CePRe. The novelty in this technique is that the formula being reduced need not to be logically equivalent with the one after reduction, but just need to preserve the counterexample set. Moreover, the condition enabling such a reduction can be usually detected with lightweight approaches, such as SAT-solving. Hence, this technique could leverage the power of SAT-solvers.
The central part of CePRe is a set of reduction rules, and soundness of these reduction rules are fairly easy to check. For the model dependent rules, additional conditions mainly concern invariants and transitions only, and we do not make a sufficient use of other features, such as fairness. In this paper, we just consider combinations of two temporal operators as many as possible, indeed, there might be other possible reduction schemas we are not aware. Indeed, in this paper, we tentatively to provide such a framework, and one can extend it to model checking of other logics.
From the experimental results, we can see that, in a statistical perspective, we can gain a better performance and lower overhead with CePRe.