An SAT-Based Method to Multithreaded Program Verification for Mobile Crowdsourcing Networks

This paper focused on the safety verification of themultithreaded programs for mobile crowdsourcing networks. A novel algorithm was proposed to find a way to apply IC3, which is typically the fastest algorithm for SAT-based finite state model checking, in a very clever manner to solve the safety problem of multithreaded programs. By computing a series of overapproximation reachability, the safety properties can be verified by the SAT-basedmodel checking algorithms.The results show that the new algorithm outperforms all the recently published works, especially on memory consumption (an advantage that comes from IC3).


Introduction
The mobile crowdsourcing network is a promising network architecture to perform tasks with human involvement and numerous mobile devices but suffers from security and privacy concerns [1,2].Pthread-style multithreaded programs play an important role in crowdsourcing computing [3,4] and crowdsourcing sensing [5,6] for supporting concurrent programming.Multithreaded programming used much existing system-level code such as device drivers, operating system, and distributed computing.The mobile crowdsourcing networks [7,8] distribute tasks and collect the results and must make sure safely access for the shared data.Therefore, it is important to verify the safety properties for multithreaded programs.
In this paper, we consider the multithreaded programs with an unbounded number of threads.Each thread executes a finite, nonrecursive state machine.Mutexes, which can be expressed using Boolean variables, are used for synchronization in this kinds of multithreaded programs.The illegal access of mutexes will lead to safety problem.We assume all mutexes are shared variables.The safety problems is to verify if the illegal access of mutexes exists.
The safety property for multithreaded programs which was expressed as upward-closed sets of the target ("bad") states can be verified by reduction to the coverability problem of well-structured transition systems (WSTS) [9,10].WSTS are a very broad class of infinite-state systems, including thread transition system (TTS) [11], Petri nets and their monotonic extensions [12][13][14][15][16], broadcast protocols [17,18], lossy channel systems [19] and context-free grammars [10].Coverability is the base verification task for WSTS: the question is whether the system can reach an unsafe or illegal configuration among some subset of its (possibly unbounded number of) components.There have been several algorithms published for WSTS coverability problem [9][10][11][20][21][22], but none perform as efficiently as finite state model checking.
The IC3 algorithm [23] is an SAT-based model checking algorithm and introduced as an efficient technique for safety properties verification of finite state systems, especially in hardware verification.It computes an inductive invariant by maintaining a sequence of overapproximation of reachability from initial states and strengthens them incrementally.An efficient implementation of the procedure shows good performance on hardware benchmarks [24].
This paper focuses on the multithreaded programs for mobile crowdsourcing networks.All multithreaded programs are pthread-style ANSI-C source code and transformed into TTS by using predicate abstraction [25,26].We introduce a novel, highly efficient algorithm for the coverability problem of TTS.The new algorithm is to find a way to apply conventional, finite state IC3, which is typically 2 Wireless Communications and Mobile Computing the fastest algorithm for finite state model checking, in a very clever manner to solve the coverability problem.IC3 algorithm is a finite state model checking algorithm, and the original input is an FSM.We try to use the finite state model checker to solve the infinite-state systems.The bounded TTS is transformed into an FSM and described as the inputs format for IC3 engine.The significant contributions of this paper are as follows: (1) Our approach requires very novel and intricate reasoning because IC3 produces a series of overapproximation reachability results.A novel algorithm which is based on IC3 engine is proposed to solve the coverability problem of TTS.(2) We introduce new encoding techniques to make the verification of infinite-state systems possible by using finite state algorithms.(3) We implement tool's combination, which is a good way to improve the total rate of successfully solved instances.
The experimental results show that our new algorithm outperforms all the recently published works, uses far less memory (an advantage that comes from IC3), and can solve more benchmarks successfully.The new method can solve 97.2% instances within 1 GB.
The rest of this paper is organized as follows.In Section 2, we review the related work.Section 3 presents necessary preliminaries used in this paper.In Section 4, we propose our new method based on IC3 and give more details of the implementation.Section 5 shows the experimental evaluation on multithreaded programs.Section 6 concludes this paper and discusses future works.

Related Works
A general decidability result showed that the coverability problem is decidable for WSTS [9], which backward-explore states starting from the target states.Bingham and Hu [20] proposed a new algorithm to compute fix-points over a series of finite state systems of increasing size.A new subclass of WSTS, named Nice Sliceable WSTS, was introduced.Starting from the target states, it computed the exact backward reachability by using finite state symbolic model checking [27] based on BDDs [28] to solve the coverability problem of NSW.Kaiser et al. [11,22] introduced a new algorithm to solve the safety properties of multithreaded programs with an unbounded number of threads executing a finite state, nonrecursive procedure.By using many inexpensive uncoverability proofs, this new approach combined forward propagation under-approximations with backward propagation of overapproximations to the coverability problem in TTS.Inspired by the success of IC3 algorithm in finite state model checking, Kloos et al. [21] proposed an incremental, inductive procedure to check coverability of downward-finite WSTS, which contains Petri nets, broadcast protocols, and lossy channel systems.All those algorithms are based on the backward reachability and suffered from complex computational consumption.Esparza et al. [29] introduced an incomplete but empirically efficient solution to the coverability problem.The new approach was based on classical Petri nets analysis techniques, the marking equation and traps [30,31], and utilized an SMT solver to implement the constraint approach.Inspired by Esparza's work, Athanasiou et al. [32] introduced an approximate coverability method by using thread state equations and implemented it in a tool named TSE.TSE is very capable on Boolean programs but theoretically incomplete.
For  ⊆ , the upward-closure of  is the set ↑  = { | ∃ ∈ ,  ⪯ }.A basis of an upward-closed set  is a set  ⊆  such that  =↑ .A set  is said to be ⪯upward-closed (or simply upward-closed if ⪯ is clear from the context) if  =↑ .It is known that if ⪯ is a , then any ⪯ -upward-closed set has a unique finite basis  such that for all ,  ∈  we have  ¡ ⪯  and  ¡ ⪯  [33].Given upwardclosed , we let () denote the unique finite basis of .Moreover, it is known that any infinite increasing sequence Definition 2 (discrete wqo).A wqo is a discrete wqo (dwqo) over  if for all  ∈  there exists  ∈ N such that for any sequence  0 ≺  1 ≺ ⋅ ⋅ ⋅ ≺   = , we have  ≤ .The weight function  :  → N maps each  to the minimum such as .For the ⪯-upward-closed set , the base weight of  is () = max{() |  ∈ ()}.
The weight function () slices the state space into a countable number of finite sets  0 ,  1 ,  2 , . .., where   = { ∈  | () = }.This property allows for finite state model checking techniques to be used to the reachability for each weighted bounded   .Definition 3 (nicely sliceable well-structured transition systems).A nicely sliceable well-structured transition system (NSW)  = (, →, ⪯) is a transition system equipped with a dwqo on its states that satisfies the following properties: (1)  is the (possibly infinite) state space.

Thread Transition Systems.
Thread transition systems (TTS) are motivated by the verification task of multithread asynchronous programs, which is the subset of NSW.Let  and  be finite sets for local and shared states, respectively.The elements of  =  ×  are called thread states.
Let  = ⋃ ∞ =1 ( ×   ).The elements of  are called states.We write them in the form ( |  1 ,  2 , . . .,   ).A TTS gives rise to a transition system  = (, ) with if one of the following conditions holds.Give target states V  ∈ , if V  is coverable; that is, does there exists a path in  leading to a state V that covers V  : V ⪰ V  ?The safety property is described as the upward-closed set of V  and converts into the coverability analysis problem.
A cover relation ⪯ is neither symmetric nor antisymmetric, thus a quasi-order, and in fact a well-quasi-order (wqo) on : any infinite sequence V 1 , V 2 , . . . of elements from  contains an increasing pair V  ⪯ V  with  < .It is easy to see that (, ⪯) fulfills the definition of WSTS.A TTS with standard thread and spawn transition can be expressed as plain Petri nets [22] and is the subset class of NSW [20].

Multithreaded Programs Safety Verification
In this section, we introduce a new method for the safety verification of multithread programs.The input source code is translated into TTS by using SatAbs [34].Then, we propose a novel TTS coverability analysis algorithm to verify the safety properties.Finally, the implementation details are described.

The Input Languages.
Most popular programming languages such as Java and C/C++ embrace concurrent programming via their pthread or thread class APIs, respectively.In this paper, we focus on the pthread-style multithreaded ANSI-C programs.ANSI-C is one of the most popular programming languages for safety critical embedded software.
The mobile crowdsourcing networks contains most embedded devices which are based on multithreaded programs to support the crowdsourcing computing and communications.SatAbs is an SAT-based model checker by predicate abstraction, and can be used to model the ANSI-C programs into TTS format.We follow the introduction from the SatAbs website (http://www.cprover.org/satabs/)to transform the ANSI-C programs into TTS.All ANSI-C programs can be translated into Boolean programs by SatAbs, completely.The safety properties can be reserved during the formalization process.

IC3-Based Thread Transition System Coverability Analysis
Algorithm.IC3 is SAT-based and computes inductive overapproximations of reachable sets.Let  and  be initial states and the property states, respectively.Also let  denote the transition relation over the current and the next states.IC3 maintains a trace: [ 0 ,  1 , . . .,   ].The first element  0 is the initial states.For  > 0,   is a set of clauses that ANDed together and represent an overapproximation of the states reachable from the initial states in  steps or less.  →  +1 , and the clauses  +1 are a subset of   , except for  = 0.The IC3 algorithm will terminate if a counterexample is found or an inductive proof   is got.
This section develops our new algorithm, TTSCov, which is based on the IC3 algorithm.For a target set V  and  ∈ N, (V  , ) presents the set of weight limited V  by .From the base weight of V  , the algorithm computes the overapproximation for the backward reachable set   .  is an inductive overapproximation of the states from V  which is reachable along a path that never exceeds weight .We use IC3, an SAT-based finite state model checker, to compute this weight limited and inductive overapproximation of V  .
As shown in Algorithm 1, the input is a TTS , a set of initial states , and an ⪯-upward-closed set of target states V  .The variable  is the current weight boundary, which is initially the base weight of V  and increases by 1 each loop iteration.  is an overapproximation of (V  , ), which initially is set as (V  , (V  )).  is an overapproximation of  −1 bounded by the weight , which is computed by IC3 engine.If   intersected with the initial states , the counterexample was found, and the algorithm terminated.In line (6), we check if   and  −1 are equal, if not, the variable  was assigned as the current .If the condition of line (6) fails  times consecutively, we have  + =  +−1 = ⋅ ⋅ ⋅ =   , and thus the verification is successful.
Actually, the condition in line (8) can be replaced by   ¡ ≡  −1 , and the algorithm works well under the syntactically equal check.But the syntactic checking is more efficient, as the IC3 algorithm builds the frame incrementally, and the clause in   is the subset in  −1 .In order to take this feature, a Boolean variable can be used to check if new clauses add to   when computing the overapproximation at line (3).This speeds up the algorithm much more.
The main routine in Algorithm 1 is the while-loop.For each loop, the IC3 engine computes the overapproximation by using the SAT solver.The algorithm terminates when it finds a counterexample at line (5) or proves safety at line (13).

Implementation.
A TTS with thread and spawn transitions is expressive as a plain Petri net.Zhang et al. [35] introduced a method to cut off a Petri net into a finite state machine (FSM).Inspired by Zhang's work, this section introduces the details of how to bound the TTS into FSM.
The TTS Format.The input multithreaded programs model are encoded in the TTS format (http://www.cprover.org/bfc/).Each shared or local is mapped to a shared/local variable.Just one shared variable can be assigned to "1," and all local variables could be assigned to arbitrary natural number.A transition is a thread transition or spawn transition, which described how the thread state changed.
The FSM Format: AIGER.AIGER is a format, library and set of utilities for And-Inverter Graphs (AIGs) (http://fmv.jku.at/aiger/).AIGs are a good way to describe a FSM and can be translated into a propositional logic for a SAT solver.The bounded TTS is encoded as an AIG model, where each shared and local state corresponds to state variables.Extra input variables are introduced to select which rule to be fired, and then update the state variables' value to set up the transition relations equally.

Shared and Local
Variables.An ||-bit vector was used to encode all shared variables, as just one shared variable was assigned with "1" at the same time.For each local variable, the unary encoding was used to encode the natural numbers , as literature [35] shows that one-hot encoding is one possible unary encoding.As shown in Table 1, thermometer encoding is another unary encoding and performs well when using the incremental SAT solvers.In this paper, thermometer encoding is used to encode the local variables.
A full adder is used to bound the total thread number, and the logic is the same as described in [35].The total thread number is the sum of all local variables' value.As the thermometer encoding technique is used to encode the local variables, the structure information is also added as the constraint to the AIG model.

Experimental Evaluation
We have implemented our algorithm in a tool named TTSCov.TTSCov is implemented with C++, and all input instances are encoded in TTS format.Petri nets tools are used by converting TTS instances into MIST format (https://github.com/pierreganty/mist).Most crowdsourcing programs are described in pthread-style multithreaded ANSI-C.SatAbs is the front-end of TTSCov, which translates the input ANSI-C programs in to TTS.
To measure TTSCov's performance, we compare with the state-of-the-art tools: MIST, IIC [21], BFC [11], Petrinizer [29], and TSE [32].All experiments are performed on an Intel 3.4 GHz Intel, and 16 GB of memory, running Linux OS in 64bit.The CPU time is limited to 1 hour, and memory to 10 GB.The second suite comes from the provenance analysis of message in a medical system, which contains 12 safe instances.The third suite also comes from the provenance analysis of message in bug tracking application, which contains 40 safe instances and 1 unsafe instance.The fourth suite contains 46 instances that are used to evaluate the BFC tool(http://www.cprover.org/bfc).Those instances are generated from concurrent C program in TTS format.They are mostly unsafe, and just 2 instances are safe.The fifth suite contains 50 instances that comes from the Erlang verification tool called Soter [36], and those examples can be found on Soter's Website (http://mjolnir.cs.ox.ac.uk/soter).Out of 50 instances in this suite, 38 are safe.This suite contains the largest example in the collection, with 66,950 places and 213,635 transitions.

Rate of Success on All Instances.
We run two different version of BFC, named BFC v1.0 and BFC v2.0, respectively.The BFC tool has 3 modes: backward, forward, and concurrent.In concurrent mode, two threads are used to run backward and forward parallel.But we find there is a potential bug in BFC v2.0 with concurrent mode.The tool got stuck in concurrent mode when switching the thread.We run 10 times in concurrent mode and show the median results in Figure 1.
Petrinizer is an SMT-based tool, which has 8 parameters, and the tool may return wrong answer for some examples.If Petrinizer returns wrong result in one configuration, we say the result is wrong.All five algorithms in MIST toolkit were compared.
Figure 1 shows that TTSCov performs better than the other complete tools, which solve 147 instances in total the same as Petrinizer, which is incomplete.For safe instances, TTSCov solves 91 out of 115, and 22 instances timeout.Most importantly, there are just 2 instances over the memory limit, which support the IC3 less memory usage.Petrinzier solves 84 instances, and 1 timeout.But  For some unsafe instances in the fourth suite, the forward thread has found the counterexample, but got stuck when switching the thread until timeout.What is more, there are 3 examples out of memory when running BFC v2.0 in concurrent mode.IIC and Backward performs almost the same, which solve about 50 instances, 1 out of memory and the rest are timeout.EEC, TSI, IC4PN, and CEGAR are not good on unsafe instances, especially CEGAR, which only solves 2 unsafe instances.
In brief, TTSCov performs well both for safe and unsafe instances, especially in memory usage.Petrinizer performs well in time and memory usage for all instances, but reports wrong answer for safe instances.TSE performs nearly the same as TTSCov, but incomplete the same as Petrinizer.BFC v1.0 and v2.0 are good at unsafe cases, excluding the potential bugs in BFC 2 .IIC and MIST perform the same, but take more memory usage.

Tools Combination.
Petrinizer and TSE are incomplete, but perform excellently on time and memory usage.We combine the Petrinizer and TSE with the other tools, and the total solved instances number is shown in Table 2.
Petrinizer works out 147 instances alone, but 30 instances return wrong or partly wrong result.For BFC, we compared the BFC v1.0 and BFC v2.0 in all three modes and then chose the best one.MIST stands for  algorithm, which performs the best of five algorithms in the MIST toolkit.When combining BFC with Petrinizer, 159 instances are solved.IIC and MIST solve 164 and 167, respectively, when working together with Petrinizer.Our tool TTSCov can solve 177 instances when combined with Petrinizer.
TSE solves 146 instances in total.When combined with the other tools, the total solved instances number is the same as Petrinizer.More importantly, TTSCov can solve all collected instances when combined with Petrinizer or TSE, except one instance from the Soter suite, which all tools can not deal with.

Memory Usage Evaluation.
To show the memory usage of TTSCov, we compare with MIST, IIC, and BFC. Figure 2 shows that TTSCov is an efficient tool in memory usage, due to the use of IC3 as the back-end engine.TTSCov solves nearly 97.3% instances within 1 GB memory.About two-thirds instances can be solved within 2 GB for all tools, but TTSCov and BFC perform better than MIST and IIC in large instances.We find a bug when running BFC v2.0 for some instances from the forth instance suite.The tool has a segmentation fault for two instances, and we have get the bug confirmation from the author.The segmentation fault instances are marked as out of memory.Petrinizer and TSE are indeed efficient for memory usage, but incomplete.Therefore, we do not compare with those.TTSCov is based on the IC3 engine, which solves the verification problem without unrolling the transition relations.This is the main reason for why TTSCov performs well in memory usage.In conclusion, TTSCov is an efficient tool in memory usage, especially for huge instances.

Conclusion and Future Works
This paper introduce an IC3-based algorithm to verify the safety properties of multithreaded programs in mobile crowdsourcing networks.The pthread-style multithreaded program is modeled as a TTS.Then the state-of-the-art SAT-based model checking algorithm is used to verify the safety properties, by computing a series of overapproximation reachability with IC3.The results show that our new approach can solve more instances compared favorably against several recently published approaches.Due to using IC3 as the backend engine, our method is significant for its lower memory consumption.Tools combination is a good direction to .We use two versions BFC in all three modes and find that the BFC 2 performs the best in memory usage test. algorithm is the best of five algorithms for memory usage in MIST toolkit, so MIST represented the  algorithm.For all timeout instances, we also show the memory usage in this figure .solve the multithreaded programs for more complex mobile crowdsourcing networks.Parallel programming will be a good way to speed up the TTSCov algorithm.

5. 1 .Figure 1 :
Figure 1: Total instances.All 178 instances are separated into safe and unsafe by the real results.There are 115 safe instances, and 63 unsafe instances.TTSCov compare with the state-of-the-art tools: MIST, IIC, BFC, Petrinizer, and TSE.There are two versions of BFC tool, and for each version, the tool has three modes.BFC 2 means the BFC v2.0 run in concurrent mode. and  respect forward and backward, respectively.MIST toolkit implemented five algorithms: , , , 4, and .

WirelessFigure 2 :
Figure2: Comparison of TTSCov with MIST, IIC, and BFC for memory usage.BFC stands for BFC 2 .We use two versions BFC in all three modes and find that the BFC 2 performs the best in memory usage test. algorithm is the best of five algorithms for memory usage in MIST toolkit, so MIST represented the  algorithm.For all timeout instances, we also show the memory usage in this figure.
30 instances return unsafe results for those safe instances.Out of 30 wrong results examples, 19 instances return wrong answer with all 8 configurations, and the others are partly wrong.TSE solves 84 instances, 10 out of memory and 21 timeout.BFC 2 solves 67 examples in total, whereas 30 examples time out and 18 instances are out of memory.BFC 2 performs same as BFC 2 , in which 66 instances are solved, but with more out of time instances.BFC 1 solves 53 out of 115 safe instances, and 61 examples out of memory limit, with just one timeout.BFC 1 solves 47 safe instances, and 9 out of time limit and 59 out of memory limit.The forward algorithm performs worst in both BFC v1.0 and BFC v2.0, where 34 instances and 35 instances are solved by BFC 2 and BFC 1 , respectively.There are 53 instances of timeout for both tools.The BFC 2 has 28 examples out of memory, and the BFC 1 runs out of memory limit for 27 examples.IIC solves 47 instances, 24 instances timeout and 44 instances out of memory.For five algorithms in MIST toolkit, EEC solves 56 instances, 13 instances timeout, and 46 out of memory.Backward algorithm performs the same as EEC, 5 more instances timeout.TSI and IC4PN solve about 30 safe instances, but CEGAR just solves 8.For unsafe instances, TTSCov solves 56 examples out of 63, no out of memory case, but 7 timeout.Petrinizer and BFC perform well in those suite cases.Petrinizer, BFC 1 , and BFC 1 solve all 63 unsafe instances.TSE solves 62 instances, and just one out of memory.BFC 1 has 4 timeout and 4 out of memory, respectively.BFC 2 has 8 instances out of time or memory limitation, and BFC 2 just has 2 instances out of memory.BFC 2 performs worse, where 30 instances are solved, because of the potential bugs in concurrent mode.