An In-Place Simplification on Mixed Boolean-Arithmetic Expressions

Mixed Boolean-arithmetic (MBA) expression, which involves both bitwise operations (e.g., NOT, AND, and OR) and arithmetic operations (e.g., +, − , and ∗ ), is a software obfuscation scheme. On the other side, multiple methods have been proposed to simplify MBA expressions. Among them, table-based solutions are the most powerful simplification research. However, a fundamental limitation of the table-based solutions is that the space complexity of the transformation table drastically explodes with the number of variables in the MBA expression. In this study, we propose a novel method to simplify MBA expressions without any precomputed requirements. First, a bitwise expression can be transformed into a unified form, and we provide a mathematical proof to guarantee the correctness of this transformation. 'en, the arithmetic reduction is smoothly performed to further simplify the expression and produce a concise result. We implement the proposed scheme as an open-source tool, named MBA-Flatten, and evaluate it on two comprehensive benchmarks. 'e evaluation results show that MBA-Flatten is a general and effectiveMBA simplificationmethod. Furthermore, MBA-Flatten can assist malware analysis and boost SMTsolvers’ performance on solving MBA equations.


Introduction
Mixed Boolean-arithmetic (MBA) expression [1,2] is defined as the expression that mixes the usage of bitwise operations (e.g., ∼ , &, |, and ∧) and arithmetic operations (e.g., +, − , and * ). Several formal methods [1,3] are designed to generate a new complex MBA expression that is equal to a simple expression. MBA expression, which can be used to replace a simple expression with an equivalent representation that is hard to understand, is an advanced software obfuscation scheme [3][4][5]. e MBA obfuscation has been adopted by many academic projects and industrial products to protect software [5][6][7][8][9]. e wide practical applications of MBA obfuscation have attracted research on simplifying MBA expression. Recent studies [10,11] demonstrate that existing computer algebra software has a very limited effect on MBA simplification. Consequently, multiple methods are proposed to simplify MBA expressions, including bit blasting [12], pattern matching [13], program synthesis [14][15][16], deep learning [17,18], and table-based solutions [5,11]. Among them, table-based solutions are the state-of-the-art MBA simplification method. However, one strong limitation is that the complexity of creating and storing the precomputed table is O(2 2 t ), where t is the number of variables in the MBA expression.
us, it has an overwhelming overhead to produce and store the tables for any t ≥ 5.
In this study, we propose a novel scheme to simplify an MBA expression without any precomputed requirements.
e key idea is that a transformation procedure can be used to reduce a bitwise expression to a unified form, and a mathematical proof is provided to guarantee the correctness of the transformation. en, the arithmetic reduction is smoothly performed to further simplify the expression and generate the final result. We implement the approach as an open-source tool, named MBA-Flatten. To demonstrate the capability of MBA-Flatten, we evaluate it on two comprehensive MBA benchmarks. e evaluation results show that MBA-Flatten outperforms existing tools in terms of more solved MBA expressions. Due to the low-cost arithmetic computation, MBA-Flatten is also an effective MBA simplification tool. In addition, the evaluation demonstrates that MBA-Flatten can assist malware analysis and boost SMT solving on MBA equations.
In summary, this study makes the following key contributions: (1) We find that a bitwise expression can be transformed into a unified form and provide a mathematical proof to support it. To the best of our knowledge, we are the first to prove the existence of the transformation. (2) e bitwise expression transformation paves the way for our novel in-place MBA simplification method. Our proposed scheme first replaces the bitwise expressions with the corresponding equivalent form. In this way, arithmetic reduction rules can be seamlessly applied to further produce the simplification result. e remainder of this study is structured as follows. Section 2 shows the background of MBA expression. Section 3 illustrates the proposed scheme that can be used to simplify an MBA expression. e proof of eorem 1 can be found in Section 4. In Section 5, we describe the experimental evaluation of the proposed approach. Section 6 discusses some limitations of our proposed scheme, and Section 7 concludes this study.

Related Work
In this section, we first introduce the background of MBA expression and its wide applications. en, we discuss the existing research on simplifying MBA expressions, pointing out the limitations, which also serve as a motivation in this study.

MBA Expression.
Zhou et al. [1,2] propose the concept of mixed Boolean-arithmetic (MBA) expression based on Boolean-arithmetic algebra, which mixes the usage of bitwise operators (e.g., NOT, AND, and OR) and arithmetic operations (e.g., +, − , and * ). MBA expression is specified as linear MBA, polynomial MBA, and non-polynomial MBA [1,11]. e formal definitions of linear and polynomial MBA expression are denoted as follows, and the linear MBA expression is a subset of polynomial MBA expression [1]. e MBA expression, which fails to satisfy Definition 1, is considered as a non-polynomial MBA expression [11]. Definition 1. (Zhou [1]). A polynomial MBA expression is of the form: where a i is integer constant,e i,j is bitwise expression of variablesx 1 , . . . , x t overB n , B � 0, 1 { }, n, tare positive integers, andI, J i ⊂ Z, ∀i ∈ I. Definition 2. (Zhou [1]). A linear MBA expression is a polynomial MBA expression of the form: where a i is integer constant,e i is bitwise expression of variablesx 1 , · · · , x t overB n , B � 0, 1 { }, n, tare positive integers, andI ⊂ Z.
Zhou et al. [1] design a generator using truth tables to produce infinite linear MBA equations. Based on existing linear MBA rules, Liu et al. [3] propose several formal methods to generate an unlimited number of polynomial and non-polynomial MBA expressions. Examples of MBA expressions are shown below. In particular, (3) is a linear MBA expression, (4) is a polynomial MBA expression, and (5) is a non-polynomial MBA expression.
Due to its solid theoretical foundation and simplicity of implementation, MBA expression has been applied in multiple academic tools and industrial products to protect software [5][6][7][8][9]. For example, Cloakware, Irdeto, and Quarkslab apply MBA obfuscation in their commercial products [5,7]. Tigress [6], an academic C source code obfuscator, encodes simple expressions into complex MBA forms. Blazy et al. [8] develop a C program obfuscator, in which formally verified MBA obfuscation rules are integrated. Ma et al. [9] apply MBA expressions to develop a novel dynamic software watermarking scheme. Figure 1 shows how to use MBA expressions to make software obfuscation [4]. Figure 1(a) demonstrates that the expression x + y is substituted with a complex but equivalent expression. e opaque predicate [19] is shown in Figure 1(b), and the predicate (x * y �� (x∧y) * (x|y) + (x∧ ∼ y) * ( ∼ x∧y)) is actually always true.

MBA Expression Simplification.
e wide practical application of MBA obfuscation has encouraged research on simplifying MBA expressions. Eyrolles' PhD thesis [10] shows that popular symbol software (Maple, SageMath, Wolfram Mathematica, and Z3 [20]) fails to simplify MBA expressions. e root cause is that existing reduction rules cannot reduce expressions that mix the usage of bitwise and arithmetic operators [11]. Researchers have developed multiple solutions to simplify MBA expressions, including bit blasting [12], pattern matching [13], program synthesis [14][15][16], and deep learning-based [17,18]. While promising, these simplification methods are still in their infancy: they either suffer from high-performance penalties, or they produce many false simplification cases.
To effectively reduce MBA expression, researchers investigate the MBA mechanism and propose table-based solutions. Liu et al. [5] prove a two-way feature in the MBA transformation and design a two-variable transformation table to simplify MBA expression. Xu et al. [11] create multiple semantic-preserving transformation tables, which enumerate all bitwise expressions and the corresponding simplified forms. Using these transformation tables, MBA-Solver can effectively simplify an MBA expression.
So far, table-based solutions are the state-of-the-art MBA simplification methods. However, the space complexity of the transformation table is O(2 2 t ) and t is the number of variables in the MBA expression. erefore, table-based solutions are not scalable to reduce an MBA expression involving five or more variables. Here, (6) is a linear MBA expression with five variables, and table-based solutions fail to simplify it. Note that multiple methods are proposed to generate an unlimited number of MBA expressions [1,3], and thus, an emerging challenge for MBA simplification is the MBA expression with five or more variables. f(x, y,z, t, a) � − (( ∼ (x∨y∨t))∧( ∼ a))

The Proposed Scheme
To reduce MBA expressions, we first present an existing finding: a bitwise expression can be transformed into a unified form. is finding paves the way for our novel inplace MBA simplification scheme, MBA-Flatten.

Bitwise Expression Transformation.
A bitwise expression is denoted as e(x 1 , · · · , x t ) of variables x k ∈ B n , k � 1, · · · , t. e transformation T is defined as follows: where x i , x j ∈ B n . Equation (7) can be recursively applied to transform a bitwise expression e(x 1 , · · · , x t ) into an arithmetic expression denoted as T(e), which is shown as follows: where a i 1 ···i k and a e are integers determined by e. After replacing all (x i j * · · · * x i k ) in T(e) with (x i j ∧ · · · ∧x i k ), (8) will be reduced as follows: An instance of the above transformation procedures is shown in Example 1. One interesting observation is that there is a gap between e and R(e), because ∼x 1 |x 2 is equal to the expression − x 1 + (x 1 ∧x 2 ) − 1 rather than − x 1 + (x 1 ∧x 2 ) + 1.
Example 1. For a bitwise expression e � ∼ x 1 |x 2 , we have Moreover, eorem 1 shows that the gap between a bitwise expression e and the corresponding R(e) is actually a constant value, 0 or − 2. In other words, a bitwise expression e can be successfully reduced to a unified form, Equation (10). eorem 1 can be proved by induction on the number of bitwise operators in the bitwise expression e(x 1 , · · · , x t ). For detailed proof of the theorem, refer to Section 4 of this study. Theorem 1. Let n, t be positive integers, e(x 1 , · · · , x t ) be a bitwise expression of variables x k ∈ B n , k � 1, · · · , t, and F(e) � R(e) − 2 * a e with the form of en, F(e) ≡ e with a e � 0or1. By this theorem, Example 2 shows that a bitwise expression ∼ (x| ∼ y) is reduced to (y − (x∧y)). Example 2. For a bitwise expression e � ∼ (x| ∼ y), we have F(e) � y − (x∧y). (12) e above procedures introduced so far are integrated into Algorithm 1. e algorithm takes a bitwise expression e as the input and outputs the transformation result (e). Algorithm 1 applies arithmetic computation to transform a bitwise expression, so it does not introduce extra memory cost to maintain the heap or precomputed tables.

Simplifying MBA Expression.
As noted above, Algorithm 1 can transform a bitwise expression e into a unified form. Using Algorithm 1, we will discuss how to simplify linear, polynomial, and non-polynomial MBA expressions.
We first introduce how to simplify a linear MBA expression. According to Equation (2), a linear MBA expression is essentially a linear combination of bitwise expressions. Using Algorithm 1, the bitwise expressions in (2) are first substituted with the corresponding transformation result. After combining like terms, (2) will be reduced to the following simple form: where (13) indicates that a linear MBA expression can be simplified to the concise form including at most 2 t terms and t is the number of variables in the MBA expression. Example 3 shows that a complex linear MBA expression can be reduced to a simple result (x + y).
Example 3. For the MBA expression in Figure 1(a), we have Enlighten by the above simplification procedure, using Algorithm 1, (1) will be transformed to an equivalent form shown as follows: where following example shows how to simplify a polynomial MBA expression. First, every bitwise expression is substituted with the equivalent form; e.g., (x∧ ∼ y) is replaced with (x − (x∧y)). en, arithmetic reduction rules are performed to produce the simplification result (x * y). Note that the linear MBA expression is also polynomial, so the polynomial MBA simplification method can reduce a linear MBA expression.
For a non-polynomial MBA expression, we notice that it includes multiple sub-expressions obfuscated by polynomial MBA rules. is finding inspires us to use the polynomial MBA simplification procedure to reduce a non-polynomial MBA expression. In particular, we first simplify the inner sub-expression (polynomial MBA expression), and the simplification result of the inner sub-expression is treated as a temporary variable to expose further reduction opportunities. An instance is shown in Example 5. During the simplification procedure, the inner polynomial MBA expressions are reduced to the simplified form, such as (x∧y) + 2 * (x∧y), which is reduced to (x + y). By replacing (x + y) with an intermediate variable t 1 , the expression can be further reduced to (t 1 + x). At the last step, all temporary variables t i are substituted back to produce the final result (2 * x + y).

Algorithm and Implementation.
e MBA simplification scheme we have described above is illustrated in Algorithm 2. e algorithm takes an MBA expression E as input and outputs its concise form. First, it checks whether the MBA expression is a polynomial MBA or not. For polynomial MBA, the algorithm applies Algorithm 1 to simplify the bitwise expressions. en, an arithmetic reduction is performed to return the simplification result. For non-polynomial MBA, the algorithm applies the polynomial MBA simplification procedure to recursively reduce each inner sub-expression (polynomial MBA) and replace it with the simplified result. At last, the algorithm performs the arithmetic reduction to generate the final result. Note that Algorithm 2 applies Algorithm 1 and arithmetic computation to simplify an MBA expression, so it does not introduce any additional tables or manage extra heap memory.
We implement Algorithm 2 as an open-source tool, named MBA-Flatten. It accepts a complex MBA expression as the input and outputs the corresponding simplification result. An overview of MBA-Flatten's architecture is shown in Figure 2.
e whole framework is written in around 1,800 lines of Python code. e parser and AST traversal components are coded based on the Python AST library. Moreover, we leverage the Python SymPy library for arithmetic reduction.
Inside MBA-Flatten, the main program consists of three major components. First, a parser receives the MBA expression and translates it to abstract syntax tree (AST) for the remaining process. en, MBA-Flatten reduces the expression to a concise form. For polynomial MBA expression, the program uses the transformation procedure to reduce a bitwise expression, and a math reduction module is adopted to further simplify the expression.
e math reduction module also includes the optimization function to generate an optimal result for some expressions; e.g., x + y − 2 * (x∧y) can be further reduced to (x∧y). For nonpolynomial MBA expression, MBA-Flatten traverses the AST bottom-up and simplifies every inner subtree (polynomial MBA expression). After reducing each sub-expression, the simplified expression is replaced with the temporary variable. At last, arithmetic reduction rules are further performed to reduce the expression and return the final simplification result. MBA-Flatten also includes utilities for measuring the complexity metrics of MBA expressions, such as counting the number of nodes in the directed acyclic graph (DAG) representation of an MBA expression, and we will discuss the complexity measurement of MBA expressions further in Section 5.1.

Proof of Theorem 1
To prove eorem 1, we first present that the transformation T is well defined. e definitions of value and form equivalence between two MBA expressions are shown as follows.
x t ) are of the same form e maps in Equation (7) are identical in one-bit space. In other words, the bitwise expression e is equivalent to T(e) with x k ∈ B, which is shown as follows: Proposition 1 shows that the transformation T is well defined, and one instance is shown in Example 6.

Proposition 1. Let e n be the bitwise expression of variables
Proof. e n 1 � V e n 2 induces e 1 1 � V e 1 2 . According to Equation (18), there is T(e 1 1 )� V T(e 1 2 ). Note the uniqueness of T(e), and then, T(e 1 1 )� F T(e 1 2 ). Since T(e 1 )� F T(e n ), we have T(e n 1 )� F T(e n 2 ). □ Example 6. For the bitwise expressions e 1 � ∼ (x 1 ∧x 2 ), , and e 1 � e 2 . We have us, T(e 1 ) � T(e 2 ). Next, we present the concept of the signature vector shown as follows.
e signature vector of a linear MBA expression is a vector with 2 t dimensions, where t is the number of variables in the expression.
Security and Communication Networks Table 1 shows the truth table of multiple 2-variable bitwise expressions, and the column with all "1" is encoded as "− 1" [1,11]. Using Table 1 en, we introduce the following lemma.
We prove F(e) ≡ e using mathematical induction on the number of bitwise operators in the expression e(x 1 , · · · , x t ) of variables x k ∈ B n .
Base step: the basis is the bitwise expression e(x 1 , · · · , x t ) with a single bitwise operator, which is one of the following four cases: ∼ x, x∧y, x|y, and x∧y, where x, y ∈ x 1 , · · · , x t .    e above four cases led to a e � 0 or 1 and s(e) j ≡ s(F(e)) j that implies s(e) ≡ s(F(e)). By Lemma 1, e ≡ F(e) holds where variables x k ∈ B n .
Induction step: assume F(e) ≡ e holds with r bitwise operators (r ≥ 1) in e. Performing one more bitwise operator to e, the new expression e(x 1 , · · · , x t ) is one of the following forms: ∼ e, e∧x, x∧e, where x ∈ x 1 , · · · , x t . Due to the commutative law of bitwise operators ∧, |, ∧ and the following equations: we only need to show that F(e) ≡ e holds on the following four cases with r + 1 bitwise operators: ∼ e, e∧x, e|x, e∧x.
e above four cases led to a e � 0 or 1 and s(e) j ≡ s(F(e)) j that implies s(e) ≡ s(F(e)). By Lemma 1, e ≡ F(e) holds where variables x k ∈ B n .
Assume F(e) ≡ e with a e � 1; from the similar discussion as above, we have and e ≡ F(e) with variables x k ∈ B n . As discussed above, the induction is completed. us, we have F(e) ≡ e with variables x k ∈ B n and a e � 0 or 1 determined by e.

Experimental Results
In this section, a set of experiments are conducted to evaluate the MBA simplification scheme, MBA-Flatten. We first run MBA-Flatten and existing peer tools on two comprehensive MBA benchmarks. Z3 SMT solver [19] is used to check whether the simplified result is equivalent to the original MBA expression. e corresponding simplification results are discussed in Section 5.2-5.4. As reported in Section 5.5 and 5.6, MBA-Flatten can assist humans in analyzing software. At last, Section 5.7 studies MBA-Flatten's performance data, such as running time and memory footprint.

Peer Tools for Comparison.
We collect and check existing state-of-the-art MBA simplification tools: MBA-Blast [5] and MBA-Solver [11]. MBA-Blast is a Python tool for simplifying MBA expressions via a two-variable transformation table. MBA-Solver produces multiple precomputed transformation tables, which enumerate all bitwise expressions and corresponding concise forms. en, MBA-Solver uses these tables to simplify an MBA expression. For a more thorough evaluation, we also check other MBA simplification tools: GraphMR [18], SSPAM [13], and Syntia [14]. GraphMR is a neural network-based solution to reduce an MBA expression. SSPAM (symbolic simplification with pattern matching) is a pattern matching method that detects and reduces MBA expressions by multiple known MBA rules. Syntia is a program synthesis framework for approximating the semantics of expressions. It uses a set of input-output samples from the expression, learns the semantics of the samples, and synthesizes a simpler expression that is equal to the original expression.

Benchmarks.
To fully expose the capability of diverse methods on simplifying MBA expressions, a large scale of MBA expressions is required for evaluation. erefore, we consider two comprehensive MBA benchmarks: Dataset 1 [14] and Dataset 2 [11]. Dataset 1 comprises 500 MBA samples generated by Tigress [6] with up to three variables. Dataset 2 collects 3,000 MBA equations with up to four variables, which contains 2,000 polynomial MBA (1,000 linear MBA) and 1,000 non-polynomial MBA expressions. Every sample in datasets is a 2-tuple: (E c , E g ). E c is the complex MBA expression, and E g is the related equivalent simple form. Multiple samples in benchmarks are shown in Table 2.

MBA Complexity Metrics.
We use the following metrics to measure MBA complexity: number of DAG nodes and MBA alternation. For example, the expression ∼ (x∧y) + 3 * (x|y), whose DAG representation is shown in Figure 3, has 8 nodes and an MBA alternation (a red arrow means one MBA alternation) of 2.
e larger a metric's value, the more complex an MBA expression. We expect the metrics' values will be reduced after simplification.
(1) Number of DAG Nodes. An MBA expression is transformed into a directed acyclic graph (DAG) representation in which the nodes are operators, variables, and constants. e number of nodes in the DAG is defined as a complexity metric for an MBA expression. (2) MBA Alternation. e MBA complexity mainly comes from mixing bitwise operations and arithmetic operations. We adopt "MBA alternation" to measure the number of edges linking different types of operations in the DAG representation of an MBA expression.

Machine Configuration.
All of our experiments are performed on a server with Intel Core i9 3.00 GHz CPU, 64 GB DDR4 RAM, 2 TB SSD Hard Drive, and running Ubuntu 20.04 OS.

Simplification on Dataset 1.
In the first experiment, we run MBA-Flatten and other peer tools on Dataset 1. e evaluation result in Table 3 shows that only MBA-Flatten successfully produces verifiable simplification outputs for all MBA expressions with negligible overhead (within 0.1 seconds).
We first study the correctness that means an expression before and after simplification is semantically equivalent. Z3 solver [19] is adopted to check whether the output of a simplification tool is equivalent to the input. e solver may not return the solving result due to the MBA's complexity, so we set 1 hour as a practical threshold for this and the following experiments. Table 3 presents the number of MBA expressions that can be reduced by simplification tools. GraphMR is trained on the linear MBA dataset, so it can only simplify 137 of 500 MBA expressions. SSPAM outputs 168 wrong simplification results because of the limited number of MBA rules in the pattern library. Syntia uses stochastic program synthesis to generate a simple expression, which successfully synthesizes 369 simplification results. MBA-Blast performs well on simplifying 2-variable MBA expressions rather than three or more variables, and therefore, it generates 416 simplification results. MBA-Solver can successfully simplify the majority of the MBA expressions (454 of 500), but it cannot process several special cases, e.g., the non-polynomial MBA expression including sub-expression ∼ (x − 1). In contrast to MBA-Solver, MBA-Flatten can successfully simplify all 500 MBA samples, and it reduces ∼ (x − 1) to the expression − x.
Next, we investigate the effectiveness that reflects how much complexity is reduced by the simplification methods. Table 4 reports the expression complexity before and after simplification. Two quantitative metrics are used to measure expression complexity: the number of DAG nodes and MBA alternation. Table 4 shows that all simplification tools (4) Else (5) For inner sub-expression E i is a polynomial MBA expression do  Table 1: Truth table of multiple bitwise expressions with 2 variables.
(except SSPAM) can considerably reduce the complexity measurement of the solved MBA expressions. SSPAM cannot effectively reduce a complex MBA expression to a simpler form due to the limited known MBA rules used in the software.

Simplification on Dataset 2.
As the second experiment, we run MBA-Flatten and other baseline tools on Dataset 2. As shown in Table 5, MBA-Flatten can successfully simplify 2,943 of 3,000 MBA expressions, and its average processing time is less than 0.2 seconds. Considering the MBA expression in Dataset 2 is more complex and diverse than the one in Dataset 1, this experiment exposes more detailed findings. GraphMR and Syntia have limited effect on simplifying complex MBA expression, which can only correctly simplify less than 450 MBA samples. SSPAM cannot generate a simpler expression, so nearly 2/3 (1,975 of 3,000) of the simplified results cannot be checked by the Z3 solver within the time threshold. Compared with MBA-Blast (1,763 simplified samples), MBA-Solver can reduce more MBA expressions with three or four variables, and it successfully simplifies 2,899 MBA samples. MBA-Flatten can reduce 2,943 MBA samples, but it fails to simplify several special cases. One exception is the non-polynomial MBA expression ( ∼ (x − 1)∧y) * ( ∼ (x − 1)|y) + ( ∼ (x − 1)∧ ∼ y) * ( ∼ ( ∼ (x − 1))∧y). Table 6 reports that all solutions (except SSPAM) can generate a simpler equivalent expression. Overall, MBA-Flatten presents its advanced capability by successfully simplifying 98.1% of MBA samples.
Furthermore, we compare the average solving time of simplification tools on the two benchmarks. From Tables 3 and 5, the simplification time of GraphMR and Syntia is almost not increased, but SSPAM takes much more time when it simplifies a more complex MBA expression. MBA-Blast takes less than 0.1 seconds to simplify a two-variable MBA expression. Compared with MBA-Solver, MBA-Flatten takes slightly more time to simplify an MBA expression. e main reason is that MBA-Solver directly gets the bitwise expression simplification results from the transformation tables, rather than reduces it by multiple simplification procedures. (a ∧ ∼ a) + 2 * (a|a) + 1 a + a 2 * ( ∼ (x∧y)) + 3 * ( ∼ x&y) + 3 * (x& ∼ y) − 2 * ( ∼ (x&y)) x + y ∼ (((x&y) * (x|y) + (x& ∼ y) * ( ∼ x&y)) − 1) − (x * y) and thus, e other observation is that MBA-Flatten can simplify all non-polynomial MBA expressions solved by MBA-Solver, but not vice versa. It is because that MBA-Solver treats the common sub-expression as an intermediate variable, rather than a sub-expression itself. erefore,

MBA-Powered Malware
Deobfuscation. MBA expression is always used to obfuscate code, so malware developer also adopts the MBA expression to complicate the program. Liu et al. [5] report that MBA expressions are used in a ransomware sample to protect the encryption key, and they also observe that MBA rules are integrated into the software obfuscator VMProtect, which is widely used by malware developers.
In this experiment, we demonstrate that MBA-Flatten can assist in reverse-engineering the malware obfuscated by MBA expressions. We collect all MBA expressions used in malware from existing work [5]. en, MBA-Flatten is applied to simplify the expressions, and the Z3 solver is used to check the correctness of the simplified result. e evaluation result shows that MBA-Flatten can successfully simplify all MBA expressions collected from existing malware samples. One simplification procedure is shown as follows, and MBA-Flatten produces the final result (x − y).
Furthermore, we replace the MBA expressions used in malware with new MBA expressions involving five or more variables and produce 130 variants, such as the above expression ∼ ( ∼ x + y)∧ ∼ ( ∼ x + y), which is replaced with Equation (6). We apply MBA-Blast and MBA-Flatten to simplify the new MBA expressions. Unfortunately, MBA-Blast fails to simplify them. In contrast, MBA-Flatten can successfully simplify all new MBA expressions. erefore, this experiment shows that MBA-Flatten can simplify the MBA expressions used in existing malware and the complex MBA expression with five or more variables. 5.6. Boosting SMT Solving MBA Equations. Satisfiability modulo theory (SMT) solvers have been widely applied in diverse software engineering areas, such as software analysis [21,22], symbolic execution [23,24], and test generation [25]. Existing work [10,11] has presented that SMT solvers are hard to solve MBA equations. However, the MBA simplification method, MBA-Solver, can be used to boost the SMT solver's performance on solving MBA equations (11).
In this experiment, we report that MBA-Flatten (denoted as MF) can assist SMT solvers in solving MBA equations. We consider the benchmark from work [11] and test three popular SMT solvers: Boolector [26], STP [27], and Z3 [20]. e benchmark is actually considered as Dataset 2 in this study, and MBA-Solver (denoted as MS) is considered as the baseline. MBA-Flatten and MBA-Solver are used to simplify all MBA equations in the benchmark, and then, the simplification results are output to the three SMT solvers. e evaluation result is shown in Table 7, and the solving time threshold is set as 1 hour. Before simplification, all three SMT solvers can only solve a small portion (Boolector 496 (16.5%), STP 98 (3.3%), Z3 84 (2.8%)) of the MBA equations within the time threshold, but after simplification, all three solvers can solve over 96% of MBA equations. Compared with MBA-Solver, all SMT solvers can solve more MBA equations after MBA-Flatten's simplification.
is is because MBA-Flatten can successfully simplify more MBA expressions than MBA-solver, as shown in Table 5. After MBA-Flatten's simplification, all SMT solvers can solve 2,943 of 3,000 MBA equations, which means that the distinction between solvers' performance on solving MBA expressions becomes insignificant. ese results indicate that MBA-Flatten is a generic method to boost SMT solver's performance on solving MBA expressions.

Performance.
is section reports MBA-Flatten's performance data. Table 8 shows the time and memory cost when MBA-Flatten processes an MBA expression with different complexity measured by the number of nodes. For every complexity measurement, 100 different MBA expressions are generated to do the test. As some of the timings are small, we repeat every test 100 times. MBA-Flatten is effective because it only performs low-cost arithmetic  Polynomial  468  2000  2000  70  2000  2000  56  2000  2000  Non-polynomial  28  899  943  28  899  943  28  899  943  Total  496  2899  2943  98  2899  2943  84 2899 2943 computation. Our implementation adopts the Python SymPy library to efficiently perform the arithmetic reduction. Overall, MBA-Flatten is an effective tool for simplifying MBA expressions.

Discussion
MBA-Flatten has demonstrated the feasibility of automatically reducing MBA expressions. However, we also note some potential enhancements for future improvement. As introduced in Section 5.3, MBA-Flatten cannot simplify the non-polynomial MBA expression (∼(x − 1) ∧y) * (∼(x − 1)|y) + (∼(x − 1)∧ ∼ y) * (∼(∼(x − 1))∧y). We further investigate how to reduce it, and the simplification procedure is shown below. During the simplification procedure, the sub-expression ∼ (x − 1) is treated as an intermediate variable rather than the expression (x − 1). However, it is hard for an automatic tool to precisely detect and identify the sub-expression, such as the sub-expression ∼ (x − 1). To mitigate this problem, one possible solution is to integrate multiple heuristic rules into MBA-Flatten. erefore, MBA-Flatten can explore diverse reduction directions to generate a simpler result.
( ∼ (x − 1)∧y) * ( ∼ (x − 1)|y) +( ∼ (x − 1)∧ ∼ y) * ( ∼ ( ∼ (x − 1))∧y) � t 1 ∧y * t 1 |y + t 1 ∧ ∼ y * ∼ t 1 ∧y t 1 � ∼ (x − 1) It is possible that an adversary attacks MBA-Flatten by combining MBA obfuscation with other obfuscation techniques to generate an expression that does not satisfy the MBA definition in this study. Note that MBA-Flatten is designed for simplifying MBA expressions, so it may correctly handle the certain MBA sub-expression, but cannot solve the remaining non-MBA part. It is interesting to further investigate whether MBA-Flatten can interact with other analysis techniques (e.g., symbolic execution) to produce a better result.

Conclusion
Existing work performs well on simplifying MBA expression with very few variables. However, the state-of-the-art methods are hard to simplify a multivariable MBA expression. We investigate it and address this challenge using an in-place simplification method. A transformation procedure is proposed to transform a bitwise expression into a unified form, and we provide a mathematical proof to guarantee the correctness of this transformation. en, the arithmetic reduction is used to further simplify the expression and produce a simplified result. Our large-scale experiments show that MBA-Flatten is a general and effective MBA simplification method. Furthermore, developing MBA-Flatten not only advances automated malware analysis but also boosts SMT solving on the MBA equations.
Data Availability e data and codes presented in this study are available at https://tinyurl.com/y5l948pu.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.