RDFuzz: Accelerating Directed Fuzzing with Intertwined Schedule and Optimized Mutation

Directed fuzzing is a practical technique, which concentrates its testing energy on the process toward the target code areas, while costing little on other unconcerned components. It is a promising way to make better use of available resources, especially in testing large-scale programs. However, by observing the state-of-the-art-directed fuzzing engine (AFLGo), we argue that there are two universal limitations, the balance problem between the exploration and the exploitation and the blindness in mutation toward the target code areas. In this paper, we present a new prototype RDFuzz to address these two limitations. In RDFuzz, we first introduce the frequency-guided strategy in the exploration and improve its accuracy by adopting the branch-level instead of the path-level frequency. (en, we introduce the input-distance-based evaluation strategy in the exploitation stage and present an optimized mutation to distinguish and protect the distance sensitive input content. Moreover, an intertwined testing schedule is leveraged to perform the exploration and exploitation in turn. We test RDFuzz on 7 benchmarks, and the experimental results demonstrate that RDFuzz is skilled at driving the program toward the target code areas, and it is not easily stuck by the balance problem of the exploration and the exploitation.


Introduction
e enormous scale in modern software makes it a difficult task to conduct a thorough testing within a limited time budget and computing resources. Furthermore, in many practical scenarios, only some code areas need testing, such as the patches, security-sensitive functions, or some userdefined positions [1]. At present, the directed testing techniques are promising solutions to satisfy this requirement.
e state-of-the-art directed testing techniques mainly include fuzzing-based groups and symbolic execution-based groups [1][2][3][4]. However, owing to some unsolved limitations, e.g., the path explosion problem, the application of the symbolic execution [5] is still narrow on the large-scale software. On the contrary, the fuzzing techniques are advantaged at testing these large-scale programs, and they have successfully uncovered a lot of vulnerabilities so far. e study on the directed fuzzing techniques is currently extensive. More specifically, fuzzing is generally regarded as a random process, so that the directed fuzzing can be modeled as an optimal search, starting from an arbitrary input and searching for the inputs that can hit target code areas. Researchers have introduced some metaheuristic approaches to implement such searching and improve its performance. For example, Böhme et al. [1] presented a directed fuzzing tool, AFLGo. It leverages a distance-based evaluation to distinguish the inputs remote to the target code areas, and it uses a simulated annealing-(SA-) based schedule strategy to distribute the testing energy by the input-distance evaluation. In brief, it can steer the testing engine toward the target code areas, and its experiments show a better directed performance than the symbolic execution-based methods.
We ran AFLGo (directed fuzzing) and AFL (nondirected fuzzing) for certain times and examined their results. In some runs, AFLGo presents a faster speed than AFL to the target code areas; however, in other runs, the directed performance by AFLGo is not as expected to outperform AFL. is experimental result is caused by the different user-defined configurations in AFLGo runs. More specifically, there are two running stages in AFLGo, the exploration and exploitation stage. e exploration stage is designed to uncover as many codes as possible; then based on the revealed results, the exploitation stage is invoked to drive the engine to the target code areas. AFLGo adopts a timewise splitting method to coordinate these two stages; i.e., it first runs the exploration stage and then runs the exploitation stage, and the user-defined parameters settle the time budgets for these two stages. Because an inadequate exploration result usually leads to a poor exploitation result, AFLGo did not outperform AFL in some runs.
Besides, the random mutation is a comprehensive strategy in exploration stage to discover code areas. Nevertheless, we argue that the random mutation is weak in the directed testing, because it is blindly driving the program toward the target code areas, and it tends to destroy some pivot input content, violating the directing process.
In this paper, as a framework, we introduce the frequency guided strategy [6] in the exploration and the input-distancebased evaluation strategy [1] in the exploitation. More specifically, we first improve the exploration by utilizing the branch-level instead of the path-level frequency. en, in the exploitation, we argue that every input content plays a different role in the process toward the target code areas; for example, some input contents are just the carriers of data, whereas some other input contents can significantly affect the program execution. erefore, we present a disturb-andcheck method to identify and protect the distance sensitive input content, that is, our optimized mutation, in an attempt to produce inputs closer to the target code areas. During testing, the opportunity to invoke the exploration/exploitation stage is determined by the input evaluation results; the engine starts to explore when the input is helpful to uncover the codes, and it starts to exploit when the input is helpful to the target code areas; this is our intertwined testing schedule, an approach to handle the balance problem between the exploration and the exploitation.
On the basis of the techniques, we develop a prototype, dubbed by RDFuzz. In the experiments, we deploy RDFuzz on 7 benchmarks, investigating the effectiveness of our optimized mutation and the directed performance of RDFuzz. e results demonstrate that our techniques are useful to speed up the testing toward the target code areas. e rest of this paper is organized as follows. In Section 2, we present an overview of some background knowledge, the problems in directed fuzzing, and the framework of our solution in brief. Section 3 describes our proposed techniques in detail. Section 4 reports the implementation and presents experimental results. Section 5 elicits the threats to validity, and Section 6 is about our future work. Our conclusion is in Section 7. Miller et al. in 1990 [7], fuzzing has been the most useful technique for vulnerability detection [8]. It continuously feeds the program under testing with numerous inputs, which are generated by different strategy-oriented methods, in an attempt to find bugs in the program.

Fuzzing Algorithm. Since its introduction by
Algorithm 1 provides a generic fuzzing process. It first reads an initial seed pool (SP), which can be randomly generated or crawled from the Internet [9]; then, it activates a testing iteration and terminates when the Continue() function returns False. In each testing round, on the basis of different evaluation principles, the engine selects a basal input from SP via the Choose() function and produces a new input s ′ by mutation on s, the program executes on input s ′ , and the BugOracle() function inspects the bug(s) on the basis of the execution information (info); thereafter, the seed pool is updated in SPUpdate() function, and the engine would launch the next testing round.
In this fuzzing algorithm, the Choose() and Mutation() functions are two significant parts. e Choose() function determines how to pick suitable inputs for mutation, that is, the starting line of the searching, and the Mutation() function determines how to produce new inputs, that is, the searching process. As far as we know, most enhanced fuzzing techniques are proposed through researching on these two parts.

Exploitation and Exploration Strategies in Fuzzing.
ere are many approaches for fuzzing taxonomy, such as black-box, grey-box, and white-box fuzzers [8] and mutation-based and generation-based fuzzers [10]. To a certain point, the fuzzing techniques can be classified into exploitation and exploration techniques.
Exploration strategy is an attempt to perform thorough testing on the program. More specifically, many fuzzers take the testing coverage as an indication of the testing degree [11]. In order to maximize the testing coverage, a lot of new approaches are put forward, and they yield a lot of new fuzzers, like AFLFast [6], FairFuzz [12], Angora [13], VUzzer [14], and CollAFL [15].
On the contrary, exploitation strategy is an attempt to perform concentrated testing on parts of the program. Because modern software is almost huge, it is difficult to explore the whole software in a restricted time budget or by limited computing resources. Moreover, in many scenarios, thorough testing on the target program is not necessary; only some code areas demand testing, e.g., the program patches and sensitive functions.

Directed Fuzzing and Its Problems.
In the fuzzing community, the exploitation strategy is often leveraged in the directed fuzzing. It makes efforts to focus the testing on the process toward the target code areas and reduce the executions on the nontarget regions. More specifically, this mechanism is implemented by the modification on the Choose() and Mutation() functions (Algorithm 1). e Choose() function would pick out the inputs close to the target code areas; based on these inputs, the Mutation() function makes special manipulations to produce new inputs, in an attempt to steer the program toward the target code areas.
Furthermore, the exploitation is usually accompanied by the exploration, because exploration is to gather information and exploitation is to make decisions depending on the gathered information. In other words, the fuzzer adopts the exploration to uncover more code areas and the exploitation to find the positions closest to (or at) the target code areas.
For example, the directed fuzzer AFLGo [1] performs the exploitation strategy as follows: its Choose() function leverages an input-distance-based approach to determine the better and worse inputs, and its Mutation() function introduces a simulated annealing-(SA-) based approach to assign more testing energy to the better inputs and less to the worse inputs. Moreover, it employs a timewise splitting method to coordinate the exploitation and the exploration, it uses a "−z" parameter defined by the user to set a fixed time budget for the exploration, and the engine invokes the exploitation stage in the remaining time.
To assess the cooperation of exploitation and exploration strategies in directed fuzzing, we examine how AFLGo works with different "−z" parameters on the same benchmark. e experiments last for 24 hours; AFLGo-1 means 1 hour of exploration and 23 hours of exploitation, and the other configurations are shown in Table 1. e "min distance" (Y label in Figure 1) means the minimum inputdistance among all the generated inputs to the target code areas. A lower "min distance" denotes a better directed performance, explained in detail in Section 4.
In Figure 1, the comparing result of the directed performance is AFLGo-8 > AFLGo-4 > AFLGo-2 > AFLGo-1. It is caused by their different time budgets for the exploration. Compared to AFLGo-1, which performs an hour of exploration, AFLGo-8 sustains 8 hours of exploration, so that AFLGo-8 has gathered sufficient information for the following exploitation stage; i.e., the engine has uncovered a certain amount of the code areas; thus, AFLGo-8 outperforms AFLGo-1 in the directed performance as a result.

Tradeoff Problem.
We argue that it is a universal tradeoff problem to coordinate exploration and exploitation in the directed fuzzing. On the one hand, more exploration can obtain and provide adequate information for the exploitation; on the other hand, an overfull exploration would occupy many resources, and the exploitation is delayed as a result. AFLGo only employs a user-defined parameter to settle the time budgets for exploration and exploitation; this is maladaptive, and it would bring in some unexpected results. Figure 2 shows the tradeoff problem in AFLGo.
In particular, Hawkeye [2], including other improved directed fuzzing engines, is enhanced by a stronger exploitation ability to reach target code areas. However, the tradeoff problem still constitutes a bottleneck in their performance.

Mutation Problem.
Besides, in the directed fuzzing, the mutation functions should conduct some special operations to adjust the input generation to speed up the directed process. However, the random mutation used brings in a severe limitation; due to the blindness, the following mutation would destroy some critical input contents previously generated. It causes deterioration in the input quality so that the directed fuzzing performance is limited.

Summary.
To conclude, we summarize the two limitations in the present directed fuzzing techniques as follows: the tradeoff problem between the exploitation and the exploration and the blindness in mutation.

Framework of Our Approach.
To address the limitations mentioned above, we present a new directed fuzzing tool, dubbed by RDFuzz, and Figure 3 shows its framework. ere are two testing loops: (i) the exploration loop is designed to improve the code coverage and provide a sufficient discovery; (ii) the exploitation loop conducts the searching for the inputs on the target code areas.
ere are three significant parts in the framework (marked with green). e input evaluation appraises each input, determines its input type, schemes the testing energy, and decides which testing loop to invoke (exploration/exploitation), to make excellent use of the inputs from the seed pool. e distance sensitive content is distinguished by a disturb-and-check method, which relies on the Bug ⟵ Bug ∪ bug; (8) SP ⟵ SPUpdate(info); (9) end (10) end ALGORITHM 1: Fuzzing algorithm.

Mathematical Problems in Engineering
input-distance calculation. e optimized mutation is aware of the distance sensitive content, and it protects these input contents from variation in producing new inputs.

Methodology
is section presents our improvements on the exploration strategy in 3.1 and the improvements on the exploitation strategy in 3.2 and describes the scheduled workflow in 3.3.

Improvements on the Exploration Strategy.
It is common knowledge that an abundant exploration result empirically leads to a better global search capability and a better ability to avoid falling into the local optimum. To achieve such a result in the directed fuzzing, we introduce the frequency guided strategy [6], which is aimed at discovering more code areas, especially the codes deeply buried in the program.
In this strategy, the execution frequency is counted, and the code areas are separated into high-frequency and lowfrequency areas according to a certain threshold. Based on the Markov model [16], the high-frequency code areas are regarded as easily accessible, and the low-frequency positions are difficultly accessible. erefore, the testing energy distribution should be lean to the low-frequency positions, in an attempt to uncover more codes. e original frequency guided strategy is conducted on the path-level, we meliorate its accuracy by figuring out the high-frequency/low-frequency code areas on the branch-level statistics. Two reasons inspire this modification. e first reason is that the path-level statistic may misinform the low-frequency code areas. We take the example in Figure 4 as an explanation. ere are 4 paths and 4 branches (A, B, C, D, and E are five basic blocks). It is assumed that path B ⟶ C ⟶ E is executed 5 times, and the other three paths are executed 100 times. On the path-level, the minimum execution frequency is only 5% of the maximum, so that the engine would regard the B ⟶ C ⟶ E path as the low-frequency code areas and regulate more testing energy on the inputs running along with B ⟶ C ⟶ E path. However, on the branch-level, the ratio between the minimum execution frequency and the maximum is 50.5%, which is not a remarkable value in the fuzzing process, so that none of these 4 branches would be regarded as the lowfrequency code areas, and the testing energy on these branches is limited in a certain range. e second reason is that the path is a kind of sequential structure, and it is expensive to restore the sequential structures in the fuzzing process. To make a balance between efficiency and effectiveness, Zalewski [17] presents an approach of splitting a sequential path into an unordered group of branches. Besides, the original frequency guided strategy chooses to restore parts of the discovered paths, which are selected by a corpus distillation technique [18].
According to the analysis above, we prefer to determine the high-frequency/low-frequency code areas according to the branch-level statistic. en, the inputs running on the lowfrequency code areas would be prioritized and obtain superiority in testing energy distribution. e execution number of each branch is counted during the testing, to make the branch-level statistics. It is realized by a global counter and some instrumentation. e counter maintains the total execution number of each branch, and the instrumentation informs the counter to add a certain number after executing a branch. After that, the testing energy on each input is regulated by the minimum execution number of all the branches in its trace.
Formally, for a branch br, its execution number numBr [br] is accumulated as equation (1). R is the record of all the executed inputs, and NumHit(br, s) is the execution number of branch br on running input s. After obtaining the statistic, for an input s, its frequency F br (s) is chosen as the minimum branch-level frequency on its execution trace, as equation (2) shows, where T(s) denotes a set of all the appearing branches by running input s.
Furthermore, Algorithm 2 describes a general process of the exploration stage. It iterates the inputs in the seed pool (Ln.1-Ln.6); for each selected input, the engine calculates its minimum branch-level frequency F br (s) (Ln.2) and assigns a suitable testing energy E(s) to it (Ln. 4); then it invokes the random input generation to perform the testing (Ln.5).  In practice, there are many feasible solutions in GetE-nergyByFreq() function; the distribution can be exponential, linear, quadratic [19], etc.

Improvements on the Exploitation Strategy.
e exploitation is to drive the program toward the target code areas, and it is conducted based on the exploration results. Our exploitation strategy consists of an input-distance calculation and an optimized mutation. e former is introduced from AFLGo [1], and it can determine whether an input is close to the target codes or not. e latter attempts to avoid producing inputs whose distances are farther than the original input for mutation.

Distance Calculation.
e input-distance is calculated by five steps, indicating the distance from the input execution trace to the target code areas.
(i) e call graph (CG) and control flow graph (CFG) are extracted from the target program by static analysis. It can be done by the functions provided by LLVM [20] on the program source codes. (ii) On the CG, supposing the target code areas are located in the f T function, the distance between an arbitrary function to the f T function can be obtained by some certain graph-based algorithms [21]. is function level distance is represented as (iii) On the CFG, for any two arbitrary basic blocks, their distance can also be calculated by some certain graph-based algorithms. is basic block level distance is represented as d bb (b i , b j ).  (3), where S denotes a set of the feasible b k choices, b k calls f k , and α is a constant parameter.
(v) For each input, there is a series of basic blocks in its trace, and each basic block is assigned a distance value in step iv, so that the input-distance can be calculated by the distances of the basic blocks emerging in the input execution trace.
e calculation is formally shown in (4). T b (s) is a set of all the basic blocks in the execution trace of input s; d bf (b i1 , f T ) means the distance from basic block b i1 to the target function f T ; F function reads all the distances of the basic blocks, which are in the T b (s), and outputs a final value as the input-distance d s (s, f T ).
We take an example to describe the distance calculation from a basic block to the target code areas (step iv). In Figure 5, b 1 is the considered basic block, and the target code areas are in the f 3 function. According to the CFG and CG, there is a reachable path from b 1 to f 3 , i.e., b 1 so that the distance from b 1 to f 3 is calculated as (5) shows, α is a constant parameter.

Optimized Mutation.
Based on the input-distance calculation, we further propose an optimized mutation to help the engine reach the target code areas. e core idea is to prevent the input generation from destroying the crucial input content. e concept of crucial input content is widely employed in the fuzzing research; however, it has various meanings. In our research, it particularly refers to the distance sensitive content. e distance sensitive feature indicates that the content is vital to maintain the inputdistance; once the content is altered, the input-distance would become larger; i.e., the input becomes farther from the target code areas. As a typical configuration, the input content is represented in a bytewise granularity in this paper.
We design a disturb-and-check method to detect the distance sensitive bytes. e method is implemented based on the input-distance comparison between the original input and the mutated input. Given an original input s o , a byte content is altered (assuming the No. k byte) and a new input s k is produced. By comparing the input-distance of s o and s k , it can make an assertion about whether the No. k byte is distance sensitive or not. After traversing the input, it can obtain all the distance sensitive bytes.
In the directed fuzzing, the goal of the engine is to continuously produce the inputs that are close to the target code areas, until reaching them. erefore, we argue that it is beneficial to protect the distance sensitive bytes in the input generation. By adopting this strategy, the input generation is optimized and called OptimizedMutation.
In addition, we take an example to explain why the distance sensitive input bytes should be protected. Figure 6 shows a code snippet; variables x and y are from the input, supposing that x is from the 5th and y is from the 6th input byte. ere is a bug at Ln.6, i.e., the abort() function, and it is settled as the target code. To ensure a more clear description, we take the code line as a representation to describe the basic block level distance d bf (b i , f T ). Because Ln.6 is the target code, its distance is 0, and the distances of the other code lines are shown in Table 2.

Mathematical Problems in Engineering
Given an input s, assuming x is 100 and y is 150 on the execution trace, the program can execute till Ln.5, and its input-distance is (4 + 3 + 2 + 1/4) � 2.5. Once the 5th byte is altered, and a new input s n is generated, x is not 100 anymore, the program only executes till Ln.4, and the inputdistance of s n is (4 + 3 + 2/3) � 3. As the input-distance becomes larger, the 5th input byte is distance sensitive. In the mutation, if the 5th input byte is protected, x can be kept as 100, and the program can always execute till Ln.5, increasing the possibility of reaching Ln.6 (the target code).
Finally, Algorithm 3 describes a generic process of the exploitation. It iterates the inputs in the seed pool (Ln.1-Ln.5); for each selected input, the engine calculates its input-distance (Ln.2) and assigns a suitable testing energy to it (Ln.3); then it invokes distance sensitive optimized mutation to perform the testing (Ln.4). e practical solutions in GetEnergyByDis() function can be heuristic; for example, AFLGo [1] leverages a simulated annealing approach to assign the testing energy, and Hawkeye [2] leverages a balanced power function to regulate the testing energy.

Intertwined Schedule for Directed
Testing. According to the techniques mentioned above, based on the evaluation results about the frequency and input-distance, the inputs can be classified into HD/LD (high/low distance) and HF/LF (high/low frequency) types. By grouping the two evaluation criteria, the inputs can be classified into four types. ey are shown in Table 3.
Because the LF inputs are helpful to improve the coverage, they are required in the exploration; the LD inputs are helpful to achieve the target code areas; they are required in the exploitation. Different from the timewise splitting schedule [1,2], an intertwined testing schedule is designed in the directed testing, as Algorithm 4 shows. In a nutshell, it is a combination to alternately conduct the exploration (Algorithm 2) and exploitation (Algorithm 3); moreover, the testing energy is assigned according to their frequency and input-distance.
In total, this schedule manifests a self-evolution characteristic, and it can alternately improve the exploration results and the exploitation results, in an attempt to avoid being stuck into the local optimum in the directed fuzzing process.

Implementation and Evaluation
Based on the proposed methods, we develop a prototype, dubbed by RDFuzz that is established on AFL [22], LLVMbased analysis [20], and a python script with the NetworkX package. AFL provides a fundamental testing framework; besides, we add some extra codes into afl-fuzz.c module, implementing our proposed methods, including the frequency statistics, the OptimizedMutation() function, and the testing energy distribution. LLVM-based analysis and the python script are employed in the input-distance calculation.

Experimental Infrastructure.
All experiments were conducted on a server equipped with Intel Xeon CPU E5-2620 v2 @ 2.10GH (12 cores in total) and 64 GB RAM, running Linux Ubuntu 14.04 LTS AMD64. All the experiments were launched in single mode, with the same initial seeds.  Figure 5: An example to show the distance computation from a basic block to the target code areas. Mathematical Problems in Engineering 7

Benchmark.
In the experiments, we selected 7 benchmarks to test and verify our proposed techniques; the information is shown in Table 4. Specifically, the first benchmark is according to the tutorial in AFLGo document [23], are the other benchmarks are the bug instances in the real-world programs (with CVE IDs).

Target Code
Areas. e target code areas are set before testing. On the AFLGo-demo benchmark, the target code areas are set according to the document [23]. On the other benchmarks, the target code areas are set according to their crash traces. For example, Figure 7 shows the crash trace of CVE-2017-9050 (from ASAN report [24]), so that the target code areas are selected as dict.c:285, dict.c:926, and parser.c:3425 (not all the codes in the trace are the target codes). All the ASAN reports of the benchmarks are taken from the web [25] or generated by executing the crash samples.
ough the target code areas are set according to the ASAN reports, all the benchmarks in our experiments are not compiled with addressSanitizer [26], because compiling with addressSanitizer is an augmented behavior for vulnerability detection, but it is not our main investigation point; in addition, it would inevitably lead to much overhead; in the experiments, a much larger time budget should be set to obtain a uniformly convincing evaluation result with addressSanitizer.

Research Questions.
e following experiments were designed to answer two research questions: (i) RQ1: Does the OptimizedMutation work as expected to improve the possibility of generating the inputs, which are closer to the target code areas than the original input? (ii) RQ2: Does RDFuzz manifest a well directed performance in reaching the target code areas?

Evaluation of OptimizedMutation (RQ1).
Due to the lack of the testing feedback, it is possible that the random mutation produces inputs same as or better/worse than the original input. On the contrary, the optimized mutation is designed in such a way to ensure the generation of inputs is closer to the target code areas.
In order to make a comparison between the random mutation and the optimized mutation, we present two appraisal indexes (AI1, AI2). e first index (AI1) is the proportion of the inputs holding input-distance not larger than the original input; i.e., d n ≤ d o , where d n is the input-distance of the generated input and d o is the input-distance of the original input. e second index (AI2) is the proportion of the inputs holding an inputdistance smaller than the original input; i.e., d n ≤ β · d o , where β is a constant smaller than 1, which we consider as 0.95 in the experiments.
More specifically, AI1 is used to evaluate the ability to generate inputs which can at least keep their distances to the target code areas; i.e., the generated inputs do not go to other positions far from the target code areas. However, mutating on the noncritical bytes can also generate such inputs, yet it is a meaningless mutation. erefore, we further introduce AI2 to evaluate the ability to generate inputs that are closer than the original input to the target code areas.
Besides, as RDFuzz is built on AFL, which includes two mutation stages, Det and Havoc [27], we make evaluation on Det and Havoc, respectively. In Det stage, the input generation is dependent on a slight mutation, and in the Havoc stage, the engine leverages a heavy mutation for input generation.
e evaluation results w.r.t the AI1 appraisal index are shown in Table 5. It can be seen that, for each benchmark under the same mutation, the proportion in Det is much larger than that in the Havoc stage.
is is an expected result, because the inputs produced in Det have strong similarity to the original input (slight mutation), and the ones produced in Havoc have only weak similarity (heavy mutation).
For each benchmark, by comparing the proportions in Det and Havoc stages, respectively, it is concluded that the (7) switch Type(s) do (8) case HFLD do (9) OptimizedMutation(s, E d (s)); (10) end (11) case LFHD do (12) RandomMutation(s, E f (s)); (13) end (14) Case LFLD do (15) OptimizedMutation(s, E d (s) + E f (s)); (16) end (17) Otherwise do (18) continue; (19) end (20) end (21) end ALGORITHM 4: e intertwined schedule.  Mathematical Problems in Engineering optimized mutation is better than the random mutation at generating the inputs; those would not go to far positions from the target code areas. Precisely, the optimized mutation can achieve an average of 90% proportion in Det and 42% proportion in Havoc. On the contrary, the random mutation only obtains an average of 68% in Det and 25% in Havoc. Furthermore, Table 6 shows the evaluation results w.r.t. the AI2 appraisal index. On average, the optimized mutation is still shown to outperform the random mutation in generating the inputs with smaller input-distance; in other words, our optimized mutation manifests a stronger directed ability. Besides, as can be seen, the values in Table 6 is smaller than those in Table 5; it is indicated that generating inputs with smaller input-distance is quite difficult; to some extent, this means that driving the program to the target code areas is not easy.
Overall, the analysis indicates that the answer to RQ1 is definite; the optimized mutation is able to improve the possibility of generating inputs closer to the target code areas.

Evaluation of the Directed Performance (RQ2).
In this section, we are ready to make an investigation about the directed performance of RDFuzz and compare it with other directed/nondirected fuzzers.
To represent the directed performance quantitatively, we adopt the minimum input-distance among all the generated inputs as the appraisal index, which is a way to reflect the best result of the directed testing. Furthermore, we do a trick that if an input hits the target code areas and the program crashes at the same time, then the input-distance goes to 0 immediately.   Moreover, the exploration result plays a significant role in the directed fuzzing, because an abundant exploration usually leads to a satisfactory exploitation result. erefore, we also pay attention to the coverage result. As a choice, we take the branch coverage number as an indicator of the exploration results; in fact, it is the size of the covered bitmap, which is a fundamental mechanism provided by AFL.
In the performance comparison, it is necessary to take AFLGo into consideration. Since Hawkeye [2] is not yet available, we cannot run our own experiments comparisons against it. Besides, we also take two nondirected fuzzers into consideration, AFL and FairFuzz, because only the exploration process can sometimes cover the target code areas as well, so that taking some nondirected fuzzers into consideration can make the analysis more convincing. Figure 8 shows the exploration and exploitation results on the 7 benchmarks. Based on these results, we gain the following observations.
(i) On the 7 benchmarks, reducing the minimum input-distance in the RDFuzz (the red line) shows a better drop than AFLGo (the green line). Besides, it can be seen that RDFuzz achieves better exploration results than AFLGo on 6 benchmarks out of the total 7 benchmarks. And on the last benchmark (CVE-2015-7497), the exploration result by RDFuzz is close to that by AFLGo. In summary, it is suggested that RDFuzz is a better directed testing engine than AFLGo, at least on these 7 benchmarks. It is also proved that our improvements on exploration and exploitation, as well as the intertwined schedule, work well to provide a sufficient exploration result and obtain a good exploitation result. e poor directed performance in AFLGo is due to the exploration and exploitation tradeoff problem; AFLGo can get stuck in the exploitation stage with an inadequate exploration result. On the contrary, FairFuzz is always in the exploration stage, aiming to improve the coverage, and it finds some inputs close to the target code areas, owing to its high exploration results.
is is just the reason why we still introduce the exploration strategy in RDFuzz. As can be seen, on these three benchmarks, the directed performance of RDFuzz is similar to FairFuzz and better than AFLGo. It is proved that RDFuzz can avoid being stuck by an inadequate exploration result.
Overall, the analysis indicates that the answer to RQ2 is definite; RDFuzz manifests a well directed performance in achieving the target code areas.

Threats to Validity
ere are three threats to validity. e first threat is that the CG construction is limited by the identification of the indirect call, e.g., the function pointer [28], which can make the CG construction incomplete, and the graph-based distance calculation becomes inaccurate as a result.
e second threat is that it is still difficult for the directed testing to discover the code areas deeply buried in the program, whereas some improved exploration strategies are applied. is is due to the inherent limitation of the fuzzing technique, because it is clumsy at discovering the paths protected by the complex constraints. e third threat is that our proposed approach is a heuristic solution to the balance problem between exploration and exploitation in the directed testing, yet the evaluation is not widely exampled. Because the realworld programs are variously complicated, we cannot ensure the effectiveness of our approach on any programs.

Future Work
As future work, we are planning to investigate some lightweight methods to make CG construction more complete, which is a significant step to provide accurate input evaluation. Moreover, we will explore some adaptive algorithms, which can bring better solutions to the tradeoff problem between exploration and exploitation. And we will make 10 Mathematical Problems in Engineering more evaluation on real-world programs to examine the effectiveness.

Conclusion
In this paper, we perform an investigation on the state-ofthe-art directed fuzzing engine AFLGo and point out two main limitations. We argue that these two limitations are representative in the directed fuzzing scenarios. e first limitation is the balance problem between the exploration need and the exploitation need in the directed fuzzing. e second limitation is that the random mutation is blind and cannot steer the program toward the target code areas. We further propose a new tool, dubbed by RDFuzz, to provide a better directed performance. In RDFuzz, based on the input evaluation and classification, we apply an intertwined testing schedule and present some improvements on the exploration and exploitation stage, respectively. e   Figure 8: For each benchmark, the left subfigure shows the exploitation result, i.e., the minimum input-distance among all the generated inputs; the right subfigure shows the exploration result, i.e., the number of covered branches.
evaluation results demonstrate that RDFuzz is skilled at steering the program toward the target code areas and is not easy to get stuck into the local optimum.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.