A Multiple-Fault Localization Method for Embedded Software with Applications in Engineering

Embedded software is increasingly being used with high reliability. However, the fault localization of embedded software is still largely dependent on the experience of engineers. Besides, faults in embedded software programs are not independent individuals; they are related to each other and aﬀect each other, which may lead to more complex interaction behavior. These uncertainties render the traditional methods for single-fault localization with limited practical value. This paper has proposed a multiple-fault localization method to be applied to the embedded software, with emphasis on the cache-based program spectra-acquiring method and the hybrid clustering-based fault partition method. Through case studies on 108 groups of the subject program, it has been proved that the hybrid clustering-based fault partition method has signiﬁcantly improved the eﬀectiveness of multiple-fault localization in comparison with the traditional fault localization methods. Experiments on three embedded software programs in engineering have revealed that the cache-based program spectra-acquiring method saves nearly half of the running-time cost compared with the traditional spectrum-acquiring method based on real-time transmission. Therefore, the multiple-fault localization method proposed in this paper can be applied in embedded software debugging and testing in engineering.


Introduction
With the rapid development of the modern society, computer software has been integrated into all walks of human life, including embedded systems, operating systems, databases, games, etc. Among them, the embedded system is widely used in key fields, such as daily electronics, home applications, vehicle equipment, transportation systems, military equipment, and aerospace applications, due to its small size, high reliability, flexibility, and convenience [1]. e main functions in the embedded system designs are customized through the embedded software. With the strong support of hardware technology, embedded software is constantly developing toward complexity, openness, and large-scale. Compared with nonembedded software, embedded software has such outstanding characteristics as strong real-time performance, tight hardware coupling, and complicated interaction environment, and is more prone to errors during the development process. is is why stricter requirements are placed on the reliability of embedded software [2]. Once the undetected faults or potential faults occur, the embedded system can no longer perform the right functions, resulting in data loss, system crash, and even threat to the safety of human life [3][4][5][6].
Like most software, faults in embedded software come from the code programming. ey can be divided into several categories, such as memory-related faults, initialization faults, calculation faults, input and output faults, control flow faults, data processing faults, etc. Among the kinds of faults, some tools can detect and locate memoryrelated faults and initialization faults, which are easier to avoid and solve. However, it is arduous to detect such failures as control flow faults, calculation faults, input and output faults, and data processing faults with tools due to the complexity of logical semantics.
In the past decades, researchers have made valuable achievements to explore fault localization methods. e general idea of fault localization is analyzing the program statement and its execution result; therefore, the testing data such as execution information is an intensely important basis during fault localization. In embedded software, due to the expansion caused by code instrumentation, the memory space usage may be tighter, and the real-time performance is greatly affected. Besides, the embedded software runs independently of the development environment, making it more arduous to get the coverage data compared with the general software. In the traditional embedded software development, developers mostly debugged by assertions [7], setting breakpoints [8], and program logging [9]. However, these debugging methods need a high level of understanding of the logic, structure, and function of the software, making it time-consuming and with low efficiency. Besides, the effect of fault localization also depends on the mastery of prior knowledge, assumptions, debugging experience, and the construction of the test case set. erefore, a more effective method is needed to solve the problem of fault localization in embedded software.
In this paper, we have put forward a fault localization method applied to embedded software, and this paper contributes to the following.
First, the negative impacts of multiple-faults in the software are analyzed. As is known, faults in programs are often not independent. ey are related to each other and affect each other, which may lead to more complex behavior. For instance, faults in subroutines may infect callers through subroutine calls and may infect more through communications between processes. ere are many mature methods in the existing literature to solve the problem of single-fault localization, but through the empirical analysis in Section 3, they are found to be not fully applicable to multi-fault software.
Second, a fault localization method for embedded software is proposed. Different from the fault localization methods for general software, the particularity of the embedded software application program is satisfied through the following: (1) An embedded software spectrum-acquiring method based on data caching is adopted to consume minimal resources and minimize the interference to the software itself. (2) A clustering combination based fault localization method is designed to improve the traditional fault localization method to be suitable for multiple fault localization.
ird, engineering practice has been carried out in embedded software with the proposed method. In most literature, only typical programs in the SIR (Software Infrastructure Repository) [10] are employed for experiment evaluation. However, the code size of these programs is relatively small, and the correlation between faults is relatively single; therefore, the verification of the fault localization method is not sufficient enough. is paper uses both open source software and real engineering software to verify the proposed methods, confirming the applicability and effectiveness of the method in practical applications. e remaining part of the paper proceeds as follows. Section 2 gives the background of the software fault localization methods. Section 3 discusses some negative effects of traditional fault localization methods when locating the programs with multiple faults. Section 4 presents the multiple-fault localization method based on the clustering combination. Some empirical studies are carried out on both open-source software and real engineering software in Section 5. Finally, we also discuss the threats to validity and consider our future work.

Related Work
Researchers have proposed many fault localization methods that have excellent performance. Most of them come from different fields of computer science, such as neural networks, graphics theory, artificial intelligence, information theory, and automation theory. Wong et al. has classified fault localization methods into eight categories, including slicebased, spectrum-based, statistics-based, program-statebased, machine-learning-based, data-mining-based, modelbased, and miscellaneous methods [11][12][13]. Among them, the program-spectrum-based fault localization (SFL) [14] is an effective method for software fault localization and has been widely studied and applied in engineering. A program spectrum is defined as the execution information about a program from certain perspectives, such as the execution information for conditional branches or loop-free intraprocedural paths [15]. Code coverage [16], or Executable Statement Hit Spectrum (ESHS) [17], is used to present the program entity, which has been covered during the testing. Using this information, the program entity related to the failure is easily identified, thus narrowing the searching scope for the fault code. Among most SFL methods, Tarantula, Jaccard, and Ochiai methods have achieved outstanding results, and they use a statistical method to calculate the suspiciousness score of the program entities and rank them in sequence [18,19].
e term N f (s) represents the total number of the failed test cases. e term N np (s) represents the number of times when the statement S is not covered and the test case passes. e term N ep (s) represents the number of times when the statement S is covered and the test case passes. e term N nf (s) represents the number of times when the statement S is not covered and the test case fails. e term N ef (s) represents the number of times when the statement S is covered and the test case fails. e value range of equation (1) is between 0 (the lowest suspicion value) and 1 (the highest suspicion value). Program entities can be sorted according to the suspicion value in descending order and debugged until the fault is located. e number of faults contained in the tested program is arduous to know in advance, and mostly there is more than one; therefore, a growing number of research studies are focusing on exploring effective multiple-fault localization methods in recent years [18,[20][21][22][23][24][25][26]. Jones et al. divide the program into several parts based on the execution of the testing cases and then assign different developers to locate the faults in parallel [27]. Abreu et al. proposed the BAR-INEL method, which uses the Bayesian model to sort candidate sets representing multiple-faults [28]. is method has a good performance both in single-fault and multiple-fault localizations. However, it needs developers to keep real-time interaction in code debugging to ensure that candidate set sorting can be modified continuously. Steimann et al. tried to use probability distribution to estimate the number of internal defects [29] and found that using an integer linear programming algorithm can significantly improve the parallelism of fault localization [30]. Ruizhi and Wong proposed an advanced fault localization method Mseer for multiple bugs in parallel, based on revised Kendall tau and K-medoids clustering methods. e Mseer proved to have more efficiency and accuracy compared to the other two methods by experimental results [31].
However, neither single-fault localization methods nor multiple-fault localization methods in literature have practical applications due to the higher performance costs or lower efficiency. Besides, because of the difference in research focus and experimental subjects, it is arduous to compare the above methods using a uniform evaluation.

Negative Impacts of Multiple-Faults in Software
e relationship between the software code and the fault is intensely complex. When a software program is divided into several modules, the relationships exist not only among modules but also among program slices or statements within a module. From the perspective of control flow and data flow analysis, the root of the associated faults is that the current state of the software is affected by the previous state. Although most of the software programs are developed based on high cohesion and low coupling, they still cannot achieve complete independence of modules. is is especially true in object-oriented software development [32]. Inheritance determines that there will be an inheritance or derivation among classes, which leads to the same inheritance and transitivity of faults. Faults at the bottom modules will be passed to the upper modules through interface calls, etc., affecting other related objects or modules. erefore, the environment of software with multiple-faults is more complex, leading to some unexpected situations in traditional spectrum-based fault localization methods.
In the following section, the negative impacts the traditional spectrum-based fault localization methods have on software with multiple-faults will be discussed according to the empirical analysis.

e Sample Program with Multiple-Faults.
e sample program with multiple-faults [21] to illustrate the negative impacts the traditional spectrum-based fault localization methods have on software with multiple-faults is demonstrated in Figure 1, which has two faults on s7 and s20. ere are ten test cases from t1 to t10 in the test suite. e execution trace of each test case is represented by the black dots. e testing results are given at the bottom of each test case, and the term F or P points out whether the test case is failed or passed executed. e suspiciousness score calculated by the Tarantula method is listed in the last column of the table.

Inspiring Our Work.
e process of locating the first bug is as follows. As statements s14, s16, s18, s21, and s24 have the highest suspiciousness score, they are examined first in a logical order. But, there is no bug existing in any statement. Similarly, statements s11 and s13 have the second-highest suspiciousness score and they are examined next, but there is no bug existing either. Next, statements s1 to s6 and s15 are examined after s11 and s13. Finally, when statements s7 and s8 are examined, a bug is found in statement s7.
As seen from the above localization process, 15 nonfaulty statements were examined before locating the first fault, proving that the fault localization efficiency will be greatly reduced when using the traditional Tarantula method to solve the multiple-faults problems.
Besides, it is amazing that the nonfaulty code s24 has a higher suspiciousness score than the faulty code s20. According to the code analysis, the fault on s20 propagates to s24 along with the control flow of the true branch. However, the Tarantula method ignores the fault propagation with the control flow and the data flow among program blocks, making the suspiciousness score of s24 higher than that of s20.
We have also found that the suspiciousness score of s1 to s4 is higher than the s7 statement because they are executed in each test case. It can be inferred that in programs with multiple-faults, the value of shared program entities such as the program entry will be greater than that of the faulty program entity, which makes the fault localization effect worse. We can also get the same conclusion through the equation of Tarantula. In a program with multiple-faults, the N ef (s) value of nonfaulty code may be greater than that of the faulty code, which reduces the suspiciousness score of the faulty code and makes the accuracy of fault localization worse.
Furthermore, we have observed that the faulty code s7 has been executed by test cases t2, t3, t5, t8, and t10, but all of the execution results do not fail.
us, the suspiciousness score of the faulty code s7 is ranked in the third last place by the Tarantula method. Test cases t2, t3, t5, t8, and t10 are named coincidental correctness test cases [33]. According to the equation of the Tarantula method, there is an inverse relationship between N ep (s) and the final result of the equation. If there are a large number of coincidental correctness test cases, the value N ep (s) of the faulty code will increase whileN np (s), N nf (s), and N ef (s) remain unchanged. In this condition, the denominator value of the equation increases, and the suspiciousness score of faulty code is reduced, affecting the ranking of the faulty code. It is inferred that with a greater increase of coincidental correctness test cases in software with multiple-faults, the traditional Tarantula method suffers a larger impact.
Overall, it is well supported from the fault localization process of the sample program that the traditional spectrumbased fault localization method is not fully applicable in software with multiple-faults, which is consistent with the research conclusions in [34]. erefore, a more effective fault localization method is required to solve the problem in software with multiple-faults.

The Proposed Approach
In this section, we propose a multiple-fault localization method applied to the embedded software. Similar to most studies, the proposed approach also rests on the following assumptions raised by [34]: (i) e faulty code can be covered by both failing and passing test cases (ii) At least one bug can be triggered by each failed test case, which leads to the fault (iii) e prior probability distribution of faultiness is unknown (iv) Developers can accurately judge whether the suspect statement is a defective statement during code review, and then effectively remove the defect e framework of the proposed method is displayed in Figure 2, which can be divided into four phases: (1) cachebased spectrum acquiring; (2) spectrum matrix constructing and preprocessing; (3) hybrid clustering-based fault partition; (4) faults locating.

Cache-Based Spectrum
Acquiring. Suppose the embedded software program P � (s 1 , s 2 , . . . , s n ) contains n program entities and P i (1 ≤ i ≤ n) refers to the i th(1 ≤ i ≤ n) program entity of the program P.
e test suite T � (t 1 , t 2 , . . . , t m ) corresponding to program P contains m test cases and Harrold et al. generalize the spectrum and propose various types of spectrum, such as the Complete-Path Spectra (CPS), Path-Count Spectra (PCS), and Branch-Count Spectra (BCS) [15], which is acquired by implanting probe functions at appropriate locations of the software program under test. e CPS spectra and the PCS spectra have played an important role in helping developers analyze information about the execution of the program and localize faults in general software programs. However, the memory resources in the embedded software are extremely limited; it would not be cost-effective to collect the traces required for the CPS spectra and the PCS spectra. e traditional way of instrumentation used in general software would inevitably bring a certain amount of code expansion and greatly affect the function and performance of the embedded software itself. Besides, quite a few embedded software has no extra output channels, making it arduous to transmit the spectrum data in real-time during running. To solve the above problem, we propose an embedded software spectrum-acquiring method based on data caching with the following steps: Step 1 Add the start and end braces of each logic block, and generate the correspondence of line numbers before and after the above execution.
Step 2 Count the number of statements of the program and establish a statement array to record the number of times each statement is executed. Initialize the corresponding elements in the info array according to the analysis of the program.
Step 3 Perform lexical analysis of the program, and implant the instrumentation function.
Step 4 Run test cases on the program after instrumentation, and update the number of times the corresponding instance is executed in the array according to the result of the instrumentation function.
Step 5 Obtain the spectrum data of the original program before instrumentation according to the data in the array.
Step 6 After executing each test case, transmit the spectrum data to the host computer using the idle output channels. e architecture of the embedded software program spectrum acquisition is demonstrated in Figure 3. e embedded software after instrumentation runs on the target board. During the execution of the test cases, the target board puts the instrumentation information to the message queue in real-time, and then sends the information to the host computer at the appropriate time.

Spectrum Matrix Constructing and Preprocessing.
e instrumentation of program P is implemented and the executable file is generated after compilation. e test suite T is then loaded and executed, and the execution data of the program has to construct the program spectrum matrix demonstrated in Figure 4.
Matrix M is used to represent the coverage information of test suite T, where M ij (M ij ≥ 0)represents the number of times the i th(1 ≤ i ≤ n) program entity in program P is executed by the j th(1 ≤ j ≤ m) test case. e testing results of the program P are represented by the matrix RE, where RE j � 1 indicates that the testing result of thej th(1 ≤ j ≤ m) test case in T is passed, and RE j � 0 indicates that the ex- However, the values in matrix M of program P vary due to the influence of the test suite and the fault codes. To eliminate the influence on the accuracy of data processing introduced by the magnitude difference of data values, it is indispensable to carry out the data standardization to keep the values in uniform measures. e Z-score standardized method defined in (4) is used. e data in matrix M are converted into scores without units. In Equation (4), the data x i equals N ef (s), μ points out the average value of the code coverage data on S i (1 ≤ i ≤ n) by the failed test case and σ points out the standard deviation of the code coverage data.

Mathematical Problems in Engineering
We present an example of a program spectrum, explaining the method of data standardization. Assuming the coverage data of the failed test case t x is t x � (x 1 , x 2 , . . . , x n ), where x i represents the number of times the statement S i (1 ≤ i ≤ n) is covered by the test case t x . e spectrum of the first failed test case t1 is as given in Table 1    ... and column represents the number of times the statement was executed by the test case. As seen from the example, the number of times the statement was executed by the test case varies greatly. As an example, statement s6 was covered by test case t1 10 times, while statement s7 was covered by test case t1 9 times. After the calculation, the average value μ of the coverage data on S i (1 ≤ i ≤ 15) by the failed test case t1 is 2, and the standard deviation of the coverage data σ is 3.2071. Due to equation 4, the program spectrum data calculated after standardization is demonstrated in Table 2.

Hybrid Clustering-Based Fault Partition.
Due to the previous research, the execution paths of the failed test cases have high similarity [35]. Failed test cases can be partitioned based on the similarity of execution paths, converting the multiple-fault localization into multiple single fault localization processes. Data clustering, which aims to group objects into subsets that have the meaning of the context, is an effective method to deal with the problems of multiplefaults [36]. e K-means method is one of the simple and commonly used clustering methods that group the given dataset into k clusters. e benefit of this method is simple and fast, which is relatively scalable and efficient for processing large datasets. Suppose the term N represents the number of objects in datasets, and the term K represents the number of clusters. e K-means method often ends with a local optimum when K << N in most datasets. erefore, the clustering effect is remarkable when the difference among the clusters is obvious. However, the traditional K-means method is sensitive to the first value, and the selection of the first clustering center has a great influence on the clustering results.
In this work, a hybrid clustering-based fault localization (HCFL) method is employed to reduce the influence of the traditional K-means method and improve the clustering efficiency and accuracy of the traditional Kmeans method. e HCFL method improves the K-means method in the selection of the first clustering center by incorporating the distance-based clustering methods and density-based clustering methods [37,38]. e HCFL method is resolved in two stages. e first k cluster centers are decided in Step 1 to Step 6 and the traditional K-means method is executed based on the first k cluster centers in Step 7 to Step 9.
Input: 1. e failed test case set T a � (a 1 , a 2 , . . . , a n ) and T b � (b 1 , b 2 , . . . , b n ), where a n and b n represent the poststandardization code coverage data of the same program entity S i (1 ≤ i ≤ n) under the two test case sets, respectively. T a and T b , respectively; Input 2. e number of k clusters.
Output: k clusters of failed test cases.
Step 1 Calculate distances between any data on the same program entity S i (1 ≤ i ≤ n) in the set T a and T b : Step 2 Calculate the average distance AVE d(T a , T b ) between data objects in the set T a and T b .
e term C 2 n is the number of couples of failed test cases in the set T a and T b .
Step 3 Suppose the distance between T a and T b is withinAVE d(T a , T b ), then T b is considered as a neighboring point of T a . Calculate the set of all neighboring points T a .
e term F (z) is a function according to Step 4 Count and arrange the number of neighboring points of all failed test cases, and select the one with the largest number of neighboring points as the first clustering center. Add the first clustering center TC1 to collection TC, and delete it from TR.
Step 5 Select the test case with the furthest distance from TC as the second clustering center TC2. Add it to collection TC, and delete it from TR.
Step 6 Select the test case with the furthest distance from both TC1 and TC2 as the third clustering center. Repeat Step 6 until the first k clustering centers are contained in the collection TC.
Step 8 Recalculate the clustering centers of each cluster.
Step 9 Repeat Step 7 and Step 8 until the Sum Squared Error (SEE) value of all clusters are unchanged.
e terms x 1ij , x 2ij , . . . , x nij represent the coverage data of the j th(1 ≤ j ≤ m) failed test cases in the i th(1 ≤ i ≤ n) cluster for program P. x 1i , x 2i , . . . , x ni represent the center of the i th(1 ≤ i ≤ n) cluster.
We present a running example of the hybrid clustering method. Suppose Table 3 is a spectrum of a sample program.
According to Step 1, the distance between every two test cases is calculated using equation (5), which is listed in Table 4. e cross-point of the table is the distance between the two test cases in the row and the column. As an example, the distance between t5 and t2 is 2.828, and the distance between t8 and t2 is 2.646. Based on the distances between each couple of test cases, the average distance described in Step 2 can be calculated using equation (6), which is 2.742450. en in Step 3, the density of each test case is calculated according to equation (7). As an example, for test case t1, the test cases where the distance from t1 is less than the average distance are t3, t4, and t5, that is, the density value of t1 is 3.
e density values of all test cases are listed in Table 5. According to Step 4, the test case t3 is selected as the first clustering center TC1 because of the highest density. And, the test case t5 is selected as the second clustering center TC2 because it has the largest distance (3.000) from the test case t3. In Step 5 and Step 6, calculate the distance between the first clustering center t3 and the remaining test case, and the distances between the second clustering center t5 and the remaining test case, which are named d(t3, tx)(x ∈ 1, 2, 4, 5, 6, 7, 8, 9, 10 { }) and d(t5, tx)(x ∈ 1, 2, 3, 4, 6, 7, 8, 9, 10 { }), respectively. e third clustering center TC3 is selected according to equation (9). Suppose the clustering number k is 3. All three clustering centers are demonstrated in Table 6.
Based on the three initial clustering centers, the partition results calculated according to Step 7 to Step 9 are in Figure 5. e three initial cluster centers, t3, t5, and t8 are distributed in the final three clusters after one iteration, which is more convenient for future clustering.
Several studies [39,40] have revealed that the number of clusters is an important factor in the K-means method, which is also applicable to the proposed method. Assume that the number of clusters is less than the number of faults, there may be a cluster containing two or more faults. If engineers stop debugging when locating the first fault, it needs to be re-executed to debug more faults in the program, causing too many iterations. Assume that the number of clusters is more than the actual number of faults, then two or more clusters may contain the same fault. In the parallel debugging mode [27], multiple engineers debug a program simultaneously for multiple faults. After each engineer has found and fixed a fault, the program is retested. If the program still exhibits failures, the debug process is repeated. In this way, the waste of debugging costs caused by the same fault in two subsets is minimized and the executing and debugging interactions are reduced greatly. erefore, in the HCFL method, the cluster number is suggested to be greater than or equal to the fault number.

Faults Locating.
In this phase, each subset of failed test cases is merged with passed test cases to obtain k test case subsets. Test cases that may fail due to the same fault code are partitioned so that the multiple-fault localization process is decomposed into multiple parallel single-fault localization processes. For each test group, calculate the program entity suspicion using equations in Section 2 and check the code in the descending order of the suspicion value until all faults are located. After a round of testing is completed, more than one bug is often discovered and modified. en, the fault localization method is to be executed continually until all faults are discovered and modified.

Case Studies
is section evaluates the proposed fault localization method, including effectiveness and performing costs. However, the application of the fault localization method in embedded software cannot be queried in the existing literature, so it is virtually impossible to select a general embedded software for cross-comparison experiments of different localization methods. Considering that the embedded software is a special kind of software, the fault localization method applied to embedded software should have general applicability except for the acquisition of the program spectrum. erefore, the evaluation experiments consist of two parts. First, conduct the cross-comparison experiments on open-source software to evaluate the effectiveness of the HCFL method. Second, apply the proposed method to the real embedded software to evaluate the operating cost.

Subject Program.
e subject programs Flex, with their accompanying test suites obtained from the SIR library, were adopted for demonstration. Twelve versions of the subject program with different numbers of faults were obtained by artificial fault activation or injection, as demonstrated in Table 7. e following conditions should be met when injecting a fault.
(i) e injected faults must be realistic, which often occur when programming; (ii) e injected faults must conform to the grammar rules; (iii) e injected faults can be tested. Otherwise, it may bring a certain difficulty to the accurate measurement of the fault localization effect.
For each experimental group of the software, the traditional Jaccard, Ochiai, and Tarantula methods, as well as the proposed HCFL method, were used to locate the fault, respectively. e program spectrums were collected with the aid of the GCOV (GNU call-coverage profiler) tool.

Evaluation Criteria for Effectiveness.
e traditional way of evaluating fault localization accuracy is by calculating the percentage of statements in a program that has to be examined until the first faulty statement is reached [41][42][43].
(1) e Average Number of Statements Examined. e term AVE − S represents the total number of the code lines to be examined to locate all faults in program P. e term Count s means the total number of statements that are to be examined according to the list of suspicion values, and the term N represents the number of testing rounds. If (AVE − S) x < (AVE − S) y , we define method X to be more efficient than method Y.
Assume that a slice of nonfaulty statements has the same suspiciousness as the faulty statement. If the statement examined first is exactly the faulty statement, we define this condition as the best case. If the bug is not found until the last statement has been checked, we define this condition as the worst case. It is inferred that in the worst case, we have to examine all the nonfaulty statements with the same suspiciousness as the faulty statement. If we examine some nonfaulty statements but not as many as the worst case, we define it as the average case. erefore, it is proposed to calculate the average number of statements examined in all the three cases.
(2) e Average Expense Value. e average expense value means the average number of statements examined AVE − S as a percentage of the total executable lines of code (LOC). e smaller the value of expense is, the better the multiplefault localization method performance will be.
(3) e P Value. In statistics, when the data conform to the normal distribution and the homogeneity of the variance, the parameter tests, such as the u-test and the t-test, are commonly used. However, when the data do not conform to the normal distribution or the unevenness of the variance, the nonparametric test, such as the Wilcoxon signed-rank test method, is required [44]. To prove that the HCFL method is more effective, the difference between the number of statements that need to be examined using these methods is computed. We have proposed a one-tailed assumption that other methods require more statements to be examined than HCFL. e P value reflects the significance level between the two groups of results, and P < 0.05 is indicative of a significant difference between the fault localization effectiveness of these two strategies.
As a whole, it can be considered that the fault localization effectiveness of method A is better than that of method B when the following two conditions are met: (i) e average number of statements examined by method A is less than method B, or the average expense value of method A is less than method B (ii) e P value of methods A and method B is less than 0.05 Tables 8 and 9 demonstrate the AVE − S values that need to be examined when locating the first bug and all bugs using the Jaccard, Ochiai, Tarantula, and HCFL methods in the best, worst, and average cases. For instance, in experiment Group1, the average number of statements that need to be examined was 5 when locating the first bug by the Jaccard and Ochiai methods in the best case, while the AVE-S value was 1 by the Tarantula method and our HCFL method.

Results and Analysis
Referring to the data in Table 8, the AVE-S value by the HCFL method is smaller than that by Jaccard and Ochiai methods when locating the first bug in the program. However, we have found that in all the 12 experiment groups in the best case, the AVE-S values using the Tarantula method are all 1, which are better than the HCFL method.
is can be analyzed using the equation of Tarantula. In the best case, all the faulty statements are covered by failed test cases and were not covered by passing test cases. erefore, the values of N ep (s) and N np (s) are 0, and the suspiciousness scores of all faulty codes are 1 as calculated by the Tarantula method. In the best conditions, the statement that is debugged first is the faulty code.
In Table 9, the AVE-S value by the HCFL method is very much smaller than that by the Jaccard and Ochiai methods when locating all bugs in the program. However, the AVE-S values in Group 2, Group 5, Group 8, Group 9, and Group C by the Tarantula method are 3, which are better than the HCFL method. is is due to the fact that the suspiciousness scores of the three faulty statements are 1, and are checked in the first three places. Anyway, in the worst and average cases, the AVE-S values of the HCFL method are much smaller than the Tarantula method, proving that the efficiency of the HCFL method is generally better than that of the Jaccard, Ochiai, and Tarantula methods. e performance comparison among the Jaccard, Ochiai, Tarantula, and HCFL methods can be revealed intuitively by the expense value in Figures 6 and 7. e bar chart in Figure 6 demonstrates the average value of expense when locating the first bug, named Expense-first, in the best, worst, and average cases. Similarly, the bar chart in Figure 7 demonstrates the average value of expense when locating all the bugs in the three cases, named Expense-All. e red bar represents the expense value when using the HCFL method,  Group  1  2  3  4  5  6  8  9  A  B  C  D  LOC  3453  3453  3453  3453  3453  3453  4008  4008  4035  4034  4035  4035  Table 8: Comparison of AVE-S value when locating the first bug in three cases. 1  2  3  4  5  6  8  9  A  B  C  D   Jaccard   Best  5  5  985  5  985  69  1  1  1397  1  1  1397  Worst  363  142  987  363  987  108  130  130  1400  15  15  1400  Average  184  73  986  184  986  88  65  65  1398  8  8  1398   Ochiai   Best  5  5  958  5  958  69  1  1  1336  1  1  1336  Worst  363  142  960  363  960  108  130  130  1339  15  15  1339  Average  184  73  959  184  959  88  65  65  1337  8  8  1337   Tarantula   Best  1  1  1  1  1  1  1  1  1  1  1  1  Worst  652  171  746  651  746  195  196  196  394  549  577  394  Average  326  86  373  326  373  98  98  98  197  275  289    while the blue bar in each line represents the expense value when using the Jaccard, Ochiai, and Tarantula methods. It appears that the shorter the bar chart, the better the fault localization efficiency. e comparison results have inferred that the fault localization efficiency of the HCFL method is better than that of the Jaccard and Ochiai methods, but slightly lower than that of the Tarantula method when locating the first bug; however, when locating all the bugs, the efficiency of the HCFL method is higher than that of the Jaccard, Ochiai, and Tarantula methods in most of the groups, especially in the worst and average cases. Compared with the data in the best cases, the comparison results in the worst and the average cases can better reflect the process of software fault localization. erefore, from the comprehensive comparison results of Figures 6 and 7, the superiority of HCFL in fault localization efficiency is demonstrated.

Methods and cases
Tables 10 and 11 revealed the P value of HCFL versus Jaccard, HCFL versus Ochiai, and HCFL versus Tarantula when locating the first bug and locating all the bugs, respectively. We could see in Table 10 that the P value is 1 in some groups, such as Group 8 and Group 9 in the best case, Group B, and Group C in all the three cases. Recalling the values in Table 8, this is because when locating for the first bug, the average number of statements examined by the three methods is the same, that is, the first bug had been located when debugging the first statement. In other groups, the P value is less than 0.05. In Table 11, the P values in all experimental groups are less than 0.001, revealing that the fault localization effectiveness of the HCFL method is extremely different from the Jaccard, Ochiai, and Tarantula methods.
Overall, the results of Experiment 1 indicate that the average number of statements examined by the HCFL method as well as the average expense value of the HCFL method when locating all the bugs is much less than the Jaccard, Ochiai, and Tarantula methods in most of the cases. e P values of HCFL versus Jaccard, HCFL versus Ochiai, and HCFL versus Tarantula are much less than 0.05 in most of the cases. We believe that the fault localization effectiveness of the HCFL method is much better than that of the Jaccard, Ochiai, and Tarantula methods when locating the multiple faults.  Table 12. e Main_control program was used to carry out the central control function of an application system. erefore, the logic of the program was relatively complex, with many branches and judgment statements.
e Main_control was implemented in C++ language and executed on TMS320C6000, and the number of instrumentation points of the Main_control was 2489. Data_commu was a program of an application payload implementing the data interaction function, which was also written in C++ language and worked on TMS320F2812. e number of instrumentation points of Data_commu was 737. e program Data_process was the software part of a SoPC (System on Program Chip), which cooperated with the programmable logic part to complete the function of command receiving and command parsing. Data_process was implemented in C language and executed on Pow-erPC405 core of Xilinx FPGA, with 237 instrumentation points.

Evaluation Criteria for the Operating Cost.
e software instrumentation to obtain the program spectrum may result in a decrease in software performance. In this experiment, the running time of each software program under the following three testing scenarios is statistically compared: (i) e run-time of the software program before instrumentation (ii) e run-time of the software program using the spectrum-acquiring method is based on real-time transmission (iii) e run-time of the software program using the cache-based spectrum-acquiring method In the second testing scenario, output channels are needed for transmitting the spectrum data in real-time. According to the characteristics of each software program, we used the SPI (Serial Peripheral Interface) port to transmit the program spectrum of the Main_control program, and the serial port for spectrum data transmission for the Data_commu and Data_process program.
Taking program Main_control as an example, the statistical method of the program run-time was as follows. e counter value of the timer Timer0 of the DSP was used to calculate the program run-time, with a timing accuracy of 0.02 microseconds. e register value of Timer0 was set to the upper limit at the beginning of the program to ensure that no overflow occurred during the statistical period. e control register was set to start the timing at the beginning of the program, and the register value of Timer 0 was read at the end of the program. e run-time of the program T runtime was calculated as follows: T runtime � Timer 0 upper − Timer 0 end * T accuracy , (13) Timer 0 upper means the upper limit value of the Timer 0 register, and Timer 0 end means the register value of Timer 0 read at the end of the program.

Results and Analysis.
e maximum, the minimum, and the mean run-time were recorded as in Table 13  From the data in Table 13 and Figure 8, it can be concluded that no matter which program-spectrum acquisition method is used, the code instrumentation for acquiring the program-spectrum increases the run-time of the program. Among them, the average run-time when using the method based on the real-time transmission is more than twice that of run-time without instrumentation, while using the cache-based program spectrum acquisition method proposed in this paper, the run-time of the program is slightly more than twice that of without instrumentation.
As the basis of the fault localization method, the acquisition of program-spectrum is the bottleneck that affects the performance of the whole fault localization method for software, especially for embedded software. Using the cachebased program-spectrum acquisition method, the acquisition time of the program-spectrum for each test case is shortened by half, which improves the efficiency of the fault localization method.

reats to Internal Validity.
e internal validity threat involves the causal relationship between the independent and dependent variables provided by the experiment. e specific implementation of the clustering method and test script code in Section 4 may have some defects, which may affect the experimental results. To ensure the correctness of the specific implementation, the manually written code was strictly reviewed and sufficiently tested.

reats to External Validity.
e primary external threat of the experiment results lies in the selection of the object program, which has limits in software scale and number of faults. erefore, we carried out empirical experiments on the application of embedded software and strengthened the confidence of the actual engineering application.
Besides, the quality of the test cases also has a certain impact on the software fault localization. A positive test case can expose as many faults as possible in the software. Efficient test cases should cover as many statements, conditions, decision conditions, and combinations of conditions as possible. e objects and test cases in Experiment 1 are the open-source widely used in related research and have certain    representativeness [45][46][47][48]. In Experiment 2, the Equivalence Partitioning analysis, Boundary Value analysis, Decision Table analysis, and other methods were used in test case design to ensure the sufficiency of test cases. e final versions of the three programs in Experiment 2 had gone through multiple rounds of testing and were successfully applied to products. erefore, the sufficiency of the test suite set can be guaranteed, and software bugs can be covered by the test suite.

Conclusion and Future Work
With the increase in the size and complexity of embedded software, the traditional human-based fault localization method is no longer applicable; thus, an efficient fault localization method is urgently needed to help engineers for debugging. is paper proposes a special fault localization method, especially for embedded software, and improvements have been made in the following two aspects compared with previous researches. Firstly, adaptive modification has been made in acquiring program-spectrum schemes according to the characteristics of embedded software, which greatly promotes the efficiency of the acquisition of program-spectrum. Secondly, the traditional K-means clustering method is improved by using a hybrid clustering method based on density and distance to find out the initial clustering center, which improves the efficiency of clustering. Several experimental results demonstrate that the proposed fault localization method can accurately locate multiple-bugs in embedded software, which saves the debugging time for engineers.
In future work, more experiments on large-scale software programs with much more faults will be performed, strengthening the accuracy of the clustering method to localize multiple-faults at a lower expense. Besides the study of fault localization methods in these specific testing fields, applications of fault localization methods on other execution platforms of embedded software should also be concerned. e scene features of these new platforms will be analyzed to make full use of the existing research results of fault localization, proposing high-quality solutions.
Data Availability e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.