Experimental Evaluation of FSM-Based Testing Cost for Time-Delay Systems

With time-delay systems arising, time-delay system testing has attracted much attention. Additionally, evaluating the cost and effectiveness is required to make a good test strategy in practice. In this paper, we take time-delay and other five factors (state number, input number, output number, completeness degree, and accessibility degree) into account and present a timer embedded FSM (TEFSM) model to design a comparative strategy for assessing the coverage criteria and test suites generation methods for time-delay systems.We explore the impact on the average length of test suites, in which the test suites generationmethods, coverage criteria, and TEFSMmodel parameters are involved.


Introduction
In real world, time-delay exists in various engineering systems such as chemical processes and long transmission lines in pneumatic systems; the study of time-delay systems has gained considerable attention over the past years [1,2].
The attendant increase in time-delay system complexity has made the timely development of reliable systems extremely challenging.To ensure the quality and reliability of time-delay systems, model based testing technique has been used widely.El-Fakih et al. present a method for deriving test suites with the guaranteed fault coverage for deterministic possibly partial timed finite state machines (TFSMs) [3].Abramovici and Stroud present a delay-fault testing approach for Field Programmable Gate Arrays based on built-in self-test (BIST) [4].And many conformance test derivation methods are based on a specification given in the form of a finite state machine (FSM), such as  [3,5], partial  (  ) [6,7], HIS [8], and  [9] test derivation methods.
However, the project schedule is always tight and it is impossible to test a system completely.Thus, the efficiency of testing method becomes a key factor for ensuring system reliability.There are some researchers studying the evaluation of the effectiveness of system testing.Briand [10] presents a critical analysis of empirical research in software testing and provides a structured overview and discussion of the validity issues most commonly encountered while performing empirical studies of testing techniques.It gives us some guides to do experiments and analyze the validity of our research at last.Besides, there are some researchers exploring methods in the field of empirical research.For example, the literature [11] based on state charts proposes a precise simulation and analysis procedure and investigates the cost effectiveness of four adequacy criteria: all transitions (AT), all transition pairs (ATP), and all the paths in the transition tree (TT).However, they do not have a further analysis on the effect of the parameters of FSMs to the derivation methods.Testers cannot determine derivation methods to use.Moreover, there are no papers to research these in combination (generation methods to coverage criteria, parameters to generation methods/coverage criteria).
Based on the previous discussion, aiming at decreasing the cost of time-delay systems, we present an experimental evaluation method for obtaining minimum test suite for time-delay systems.This method incorporates recent advances in THE literature [10][11][12].We take time-delay and other five factors (state number, input number, output number, completeness degree, AND accessibility degree) into account and present a timer embedded FSM (TEFSM) model firstly.Moreover, by taking time-delay as an element of input, the TEFSM can be converted as normal FSM model, and this makes the evaluation task much easier.Furthermore, the goal of our investigation is focused on minimizing the length of test suite since the minor the test sequence is, the less the system cost is.We present a novel method to evaluate the efficiency of FSM-based testing method and explore the situation that a test sequence generation method may have relatively good effect for a particular or some coverage criteria and how the parameters of a FSM model affect the FSM test suite.Namely, we investigate the relationship between generation methods, coverage criteria, FSM's parameters, and testing cost.
The rest of the paper is organized as follows.Section 2 describes some fundamental concepts such as FSM, TEFSM, and some classic coverage criteria of FSM-based systems.Section 3 presents the research method to covert TEFSM as FSM and decrease the size of test sequence for saving test cost.Section 4 presents experiments and analyzes the results.Finally, this paper makes a conclusion and presents directions for the further research.

Related Concepts Definition
In this section, some basic concepts are described which will be used in the proposed approach, and TEFSM model is especially described.

FSM Model.
Formally, an FSM is a system  = (, , ,  0 , , ) where  is a finite input character set,  is a finite output character set,  is a finite state set,  0 ∈  is an initial state, is a transition function, and is a output controlled function [3].
If each state of one FSM respondes to all the inputs, tis is a complete FSM (CFSM); otherwise, it is a partial FSM.That is, if for all   ∈ ,  ∈ , ∃  ∈ , (  , ) =   , it is CFSM.
If one state of a FSM reaches different states for the same input, this is a nondeterministic FSM; otherwise, it is a determined FSM.That is, if ∃  ,   ∈ ,  ∈ ,  ̸ = , (  , ) = (  , ), it is nondeterministic FSM.
If the states of FSM are pair-wise distinguishable, this is a reduced FSM.Otherwise, it is nonreduced.That is, if ∃  ,   ∈ , for all  ∈ ,  ̸ = , (  , ) = (  , ), where  indicates a set of inputs sequences, it is nonreduced FSM.

TEFSM. TEFSM model is an extension of FSM
; it is used for dealing with time-delay system.In this model, a timer is embedded as it is called timer embedded FSM model.Timer means delay time of a state, time-delay is considered as a factor during state transition, timers and entry actions are associated with states, conditions and actions are associated with transitions, and time is implicitly associated with inputs and outputs [12].The state transition occurs automatically after timeout unless it receives further inputs before timeout.Given a TEFSM as   = (  , , ,  0 , , ), in which   indicates input that contains a time-delay, a pair of (, ),  ∈  and  is a nonnegative rational and is a time-delay of state transaction, and input  is applied after time .

Coverage Criteria.
Coverage criteria rule the sufficient conditions for deciding when to end the testing process; that is, one test suite must satisfy one kind of testing requirements [13].If  indicates one coverage criterion and  indicates a test suite, TR  () is the testing requirements set of a given  for criterion , and TS  (, ) is the requirement sets satisfied by a test suite .Obviously, it is the subset of TR  ().  (, ) indicates the coverage rate of ; that is   (, ) = |TS  (, )|/|TR  ()|.If TS  (, ) = TR  () or   (, ) = 1,  is a test suite adequate for the criterion  of . Figure 1 is an example of FSM model which has four states  0 ,  1 ,  2 ,  3 , respectively, and eight transactions; it will be used for the following coverage criteria explained in this section.
(1) State Coverage (SC) Criterion.Cover all states of ; hence, TR SC () = , where  is the state set of  and TS SC (, ) = { |  ∈ }.For the FSM in Figure 1, there are four states  0 ,  1 ,  2 , and  3 , and the test suite {, } is SC-adequate for all the four states that could be covered with this test suite.Note that the initial state is reachable with the empty transfer sequence, while the prefix  of the test  is a transfer sequence to state  1 .
The test suite {, , , , } is TC-adequate for Figure 1 because this test suite could cover all the 8 transitions.
(3) Initialization Fault (IF) Coverage Criterion.Avoid confusing other states with the initial state, this is for checking whether the behaviors of other states are same as the initial state, denotes as TR IF () = { ∈  |  ̸ ≃  0 }.There might be no states that can be distinguished from the initial state [11].However, for a minimum , there are states which are different from the initial state.We can express TS IF (, ) formally as follows: It indicates that there are some sequences in  to distinguish the initial state from other states.
For the TEFSM in Figure 1, the test suite {, , } is IF-adequate.We know that the input sequence  is a transfer sequence to state and is followed by , which distinguishes state  1 from the initial state.Similarly, the sequences  and  also satisfy the requirements related to state  3 and  4 .
(4) Transition Fault (TF) Coverage Criterion.Detect whether the output or the reachable state of one transition is correct to certain input and whether the ending state could be distinguished from other states [14].We can express TR IF () formally in mathematics as follows: TS IF (, ) defines whether there are some sequences in  that could distinguish the ending states of all transitions of  from other states.We can express it formally as follows: For the example TEFSM in Figure 1, the test suite {, , } is TF-adequate.Consider, for instance, the transition ( 3 , ) whose tail state is  1 .Test  covers this transition.States  1 and  2 are distinguished by , which follows  and ; states  1 and  3 are distinguished by , which follows  and ; and states  1 and  0 are distinguished by , which follows  and empty sequence.Thus, all requirements related to transition ( 3 , ) are satisfied.One can check the other requirements that are also satisfied.

Test Sequences Generation Methods.
There are many methods reviewed in Section 5 of the literature [15].This part describes some test sequences generation methods.
(3) -Method.It utilizes a separating family from the identifiers of each state to harmonize these identifiers [8].For the example of Figure 1, we can obtain the separating family set  = { 0 ,  1 ,  2 ,  3 }, where  0 = {, },  1 = {, },  2 = {, }, and  3 = {}.So, in the first phase, we get     ; in the second phase, we compute Besides, there are methods such as UIO presented in literature [8], DS, and  proposed in the literature [16].And also some researchers think about the derivation methods of nondeterministic FSM.

Conversion between TEFSM and FSM.
According to the definition of TEFSM presented in Section 2.2, TEFSM diagram could be described as in Table 1.A number of symbols are used in order to increase readability: "?" symbol for input; "!" for output; "%" for probability; "∼" for condition and probability; and "Δ" for time-delay [12].Figure 2 is an example of TEFSM model, which has three states named  0 (initial state),  1 , and  2 .
There are transactions between each two states with different transaction conditions.Δ 1 and Δ 2 indicate the minimum delay time from state  0 to state  1 with transition conditions  and , respectively.Set   =  ∪ ( > Δ 1 ) and   =  ∪ ( > Δ 2 ); then, the TEFSM in Figure 2 can be converted as Figure 3, which is now a normal FSM model.
In this way, theory of normal FSM could be applied for TEFSM model reasonably.

Testing Cost.
Testing cost is a main aspect to evaluate testing technique performance.Briand presents that the testing cost involves two dimensions at least [10]: human effort and machine CPU time.In addition, Namin and Andrews describe the relationship among three properties of test suites: size, structural coverage, and fault location effectiveness [17].
We consider a technique more effective if it could generate minimum test suite and be highly active; hence, we make  use of the length of test suites as a measure of the cost.The length of test suites means the total length of test cases within the test suite.In the literature [4], it does not consider the accessibility degree and only changes one factor to observe the experiments each time.Here, we set five parameters of FSM model as the effect factors: the number of states, the number of inputs, the number of outputs, the completeness degree, and the accessibility.

Evaluation Method.
This paper mainly focus on three test derivation methods (,   , HIS).-method, as the most classic method, should be taken into consideration. method improves the -method for reducing the number of test suites and thus should be considered.HIS-method is the method used in the literature [4,14], so it is necessary to consider this method here for comparison.Moreover, we use the coverage criteria as those considered in the literature [13]: state coverage (SC), transition coverage (TC), initialization fault (IF), and transition fault (TF) coverage criteria.When an FSM is the specification for testing, tests covering an FSM specification target one or several elements such as inputs, outputs, states, and fragments of its transition graph.Paths are typical fragments of the transition graph considered for coverage; therefore, path coverage has to be taken into account.We choose the traditional TC criterion.The latter two criteria (IF, TF) are proposed to direct at the fault models.IF criterion finds that the only possible faults in FSM implementation are related to a wrong initial state of a specification FSM, while TF criterion states that implementation faults occur in transitions.The four criteria could be representative of criteria for the FSM model.
For evaluation method, there are three ways.
(1) Similar to the method proposed in the literature [1], generate a test set which is adequate for all coverage criteria as a test pool first; then, minimize it to get the minimum test set which is adequate for one specific coverage criteria.
(2) Generate the test set which satisfies each kind of coverage criteria by generation method directly.
(3) According to the relation among the coverage criteria shown in Table 3  an incremental method.Because we can obtain the test suite adequate for some coverage criteria when utilizing the generation methods, so it is better to use this incremental method if the coverage criterion is very strict and one generation method is not able to generate the test set adequate for this coverage criterion.We use the first evaluation method in this paper, which we are coming to illustrate in Section 4 in detail.
After generating the final test set, we analyze the experimental data using the orthogonal testing method.The orthogonal test is useful when many factors influence one thing.Because at that situation the inputs would be many values, it would be effective through choosing a suitable orthogonal table that has advantage of equilibrium and dispersion.In this study, we choose the orthogonal table containing four levels and five factors which actually can be written as  16 (4 5 ), which is shown in Section 4.2 in detail.The whole evaluation process is a general process for assessment as well.We can choose a way to solve the problem according to the situation.Figure 4 describes the process in detail.
How can we select a minimum test suite   from  to satisfy the test coverage criteria?The literature [13] applies greedy algorithm to deal with this problem.Test suite selection for SC/TC criteria is still a weighted set-cover problem, while the selection for IF/TF criteria is defined as set-cover with pairs.However, for the definition of IF criterion, we can consider the selection of the minimum test suite for IF criterion as the weight set-cover problem as well.
We also define the cost () of a sequence  in test suite  as || + 1, that is, the length of the sequence plus the reset symbol used to bring the FSM back to the initial state.
In the literature [13], every time pick up the min ratio between the cost and coverage increments induced by  in   , that is, ((  ∪ {}) − (  ))/|TS SC (,   ∪ {})/TS SC (,   )|.However, it may generate redundant test sequences.For example, in the case of SC criterion, the FSM model  shown in Figure 3 has three states  0 ,  1 and  2 .The initial test set is: , .The first sequence "" could cover state  0 and state  1 with min ratio = 3/2 = 1.5.So, we can add it to   , and then check the second sequence "", because the second input can make the system reach state  2 , which is not covered before, the ratio = 3/1 = 3 > min ratio, so it should be ignored in this round.For the second loop, add "" to it; as a result, there are "" and "." In fact, "" is enough to satisfy the SC criterion.
To deal with this problem, we present a novel algorithm MinTSByWeight (minimum test suite by weight), which determines test sequence by weight.The algorithm is mainly based on the theory that the minor test suite it has, the less it costs.We do not need to consider time-delay in this algorithm since we already presented a method to covert TEFSM as FSM by taking time-delay as an element of input.
As shown in Algorithm 1, there are 3 inputs: FSM model, initial test set , and test coverage criteria ; and one output   , which is the minimum test sequence of test object  and satisfies the coverage criteria .At step (3), the weight of each test sequence or each pair test sequence is calculated based on the formula ()/|TR  ()|.In steps (4) to (7), select the minimum weight from the weights of all sequences or pairs and then add such sequence or pair "" or "1, 2" into the result set   with steps (8)∼ (10).From the second round, the weights of the remaining sequences or pairs need to readjust in the initial test set by removing the cover points in  or 1/2 and then go to the select min weight as before.Cycle

Max test suite
Min test suite

Coverage criteria
Full test suite

Statisticsorthogonal testing method
Analyze results this process until the selected sequences are adequate for the criterion or having checked all the sequences in the initial set .

Coverage Criteria for Different Method
Premise.FSM is deterministic and completed finite state machine.When the generated FSM is nonreduced, the program does not proceed.In other words, although the automatic process of generating FSMs may produce nonreduced FSMs, this kind of model is not considered in the experiment.Therefore, we generate deterministic and completed models directly.
Coverage Criteria and Generation Method.This method includes average length of the test suite satisfying four coverage criteria (SC/TC/IF/TF) generated by -method/ method/HIS-method Experiment Application Scenario.Considering that in practical applications, it is impossible to model the whole system, but some important parts, and the experiment contains eight experimental groups and sets the number of states: 3, 6, 8, 10, 12, 14, 16, and 20.Besides, we set the number of inputs to be 2, 5, and 7, respectively, and the number of outputs to be also 2, 5, and 7, respectively.Each group automatically generated 50 FSMs.
Experiment Result.The results of the three types under the three generation methods are shown in Tables 2, 3, and 4.

Result Analysis from Test Suite Length. Figures 5(a) and 5(c)
show that the test suite length is, in low degree, affected by the generation methods.Almost all groups can obtain the sequence-suites with nearly the same length in average.Figure 5(b) shows that the gap between   -method and HISmethod under TC-adequacy is small, and -method can generate a test set with shorter length.Figure 5(d) shows that the impact trend of these three methods on the TF-criterion is similar, but the average length of test suite with TF-adequacy generated by -method and HIS-method is relatively longer.Figure 5(e) shows the difference of the initial test suite average length among the three generation methods.The -method generates the longest test suite length, and the length gap of suite derived by HIS-method and   -method is insignificant, but   -method could generate less test suite because it does not need to harmonize the state identifiers.Moreover, method generates shorter TC-adequacy test suite and TFadequacy test suite as well as shorter initial test suite.
Result Analysis from Effort.When choosing the minimum set from a large pool suiting to each coverage criteria, the reduction rate is different.If the reduction rate is low, we could directly use the initial test suite instead of choosing the minimization set to improve the test efficiency, because the SC/IF criteria can be covered by fewer test sequences, while the TC/TF criteria need more test sequences to be covered.We only consider the latter two criteria under different derivation methods.The impact of each derivation method on the TC-criterion is shown in Figure 6(a), and the impact on TF-criterion is shown in Figure 6(b).
From Figure 6, we know that choosing the minimization test suite needs the largest effort.For the TC-criterion, Figures 5(b) and 5(e) show the reduction intensity to remove the useless sequences and Figure 6(a) reflects the high reduction rate, while there is little difference of the reduction rate between the other two generation methods (  -method and HIS-method) when choosing the minimum set to satisfy the TF-criterion.However, as for TC-criterion, when the number of inputs is small, the gap is small.But, when it is large, there is a clear gap.For example, if the groups of 7, 8 and 9 (the number of states, the number of inputs, and the number of 22 outputs) are 8/2/2, 8/4/7 and 8/5/5, respectively, in group NO. 7, the results selection rate is 0.62637 for   -method and 0.7059 for HIS-method, while in the NO. 8 the results selection rate is 0.31609 for   -method and 0.6409 for HISmethod.So, if the input number is large,   -method can be used to improve the quality of the test sequences.

The Impact of FSM Parameters on Coverage Criteria.
Consider five factors: the number of states, the number of inputs, the number of outputs, the completeness degree, and the accessibility degree.From the results above, for CFSMs, the minimization cost of   -method is the minimum.Hence, here, we choose   -method as generation method.
Using the orthogonal experimental method, we choose some representative samples from all level combinations for   experiments and then find out the optimal level combination by analyzing the results of these samples to comprehend experiments completely.In this experiment, there are 4 levels and 5 factors.Each group generates 500 FSMs automatically.We control the level of the number of inputs and accessibility degree related to the number of states to make the FSMs more similar to the reality.As one level among these factors is on the same situation, though the level values among each factor may be different, the control would not lead to the inequality among the factors.
Table 5 shows the levels for each factor and Table 6 shows the design scheme (number in brackets indicates level number).
Experimental Results Analysis.Applying the range-analysis method, we can obtain some values such as 1 = 1181.750,2 = 713.500,3 = 676, 4 = 374.000,and 5 = 402.5;from these range values, we can easily know 1 > 2 > 3 > 5 > 4.Also, the factor of the number of states has the greatest impact on the test sequences length, followed by the number of inputs and the number of outputs (outputs will affect the verification of transitions).By the orthogonal design tool, we can get the effect curve shown in Figure 7(a).The effect curve describes the impact for SC-criterion as shown in Figure 7(b).
Figure 7 shows that the number of states for the average length has a great impact, while the other three parameters have little effect on the average length of the test suites which satisfy the SC-criterion.The number of inputs has a positive influence on the average length as Figure 7(a) shows.And the number of outputs increased results in the growth of the length of the test sequences, because the more the inputs, the shorter the length of characterization sequences and the fewer the number of such sequences.As each derivation method is closely related to the sequences reaching each state, the impact of accessibility degree emerges as a wave form but is not positively correlated.Figure 7(a) shows that when the accessibility degree () is 0.8 times as long as the maximum length (the number of states − 1), the test suite length nearly reaches the maximum.When it equals the 60 percent of the maximum length, the test suite length nearly reaches the minimum.After all,  represents the length of each sequence reaching one state.coverage criterion is TF-criterion, we can use   -method directly and do not need to select the minimal criterion adequacy test suite.Moreover, if we only need to satisfy TC-criteria, we can make a trade-off and consider the HISmethod.Because   -method is influenced by the number of inputs of FSM in the process of selecting the minimum test set for TC-criterion, and in case the number of test sequences derived by HIS-method is more than that derived by  method, the rate of the sequences needing to be removed is relatively stable.(b) One of the FSM model parameters, the number of states (), has a most significant impact on the length of the test suites.With the number of states increasing, the length increases much faster.In addition, the increase of the number of inputs will cause an increase in length, but the change rate is lower than that in the situation of increasing the number of states.And as the number of outputs increases, the length decreases.Because the increase of the outputs makes the length of the sequences distinct, the states are short and the number of distinguishable sequences is limited; and when the accessibility degree is up, the length changes as a wave form.So, when designing the complete FSM models, in the conditions of the same number of states, inputs, and outputs, if we try to make the accessibility degree be 0.6 times as long as the maximum length (the number of states − 1), the length of the generated test set will be fluctuating around the minimum.

Threats to Validity.
The FSM considered in the experiments is generated randomly, and we only consider the complete and deterministic FSMs together with time-delay factor, so we could not make sure these would be close to the FSM specifications in the actual project.Therefore, the conclusions drawn from the random FSMs may not be well applied in the actual situation.In the future work, it needs further validation in real situation and to extend our study on the nondeterministic FSM and the partial FSM.
Measuring the cost of testing in such an experimental context is also a challenge.Briand presents that the testing cost involves at least two dimensions: human effort and machine CPU time, details of which are in Figure 3 of the literature [10].It is rather clear that test suite size, regardless of how it is measured, is a very rough cost measure.Moreover, the literature [11] use the cumulative length of a test set to measure the cost for it's easy to count.Though many follow the simple way to measure the cost, it would be better to consider other factors.
We can assess the ability of the test criteria not only by the test suites average length but also through the malfunction versions generated by the use of mutation operators from a specification FSM together with time-delay.In this case, we compare their variation scores that measure the number of variants killed and use it to assess the fault detection capabilities of various test suites generation methods.However, it is only used for finding the faults.From here, it can be reflected that the former (the test suites average length) mainly evaluate the cost and the later (variation scores) mainly evaluate the effectiveness.Whether there are some valuable suggestions on the problem of finding the balance point of cost-effectiveness is not probed here.

Conclusion
The main contribution of this paper is that we propose a comparative strategy for assessing the coverage criteria and test cost for time-delay systems test.A TEFSM model is adopted for time-delay system's test suite generation.In the future, if we want to assess the coverage criteria or the test suites generation methods that are not applied in this paper, the coverage criteria or the test suites generation methods can be added directly to the comparative architecture.Besides, we take many factors of the FSM model (FSM state number, input number, output number, completeness degree, accessibility degree, and time-delay which are not considered in the literature [13]) into account.Through empirical study, we explore the test suites generation methods, coverage criteria, and FSM model parameters in terms of the influence of the average length of test suites.Based on these relationships, test engineers can make a good test strategy on the part of the conformance testing based on the TEFSM model.For example, when testing, if the coverage criterion is TFcriterion, we can use that directly for its relatively minimum cost.When designing the complete FSM models, in the conditions of the same number of states, inputs, and outputs, and if we try to make the accessibility degree be 0.6 times as long as the maximum length (the number of states − 1), the length of the generated test set will be fluctuating around the minimum.In industrial practice, one may design FSM models for critical components and automatically obtain test sequences by suitable methods.If, based on the requirements, we could judge in which situation the test cost is lower than others' cost, one could know which method to choose or how to change the models to be better.However, due to the random generation of the FSMs for experiments, we only consider the complete and deterministic FSM; it may vary with the actual situation.Next, we will extend our work on the nondeterministic FSM and the partial FSM.It requires further validation in more practical situation.The literature [13] explores the correlation between the structure testing and the model testing.We will benefit from the efforts, if we solve these problems, such as, how to extract the commons of the testing phases, understand the kinds of errors detected by each of the various stages, and use the various test adequacy criteria and test suites generation methods reasonably to test efficiently.

Figures 5 (
Figures 5(a)∼5(e) show impact of kinds of methods to 4 criteria.

Figure 5 :
Figure 5: Impact of kinds of methods to criteria.

Table 1 :
Conventions for constructing a TEFSM diagram.

Table 2 :
The experimental data of different coverage criteria for -method.

Table 3 :
The experimental data of different coverage criteria for   -method.

Table 4 :
The experimental data of different coverage criteria for HIS-method.
(b) Efforts of minimizing test suite satisfied TF Criterion

Table 5 :
Levels for each factor.

Table 6 :
(a) Experimental design and data recording.(b) Parameter in different level.