Multiobjective Genetic Algorithm for Class Testing using OCL Class Contract Specifications: A Framework

Modeling software, the class operation standard and Object Constraint Language (OCL) semiformal language). The presented research achieved two main goals: (1) automation of testing process and conformance to standards of the current technique of test sequence generation, bridging the gap between the research and industry; (2) improvement in the state of the art approach through the application of multiobjective genetic algorithms (MOGAs). A case study along with the results achieved through the proposed technique is presented as well, clearly reflecting the significance of the proposed research.


Introduction
Speci cation-based testing refers to the area of software testing where software is tested against its speci cation. It is functional black-box testing where software is tested on its interfaces to validate the documented requirement specications. It deals with generating test suites from the software speci cations, executing the test case scenarios against the actual software, and then checking the results against test oracles. Speci cation-based testing takes advantage of building the testing environment of software prior to its existence [1][2][3][4]. In model-based testing, software is represented in terms of models. Uni ed Model Language (UML) is one of the known modeling techniques which models the software in static (e.g., class diagrams) and dynamic (e.g., sequence diagrams) structures. Model-based development lets the engineers focus on domain speci c issues comparable to the technical issues of software development. e support of existing tools makes model-based development and model-based testing more attractive. e available tools help the engineers to model the software, transform software models from one representation to another, and generate abstract test cases from the software model. ese abstract test cases can then be transformed into actual executable tests [2,5]. Traditionally, GAs have been used as a search heuristic in finding the optimal set of solutions for single objective problems. Recent advances in the field suggest the usage of GAs for multiobjective optimization [5]. In principle, multiobjective GAs (MOGA) are the same as GA based tools, but the potential solutions are evaluated for multiple parameters, and their fitness values are evaluated by multiple fitness functions. e MOGA evolution process involves the comparison of multiple fitness values for candidate chromosomes. Multiobjective optimization is particularly useful in the problems where the optimization of objective is not possible without compromising the quality of competitive objective(s). e generated solutions are referred to as Pareto optimal solutions. Test sequence optimization involves trade-off between testing cost and achieved test coverage, so the process is a strong candidate for multiobjective optimization [6]. In this paper, the proposed framework focuses on the following objectives: (1) to improve the unit testing of class models using OCL class contract specifications in compliance with industry standards, (2) to automate the test sequence generation process, (3) to reduce the states from infeasible test sequences, and (4) to improve the test coverage [7,8].
e presented research achieved two main goals: (1) automation of testing process and conformance to standards of the current technique of test sequence generation, bridging the gap between the research and industry; (2) improvement in the state of the art approach through the application of multiobjective genetic algorithms (MOGAs). A case study along with the results achieved through the proposed technique is presented as well, giving clear reflection of the significance of the proposed research. e rest of this paper has following organizational structure: Section 2 presents a review of existing techniques; Section 3 is all about the proposed framework; Section 4 presents experiment, results, and discussion; and Section 5 concludes the research.

Literature Review
In this section, a review of existing methodologies is presented. e review is split into two specializations, i.e., test sequence generation and test sequence optimization.

Test Sequence Generation.
Generation of test sequences/ test cases is among the challenging tasks for a test engineer. It involves trade-offs between the number of test cases and the desired test coverage, available resources, quality of test cases, achieved coverage, etc. Zhao et al. [9] aimed to develop the infrastructure of automatic test data generation for extended finite state machine (EFSM) models that produce real data to trigger feasible transition paths. It provided empirical results on efficiency analysis of test data generation for a set of state-based models. Derderian et al. [10] proposed automated unique input output (UIO) sequence generation for finite state models. e sequence generation problem was regarded as a search problem and was targeted through genetic algorithms (GA). e authors used 11 real and 23 randomly generated FSMs as proof of concept experiments. e GA based experimental results showed 62% better performance compared to the random search.
UML diagrams model both the static and dynamic aspects of a system. Asthana et al. [11] proposed a novel technique to generate the test cases from class and sequence diagrams. e research emphasized that the input diagrams were not transformed into some intermediate model which seems to be compromised as the model was represented in XML form which seems to be an intermediate form.
Mingsong et al. [12] generated test cases from UML activity diagrams.
ey presented a comparison between the dynamic behavior of the activity diagram and actual program execution. Shah et al. [13] worked on automatic testing. ey proposed a framework for extracting the test cases from UML diagrams of sequence and class hierarchy. ey exported UML diagrams to XML format; then, from those XML files, use cases were generated. Kumar et al. [14] proposed a methodology which generated test cases by using class, sequence, and activity diagrams. ey generated XML structure of diagrams defined through the UML. ree types of test cases were generated, i.e., static, dynamic, and integrity test cases. Static test cases were generated from class diagrams, while activity and sequence diagrams were used for integrity test cases and dynamic test cases. Hilken et al. [15] have argued the importance of modeling languages, e.g., OCL and UML, and their role in designing a system. ey pointed out that the modeling languages provide a lot of description through a large number of constructs. Gupta [1] proposed an approach to test the class methods interaction through class contracts. e author used a state-based approach. An abstract state configuration of class and an initial abstract state were used to incrementally generate the reachable states. e task was accomplished through the search of the methods which can be invoked in the current state. It resulted in inability to match the automation and syntax of OCL standards, and the standard parsers were unable to parse the syntax. e approach used AFS traversal to generate test sequence paths and hence faced inherent problems of finite state. e author has used the traditional searching approach for path traversal of finite state machines, and all-transition coverage has been used as sequence path generation. A specification-based testing approach was proposed, which used class contracts specified in the form of OCL constraints (class invariants, preconditions, and postconditions). Miller and Strooper [2] presented a case study on specification-based implementation testing framework. e authors argued that the proposed framework produced almost the same performance as the resulting BZ-Testing tools. ey emphasized that their framework was more cost effective than the manual testing. Ali [18] emphasized that the use of search-based software testing becomes highly computational overhead when applied over large applications. To solve the computational overhead issue, specifically in cloud environments, the authors presented parallel GA. e solution was presented through Hadoop MapReduce. Harman et al. [4] proposed three search-based algorithms for test data generation and presented the results of a case study in support of their approach.
e authors emphasized that their approach maximized the coverage and minimized the required number of test cases. e size of the software considered for case studies was as big as 144 lines of code, which might be good for a proof of concept. Arcuri et al. [19] focused on comparison of three test automation strategies, namely, random testing, adaptive random testing, and search-based testing using genetic algorithms. e results of the abovesaid strategies were presented as well. Srivastava [20] presented a GA based approach for test data generation which took the user input variables and generated the test data. e author claimed that GA outperforms random testing on time measures. Srivastava et al. [21] presented another searchbased test sequence generation technique using the ant colony optimization algorithm. e "ants" were used to explore CFG and find the optimized test sequences. Yano et al. [22] presented an approach of test sequence generation using evolutionary algorithms. ey presented an evolutionary approach for test sequence generation from a behavioral model, in particular, EFSM. A multiobjective evolutionary algorithm, M-GEO vsl, adopted from M-GEO, was used; it considered two objectives: to search for a test sequence that covers a target transition and to minimize the length of this test sequence [23]. Literature reveals that UML diagrams are not sufficient enough to specify complete class behavior; most accurate details of a class are revealed from the OCL class specifications in the form of OCL class contracts [1].

Materials and Methods
e proposed approach uses the OMG's (Object Management Group) standard OCL syntax and automation standard of the test sequence generation. In order to improve testing effectiveness, this paper applied a multiobjective approach using MOGA where optimization of the test coverage and validity of the test sequences constitute a concern. Our goal is to produce test sequences which are most effective in identifying and revealing software implementation problems. e proposed approach is divided into two main phases as shown in Figure 1; i.e., (1) standard OCL parsing is done on the input OCL class contracts, and an abstract finite state machine is generated; (2) state-based test sequences, generated from the source AFSM using multiobjective GA, are optimized.

Parsing Class Contracts and Generating Abstract Finite State Machine.
e proposed technique takes class contracts as input, and by using the standard OCL parser [24], a parse tree of the input class contracts is generated. e parser is used with Eclipse [25], IDE for Java, for parsing the OCL constraints of UML models. e input OCL class contracts are in textual form. e generated parse tree is subject to its semantic analysis and construction of domain specific objects. A parse tree processor is implemented using Java that transverses the parse tree and extracts the objects corresponding to the domain concepts of OCL semantics. e objects shown are extracted through the implemented processor. After generation of parse tree, there is the process of semantic analysis of the output parse tree and construction of domain specific objects in Java. e proposed OCL parse tree processor transverses the parse tree and extracts the objects corresponding to the domain concepts of OCL semantics. Next to the generation of parse tree, the abstract finite state machine is constructed by applying the rules from [1]. e abstract state model of the software from specification is created by starting from the class constructors. For each constructor, a new initial state in the abstract finite state machine is created, and then all states resulting from initial state onward are dynamically created. e proposed framework deviates from the existing research [1] that suggested transition tree coverage criterion; i.e., test sequences are identified along with the simple paths. Simple path coverage misses the self-reference transitions, and it is quite possible that a method might fail on subsequent invocations as the subsequent calls may bring the object in as state (due to implementation faults) that it might behave anomalously; even the specifications may suggest some other behavior. However, in case there are self-transitions to a state, it might skip a valid step in the sequence of method calls. erefore, it is better if there are row test sequences from exhaustive search of the AFSM. e test sequences generated in this step are used as an initial population for the MOGA optimization [26].

Coding Test Sequences in Chromosomes and
Optimization through MOGA. After buildup of the abstract finite state machine, the next phase is generation and optimization of the testing sequences. is phase involves coding the test sequences in tool specific chromosomes, executing MOGA, and selecting the best fit test sequences after evolution.   We have used the JGAP tool to evolve the population having randomly generated chromosomes. It generates the chromosomes of length n, where each of the genes is represented by test transition objects. At first, a random state is picked out of all the states of the finite machine generated in the last step. Next, gene is one of the outgoing edges from the selected state, and this transition is again chosen randomly.

Coding Solutions in Genes and
is process goes on till all the n genes are coded. A potential chromosome in our solution set can be visualized as in Figure 3.
Each transition, T i , in the coding scheme contains reference to the initiating state (transition from state) from which that transition originated and a reference to the terminating state to which that transition is leading, where n is the length of the chromosome, T i is the ith transition in the test sequence, and i � 1,2,3, . . ., n.

e Multiple Objectives.
In order to get quality test sequences, the proposed approach has two objectives which do not conflict, but optimization of one may decrease the fitness of the other objective. e objectives of the proposed technique are as follows: (1) Coverage Optimization. While testing, the proposed approach is interested in revealing all possible errors by applying all possible input combinations to the method interface of the class under test (CUT). Due to infinitely many combinations of class state variables and method input parameter values, it is practically impossible to test all possibilities. e proposed approach can only have as improved class test coverage as possible so that the level of the quality of our testing process is ensured. erefore, the first of the two objectives is the optimization of generated test sequences in terms of the coverage. Our fitness function evaluates the number of transitions of the finite state machine covered by the test sequence.
Coverge fitness(CF) � n i�1 (coverage weight for call sequence). (1) In order to calculate coverage fitness, coverage weights are used which may be added according to three different scenarios: (i) If a transition is covered once, chromosome is given additional positive weight-age, and it rewards a chromosome for covering a transition. (ii) If a transition is not covered at all by a chromosome, it is given additional negative weight-age, and it rewards a chromosome negatively for not covering a transition. (iii) If a transition is covered more than twice by a chromosome, it is given additional negative weightage, and it rewards negatively due to repetition.
(2) Test Sequence Order Optimization. Comprehensive testing of a class involves testing for both valid and invalid method interactions [1]. By inherent properties, MOGA searches through the solution space by building random solutions based on the genetic operators. In the case of class unit testing, any sequence of method calls may be valid, but a question arises as to getting test sequences which are in sequence according to their place in the finite state model. Our second objective is to make the test sequences as in order as possible. Fitness value of solution by assessing its order often is in contrast with the fitness value for overall coverage achievable by that solution.
Oreder fitness(OF) � initial state weight + n i�1 (coverage weight for call sequence). (2) Description of the weight calculation for test sequence order fitness is given as follows: (i) Initial state weight: if the first gene of the chromosome has an initial state of AFSM as from state, then this weight is added; otherwise, it is skipped. (ii) Sequence weight for call sequence: we calculate the quality of chromosome by the sequence of method calls and reward each chromosome by the following formula: (a) If any of the method calls (genes) is in a valid sequence, then a positive weight is added to the second fitness value. (b) If any of the method calls (genes) is not in a valid sequence, then a negative weight is added.

e Genetic Evolutionary Process.
e evolution process in our approach is completed by the following steps. is genetic evolution of chromosomes is done automatically by Java Genetic Algorithms Package (JGAP) [27].

Initialization of Test Sequence Population.
Initial population of the test sequences can be generated either completely at random where transitions from the generated AFSM are picked at random to create genes of each chromosome of the initial chromosome pool. We have initialized the pool by exhaustive search of the AFSM. is initialization is used to minimize the possibility of evolution of the population toward local maxima.

Selection for Reproduction.
Process of selection involves selection of fittest individuals for mating in the next population. Here, each gene is passed from the genetic evolution tool to our fitness function evaluator and is then assigned fitness values based on our fitness functions.

Reproduction of Population.
Population created in step one undergoes genetic processes of crossover and mutation and gets evolved over generations. After each generation, chromosomes are assigned fitness values according the fitness functions.

Crossover.
Based on the selected crossover probability, a single point crossover is performed on the population chromosome, where parts of the chromosomes are swapped and new offspring are created for next generation selection. Here we have used 0.30 as the crossover probability. e crossover process is shown in Figure 4.

Mutation.
In this operation, value(s) of genes are mutated based on the mutation probability, and resulting chromosomes are constructed. In the proposed model, mutation probability is set to 0.07. Here, some of the test transition objects in the target chromosome are replaced with randomly selected values from the AFSM. Our random transition selection mechanism plugs in with the evolution tool and provides random transitions when required for mutation purpose. Example of random mutation is shown in Figure 5.

Termination Condition.
e proposed approach uses the termination criterion of evolving the population, i.e., specific number of times, while reproducing the individuals.

e Fitness Functions.
In order to optimize the test sequences through MOGA, the role of efficiently defined fitness functions is critical. As in the adopted MOGA, two objectives are to be achieved, so we have defined two fitness functions, i.e., (1) fitness by coverage and (2) fitness by test sequence order. e required fitness values are calculated through the two presented algorithms: coverage-based chromosome fitness calculation algorithm (CCFA) (Algorithm 1) and test sequence order-based chromosome fitness calculation algorithm (TSOCFA) (Algorithm 2), which is already used for the same problem by [28,29].

Case Study and Experiment
Generation of test sequences is a critical part of the testing phase of software development life cycle. e test sequences for unit testing of a class can be generated from OCL class specifications [1], i.e., by mapping class specifications (OCL class contracts) to the class model (specifically a class in the class diagram). Existing test sequence generation process [1], when applied to actual testing, reveals some critical issues.
ose issues and tackling them through the proposed framework are presented in the form of a case study. e CoinBox class is taken from a Drink Vending Machine's class diagram; this class is responsible for keeping record of the number of available drinks and quarters entered by the customer. e reason to use CoinBox class lies in presenting a fair comparison with existing research [1] where it is used for approach verification. Two more classes, Stack and Circle, were tested as well.
In the following code of block, the represented syntax is used by [1] and deviates from the standard OCL [2] in many aspects like the following: (1) Each statement in preand postconditions must be joined by a logical operator, e.g., "and", which is missing in the example. (2) Standard OCL syntax does not allow the use of curly braces "{}" around the context declarations. (3) All the OCL contexts (equivalent to class) must be declared inside a package and endpackage statement. (4) Each constraint in the invariant declaration must be separated by "and" instead of ",". (5) Writing just ":" operator while declaring a method signature is not enough; it should be fully qualified with the context name being referred to by the method. (6) Each "if" must have an accompanying "else" in order to validate OCL statement.

Scientific Programming
As already mentioned in [1] that the above block of code was not fulfilling the OCL standard syntax, the proposed framework modified this code syntax as adopting all the requirements of the standard OCL syntax. e resulting OCL class contract is acceptable according to the OCL 2.0 standard. e proposed approach used mutation analysis for benchmarking its performance. Moreover, fault seeding in classes under the test is done through Mu Java. It is worth noting that proper selection of the number of generations is problem specific and is important; e.g., a test run of the tool over CoinBox class gave 2 unique test sequences over 100 evolutions, but they got improved and diverse with 500 and 1000 generations.
Package CB Context CoinBox Inv: curQtr ≥ 0 and quantity ≥ 0 and totalQtrs ≥ 0 Context COinBox::CoinBox() Post: Self.curQtr-0 and self.allowVend-FALSE and self.quantity-0 and self.total Qtrs-0 ese faults were based on predefined mutation operators. From Table 1, the results reveal that the proposed approach attains better results in identifying the seeded faults. e reason behind the low performance of the existing technique [1] lies in its transition tree coverage which skips loops in the AFSM. Syntax of OCL used by existing approach [1] fails to be accepted as standard OCL syntax and fails to get parsed by the available OCL parsers. It deviates from the standard of writing OCL statements and hence cannot be employed in practical test sequence generation scenarios. e very first impact of that nonstandard OCL reveals the syntax errors. e proposed tool reads standard OCL constructs and automatically generates the test sequences applying the rules used by the current approach. It also allows on-demand optimization of the test sequences if desired by the test engineer. An obvious advantage of the automation along with effort saved from manual works is automatic changes to the test sequences on change of OCL specifications.
e experimental case study, exhaustive state space search, generated 872 test sequences with a maximum length of 26 with redundant test sequence loops. Much effort needs to be spent on executing all these test sequences. Application of MOGA with a population size of 25 and a sequence length of 15 gave 25 test sequences with a length of 15, each being optimized for all state coverage and ordered sequence paths over 1000 MOGA generations. It was also observed that more generations give more diverse test sequences with higher fault revealing efficiency. Since the proposed approach used a random population out of the search-based sequences, it minimizes the chances of bad genes and evolution in negative direction. A mutation analysis of the class under test found that MOGA based test sequences seem to give at least comparative defect revealing efficiency and may considerably outperform test sequences generated from the current approach. It is important to be noted that proper selection of number of generations is important; more generations might give better results but with considerable MOGA execution time.
By nature, as all optimization techniques, it is never expected that one gets an exact solution, but an optimized solution is obtained. MOGAs, being a subset of evolutionary algorithms, start with a possible set of solutions and try to optimize the set of solutions generation after generation. Evolution as mimicry of the natural process of evolution might not find suitable chromosomes (e.g., due to mutation) and might give some useless test sequences; this can be controlled using better fitness functions.
is is obvious because in nature if wrong genes get to the next generations, then the individuals may suffer from defects. After generation of AFSM, it can (i) either generate a stochastic random population where each chromosome is composed of a completely random set of genes. (ii) or get a random population out of the population of test sequences generated from state-based test sequence generation approach.
e second option seems to give better results. While specifying MOGA fitness functions for test sequence optimization, the sequence of genes while calculating fitness values must be taken into account. e proposed approach gives improvement in terms of automation of test sequence generation process. MOGAs are quite effective while being used for test sequence optimization process, but the 8 Scientific Programming proposed approach recommends the use of raw test sequences as initial population. MOGA optimized test sequences give optimized coverage within limited test sequence length and numbers. Table 2 presents the comparative analysis of the proposed framework with the existing techniques in terms of six features, i.e., automation, being specification based, being coverage based, being state based, optimization, and being multiobjective. It may clearly be observed that all the techniques are either fully or partially automated except that presented by Harman et al. [4]. e proposed technique generates the test sequences in an automated mechanism. Next, only two of the techniques use OCL class specification for the test case generation, i.e., the proposed technique and research presented by Gupta [1]. UML diagrams were used in [22], class and sequence diagrams were used for the generation of sequences of test cases [11], activity diagrams were used by Mingsong et al. [12], and class and sequence diagrams were used in [13]. From the existing literature, it has already been established that the most accurate details of a class are revealed from the OCL class specifications in the form of OCL class contracts. As far as state coverage is concerned, only the proposed technique along with those in [1,12,22] takes state coverage into account while generating the test cases, and the rest of the techniques do not use this feature. As all the techniques along with the proposed technique either use FSMs or directed graphs during the process of test case generation, all of them use state-based features.
e most optimized test cases are generated through [4,10,22] and the proposed framework while the rest of the comparative techniques use searching mechanisms which have their own inherent problems. e proposed technique and the research presented in [22] are multiobjective while all others are limited to single objectives. e multiobjective approach tries to limit the minimum number of states in test sequences by providing the maximum coverage. is comparison clearly reflects the fact that the proposed technique is better in terms of key features than its comparative ones.

Conclusion
e proposed approach has improved the existing approaches by conformance to industry standard syntax and automation from OCL to the actual test sequence generation.
e proposed approach provides the advantage of optimization for test sequences in terms of minimum number and higher quality along with automation of test sequence generation process and conformance to industrypracticed OMG standard OCL syntax. It saves time and resources spent on the part of testing process where selection of test sequences is to be accomplished. Our approach gives improvement in terms of automation of test sequence generation process. Multiobjective genetic algorithms are quite effective while being used for test sequence optimization process, and use of raw test sequences as initial population appears to give better results compared to the completely random selection of initial population of test sequence chromosomes. MOGA test sequences give optimized coverage (maximum transition coverage) within limited test sequence length and numbers. e proposed framework can be used either by industry practitioner test engineers for creating test sequences while testing the software or by researchers while experimenting with FSMS, GA, and MOGAs.
is research can be improved using more reliable testing techniques for testing the software with FSMS, GA, and MOGAs.

Data Availability
No data are available.