Prioritizing Program Elements: A Pretesting Effort to Improve Software Quality

Improving the efficiency of a testing process is a challenging task. Prior work has shown that often, a small number of bugs account for the majority of the reported software failures; and often, most bugs are found in a small portion of the source code of a program. First, prioritizing the code elements according to their criticality and then conducting testing, will promote to reveal the important bugs at the early phase of testing. Keeping it in view, we propose an efficient test effort prioritization method that give a chance to the tester to focus more on the parts of the source code that are highly influenced towards the system failures or in which, the failures have high impact on the system. We consider five important factors such as influence towards system failures, average execution time, structural complexity, severity, and business value associated with a component and estimates the criticality of the component within a system. We have experimentally proved that our proposed test effort prioritization approach is effective in revealing important bugs at the early phase of testing as it is linked to external measure of defect severity and business value, internal measure of frequency, complexity, and coupling.

their structural complexity only. From the experimental results, we observed that our approach helps to reduce the failure rate at the operational environment. The consequence of the observed failures were also low compared to the related approach.
Priority should be established by order of importance or urgency. As the importance of a component may vary at different points of the testing phase, we propose a multi cycle-based test effort prioritization approach, in which we assign different priorities to the same component at different test cycles.
Test effort prioritization at the initial phase of SDLC has a greater impact than that made at a later phase. As the analysis and design stage is critical compared to other stages, detecting and correcting errors at this stage is less costly compared to later stages of SDLC. Designing metrics at this stage help the test manager in decision making for allocating resources. We propose a technique to estimate the criticality of a use case at the design level. The criticality is computed on the basis of complexity and business value. We evaluated the complexity of a use case analytically through a set of data collected at the design level. We experimentally observed that assigning test effort to various use cases according to their estimated criticality improves the reliability of a system under test.
Test effort prioritization based on risk is a powerful technique for streamlining the test effort. The tester can exploit the relationship between risk and testing effort. We proposed a technique to estimate the risk associated with various states at the component level and risk associated with use case scenarios at the system level. The estimated risks are used for enhancing the resource allocation decision.

An intermediate graph called Inter-Component State-Dependence graph (ISDG) is
introduced for getting the complexity for a state of a component, which is used for risk estimation. We empirically evaluated the estimated risks. We assigned test priority to the components / scenarios within a system according to their estimated risks. We performed an experimental comparative analysis and observed that the testing team guided by our technique achieved high test efficiency compared to a related approach.

Introduction
Testing is the process of exercising a program with the intent of detecting bugs. The basic aim is to increase the confidence in the developed software. Testing enhances the software quality in terms of the total number of test runs, bugs revealed and the percentage of code coverage. Verification, validation and defect finding are the major tasks under software testing.
In software testing literature, four terms are commonly used. These are (i) failure (ii) error (iii) fault (iv) defect. Though, they have related meaning, they differ at some points. An error made by a programmer results in a defect (fault or bug) in the program. The execution of a defect may cause one or more failures. As per the IEEE standard, failure is the inability of a system or a component to perform its required functions within the specified requirements. A failure in a system is observed by the user externally. There are two main goals in software testing: (i) to achieve the adequate quality in which the objective is to search the bugs within a software (ii) to assess the existing quality of the system in which the objective is to assess the reliability of a software system. Based on the testing strategy, the software testing approaches are classified into two types such as code based testing and usage based testing. The aim of code based testing is to execute each and every statement in a program at least once, during the test [2,3]. It attempts to cover each reachable elements in the software, within the available test budget. In the code based testing methodologies such as statement, branch and path coverage, each aspect of a program is treated with equal importance [2]. The main aim is to find as many bugs as possible. Usage based testing focuses on detecting bugs that are responsible for frequent failures of the system. Unlike the code based testing, the tester of usage based testing does not require any prior knowledge of the program.
In code based testing, the aim is to execute each statement and conditional branch Introduction functions (iv) coverage of functions not yet covered (v) potential for fault exposing (vi) probability of fault existence/exposure, adjusted to previous coverage etc.
All the existing techniques on test case prioritization and test case selection are purely code-based and require the information on previous usage of the system.
Hence, these techniques are mainly used at the post-implementation phase and used only for regression testing. Among the objectives of test case prioritization, the most important one is to maximize the rate of fault detection. The aim is to detect the faults from the important parts of the source code at the early phase of the testing process. Other objectives include the ability to detect important faults and the ability to reveal faults associated with specific code changes or to achieve the target coverage or reliability level as early as possible.
The distribution of test efforts is important to test organization. In this thesis, prioritizition refers to test effort prioritization in which components 1 /scenarios are prioritized for testing according to their influence on the overall reliability of the system or severity of failures. Test effort prioritization is a research area under pretesting effort i.e. before the generation of test cases. The software industry is really interested to save money on testing. As test resources are limited, a proper analysis is needed to decide how much test effort should be given to individual elements, within a system. The test manager should estimate the criticality associated with individual elements in order to decide which parts of the system should be tested thoroughly, within the available test budget. For estimating criticality, the test manager should consider various internal and external factors of a component such as complexity, dependability, severity and the business importance within the system.

Motivation
An efficient prioritization method can drastically reduce the inefficient effort and help to effectively utilize the test resources. Though, a great effort have been given on prioritization-based testing [4,[9][10][11], the proposed methods are not so much effective in reducing the failure rate of a system and improving the user's perception on the reliability of a system. Limitations of some prioritization-based testing methods and reason for their low productivity are described below.
The techniques used for code prioritization [11,12] only find the percentage of code coverage at the testing phase in a practical system. It cannot find the elements which have high impact on the overall reliability of the system. Testing methods based on operational profile [9,13] alone did not consider the white-box approach for test effort prioritization. Though some researchers [14,15] have considered the white-box approach along with operational profile, but they did not consider the data dependencies among components within a system.
Test effort prioritization at the early stage of development cycle makes the testing process effective. Several researchers [16][17][18][19] have proposed test effort estimation methods at the early phase, but to the best of our knowledge, no one has proposed a quantitative estimation of complexity for a use case. As, the complexity of a use case is a major input for test effort estimation and prioritization, there is a need to perform analytical complexity assessment at the architectural level, with little or no involvement of subjective measures from domain experts. Keeping these in view, we propose some approaches that attempt to overcome many of the limitations of the existing approaches highlighted above. Now, we discuss the motivations behind our research work.
X A bug in a critical element may cause frequent failures or severe failures of the system. The criticality of an element can be identified through the analysis of source code and the operational profile of the system.
X Some researchers [1,20,21] have observed that the return on investment on testing is increased through a Value based software testing method, where the business value that come from customer and market is considered as a testing factor. Similarly, there are some components which are executed rarely but a bug in that may cause catastrophic failures. To make the criticality computation process accurate and effective, the external factors of a component such as business value and severity associated with the failure modes should be considered along with its internal factors.
X It is possible to achieve a high quality software product in affordable cost. For this, software testing should be incorporated early into the software development process. It is desirable to identify the critical elements at the architectural

Overview
In order to save time and cost in the Software Development Life Cycle (SDLC), there is a requirement of an effective decision-making for allocating resources to various parts of the software system. In this thesis, we explore some test effort prioritization issues at various phases of software development life cycle. We propose a set of techniques to prioritize the components/use case scenarios for testing at the code level and also, at the design level. At the code level, the potential of a program element to cause failures is measured with the metric Influence Metric. Based on a graph-based representation, the effected part of classes are determined. Within a system, we consider the internal and external factors such as the class influence, average execution time, structural complexity, severity and business value for ranking of the importance of a class for testing. We propose a novel approach for reliability improvement that involves the analysis of the dynamic influence and severity of various components within a software system.
A software product can be lunched in due time with sufficient testing, if a test plan is prepared early. As the analysis and design stage is critical compared to other stages of SDLC, detecting and correcting errors at this stage is less costly than later stages.
We aim to leverage the architectural complexity and business importance information to assign test priority to use cases. We first analyze the factors that have an effect on the complexity of a use case and then, give a framework to compute test priority. The stakeholders and developers feel that the measurement of the quality of a software system through risk is more significant than other factors such as expected number of residual bugs or failure rate etc. Risk assessment framework takes into account the arguments about the benefits as well as the hazards 2 associated with a system. It helps to take a valuable decision on investment at an early stage. We propose a technique to estimate the reliability-based risk at the design level. Reliability-based risk is estimated based on two factors (i) the probability of the failure of the software product within the operational environment and (ii) the adversity of that failure.
We propose a technique to assess the risk of a component at various states within a system, which is used as the basis for establishing the test priority.
A set of experiments are conducted to compare our test effort prioritization techniques to different solutions. Through the experimental results, we observed that our proposed techniques guide the tester to expose the critical elements that are getting less attention in terms of testing. In addition to that, our approaches also help to improve the reliability of the system within the available test resources.

Focus and Contribution of the Thesis
Specifically, the thesis makes the following contributions: X We propose a framework to compute the criticality of a component within a and observed that our approach is effective in guiding test effort as it is linked to both external measure of defect severity and business value, and internal measure of frequency and complexity. Through the experimental results, we observed that our approach helps to improve the reliability of a system within the available test resources. In addition to that, our approach also helps to

Introduction
In our next work, we prioritize the program elements based on dynamic analysis of source code. influence metric, only the information regarding how many other classes request services from a given class is obtained, but in dynamic influence metric, the information regarding how often these requests are executed within a scenario is obtained. In the second test cycle, we assign test priority to a component based on its failure rate in the previous test cycle. We include a Value-based testing approach in the third test cycle. The effectiveness of our proposed testing approach has been validated by applying it to three moderate sized case studies.
The proposed techniques can be used by testers in software industry for prioritizing the test efforts, where the source codes are available. Since in many cases, the source code may not be available, in our next work, we develop a technique for prioritization of elements at the design level. The technique can be used by tester in software industry, where source code are not available and/or test planing is required much early in the SDLC.
X Planning at high level enhances the decision on resource allocation. Estimating the criticality of an architectural element and performing test effort prioritization based on criticality at high level helps both the system analyst and the test manager in planing suitable provision for the critical elements. If the critical elements will be detected at the early phase of SDLC then, it will be useful in allotting resources in afterward development phase. Keeping this in mind, we propose a technique to rank the use cases within a system for testing based on their internal criteria-architectural complexity and external criteria-business 8

Focus and Contribution of the Thesis
Introduction value. We first, analyze the factors that have an effect on the complexity of a use case and then, give a framework to compute test priority. The complexity of a use case is computed analytically through a collection of data at the architectural level with little or no involvement of subjective measures from domain experts. In our approach, a high-ranked use case may be more fault-prone or it may add value to the organization. Hence, the failure of a high-ranked use case may create a great loss to the organization.
In all the above work, we have not considered the risk associated with a system.
In real practice, risks are associated with every system. Resolving risks at the analysis and design level will improve the quality of the system, within the available resources. In our next work, we develop an approach at the design level for prioritization of elements for testing, considering the risk associated with a system.
X Test effort prioritization based on risk is a powerful technique for streamlining the test effort and delivering the software product with right quality level in limited resources. The tester feels that he is doing the best possible job with the limited resources by exploiting the relationship between risk and testing effort.
Risk assessment at an early stage helps to achieve a high level of confidence in the system. We propose an analytical approach for risk assessment of a software system at the design stage. First, we propose a method to estimate the risk for various states of a component within a scenario and then, estimate the risk for the whole scenario. In our previous work, we have assessed the severity at the code level, but in this work, we assess the severity at the design level. We estimate the risk of the overall system based on two inputs: scenarios risks and Interaction Overview Diagram (IOD) of the system. Our risk analysis approach ranks the components/scenarios within a system for testing according to their estimated risks. We performed an experimental comparative analysis and observed that the testing team guided by our risk assessment approach achieves high test efficiency compared to a related approach.
The relationships among the contribution is shown in Figure 1.1. As shown in the figure, the contribution on test effort prioritization is broadly divided into two parts.

2.
Chapter 3 provides a brief review of the related work relevant to our contribution. 3. Chapter 4 presents a novel approach to get the influence of a component towards system failures. We propose a metric, called Influence Metric, through static analysis of source code and use it as a factor for prioritizing program elements at the code level. 4. Chapter 5 presents a novel approach to prioritize classes according to their potential to cause failures and severity of those failures. This is a very important and interesting problem for software testing. This chapter extends the work in Chapter 4 by adding some contributing factors-structural complexity, severity and business value-for test effort prioritization. 5. Chapter 6 presents a multi cycle-based test effort prioritization approach to improve the reliability of a system within the available test resources, through the dynamic analysis of source code. 6. Chapter 7 presents an approach to estimate the test effort based on the prioritization of use cases in the design level of software development life cycle. Our approach quantifies a method for estimating the test effort of a software system based on use cases. It provides experimental results that appear to substantiate the method. 7. Chapter 8 presents a risk estimation approach of software system at the architectural level. The main idea consists in using UML sequence and state diagrams, in order to calculate an overall risk factor associated to a selected architecture. 8. Chapter 9 concludes the thesis with a summary of our contributions. We also briefly discuss the possible future extensions to our work.

Chapter 2 Background
This chapter provides a general idea of the background used in the rest of the thesis.
For the sake of conciseness, we do not discuss a detailed description of the back- technique. Section 2.9 presents the basic concepts on Operational Profile of a system which is used in various testing approaches for achieving and assessing the reliability of a system. Section 2.10 briefly discusses the concepts of risk-based testing. Section 2.11 summarizes this chapter.

Object-Oriented Technology and Software Testing
It is widely accepted that the object-oriented (O-O) paradigm will significantly increase the software reusability, extendibility, inter-operability, and reliability. This is also true for high assurance systems engineering, provided that the systems are tested adequately. Object-oriented software testing (OOST) [22] is an important software

McCabes Cyclomatic Complexity
Background quality assurance activity to ensure that the benefits of object-oriented (O-O) programming will be realized. Below, we discuss different levels of testing associated with object-oriented programs.
1. Intra-method testing: Tests designed for individual methods. This is equivalent to unit testing of conventional programs.
2. Inter-method testing: Tests are constructed for pairs of method within the same class. In other words, tests are designed to test interactions of the methods.
3. Intra-class testing: Tests are constructed for a single entire class, usually as sequences of calls to methods within the class.
4. Inter-class testing: It is meant to test a number of classes at the same time. It is equivalent to integration testing.
The first three variations are of unit and module testing type, whereas inter-class testing is a type of integration testing. The overall strategy for object-oriented software testing is identical to the one applied for conventional software testing but differs in the approach it uses. We begin testing in small and work towards testing in the large. As classes are integrated into an object-oriented architecture, the system as a whole is tested to ensure that errors in requirements are uncovered.

McCabes Cyclomatic Complexity
Cyclomatic Complexity (v(G)) [23] is a measure of the complexity of a module's decision structure. It is the number of linearly independent paths and therefore, the

Halstead Complexity Metric
Background minimum number of paths that should be tested. If the structure of source code is complex, it is hard to understand, to change and to reuse. The cyclometic complexity measures the number of linearly independent paths through the Control Flow Graph (CFG) of the program. v(F) = e -n + 2, where F is the CFG of the program, n the number of vertices and e the number of edges. We present a program with its CFG in Figure 2.1. In the program, n=7, e=8. So, the cyclomatic complexity for the program is: 8-7+2=3.

Halstead Complexity Metric
Any programming language is defined by declarative instructions definitions, executable instructions. The operators and operands are handled within expressions.
The programs are made up of instructions, written in sequences, without taking into account the running order. Halstead [24] makes the observation that metrics of the software should reflect the implementation or expression of algorithms in different languages, but be independent of their execution on a specific platform. The metrics proposed by Halstead are computed through the static analysis of the source code.
He estimated the programming effort. The measurable and countable properties are: X n1 = number of unique or distinct operators appearing in the source code.
X n2 = number of unique or distinct operands appearing in that source code.
X N1 = total usage of all of the operators appearing in that source code.
X N2 = total usage of all of the operands appearing in that source code.
The number of unique operators and operands (n1 and n2) as well as the total number of operators and operands (N1 and N2) are calculated by collecting the frequencies of each operator and operand token of the source program. Halstead defines:  to capture the behavior of v at s [25]. According to Weiser [25], a program slice is a reduced and executable program obtained from a program by removing statements, such that the slice replicates part of the behavior of the program.

Program Slice
Slicing object-oriented programs presents new challenges which are not encountered in traditional program slicing [26]. To slice an object-oriented program, features such as classes, dynamic binding, encapsulation, inheritance, message passing and polymorphism need to be considered carefully [27]. Larson and Harrold were the first to consider these aspects in their work [28]. To address these object-oriented features, they enhanced the system dependence graphs (SDG) [29] to represent object-oriented software. After the SDG is constructed, the two phase algorithm of Horwitz et al. [29] is used with minor medications for computing static slices. Larson and Harrold [28] have reported only a static slicing technique for object-oriented programs, and did not address dynamic slicing aspects. The dynamic slicing aspects have been reported by Song et al. [30] and Xu et al. [31].  [33].

Categories of program slicing
Intra-procedural Slicing and Inter-procedural Slicing: Intra-procedural slicing computes slices within a single procedure. Calls to other procedures are either not handled at all or handled conservatively. If the program consists of more than one procedure, inter-procedural slicing can be used to derive slices that span multiple procedures [29]. For object-oriented programs, intra-procedural slicing is meaning less as practical object-oriented programs contain more than one method. So, for object-oriented programs, inter-procedural slicing is more useful.

Applications of program slicing
Slicing is used by both developer and tester, before the execution of the code and during execution. The developer uses slicing tool to understand the source code and to reduce the size of a program. Sometimes a programmer has to read a lot of code before finding what he is actually looking for. Programmer uses the slicing tool to improve the productivity. The tool helps the programmer in reducing the amount of code that need to read. The tool is used by the developer for debugging. Some variables may show unexpected values at some point in the program. To know the exact cause of these values is difficult and also time taking. The slicing tool helps a lot in this case. The tester uses the slicing tool for analyzing the test coverage of the test suite [7,34]. The dynamic slice is created for each test case of a test suite and the union of these slices are computed to get an idea of code coverage by the test suite.
Recently, Qusef et al. [35] proposed a novel approach to maintain the traceability links between unit tests and tested classes based on dynamic slicing.

Program Representation
Various types of program representation schemes exist which include high level source code, pseudo-code, a set of machine instructions in a computer's memory, a flow chart and others. The purpose of each of these representations depends upon the exact context of use. In the context of program slicing, program representations are used to support automation of slicing. Various representation schemes have resulted from the search for ever more complete and efficient slicing techniques.

Program Dependence Graph (PDG)
The program dependence graph [36] G of a program P is the graph where each node n ∈ N represents a statement of the program P . The graph contains two kinds of directed edges: control dependence edges and data dependence edges.
A control (or data) dependence edges (m, n) indicates that n is control (or data) dependent on m. Note that the PDG of a program P is the union of a pair of graphs: Data dependence graph and control flow graph of P .

System Dependence Graph (SDG)
The PDG cannot handle procedure calls. Horwitz et al. [29] introduced the System Dependence Graph (SDG) representation which models the main program together with all associated procedures. The SDG is very similar to the PDG. Indeed, a PDG of the main program is a subgraph of the SDG. In other words, for a program without procedure calls, the PDG and SDG are identical. The technique for constructing an SDG consists of first constructing a PDG for every procedure, including the main procedure, and then adding dependence edges which link the various subgraphs to- gether.
An SDG includes several types of nodes to model procedure calls and parameter passing: X Call-site nodes represent the procedure call statements in the program.
X Actual-in and actual-out nodes represent the input and output parameters at call site. They are control dependent on the call-site nodes.
X Formal-in and formal-out nodes represent the input and output parameters at called procedure. They are control dependent on procedure's entry node.

Program Representation
Background Control dependence edges and data dependence edges are used to link an individual PDG in an SDG. The additional edges that are used to link a PDG are as follows: X Call edges link the call-site nodes with the procedure entry nodes.
X Parameter-in edges link the actual-in nodes with the formal-in nodes.
X Parameter-out edges link the formal-out nodes with the actual-out nodes.
X Summary edges connects an actual-in vertex and an actual-out vertex if the value associated with the actual-in vertex may affect the value in actual-out vertex. It represents the transitive dependencies that arise due to procedure calls.

Extended System Dependence Graph (ESDG)
ESDG models the main program with all other methods. Each class in a given program is represented by a class dependence graph. Each method in a class dependence graph is represented by procedure dependence graph. Each method has method entry vertex that represent the entry in the method. The class dependence graph contains a class entry vertex that is connected with the method entry vertex of each method in the class by a special edge known as class member edge. To model parameter passing, the class dependence graph associates each method entry vertex with formal-in and formal-out vertices.
The class dependence graph uses a call vertex to represent a method call. At each call vertex, there are actual-in and actual-out vertices to match with the formal-in and formal-out vertices present at the entry to the called method. If the actual-in vertices affect the actual-out vertices then summary edges are added at the call-site, from actual-in vertices to actual-out vertices to represent the transitive dependencies.
To represent inheritance, we construct representations for each new method defined by the derived class, and reuse the representations of all other methods that are inherited from the base class. To represent the polymorphic method call, the ESDG uses a polymorphic vertex. A polymorphic vertex represents the dynamic choice among the possible destinations. The detailed procedure for constructing an ESDG is found in [28]. Each node can be a simple statement or a call statement or a class entry or a method entry. An example of an object-oriented program with its ESDG is shown in Figure 2.3. Several researchers [4,37,38] [28] in our work because, our main aim is to get a forward slice of a method-entry vertex through the process of graph reachability. Throughout the thesis, we use the terms node and vertex interchangeably.

Unified Modeling Language (UML)
Models are the intermediate artifacts between requirement specification and source code. Models preserve the essential information from requirement specification and are base for the final implementation. UML has emerged as an industrial standard for modeling software systems [39]. It is a visual modeling language that is used to 20

CK Metrics
Background specify, visualize, construct, and document the artifacts of a software system. UML can be used to describe different aspects of a system including static, dynamic and use case views of a system. UML supports object-oriented features at the core. It accomplish the visualization of software at early stage of development cycle, which helps in many ways like confidence of both developer and the end user on the system, earlier error detection through proper analysis of design and etc. UML also helps in making the proper documentation of the software and so maintains the consistency in between the specification and design document. UML diagrams can be divided into two broad categories: structural and behavioral diagrams. The UML structural diagrams are used to model the static organization of the different elements in the system, whereas behavioral diagrams focus on the dynamic aspects of the system.

CK Metrics
CK metrics [41] were designed to measure the complexity of the design of objectoriented system. CK metrics measured from the source code have been related to: fault-proneness, productivity, rework effort, design effort and maintenance. It helps in taking managerial decisions, such as re-designing and/or assigning extra or higher skilled resources to develop, to test and to maintain the software. The set of metrics are:

Value-based Testing
In Value-neutral testing method, each use case is considered equally important and hence, the test effort for a use case is linear to the factor complexity. Value-based testing method focuses the test effort on the features (use cases) that provide a high system value [1,20,21,42]. The addition of Value (say, business value) helps to maximize the returns on investment on the resources allocated to testing [43].
Boehm [42]  1. The relative benefit that each feature provides to the customer or business. It is estimated on a scale from 1 to 9, where 1 and 9 indicates the minimum benefit and the maximum possible benefit respectively. The best people to judge these benefits are the domain experts and the customer representatives.
2. The relative penalty by not including a feature is also estimated. It represents how much the customer or business would suffer, if the feature is not included within the system. For this penalty, a scale from 1 to 9 is also used, where 1 stands for no penalty and 9 represents the highest penalty.
3. The sum of the relative benefit and penalty gives the total business value called Value. By default, benefit and penalty are weighted equally. The weights for these two factors can be changed. We have rated the benefit twice as heavily as the penalty ratings as defined in [21,42].
For example, the business values for various use cases of Automatic Teller Machine (ATM) system are shown in Table 2.1. We consider only the use cases that are used

Operational Profile
Background by the customer. start-up and shut-down use cases are not considered as they are the basic use cases to run the system.

Operational Profile
According to Musa [9], a profile is a collection of disjoint (only one can occur at a time) alternatives with some probability assigned for each occurrence. An operational profile simply consists of a set of operations that a system is designed to perform along with its probabilities of occurrence. It predicts the possible use of the system in the operational environment in a quantitative manner. It is widely used in the field of software reliability engineering.
An operational profile assigns probability values to various high level functions (use cases) according to their probability of use by various users within a system [45][46][47]. Suppose, we have drawn a use case diagram consisting of m types of users and n number of use cases for a system. Each user type has been assigned a probability of using the system. Let u i be the probability assigned to i-th user type for accessing the system such that ∑ m i=1 u i = 1. Let q ij be the probability of requesting the functionality of j-th use case (j=1...n) by i-th type user (i=1..m) such that ∑ n j=1 q ij = 1. Then, the probability of a use case x denotes the likelihood of the use case being executed by an average user is given by: We consider that the functionality of any system can be modeled through a set of scenarios derived from use cases [40]. A use case consists of one main scenario and a number of alternative scenarios. As per the domain knowledge, a scenario of a use case is assigned some frequency based on its execution in the operational

Risk-based Testing
Background environment. Let f i (j) be the frequency of j-th scenario of i-th use case such that ∑ nos i j=1 f i (j) = 1 where, nos i is the total number of scenarios of i-th use case. Then, the probability of execution of k-th scenario of i-th use case, p(k i ) is given by:

Risk-based Testing
In order to save time and cost in the software development life cycle, there is a requirement of an effective decision-making for allocating resources to various high level requirements. For this, there is a need to assess quantitatively all possible types of risks associated with high level requirements as early as possible. Risk is the combination of damage that occur due to failure and probability of failure in the operational environment, as shown in Figure 2.4. Risk analysis is important for a critical real-time application and it is basically done to assess the damage during use, frequency of use, and to decide the probability of failure by looking at defect [48].
There are several types of risks such as reliability-based risk, availability-based risk, acceptance-based risk, performance-based risk, cost-based risk, and schedule-based risk. We are mainly concerned with reliability-based risk. It is the probability that the software product will fail in the operational environment and the adversity of that failure. Risk assessment framework takes into account arguments about benefits as well as hazards. It helps to take a valuable decision on investment at an early stage.

Summary
In this chapter, we have discussed the slicing concept and intermediate program representation that will be used later in our thesis. We have discussed the estimation of business value with an example. We have also given an introduction of risk associated with a system under test.

Related Work
In this chapter, we review the literature and present a brief summary of the work done related to prioritization-based testing at both the implementation and architectural level. The different approaches proposed in this direction by different researchers can be broadly categorized into two types: pre-testing effort prioritization (before the construction of test cases) and on-testing effort prioritization (at the time of test case selection in a test suite). Pre-testing effort prioritization methods help to prioritize the program elements for testing whereas, on-testing effort prioritization methods prioritize the test cases within a test suite. We first discuss the reported work on pre-testing effort prioritization methods in Section 3.1 followed by on-testing effort prioritization methods in Section 3.2. As our aim is to improve the reliability of a system under test through test effort prioritization, we discuss a number of reliability models for assessing and achieving the reliability of a software system in Section 3.3. We propose a test effort prioritization method at the architectural level to rank the use cases within a system for testing. For this, we discuss some early effort estimation and prioritization methods (development effort and testing effort) based on use cases in Section 3.4. For achieving a better reliability through testing, it is required to estimate the reliability-based risk for various elements within a system at the early phase and prioritize the elements according to their estimated risk. We present a brief summary of the work done on risk assessment in Section 3.5. Finally, we present the summary of the chapter in Section 3.6.

Pre-testing Effort
The basic aim of a pre-testing effort prioritization method is to prioritize the test

Code prioritization
Code prioritization is a testing technique which is used for improving the code coverage in a coverage-based testing. Code coverage is a metric that represents how much of the source code for an application run when the unit tests for the application are run. It is basically used for measuring the thoroughness of software testing.
Li. [11] proposed a priority calculation method that prioritizes and highlights the important parts of the source code based on dominator analysis, that need to be tested first to quickly improve the code coverage. His approach consists of two Before test construction, Li's method decides which line of code will be tested first to quickly improve code coverage. According to his approach, first the intermediate representation of the source code, known as Control Flow Graph (CFG) is constructed. Then, a node 1 of the CFG is prioritized based on measuring quantitatively how much lines of code are covered by testing that node. A weight is calculated for each node considering only the coverage information. It does not take into account, for instance, the complexity or the criticality of a given part of the program.
A test case covering the highest weight node will increase the coverage faster 2 . There are two kinds of code coverage such as control flow based and data flow based. Li's work focuses on control flow coverage.
Li et al. [12] presented a methodology for code coverage-based path selection and test data generation, based on Li's previous work [11]. They [12] proposed a path selection technique that considers the program priority and call relationships among 1 A node is either a statement or a method or a basic block in the source code 2 The tester, based on his/her experience may desire to cover first a node with a lower weight but that has a higher complexity or criticality class methods to identify a set of paths through the code, which has high priority code unit. Then, constraint analysis method is used to find object attributes and method parameter values for generating tests to traverse through the selected sequence of paths. It helps to automatically generate tests to cover high priority points and minimize the cost of unit testing.
Code coverage is a sensible and practical measure of test effectiveness [49]. It helps the developers and vendors to indicate the confidence level in the readiness of their software but, the limitation is that it gives equal importance to the discovery of each fault. So, no information is gained on how much it affects the reliability of a system by detecting and eliminating a fault during the testing process, as different faults have different contribution to the reliability of a system.

Fault-prone based testing
Fault-prone based testing approach identifies the faulty components in a system and test effort prioritization is done accordingly. It estimates the probability of the presence of faults within a component, which helps to take a valuable decision on testing. There has been significant amount of research [50][51][52][53][54] in software industry to identify the fault-prone components within a system and prioritizing the test accordingly. Different authors have focused on different characteristics associated with a component for counting faults.
Eaddy et al. [50] experimentally proved that concern-oriented metrics 3 are more appropriate predictors of software quality than structural complexity measures and there is a strong relationship between scattering and defects.
Czerwonka et al. [54] discussed the application of CRANE tool set on a large scale software product, Windows Vista, to expose the required information such as code churn, code complexity, dependencies, pre-release bugs with the purpose to make a decision for failure prediction, change analysis and test prioritization to minimize risks of further problems in changed code.
Ostrand et.al. [51] proposed a novel approach to identify the faulty files at the time of next release of an application. For prioritizing testing efforts, their approach considers the factors that are obtained from the modification requests and the version control system. These factors are (i) the file size (ii) file status (whether the file was new to the system) (iii) fault status in previous release (iv) number of changes made. For some initial releases, the models were customized based on the above observed factors. Based on the experimental results, the authors concluded that their methodology can be implemented in the real world without extensive statistical expertise or modeling effort.
Ostrand et. al [53] proposed a negative binomial regression model. The binomial model is used to predict the expected number of faults in each file of the next release of a system. The predictions are based on the code of the file in the current release, and fault and modification history of the file from previous releases. Similarly, Emam et al. [52] found that a class having high export coupling value is more fault-prone. A complex program might contains more faults compared to a simple program [55]. As the factor complexity is the most important defect generator, the complexity metric is used as a parameter for testing [56,57].
We present some existing work on prediction of faulty components through design metric. Researchers [56,58] related the structural complexity metric obtained through CK metric suite [41] to the fault-proneness of a system. It is observed that the estimated defect density that is computed through static analysis and the pre-release defect density that is computed through testing are strongly correlated. Emam et al. [52] experimentally proved that inheritance and external coupling metrics are strongly associated with fault-proneness.

On-testing Effort
The basic job of on-testing effort is to identify the important test cases within an existing test suite with the aim to reduce the test cost. In this section, we briefly present the work done on two sub areas: test case prioritization and test case se- An empirical research work [4][5][6] proposed on test case prioritization in which, statement-level and function-level coverage techniques are used for test case prioritization. The basic aim of these approaches is to improve a test suite's fault detection rate and to reduce the cost of regression testing based on total requirement and additional requirement coverage.
Elbaum et al. [5] proposed a metric named as Average of the Percentage of Faults Detected (APFD). The metric is used to measure the ability of rate of fault detection of a sequence of test cases according to a prioritization technique. They used greedy strategy for selecting the test cases from a test suite in regression testing. In the greedy strategy, a test case with the highest statement coverage was selected first.
Each time, after the selection of the best test case (test case with the highest statement coverage) for execution, the remaining test cases were again ordered based on the criteria; the coverage of un-covered statements (the statements that are not yet covered by the already executed test cases). They did not consider the statements which were already covered by the executed test cases. Test case selection and execution, which is iterative in nature was continued in a test suite, till the coverage of each statement by at least one test case. Though, this scheme helps the tester in achieving full statement coverage within a program by using as few test cases as possible, it does not ensure in the improvement of the reliability of a software product using a fixed size test suite. The limitation with the test case prioritization approaches is that the metric, APFD, used for prioritization, gives equal importance to each detected fault. In this technique, the assumption make that all detected faults are of equal severity and all test cases have equal costs, which is not true for a practical application.
To solve this problem, Elbaum et al. [6] extended their previous work [5]  Recently, Bryce et al. [8] proposed various criteria for prioritizing test cases.
These are (i) Parameter-value interaction coverage-based, (ii) Count-based and (iii) Frequency-based. They applied these criteria to some stand-alone GUI and Webbased applications and found that the fault detection rate is increasing over random ordering of test cases.
All these discussed test case prioritization techniques are purely code-based and require the information on previous usage of the system. These techniques are mainly used at the post-implementation phase and used mainly for regression testing.
Test cases can also be prioritized based on the design model. Kundu

Empirical Work on Reliability Analysis
A number of reliability models [9,14,15,57,63] have been proposed for assessing and improving the reliability of a software system over several decades. Some researchers have considered system as a black box whereas, others have included the architecture of the system in their analysis.
Musa [9] is recognized for his work in the field of test suite design using Operational Profile. Operational Profile is a quantitative characterization of how a system will be used. An operational profile is used to guide testing. If testing is terminated and the product is shipped due to crucial schedule constraints, the tester is ensured that the most-used operations will have received the most testing effort. According to Musa, the reliability of a software product depends on how the product will be used by a customer. The testing should be conducted as if the product is in the field.
The chance of failure is high in a module with high execution probability. If a module is executed more frequently, then the probability of activation of any residing error in that module is high, which may cause frequent failures. Based on this idea, he had proposed a technique to prioritize input-domains or fault-regions on the basis of their impact on the overall reliability of the system. His proposed testing method is for both assessing and enhancing the reliability of a system according to user's point of view.
Testing based on operational profile is efficient and effective in revealing bugs (compared to coverage-based testing) that influence the reliability of a system at the operational environment [13]. Cobb et. al [13] had experimentally proved that in terms of Mean Time Between Failure (MTBF), operational profile based testing improves the perceived reliability during operation 21 times greater than coveragebased testing. They also proved that even if the operational profile is not accurate during testing, there is a high probability that MTBF at the operation time will be much higher than that obtained with coverage-based testing or other black box testing approaches.
Not only at the testing time, but also during software inspection, Usage-Based Reading (UBR) [64] technique helps reviewers to find quickly the faults that have the most negative impact on the user's perception on system quality. The use cases are prioritized based on their execution probability and handed over to the reviewers for inspection. UBR guides the reviewers to focus the software parts that are most important for a user. The limitation with these discussed methods is that failure regions for a practical program was decided only considering a black-box approach.
It is a challenging job.
Reliability prediction will be more accurate, if internal structure (interaction among components) of the system will be considered along with the operational profile of the system. Goseva-Popstojanova et al. [65] proposed that, there are broadly two categories of architecture-based analysis such as state-based [14,15,66] and path-based [67][68][69][70]. In a state-based analysis, the probabilistic control flow graph is mapped to a state space model and transition probability between components is decided based on Markov property and operational profile [9]. Cheung [14] has proposed a user-oriented software reliability model, which measures the quality of service that a program provides to a user. His Markov reliability model uses a program flow graph to represent the structure of the system. The flow graph structure is obtained by analyzing the code. It uses the functional modules as the basic components whose reliabilities can be independently measured. It uses branching and function-calling characteristics among the modules, that are measured in the operational environment. Similar structural models have been proposed by Littlewood [15] and Booth [71], to analyze the failure rates of a program. Lyu [66] proposed a structural model for estimating the reliability of component-based programs where the software components are heterogeneous and the transfer of control between components follows a discrete time Markov process. It is assumed that time spent in each state is exponentially distributed.
In a path-based analysis, reliability of each path from singly entry node to single exit node in a control flow graph is computed and average of path reliability is computed for reliability estimation of whole system. Sometimes path-based analysis gives incorrect result due to infinite paths caused by loop. This problem is solved to some extent by Krishnamurty et al. [68]. They have proposed a two phase approach.
In the first phase, the reliability for each component was predicted based on code coverage of that component. In the second phase the reliability for the whole system was obtained by integrating the reliability of components achieved in the first phase.  [57] stated that complexity metric should be a parameter. It helps to estimate the failure rate for initial software fault density.
The advantage of these early reliability estimation techniques [69,70] is that it can extract valuable components at the analysis phase.

Early Test Effort Estimation Methods based on Use Cases
The work on early effort estimation based on use cases was first proposed by Karner [19]. He defined a metric called Use Case Point (UCP) based on use cases to estimate the effort of an application. From that day onwards, a continuous research is going on UML based effort estimation [19,72,73]. A number of technical complexity factors such as distributed system, response, end-user efficiency, easy to install, easy to use etc. are considered along with some environmental factors to adjust the Use Case There is a mapping from use case to test case generation. Currently, the number of test cases for a use case is estimated based on UCP.
In the above discussed work [19,72,73], though the complexity of a use case is considered as a major attribute for effort estimation both for development and testing, the complexity is roughly categorized as simple, average or complex based on the number of transactions or number of scenarios only. As the factor "complexity" plays a major role in estimating the fault proneness of a system, it is directly related to testing and development effort. Hence, the architectural details of a use case should be analyzed for getting the complexity in a quantitative form. Another limitation with these existing work is that these estimation techniques contain a lot of involvement of subjective measures from domain experts and hence, the accuracy of these approaches are doubtful.
Kim et al. [74] could be able to solve these limitations to some extent by proposing an effort estimation approach, in which the UML model is analyzed to get the complexity of a system. It collects data at the analysis stage and considers the use-case Robiolo and Orocosco [75] proposed an effort estimation method through usecase diagram. In their approach, the size of a project is estimated based on two factors: total number of use-case transactions and total number of entity objects.
Finally the effort is estimated through mean productivity value. First a use-case is converted to a textual description and then, basic elements such as function and data 36 are identified. Size of the application is estimated based on the number of module entity objects. They have not considered the architectural details of a use-case and could not estimate the effort required for an individual function.
Similar to the discussed UCP-based testing effort estimation method, Zhu et al. [76] proposed a method to predict the number of test cases for the system from use cases. They considered number of transactions, number of entity objects and some special requirements which are not covered by transactions to estimate the number of test cases for a use case. These effort estimation methods only considered the estimation of high level effort. Furthermore, these testing effort estimation methods [16,17,77] estimate test cases for the whole system not for each individual task unit(use case). These are too abstract for estimation.

Risk Analysis for Testing
Amland [78] has proposed a risk-based testing for large projects. Some risk assessment methods [79,80] have been proposed at the requirement stage based on multiple experts knowledge. These two methods first identify possible mode of failures for a high-level requirement and then try to estimate the impact of these failures on the requirement. Unlike our approach, these risk assessment methodologies do not take any architectural level information and therefore are purely subjective. Hence, these methods are more error-prone due to only human intensive.
Some techniques are available for reliability-based risk assessment based on formal design model [81,82]. Yacoub and Ammar [81] first proposed a risk assessment method at the architectural level using UML models. They have proposed heuristic risk factor associated with a component and with a connector based on dynamic metrics (dynamic complexity and dynamic coupling metrics to estimate the complexity factor of a component and a connector). Then, the risk factor of the system is assessed based on two inputs: abstract intermediate representation of the system called 37

Related Work
Component Dependence Graph (CDG) [70] and risk factor of individual components.
Goseva-Popstojanova et al. [82] have also proposed a similar approach for risk assessment. They have estimated the risk factor for a scenario by the help of component and connector risk factors and Discrete Time Markov Chain (DTMC) with a transition probability matrix P x =|p ij | x , where |p ij | x is the conditional probability that the program will next execute component j, given that it has just completed the execution of component i. They also introduced multi-failure states that represent failure modes with different severities. These two approaches [81,82] are purely analytical and do not take any input from domain experts.
Appukkuty et. al [83] have proposed a risk assessment method by considering possible failure modes of a scenario and computing the complexity of the scenario in each failure mode. Their proposed method is for risk assessment at the requirement level. Similar to our approach, Cortessela et al. [84] have proposed a risk assessment method based on UML models, but their method is assessing performance-based risk, whereas our method is assessing reliability-based risk from UML models.
Smidts et al. [85] have added safety as a characteristic for reliability estimation and define the software reliability is the probability that the software-based digital system will successfully perform its intended safety function, for all conditions under which it is expected to respond, upon demand, and with no unintended functions that might affect system safety.

Summary
We have discussed the work on code prioritization to improve the coverage-based identifying those parts is a big challenge.
The user's view on the reliability of a system is improved, when the occurrence of bugs are reduced from the frequently executed parts of the software [9,13,86,87].
According to Musa [9], removing faults from the frequently executed parts of the source code helps the test manager to achieve high reliability with low cost. However, the length of time a part of the source code is executed does not wholly determine the importance of the part in the perceived reliability of the system. It is possible that the result produced by an element 1 which is executed only for a small duration is saved and extensively used by many other elements. Sometimes, the produced result of a rarely executed element is saved and widely used by a number of frequently executed elements. Hence, an element on which many number of other elements are dependent would have a high impact on the reliability of the system, even though it is itself getting executed only for a small duration.
The degree of coupling is correlated with the criticality of a system [88]. An element 2 which provides a number of services is reusable as it is independent. An element importing services may be difficult to reuse in another context because it depends on many other elements. Coupling is also related to change proneness [89].
An element which is providing services to many number of other elements is likely to change, because it has to adjust to the evolving needs of the dependent elements [88]. The same element is also reusable and changeable 3 . So, extra test resources is required for the element, which is providing services to many number of elements because, bugs in that may be infected highly.
Assuming that all elements are approximately of similar size and complexity, the failure rate of a software product would be disproportionately influenced by the presence or absence of bugs in some elements. These elements either get executed frequently than others during the normal operation of the software or the results produced by these are used extensively by a large number of other elements. Hence, we estimate the criticality of an element on the basis of its two important characteristics such as execution probability and influence toward system failures. The first one is determined through operational profile [9] of the system and the later one by the help of coupling [90].
We introduce a new metric called Influence Metric for an element within a system.
It shows the number of elements in the system that are using the produced result of the given element, directly or indirectly. Our proposed Influence Metric provides the detailed information at the statement level by marking the statements within a program that are influenced by a given element. The Influence Metric that generates inf luence value for an element is used as the measure for criticality computation.
As the analysis is performed at code level, our proposed method marks the nodes (statements) in the source code that are dependent on a given element, directly or indirectly. First, we propose an algorithm to compute the influence value of a method and then, we use it to compute the influence value of a class. We compute the criticality of a class on the basis of its inf luence value, which shows how many nodes are dependent on it and average execution time, which shows how often these dependencies are executed at run time. We prioritize the elements within a system according to their estimated criticality. First, prioritizing the elements within a system and then conducting testing, will promote efficient testing of software by revealing important bugs at the early phase of testing.
The rest of the chapter is organized as follows : Our proposed Influence Metric is discussed in Section 4.1. In the section, we discuss the test priority assignment using Influence Metric. The experimental results are given in Section 4.2 and the chapter summary is given in Section 4.3.

Our Approach
An object-oriented program comprised of a set of classes. A class consists of number of methods. We have proposed an algorithm named MethodInfluence by using forward slicing approach [25] to compute the influence value of a method within a system. Influence value of an element shows the influence of the element towards system failure. We have taken the intermediate representation of the program called Extended System Dependence Graph (ESDG) [28] as an input to our algorithm, Method-Influence. Our algorithm is applied on each method-entry vertex v of a class. The algorithm marks the vertices of ESDG that are dependent on v, directly or indirectly.
We get the influence set(m) for a method that contains the set of vertices that are using the results produced by the method m. Combination of influence set of all relevant methods of a class is the influence set for the class. From the influence set of the class, we get its influence value. This approach statically computes the influence of a class within a whole program. Execution of the program is not necessary. Though, the influence set of a class shows all possible requests to the class for service, but it is unable to show how often these requests are executed in the operational environment.
The reliability of a system is not related to the number of existing faults in the system under test. It is only related to the probability that a fault leads to a failure that occurs during software execution [14]. It is because, the data input supplied by the user decides which parts of the source code will be executed. A bug existing in the non-executed parts will not affect the output. So, it is not sufficient for a class to know how many other classes are requesting services from that class. It is also required to know how often these requests are executed at the run time.
For this, we are extracting the average execution time of a class within a system.
It is obtained through the operational profile of the system. Operational profile is the probability with which different high-level functions (or use cases) are executed during a typical use of a software. Once both the factors of a class within a system, influence value and average execution time are obtained, we compute the criticality for the class within the system. Test Priority (TP) is assigned to a class according to its criticality. TP of an element shows its intensity of testing requirements. Higher is the TP value of an element, more is the test resource required to reduce the system failure rate.

Influence of a method
First, we represent the input program by an intermediate representation called ESDG.
Then, we apply our proposed algorithm on the ESDG to compute the influence of a method in the program. Our algorithm counts the number of nodes marked as influenced by a method m in a program from the data dependent set of that method's formal parameter-out nodes.
The influence value of a method m is expressed as: In this section, we present our algorithm MethodInfluence in pseudo code form to compute the influence value of a method. The notations used in our algorithm are presented below.

visited[i]: It is a Boolean variable which is set to TRUE upon visiting node i. inf luence[i]:
It is a Boolean variable which is set to TRUE when node i is marked as influenced.
queue1: It is a queue that contains the nodes which are to be processed next.
queue2: It is a queue that contains the nodes which are to be marked as influenced.
insertQueue: It is a function that adds nodes to a queue.
deleteQueue: It is a function that deletes nodes from a queue.
T ype(n): It is a function that returns the type of node n out of all possible types in ESDG. The algorithm maintains two queues, queue1 and queue2. queue1 maintains the node that are to be traversed next. It contains the nodes that are in the end of control dependence edges, data dependence edges or parameter-in edges of the visiting node. queue2 maintains the nodes that are to be marked as influenced. It contains the nodes that are in the end of parameter-out edges of the visiting node.

Working of the Algorithm
To illustrate how to compute the static influence of a method within a program, we consider the program and its ESDG shown in Figure 10: n ← deleteQueue(queue1) 11: if Type(n)==method-entry vertex then 12: Traverse only its control-edges and parameter-edges 13: else 14: if Type(n)==call vertex then 15: Traverse its adjacent nodes 16: end if 17: else 18: if Type(n)==polymorphic vertex then 19: Traverse only its polymorphic-edges {each adjacent node of a polymorphic-edge is a method-entry vertex} 20: end if 21: else 22: Traverse only its outgoing data dependence edges and control dependence edges. 23: end if 24: for each adjacent not-visited node w do 25: visited[w] ← TRUE 26: if Type(w)==parameter-out vertex then 27: insertQueue(queue2, w) 28: else 29: insertQueue(queue1, w) 30: end if 31: end for 32: end while 33: while queue2 ̸ = ∅ do 34: n ← deleteQueue(queue2). 35: inf luence[n] ← TRUE and add node n to influence set of the input method.

36:
Traverse the adjacent nodes of n through its all types of edges except the control dependence edges 37: for each not-visited node w do 38: visited[w] ← TRUE 39: insertQueue(queue2, w) 40: end for 41: end while 42 traverses each control dependence edge from the given method-entry vertex and adds the nodes F 1 in, F 2 in, 3 in queue1 and F out in queue2. Now, the algorithm will delete the first element F 1 in from queue1 and checks all its outgoing edges to find any depending node not traversed. Then, it will delete F 2 in and 3 from queue1. The process is continued till queue1 becomes empty. Once, queue1 becomes empty, our algorithm will start deletion from queue2. It deletes the node F out from queue2, marks it as influenced and traverse all the parameter-out edges of the node only. Then, the algorithm adds all the not-visited nodes in queue2. Now, queue2 contains the node A out. Deleting A out and marking it as influenced next, queue2 will contain nodes 10, A1 in and 12. In the similar way, other relevant nodes are inserted in queue2 and deleted from it. At the end, the nodes F out, A out, A1 in, 10 and 12 are marked as influenced. It shows the contribution of add method to the rest of the source code.

Complexity Analysis
If N number of nodes are created in the intermediate graph ESDG for representing the object-oriented program, at each node there can be maximum N − 1 number of edges.
So, the worst case space complexity will be N × Similarly, in the ESDG any edge is visited at most once. So, the time complexity= where E is the total number of edges.

Influence of a class
The nodes in the set inf luence set(c) for the class c is the union of all the sets where, k is the total number of methods in class c. From the influence set, we get influence value by applying Equation 4.1.

Average execution time of a class within a system
A scenario within a system is implemented by the interaction among a set of classes.
The average execution time of a class c i , denoted as ET (c i ), in the system is given by: where nos is the total number of scenarios in a system under test, p(j) is the probability of the execution of j-th scenario within a system and T ime(c i ) j is the total activation time of class c i within j-th scenario.

Computation of criticality
Test effort is assigned to a class according to its criticality. We combine both influence value and average execution time of a class to get the criticality of that class.
Criticality for a class is computed by applying the following formula.
where, inf luence val(c i ) is the estimated influence of class c i towards system failures and ET (c i ) is the average execution time for class c i within a system. Test Priority (TP) is assigned to a class according to its criticality. A class with high T P is critical and hence, requires extra test effort.

T P for various classes of a small program is computed using Equation 4.2 and is
shown in Table 4.1. study, SMA, is implemented in a large supermarket that provides a number of services such as find product, specify the required quantity, specify fulfillment of the product, record customer details, take payment, conform order and print invoice and picking.
These are well explained in [2]. Our third case study, ATM, is an electronic banking outlet, which allows customers to complete basic transactions without the aid of a branch representative or teller. This is an example of a commercial application system.
We present a brief summary of these case studies in Table 4.2, so that the size of each can be well understood. In Table 4

Sensitivity analysis
Using our criticality estimation method, we have investigated the failure rate of an application based on the failure of individual classes with different criticality. We have done it in three phases. In the first phase, we selected the highest priority class  numbers of test cases (randomly selected scenarios) based on operational profile. A test case is responsible for the execution of one scenario 5 . We continued our process by slowly decreasing the reliability of a selected class in a step wise manner and observed the failure rate of the system under test at each reliability point of that class for the same set of test cases. Same process and same test cases were also applied to a class with medium priority and the class with the lowest priority. As the observed failure rates 6 were varied for the same set of test cases, at each reliability point of selected classes (one at a time), we could analyze the sensitivity of a class towards system failure rate. The graphs shown in Figure 4.2 show the failure rates of LMS, SMA and ATM case studies. We obtained the graphs by decreasing the reliability of the highest priority class, some medium priority classes and the lowest priority class (one at a time) of each case study, in a step wise manner. We have considered six classes of each case study including the highest and lowest priority class.
In Figure 4.2, it is clearly shown that, when the reliability decreases for a class with high TP value, the system failure rate increases at a higher rate, but this is not true for a class with low TP value.

Comparison with Musa's approach
We have argued that the classes with high tendency towards system failures are not only identified by their execution time but also by their influence values. Like average execution time, a class with high influence value is also responsible for a high failure rate of the overall system. To validate our claim, we have conducted two experiments on each case study (LMS, SMA and ATM). In the first experiment, Experiment 1, we checked the impact of execution time on system failure rate and in the second experiment, Experiment 2, we checked the impact of influence value on system failure rate.

Result Analysis and Discussion
For LMS, SMA and ATM case studies, the failure rate of the overall system was 58% and 47% and 72% respectively in the first experiment, Experiment 1, when the reliability of the first class, the class with the highest execution rate, was decreased from 1 to 0.5. In the second experiment, we found that the system failure rate was near about 54%, 51% and 69% for LMS, SMA and ATM case studies respectively, when the reliability of some classes out of the selected five classes (one at a time) was decreased from 1 to 0.5. We found that the overall failure rate of ATM case study was the highest in both the experiments, Experiment 1 and Experiment 2. In the case study ATM, we found that the class Withdrawal has the highest execution rate and also has high influence value. So, the failure rate was increased in a high rate, when the reliability of Withdrawal class was decreased.
From Experiment 1, we observed that the failure rate of a system was increased, when the reliability decreased for a class with high average execution time. From Experiment 2, we observed that a class with low average execution time but high influence value was also responsible for increasing the failure rate of the system. From both the experiments, we concluded that the newly introduced factor influence value in our proposed method is also playing a major role in identifying the failure-prone classes whereas, Musa [9] stated that only the frequently executed classes should get extra test resources on the testing phase as they are more failure-prone. As our approach considers both the factors: average execution time and influence value, it exposes the failure-prone classes that are exposed by Musa's approach [9]. Further, our method identifies new failure-prone classes through the newly introduced factor influence value that are neglected by Musa's approach due to low execution time. It is because, some wrongly produced output by a rarely executed class may be used by some frequently executed classes, that makes the failure rate of the system high.

Threats to validity of results
In order to justify the validity of the results of our experimental studies, we identified the following list of threats : X Biased test set design and influencing results.
X Seeding biased errors in various classes of each case study.
X Testing only for selected failures and loosing generality of results.
X Using testing methods which may only be suitable for some particular bugs while may not reveal other common and frequent bugs.

Measures taken to overcome the threats
In order to overcome the above mentioned threats and validate the results for most common and real life cases, we have taken the following corrective measures : X We used same test set in each reliability point of a class for observing failures.
X We used same type of seeded bugs in the classes of each case study.
X We took care that the seeded bugs match with commonly occurring bugs.
X We inserted class mutation operators to seed bugs. Using mutation operators, we can ensure that a wide variety of faults are systematically inserted in a somewhat impartial and random fashion. While traditional mutation operators are restricted to a unit level, class mutation operators [93] for object-oriented programs have impact on cluster level.
X We considered the failures that provide a base to the user to decide how much they can trust the software.

Limitation of our approach
It is not sufficient to assign test priority to an element on the basis of its influence value and its average execution time. A single bug in a class with low test priority value may cause catastrophic failure. As, some classes usually provide exception handling of rare but critical conditions, it is necessary to consider the severity associated with each class by checking the effect of its failure to the system operation.
For efficient testing, the test priority computation should also include the severity associated with the failure of a class. The limitation of our approach is that we have not considered the severity associated with each class.
Another limitation is that though ESDG is simple for representing small and moderate programs, but for a large real life program, ESDG may become too large and complex to manage [26]. Obviously, the storage requirement will also be very high. For large programs, inf luence value of a method may be computed by using traditional fan-in and fan-out metrics [94] in place of ESDG. However, the advantage of using ESDG over the traditional fan-in and fan-out method is that our proposed influence metric will improve the accuracy. It is because, ESDG shows the details regarding the statements that are really affected in the source code, when a method is producing incorrect result. It is because, ESDG shows the dependencies at statement level, whereas fan-in and fan-out show the higher level dependencies at function level/module level.

Summary
We have proposed a new metric called Influence Metric to identify the criticality of an element in the source code. It is based on static analysis of the source code.
The average execution time of a component within a system was estimated based on the operational profile of the system. Criticality for a component within a system is computed on the basis of its influence value and average execution time. Test priority is assigned to the components according to their criticality.
We have experimentally proved that decreasing the reliability of a high priority class drastically increases the failure rate of the application, whereas, it is not true in case of a low priority class. So, the intensity with which each element should be tested is proportionate to its test priority value. It helps the test manager to expose the critical elements before test case generation that are getting less attention in terms of testing. The limitation with our criticality estimation method is that it does not consider any external factor. Our proposed test effort prioritization method will be effective, if the severity associated with various failure modes of an element could be considered. So, we aim at considering the severity in our next work.

Criticality Estimation
First, prioritizing the program elements within a system according to their criticality and then, conducting the testing process will promote efficient testing of a software product by revealing important bugs at the early phase of testing. In Chapter 4, User's perception is an indicator on the acceptance of a system. User's view on the reliability of a system is improved and almost cheaper, when faults which occur in the most frequently used parts of the software are almost removed [9,13,86,87]. The idea behind the consideration of average execution time for a class as a parameter is that when, a class is executed for longer time, there is a high probability that any existing errors in the class will be executed during the run. It will cause the frequent failure of the system.
In Chapter 4, we have introduced a metric called Influence Metric for an element, that shows the degree of influence of that element toward system failures.
The idea of including structural complexity is to estimate the probability of presence of faults within a component. The case studies discussed in [95] show that the residual bugs location is strongly correlated with module size and complexity. For evaluating the structural complexity, Chidamber and Kemerer [41] have proposed six metrics. We found that considering all the six metrics at a time is complicated, time consuming and also sometimes not useful for a particular purpose. At the same time, a single metric is also not sufficient for complexity estimation. At least the use of two or three CK metrics give a proper estimation of potential problems [58]. For our purpose, we are using two CK metrics: RFC and WMC. RFC gives an idea about the longest sequence call of methods and WMC provides the Cyclomatic complexity of each method implemented in a class. Our approach shows the complexity of a class within a program. In addition to that, it also shows the likelihood of the class to fail in the operational environment due to the consideration of the factor, average execution time.

Criticality Estimation
There are some components which exist within a system with low complexity, but the failure of any one of those components may have a catastrophic impact on the system. For example, a critical code may be called in case of an emergency, which happens infrequently but can have catastrophic impact, if an error occurs in that part.
The impact of the failure may cause severe damage to the system or a huge financial loss. So, for computing the criticality of a component, we consider the severity of the damage caused by the failure of the component within a scenario. The severity factor is dependent on the nature of the application. Hence, it is a subjective matter and is basically assessed by domain analyst, who has the knowledge of the environment in which the software will be used. The basic input for severity assessment is the costs of various failure modes. Detailed procedure of severity estimation is addressed in [96].
There is also a close relationship between testing and business value that comes from market or from customers [43]. Each use case of a system should not be treated with equal importance [97]. Once the criticality of a component is estimated through our approach, exhaustive testing has to be carried out to minimize bugs in high critical components. The total test effort is distributed among various components within a system according to their criticality. The component with high criticality will get high priority for testing. As a result, not only the post-release failures will be minimized but also, the severed types of post-release failures will also be minimized within the available test budget.
The rest of the chapter is organized as follows : Section 5.1 discusses the proposed methodology for estimating the criticality of a component within a system.
The experimental studies are conducted to test the effectiveness of our approach.
The experimental results are shown in Section 5.2. The summary of the chapter is discussed in Section 5.3.

Our Approach
Our proposed methodology on criticality computation of a component consists of the

Analyzing the structural complexity
Our aim is to find the complexity associated with a component by analyzing the complexity of various services provided by the component. We consider only two CK metrics (RFC and WMC), out of six metrics proposed in [41]. It is experimentally proved that a component with high RFC and high WMC is fault-prone [98]. Hence, these two chosen metrics (RFC, WMC) are used as inputs to derive the complexity of a component for our purpose. RFC contains a set of member functions directly or indirectly called by the class, whereas WMC is checking the complexity associated with all member functions of a class using Cyclomatic complexity.
RFC metric measures the cardinality of a set of methods that can potentially be executed in response to a message received by an object of that class [41]. In RFC, the basic unit is a method, which refers to the message passing concept in O-O programming. The RFC value for a class is given by where RS, M and R i represent the response set for the class, number of methods in the class and the set of methods called by i-th method of the class, respectively. A class with high value of RFC indicates that the complexity of services provided by the class is high and hence, the understandability is less. When a larger number of methods are invoked from a class through messages, it complicates the testing and debugging process and also it is difficult to change a class due to the potential for a ripple effect. As testing and maintenance is complicated, the chance of getting bug increases. We have derived R i using the intermediate representation, ESDG, of the source code. Our algorithm starts traversing from each method-entry vertex of a class and traverses only the call-edges in a forward direction and generates a set of nodes called by each method of a class. This process is repeated for each method of a class and finally, the sets are merged to get the response set, RS, for the class.
Luke [99] argued that there is really no way to know a software failure rate at any given point in time because the defects have not yet been discovered. According to his statement, the design complexity is positively linearly correlated to defect rate.
Hence, the occurrence of software defects should be estimated based on McCabe's complexity value or Halstead's complexity measure [99]. We consider WMC metric that gives a rough estimation of total complexity associated with a class. WMC metric is correlated with defect rates [58]. It counts local methods and calculate the sum of the internal complexities of all local methods in a class [41]. The internal complexity of each method is decided through Cyclomatic Complexity. WMC value for a class c is given by where, M and W i represents the number of methods in a class and Cyclometic complexity of i-th method, respectively. It helps to evaluate the minimum number of test cases needed for each method and hence, is used as a guideline by test manager to estimate how much time and effort is required to develop and maintain a class.
We estimate the probability of faults in a class based on two parameters: RFC and WMC. First, we assign a threshold value to each metric as defined by Rosenberg et. al. [100]. For each parameter, we use only three weights: low (0.3), medium (0.5) and high (1) [100]. The assignment of points to the three weights is a rough guideline. The following threshold values are assigned to the two parameters as stated in [100]. The complexity information for LMS and SMA case study are shown in Table   5.1 and 5.2, respectively. The classes that are within preferred (acceptable) limit are low (medium) in complexity and the classes which exceed the acceptable limits are high in complexity. Out of these two parameters, if one parameter is in low range

Severity analysis
Severity is a rating which is applied to the effect of a failure. It shows the seriousness/impact of the effects of a failure within a system. Severity of a failure within a system decides how a bug within a component affects the whole system. We have inserted some bugs in various components within a system and executed the system for some duration in the operational environment. We observed that similar types of bugs in different components cause failures with different severity. Hence, we use the severity factor of a component as a measure to the overall quality of the product. We consider that a component is critical, if the failure of the component causes severe effect on the whole system. In our proposed criticality evaluation method, our aim is to first reveal bugs from a high critical component and then, reveal bugs from a low critical component. If there is an urgency to release the system before time or the testing time is shortened due to some unavoidable circumstances then, the test manger should ensure that the bugs responsible for severe type of failures are revealed and fixed.
Though, testing focus should be given to the parts of the code that are executed frequently [9,13,63], however, there is also a need for severity analysis for better quality of a system. Some parts of the source code are executed in case of an emergency. Though these parts execute rarely, the existence of a bug with them may cause a severe failure. For example, let us consider a component which is providing exception handling of rare but critical conditions. In this case, the component is exe- terms. Now, we discuss about FMEA for a software system.

Software Failure Mode and Effect Analysis (SFMEA)
The detailed SFMEA focuses on the classes or modules in which several error conditions are checked. Table 5.3 shows various types of errors that may occur within a software module/class at the design or coding stage.
Ozarin [102] has discussed the advantages of performing SFMEA at various levels: The module may carry out estimations wrongly due to faulty requirements or wrong coding of requirements. Calculation underflow or overflow The algorithm may produce in a divide by zero state Error in data Unacceptable data The module may accept out of range or wrong input data, no data, wrong data type or size, or premature data; produce wrong or no output data; or both. Input data trapped at some value A sensor may read zero, one, or some other value.

Bulky data rates
The module may not be able to handle a vast amounts of data or many input requests simultaneously. Error in logic Wrong or unpredicted commands The module may receive improper data but continue to execute a process. It may be intended to do the proper thing under improper situation/state. Failed to issue a command The module may not call a routine under certain circumstance.

(i) Method-level analysis (ii) Class-level analysis (iii) Module level analysis and (iv)
Package-level analysis. According to him, SFMEA process is accurate and effective at the Method-level, which is the lowest level analysis. The authors of [103] have considered that a method within a software system is equivalent to a part of hardware system in which, there is a chance of failure under certain conditions. It is because, if a method within a class does not perform according to its pre-defined specification then, there is a chance of failure of the whole system under some conditions. At the time of testing, the debugger analyzes the root cause of a bug and extracts the method within a class and specifically the instruction within a method, which one is the source of bug. If any failure occurs at the testing phase then, significant amount of searching is conducted to find the exact faulty parts in the source code and specifically the search is conducted to find the exact faulty instructions of a method.
As the source code is available in this stage, we conduct the operation level or method level SFMEA. During the execution of a scenario, a number of objects communicate through message passing. The message passing mechanism is implemented through method calls. A method within a class may or may not has formal parameters and may or may not has return value. To identify the severity of a class within a system, we have to identify the various types of failure modes within a method of a 60

Our Approach
Criticality Estimation class and also we have to estimate the severity of each failure mode by seeding some bugs, observing the failures and estimating the impact of failures. To estimate the severity level of a failure mode, we take the views of domain experts.

Method level failure modes and effect analysis
A method performing important tasks is generally viewed as an agent, which has to fulfill a contract to perform its operation. There may or may not be any formal parameter in a method and a method may or may not return any value. A method maintains some pre-conditions and post-conditions that explicitly state the agreement of a method for performing a task. A pre-condition is the entry condition to perform a task and a post-condition is a condition that must be true after the completion of the task. Similarly, a class invariant states some constraints that must be true for its objects, at each instance of time during the life time of an object. A method's job is divided into two parts: (i) constraint checking part and (ii) actual logic to perform a task. We assume that there is no time constraint when, a method is performing its task. In this chapter, we consider four failure modes of a method as defined in [103].
These are: 1. Pre-condition Violation Failure Modes (F 1 ): There are two sub-failure modes: (i) pre-condition is not satisfied but its corresponding exception is not raised, F 1.1 and (ii) pre-condition is satisfied but its corresponding exception is raised, ii. m1 invokes m2 by wrong parameters (when m2 contains parameters. We consider only parameter's value not the type), F 3.1.2 .
(b) A method m1 of class A invokes method m2 of class B then, there is a possibility of the following failure modes in the list of failure modes of m1, i. m1 fails to invoke m2 (because of lack of instance of object of class ii. m1 invokes m2 in wrong order (when the invocation of m2 is condition iii. m1 invokes m2 by wrong parameters (when m2 contains parameters. We consider only parameter's value not the type), According to [101], severity is classified as: 1. Catastrophic : A failure may cause death or total system loss.
2. Major : A failure may cause very serious effects. The system may loose functionality, security concerns etc.
3. Marginal : A failure may cause minor injury, minor property damage, minor system damage, or delay or minor loss of production, like loosing some data. 4. Minor : Defects that can cause small or negligible consequences for the system, e.g. displaying results in some different format.
We assign severity weights of 0.25, 0.50, 0.75, and 0.95 to Minor, Marginal, Major, and Catastrophic severity classes, respectively as defined in [81,82]. The damage may be classified to different classes as mentioned above or it may be quantified into money value, whatever the analyst feels better. For example, if a large volume data to be sent by mail are wrong, then the cost of re-mailing will be horrible.
The column Ef f ect shows the effect of the failure mode on the system. Severity is any one severity out of the four severities-Catastrophic, Major, Marginal and Minor-discussed above.

Business value estimation
For ATM, the main use cases are deposit, withdraw, inquiry balance and transfer money. The business value (Value) for ATM is estimated in 2. Extracting slices of various scenarios from the CDD and using the slices for estimating the business value for a component. We view a CDD as a simplified form of a System Dependence Graph (SDG) [29] but, a CDD does not have as many types of edges as SDG [29]. Unlike SDG, a CDD does not represent the individual statements of a program because, inclusion of individual statements makes the graph unnecessarily complicated. In Extended

Component Dependence Diagram
Control Flow Graph (ECFG) [104], a node refers to a method of an object-oriented program whereas, in CDD, a node refers to a component. The aim of referring a node of CDD to a component instead of a method in an object-oriented program is to make the graph simple and easily understandable. We compare CDD with the Component Dependence Graph (CDG) proposed by Yacoub et al. [70] where, nodes refer to components. They have adapted control flow graph principle to represent the dependency between two components and possible execution paths. Unlike our approach, Yacoub et.al. [70] have considered only control dependency between components. In their approach, the components are assumed to be independent. The existence of bug in one component is not responsible for the failure of another component. As we have considered the data dependency between two components, a bug in one component may have an effect on other components.
. . print(o2); } int d [10]; The CDD, generated in our approach, satisfies all the following constraints: X No node is isolated.
X All use cases put together cover all nodes.
X No self loops.
X The node at which a use case starts execution is not control dependent on other nodes of the graph.
X The nodes tested by any one test case are a subset of nodes belonging to slice of a scenario.
Once the intermediate graph, CDD, is constructed, we use it to extract slices with respect to various scenarios for prioritizing the components within a system.

Extracting slices of CDD with respect to various scenarios and estimating the Value of a component
Each use case has one main scenario and a number of alternative scenarios. We only consider the main scenario and do not consider the alternative scenarios. The Value of a use case is same as the Value of its main scenario.
We compute the slice S i of the CDD with respect to scenario S i and represent it as Slice(CDD, S i ). The slice contains the set of components that are either executed during the execution of the scenario S i or the results of the components which are saved in different variables, used during the execution of S i .

Value estimation scheme
Once, the business values for all scenarios of a system are determined, we estimate the business value, V alue(C i ), for a component C i , as follows.
Where, q j = V alue j , if C i ∈ slice(S j ) else, q j = 0. In Equation 5.1, nos is the number of scenarios within a system, V alue j is the probability of j-th scenario and slice(S j ) is the slice of the CDD with respect to j-th scenario. The priority value of a component intuitively indicates the priority of its being used during an actual operation of the program. We now algorithmically present our business value estimation scheme for various components within a system.

Value estimation Algorithm
1. Construct the CDD of the program.

2.
Determine the business value V alue i for each scenario S i within a system.

3.
For each component C j of CDD do, V alue(C j )=0.

4.
For each scenario S i of the program do (a) Compute the slice of scenario S i from the CDD.

Print the Value computed for each component.
We now explain our Value estimation method using a simple example. Let us assume that the program shown in Figure 5.1a has two use cases: U 1 and U 2 . Each use case has only one scenario. Let, the business values associated with scenario S 1 and S 2 be 0.8 and 0.2, respectively.

Criticality computation
For assigning criticality, the commonly used method is to do a proper weight assignment and then, calculate a weighted sum for a class [105]. We assign a relative weight for each chosen factor of a class. For each factor, we assign equal weight. The weight may vary depending on the nature of the system. An example of criticality computation for a component is shown in Table 5 There are a lot of technical, productivity and environmental complexity factors that exist within a system. For simplicity, we consider only five factors for complexity estimation. Consideration of a number of factors improve the accuracy of criticality estimation method but, it will make the process complicated and confusing. In Table   5.6, we assign weight of 1 to each complexity factor. It may vary from application to application and it is purely a subjective matter.

Experimental Studies
We have applied our proposed complexity estimation method and prioritized the test effort according to their estimated complexity on LMS, SMA and ATM case studies.
These are implemented in Java and are introduced in Chapter 4. We have used fault seeding for evaluating the effectiveness of our proposed approach. It has been shown that, fault seeding is an effective practice for measuring the testing method efficiency [106]. We have carefully chosen some mutation operators to seed bugs randomly. The fault density is considered as a constant equal to 0.05 for each case study. This means that in a case study consisting of 1000 number of lines, 50 number of bugs were inserted. The seeded faults are either class mutation operators [93] or interface mutation operators [22,107]. The class mutation operators are targeted at object-oriented specific features which Java provides such as class declaration and references, single inheritance, interface, information hiding and polymorphism. In this chapter, we have considered four class mutation operators to simulate the faults. These are:  [108], which is based on client-server concept. In the coupling based approach [108], when a client class calls another server class, first some method sequences of the client class are considered.
These method sequences are subset of the set of method sequences decided at unit testing. Then, for each method sequence, the method sequences of the called class (server) are decided. At a time, one server class is considered for each client class.
For one client method sequence, there can be number of server method sequences. In this level, the testing will be effective, if the method sequences of the client class will be complete. As we have tested thoroughly the classes with high criticality at the unit level in the second copy of a case study, we have considered the coupling-based integration testing [108] to cover all the possible interface faults of critical classes.
We have taken the help of a coverage analysis testing tool JaBUTi [109] for getting the coverage report of a test case. The example of the coverage report by two test cases at the unit level through JaBUTi is shown in Figure 5.  The mutation score S, for a test set T, is defined as follows.
M utationScore(S,T) = #dead mutants #mutants seeded − #equivalent mutants Table 5.7 shows the mutation score of generated test sets by two different testing methods. In Table 5.7, it is observed that M S T and M S P are nearly equal. In LM S case study, mutation score of the first testing method is high, whereas in SM A and AT M case studies, the mutation score of the second testing method is high, in which our method is applied. We observed that our method is also equally competent with the first testing method in finding mutants. As we consider average execution time, influence value and severity for test effort prioritization in addition to Mutation score by second testing method in which the components were prioritized based on our proposed approach.
structural complexity, we claim that our method exposes the important bugs, which are responsible for frequent failures or severe failures. We conducted another set of experiments to check the types of failures observed in the operational environment.
After resolving the detected bugs, we found that some residual bugs are existing in both the copies of the case studies. A few bugs were detected toward the end of testing, which could not be fixed due to the shortage of testing time. At this point, we again emphasize the fact that our aim is not to achieve complete fault-coverage with a minimal test suite size. We fixed a test budget for each case study before the testing phase and our aim is to ensure the efficiency of both testing methods within the available test budget. Therefore, after the completion of testing phase, we observed the effect of those residual bugs in both copies of each case study by invoking random services. For this, new system level test cases were randomly generated based on operational profile [9] for observing the behavior of the system at post-release stage.
At this point, we did not fix any detected bug. Analytical comparison of the two testing methods were done by running the same input set on the results obtained by the discussed testing methods. The tested source code of each case study were again executed to test their behavior at the operational environment.

Result analysis
The results of our simulation studies are summarized in Through a detailed analysis of the results of both testing methods, we conclude that our proposed test effort prioritization method helps to minimize the post-release failures of a system and also helps in minimizing the catastrophic and major types of failures at the operational environment. As a result of this, user's perception on overall reliability of the system is improved. The efficiency of our proposed method will be improved, if we run the software for long duration by taking a number of test cases based on operational profile.  We observed that the performance rate is drastically increased by our method, when the system is executed for a long time, in all the three case studies.

Summary
We have proposed a criticality estimation method at the code level and prioritized the test effort for various elements within a system according to their estimated crit- Test effort prioritization technique helps the tester to do the best possible job within the available test resources [105]. The tester gets the best possible chance to reveal the important bugs. Important bugs are those that reside within critical functions and modules of the system Our aim is to identify the criticality of a component before the testing phase and allocate test effort to the component according to its criticality. If bugs from the critical components are detected and fixed during testing, the post-release failure rate of the system will be reduced. The importance of a component may vary at different points of the testing phase. If a component has failed in past, then there is a possibility that it will fail in near future [54,110]. Hence, we analyze the failure history of a component within a system and use it as a factor for estimating the test priority of the component in the next phase of testing.
We propose a multi cycle-based test effort prioritization approach to test the basic functionalities of the system. We institute three different test cycles meant to focus on different aspects of the quality of the system: (i) coverage of critical components, In the existing prioritization-based testing methods [4][5][6], the priorities are assigned to the test cases and the priority assignment is done only once for the entire duration of the test. Unlike the existing approaches, we assign priorities to the program elements instead of assigning that to the test cases. We also assign different priorities to the same program element at different test cycles. A stipulated time period is set for a test cycle. The duration of a test cycle may vary under certain circumstances, but the duration of the entire testing time is fixed.
In the first test cycle, we estimate the criticality of a component within a system on the basis of its influence value, severity and execution probability. In our previous work (Chapter 4), we presented a static metric to compute the influence value of a class within a system. Dynamic metric captures the dynamic behavior of an application and helps the analyst to make a good test plan. In this chapter, we propose an algorithm to get the influence value through dynamic analysis of source code. We assign test effort to the components according to their estimated criticality. Test pri-   [20]. Hence, from a business point of view, test effort distribution based on the return on investment will be effective. In the third test cycle, we first prioritize the use case scenarios within a system based on their business values and then conduct only system testing.
We apply our proposed multi cycle-based test effort prioritization approach on LMS, SMA and ATM case studies. These are already introduced in Chapter 4. We illustrate our proposed approach through the case study ATM. Figure 6.1 shows the communication diagram of withdraw use case of ATM. We consider it as an running example in next section.
The rest of the chapter is organized as follows. We discuss our proposed multi cycle-based test effort prioritization approach in Section 6.1 and present the experimental studies in Section 6.2. We give a summary of the chapter in Section 6.3.

Our Approach
It consists of the following steps: Below, we discuss each step in detail. The first one is collected at run time and is used to compute the dynamic influence of an object. In object level, our algorithm shows an object is providing services to which objects and how many times to each object within a scenario. In this level, we check how many objects are using the given object, directly or indirectly, within a scenario. At statement level, our algorithm shows how many statements are affected by the given object out of the total number of statements executed by the test case.

First test cycle
For simplicity, we assume that one use case consists of one scenario; Only the main scenario is considered, the alternate scenarios are not considered. At run time, our approach on Dynamic Influence Metric maintains all dynamic informations such as the occurrence of object creations, deletions, invocation of various methods, attribute references etc.
The successful execution of a method is dependent on the corresponding state of its object. For any unpredictable behavior of a method, it is required to check the consistency of its corresponding object's state. Our object slicing approach helps the tester to check the state space before and after the execution of a method through its data members. Our approach acts as an active monitor and reports the objects which are responsible for changing the state of the corresponding object within a scenario. Our dynamic slicing approach overcomes some limitations of the existing graph reachability methods for slicing [28,31,38]. The main limitation in these existing slicing methods is that when the slicing criteria changes, we have to again start from the slicing point. The slices for different variables at different nodes are obtained by traversing the graph several times starting from the slice point.
The advantage of our slicing approach is that, the previous results that are saved in memory can be reused instead of starting from the beginning every time. The dynamic slice of an object at any execution point is the combination of dynamic slices of its data members. Mund and Mall [111] have proposed an inter procedural dynamic slicing algorithm to compute the dynamic slice of procedural programs. The advantage of their method is that, the previous results that are saved in memory are reused instead of starting from the beginning every time. They have not considered the object-orientation aspects. We have extended their work to get the dynamic slice of object-oriented programs. With this new dynamic slicing approach, we compute the Dynamic Influence Metric that gives the influence value of an object by checking its contribution at every execution step.
We propose an algorithm called Influence Through Dynamic Slice (ITDS) to compute the influence value of an object within a scenario. The rationale behind this algorithm is to prioritize the regions of the source code for testing because, some components of a program are more critical and sensitive to bugs than others, and thus should be tested thoroughly. In this section, we first provide the definitions used in our algorithm and then, present our proposed algorithm, ITDS. We also explain the working of our proposed algorithm through an example.

Definitions used in the algorithm
Before presenting our proposed algorithm, we first introduce a few definitions that are used in the algorithm. Def (var) and U se(var) represent the set of nodes in the intermediate graph that are used for defining and using the variable var, respectively.
During the execution of a program, a statement always corresponds to a node n in CDG. In the rest of the thesis, we use the terms vertex and node interchangeably. At the execution of a return node n,

Def. 1. RecDef V ar(v) and RecDef Control(n):
where var 1 , var 2 , · · · , var k are the variables used at node n and S is the most recently executed control node under which node n is executing.

Def. 5. F ormal(n, f ), Actual(n, a):
When a function is called at node n, some parameters may be passed by value or by reference. If at the calling node n, the actual parameter is a and its corresponding formal parameter is f then, Actual(n, a) = f ⇔ The examples for each of the above definition are given later in this section through an example program (working of ITDS algorithm).

Algorithm ITDS
The input data are provided to run the program and the name of the desired object is provided to calculate its influence. ITDS provides two outputs: Control Dependence Graph (CDG) [112] is constructed. We store the frequency of use of a node by a given object at run time, as there is a difference between a node used by ten different objects and a node used by an object ten times. At the execution of a scenario, our algorithm maintains the set of objects that are dependent on a given object and computes the influence value of the given object within the scenario.
During the execution of a program, we maintain a set of dependent nodes for each variable that are used during program execution. Our algorithm, ITDS, checks whether the currently executed node is using the desired object, for which the influence value is computed. The currently executed node will be added to the influence set of the desired object, if it uses any node from the dependence set of the desired object. We use the data structure named Active object set to get the list of currently executed objects at any instance of execution. When a method is invoked, all the data members of the corresponding object are passed as call by reference. Now, we present our algorithm, ITDS, in pseudo code form.

Algorithm: ITDS(CDG, Object O) {
Input : CDG of an object-oriented program and the desired object for which the influence value will be calculated. when a control node is executed, the algorithm maintains the set of nodes on which the control node is dependent. If a node is a call node or return node, it performs some operation before the execution of the node and also performs some operation after the execution of the node. When a new object is created due to the execution of a call node, which calls to a constructor class, the algorithm maintains dependent list for each data member of the object from that execution point. The algorithm maintains a list of objects that are interacting at any execution point. The algorithm maintains a stack to store the nesting of calls, which is updated before and after the execution of a call node. The algorithm updates the data structure of the formal parameters with that of the actual parameter.
Once the data structures are updated after the execution of a node, the algorithm performs a set of operations to check whether the currently executed node will be included in the influence list of the given object. When a slicing command is given to get the dynamic slice of an object, the algorithm computes the dynamic slice of the object by taking a union of the dynamic slices of its data members. After the execution of the program is completed, the algorithm checks how many nodes are executed and out of that how many nodes have used the given object.

Working of ITDS Algorithm
Consider the example program shown in Figure 6.2a. The Control Dependence Graph (CDG) of the example program is shown in Figure 6.2b. During the initialization step, the algorithm sets CallSliceStack=∅ and ActiveCallSlice =∅. We have run the program with input 12 for n and computed the influence value of object bx after the execution is completed. Now, we have the followings for some executed nodes of the program. A node will be added to the influence set of an object, if the node is using a node that belongs to the dynamic slice of that object. We have checked this in Statement

Criticality computation for a component
We We obtain the normalized criticality of a component within a system using Equation 6.2. Now, we apply our proposed approach on ATM case study. For simplicity, we consider only the main scenario of a use case. As, we consider only one scenario of a use case, the execution probability of a use case is assigned to its main scenario. We assume the execution probabilities of various use cases of ATM as given in Table 6.2. We have computed the criticality of various components within the whole system using Equation 6.3 and observed that the components Session, NetworkToBank, CardReader, Withdrawal and CustomerConsole are critical than others, within the ATM system.

Priority assignment and testing
In this cycle, we prioritize the components within a system according to their criticality. At the unit level, the percentage of code coverage for various components are decided based on their priority values. For example, 100% statement coverage and 90% decision coverage may be conducted for the highest critical component whereas, it may be less for a component having low criticality. Similarly, at the time of integration testing, 90% parameter and 80% interface coverage may be conducted for a high critical object whereas it may be low for others.
At the time of system testing, the test cases are selected keeping in mind that the high priority components will be executed a number of times compared to others.
Hence, the cost of a test case 1 is considered as the sum of the criticality of various components that are covered by the test case. The cost of a test case T i , denoted as Cost(T i ), is expressed as follows.
where, exe set(T i ) is the set of components that are covered by T i and priority(C k ) is the criticality of k-th component, C k , in the set exe set (T i

Second test cycle
There We first extract the objects one by one according to their priority and compute the dynamic slice of the object based on our proposed ITDS algorithm, discussed in Section 6.1.1. It gives us the dependent objects of the said object. During the testing phase, we give importance not only to the failed objects but also to the set of objects that are dependent on the failed objects. It is because, these objects might be infected through the failed objects.
As the priority criteria is changed in this cycle, the priority values of some com-

Third test cycle
In this cycle, we conduct a value-based testing [20] with an aim to get a high return on investment and to improve the customer satisfaction on testing. To conduct a value-based testing, it is required to know the business value associated with a high level requirement/feature. A feature is a characteristic or attribute of a product for which work must be done to develop it and deliver it. A feature within a software provides some business value. A feature of a product is delivered to the customer with a hope to get some benefit for a reasonable cost. For a feature, the value is roughly defined as the amount the stake holder is willing to pay for the implementation of the feature.
Business value is estimated based on the relationship among satisfying needs, expectations and the resources required to achieve them [20]. According to Wiegers [44], Business Importance shows the business value of a feature. It is the weighted sum of two factors: the benefit of including a feature within a system and the penalty of not including the feature within the system. Benefit is associated with the requirements of the product's business. Penalty is associated with the consequence that the customer or business would suffer if the feature is not included. Both the benefit and penalty are judged by the customer representatives of the software. For example, failing to comply with a government regulation could acquire a high penalty even if the customer benefit is low. The set of requirements with a low benefit and a low penalty add cost but little value. As benefit and penalty are two factors associated with Business Importance, we define the weights of the features as a vector [1].

Weights of Business Importance
where W b and W p specify the weights associated with benefit and penalty, respectively. The Total Business Importance is defined as follow: T otal Business Importance = W b × Benef it + W p × P enalty (6.5) We get the Total Business Importance on the basis of individual ranking of the benefit and penalty of a working feature. We get the normalized Business Importance for a feature by normalizing the Total Business Importance. Figure 6.3 [1] shows the process for estimating the business importance for various features within a product.
In the figure, Business Importance index shows the variations in business values of a product over a period of time.
The business values for various use cases of ATM are shown in

Experimental Studies
We have implemented our proposed multi cycle-based test effort prioritization approach on three case studies and checked the effectiveness of our approach by comparing it with a related approach. We empirically evaluate our approach through ATM, LMS and SMA case studies, explained in Chapter 4.
In order to verify the effectiveness of our approach, we have carried out a series of experiments on the case studies. It has been shown that mutation testing is an effective practice for measuring the efficiency of a testing method [106]. A mutant is said to be killed when it is executed by a test case and the test case fails. We have selected seven number of class mutation operators in our experiment from the mutant model [93]. These operators are mainly designed to modify object-oriented features such as inheritance, polymorphism, dynamic binding and encapsulation.
The mutants are selected after a very careful consideration of various types of unit level and integration level faults that may occur during source code implementation.    Table 6.5 shows the fault detection capabilities of the two different prioritization methodapproaches, our approach and Musa's approach. From Table 6.5, it is observed that more mutants were killed in our approach than Musa's approach. As our approach considers the influence value of an object and also gives importance to the faulty objects from the test history, more number of faults were detected in our approach than Musa's approach. However, it is not  true that a testing approach which is effective in detecting faults is also effective in improving the reliability of a system. The reliability of a system is not related to the number of existing faults in a system under test, but related to the probability that a fault leads to a failure which occurs during software execution [9,13,14]. It is because, the data input supplied by the user decides which parts of the source code will be executed. An error existing in the non-executed parts will not affect the output.
We conducted a series of experiments for assessing the reliability of the outcome of the two discussed approaches after completion of the testing processes. The software reliability of a system, R, is calculated as given below.
where, p i and Θ i represent the execution probability and failure rate of i th sub-domain respectively.
Θ i is computed as follows: where, z ij represents the execution result of a test case which is selected from i th sub-domain for j th time. The value of z ij is 1, if a failure is observed else the value is 0. n i is the total number of test cases selected from i th sub-domain and ∑ m i=1 n i = n, where n is the total number of test cases executed in the system and m is the total number of sub-domains of the input domain. Table 6.6 is a subset of test cases that are designed for ATM case study. Table   6.7 shows the reliability computed for the tested source codes that are obtained by using our approach and Musa's approach, respectively. In this table, Copy1 is the source code that is tested by Musa's approach [9] and Copy2 is the source code that is tested by our multi cycle-based approach.

Result analysis and discussion
From Table 6.7, we observed that high reliability is observed in Copy2 compared to Copy1, in each case study. We discuss some situations in which the testing based on Operational Profile implemented in Musa's approach is not giving good result.
For example, consider a situation. Suppose, there is a fault in a method m which is executed for a short duration. The return value of m is saved and used by some frequently executed methods of other components. If method m returns a wrong value then, the failure of the system will be high. In this situation, the fault in method m may not be detected as the method is getting less attention in Musa's approach due to low execution probability. It is better explained through the graphical representation of a simple example instead of going to the details of the case studies. Consider the sequence diagrams shown in Figure 6.6. Suppose the execution probabilities of SD1 and SD2 are 80% and 20% respectively. The average execution time of class D is the lowest as it is used only in SD1, but the influence value is high as it is providing services to a number of classes. As shown in Figure 6.6a, the returned value of class D is used in class B, C and A, directly or indirectly. If class D will return incorrect value, the highly executed classes A, B and C will be affected. It will increase the failure rate of the overall system. Class D is getting less attention in Musa's approach due to its low execution time. As we are considering the influence value of an object as one factor for test effort prioritization, class D is not neglected in our test approach.
It gets appropriate test effort. Another thing is that we have not explicitly shown the are only allowed to access journals and transactions. Due to an ICE mutant, the system allowed a non-teaching staff to issue a journal.
As we included severity analysis as one attribute for testing and considered the business value of a scenario, such types of major failures were not observed in the tested source code obtained using our approach. Now, we discuss one minor failure that was observed in Copy 2 of ATM case study. Copy 2 was tested through our approach. We have inserted an invalid card. The system has opened the transaction screen and allowed for a transaction instead of ejecting the card, though any transaction was not performed with the invalid card. This is shown in Figure 6.7.
From the log file shown in Figure 6.7b, it is observed that neither the deposit nor the  withdraw transaction is performed in Card# 4, but the system is not ejecting the card after recognizing an invalid card. As shown in Figure 6.7a, only the card was ejected when, the user did not want to continue any transaction further.

Summary
In this chapter, we have proposed a multi cycle-based test effort prioritization approach for improving the reliability of a system. Our aim is to minimize the critical faults in a system which are responsible for frequent or severe failures in the opera-

Ranking Use Cases for Testing
A use case is related to a set of requirements. Cockburn [113]  As the execution probabilities of both the sub-domains are nearly equal, each will get almost equal test effort though, the second input sub-domain is more failure-prone than the first input sub-domain.
A complex program might contain more number of faults compared to a simple program [55]. As the factor complexity is the most important bug generator, the complexity metric is used as a parameter for testing [56,57]. The complexity can vary from one use case to another. In a moderate size application, a simple use case generally takes at most 5 number of steps for its success scenario and its implementation also involves less number of classes. A complex use case takes at least 10 number of steps and its implementation also involves a number of classes. The job of a test manager is to estimate the complexities associated with various use cases and consider the complexity as an important factor at the time of test planing. Though, the estimation of complexity for high level functions at the analysis stage is a tough task, it is better to estimate it as early as possible and refine it in the low level rather than delaying the test estimation and proceeding it in an unplanned fashion. Though, the complexity of a use case is related to its fault density, the observed failures within a system are also related to the execution probabilities of various use cases that lead a fault to a failure. The main objective of software testing is to improve the reliability rather than to detect defects. For this, the test cases should be selected based on both the criteria: (i) defect distribution and (ii) how the software is used. Defect distribution is estimated based on the complexity of the system and the expected use of a software is decided based on the operational profile of the system.
To identify the failure-prone use cases, we consider the execution probability of a use case along with its estimated complexity and call it Occurrence Complexity (OC).
There is a close relationship between testing and business value of a high level function that comes from market or from customers [43]. Each use case of a system should not be treated with equal importance [97]. Keeping this in view, we propose a test effort prioritization method to estimate the test priority for a use case within a system on the basis of its factors-(i) complexity, (ii) execution probability and (iii) business value (Value). The use cases of a system are ranked according to their priority values. Our proposed prioritization method provides a path to discover the truly critical use cases. This ranking method helps the developer and the test manager to take a decision on test effort distribution in a critical environment, where the customer's expectation is high on the overall quality of the system, timelines are short and resources are limited. It is observed that some use cases with high complexity are less valuable to the organization. The balancing strategy is to assign less effort to low ranked use cases. It can save the resources which can be used for high ranked use cases. As the important use cases are getting a chance to be tested rigorously through our proposed approach, the reliability of the system under test is improved.
The rest of the chapter is organized as follows:

Complexity Factors
When the test plan is made before coding, at the design level, the test manager considers the architecturally relevant aspects. The difficulty lies in analyzing all the architecturally relevant aspects of a use case and ranking it appropriately.
We propose the following eight factors that affect the complexity of a use case.

Sum of complexities of Linearly Independent Paths within a SD
Cyclomatic-complexity is defined as the number of linearly independent paths 1 in a graph. There is a strong correlation between the cyclomatic-complexity measure and the number of bugs in a program [99]. In this section, we generate the Control Flow Graph (CFG) of a SD and count the number of linearly independent paths within the SD and then, estimate the complexity of each path in terms of test effort required.
The sum of complexity of all linearly independent paths is the complexity of the SD of the use case. The test manager allocates test effort to the use cases based on their estimated complexity. Higher the complexity of a SD, more test effort is required to test it.
First, the Control Flow Analysis (CFA) of the SD is performed to get the CFG.
CFG is used to extract the basic individual paths within a SD. It is the source of estimation for testing. It is named as Concurrent Control Flow Graph (CCFG) instead of CFG, due to the presence of asynchronous and parallel messages within a SD [114]. At the time of execution of a synchronous message, the caller waits for the reply message from the callee. The caller could not initiate any message in between, but in an asynchronous message, the caller does not wait for reply message. LogRegister, need to be updated when a book is issued. The controller sends two messages in parallel, one to Book and one to LogRegister object. Now, these two messages run concurrently. So, the CCFG of a SD is affected by the interaction operator par, which causes at least two concurrent threads of control.
In CCFG, each message of SD represents a node. Once the CCFG of a SD is generated as shown in Figure 7.2, we extract all linearly independent Concurrent Control Flow Paths (CCFPs) of the CCFG for testing. CCFP is a control flow path with extra feature: sub-paths are added in CCFG due to concurrent control flow.
Parallel and asynchronous messages cause concurrent control flow at the execution of a scenario within a SD. The concurrency within a CCFG is identified through fork and join nodes. In a CCFP, an open and close parenthesis represent fork and join nodes respectively. A CCFP within a CCFG is a path which includes all sub-paths going out from a fork node. It includes a path from the start node to the end node containing all residing nodes in the path. There can be a number of CCFPs in a CCFG. In our example, as there is no condition in the SD (see Figure 7.1b), we get only one linearly independent path in the CCFG shown in Figure 7.2 and call it ρ1.
It is given below.
(a) SD with parallel message (b) SD with asynchronous message  Further, after a bug is fixed, it is also a difficult job to ensure that the bug is corrected truly and not simply masked. The concurrent bugs are categorized as race conditions, incorrect mutual exclusions, and memory reordering. We cannot immediately observe the consequences of a race condition. It might be visible after some time or in a totally different part of the program. There is also a need to synchronize the operations between threads. For this, extra overhead is required. As extra test effort is required for a concurrent node, we assign high weight to a concurrent node compared to a simple node, at the time of calculating complexity for a path. We assign weight of 5 to a concurrent node (node under a fork) and weight of 1 to a simple node. The complexity of path ρ1 of Figure 7.2 is calculated as 1 * 5 + (3 + 2 * 5) * 5 = 70. The complexity of a CCFG is the sum of complexities of its paths. It is given by where, Complexity(CCF P i ) is the complexity of CCF P i of the CCFG and n is the number of paths in CCFG.
Now, we consider some use cases of LMS case study.

Number of Test Paths generated within a SD
We consider all possible test paths within a SD as an influencing factor for complexity computation. Each test path is covered by an individual test case. The amount of test effort required for a use case is decided on the basis of the number of test paths generated within the SD of the use case. We consider the total number of possible interacting modal class [115]. It is given by the following equation.

Number of Critical Messages transmitted within a SD
There are certain messages within a SD that are critical to the sender [116]. The failure of services for those messages may lead to catastrophic consequences. Therefore, we should check the severity associated with a message within a SD for test effort prioritization. We check how the failure of receiver affects the sender, within a SD. The value returned by the receiver may be used by the sender for taking any important decision. The returned value may be used in some computations in which the inaccuracy may lead to catastrophic consequences. There are some messages within a scenario that are providing exception handling of rare but critical conditions. Though, the execution probability of that messages are low, the failure of any one of them may cause a sever loss to the system. Therefore, we consider the severity associated with a SD through the criticality of messages. We assign severity to a message within a SD on the basis of how the system operation is affected by the failure or incorrect services provided by the receiving object of the message. At the analysis stage, the critical behavior can be identified from domain experts or customers. It is traced to use cases and then to SDs to identify elements of the system that need to be analyzed in depth and need to be tested thoroughly.
For example, Figure 7

Length of the longest Maximum Message Sequence (MMS)
For the successful execution of a scenario, it is required to know (i) What other classes might be affected when, one class is not behaving properly or returning wrong value?   makes the event e2 to trigger, either directly or indirectly. At the time of testing an event, we have to test the events which are in the dependence set of that event. We identify the interaction faults through the dependence set of an event. For this, it is required to know the transitive dependencies among objects within a SD. It is easy to detect a fault in a direct method call, but difficult in indirect case. This indirect dependency can be extracted from the flow of messages within a SD. The flow is well understood from message sequences.
First, we define a Message Sequence (MS) within a SD. Then, we define Maximum Message Sequence (MMS). A MS is a concurrent sequence of messages (call message or reply message) within a SD having the first message is a synchronous call and the last one is the reply message corresponding to the first one [116]. A MMS is a message sequence that is not a subsequence of any other message sequence within the SD [116]. All possible MSs for the SD shown in Figure 7.1b are {m1, r1}, {m4,   r4}, {m2, m3, m4, r4, m5, m6, r5, r2} and MMSs are {m1, r1}, {m2, m3, m4, r4, m5,   m6, r5, r2}. The MS {m4, r4} is not a MMS because it is included within another MS. In the given SD (see Figure 7.1b), the length of the longest MMS is 8 ({m2, m3,   m4, r4, m5, m6, r5, r2}). There exist context-sensitive dependencies among objects within a MMS which show both direct and indirect interactions.
The longest MMS within the SD of use case Issue Item (see Figure 7.5) is as follow.
In this M M S, there are two fork nodes. The longest sub-path in the first fork node is 4 (m17, m18, m19, m20) and in the second fork node, each concurrent subpath has equal length of 3. Hence, the longest MMS is 9 for the said use case.
A MMS with high value indicates that the dependency among objects is high. A fault in one can be easily infected to other dependent objects, which increases the probability of system failure.

Number of External Links used in a SD
In a distributed system, the communication reliability is critical in unsafe environments. It is required to estimate the probabilities of failures for connectors at the analysis stage for an effective testing. Consider the SD shown in Figure 7.1b. Suppose objects o1, o2 are residing in node1 and object o3, o4 are residing in node2. node1 and node2 are linked by a network. This is shown through a deployment diagram in Figure 7.10. It is assumed that the probability of connector failure is zero for the objects in same node. The probability of connector failure between o1 and o2 is zero, whereas there is a probability of connector failure between o2 and o3 also o2 and o4. So, the probability of failure is high, when a number of messages are transmitted through connectors in a use case. For the SD shown in Figure 7.1b, out of 10 messages, 3 messages (m3, m5, r5) are transmitted through networks. When the data is transmitted through network, extra test effort is required to check any network problem. So, the complexity of a SD based on number of connectors used within a SD is expressed as Connector Complexity is given by:

Number of Polymorphic Calls within a SD
A polymorphic call can be identified within a SD through Class Diagram (CD). A polymorphic behavior occurs at runtime, when the sub-classes override at least one of the method of the base class. Testing the polymorphic behavior within a scenario requires extra test effort. The complexity of a use case depends on the number of polymorphic messages that are transmitted through the SD of the use case. The generation of test cases to test a SD which contains polymorphic calls, require manual efforts [117]. In a polymorphic interaction, new test sets are generated for both inherited and overriding methods. The behavior of the program is not predictable due to run time binding, which makes the testing process difficult [117]. Polymorphic interactions are of different types, such as simple polymorphic interaction, parameterinfluenced polymorphic interaction and configuration-influenced polymorphic interaction etc.

Simple polymorphic interaction
In this case, the instance of a derived class is directly passed as a parameter. The parameter directly controls the polymorphic behavior. It is easy to test this. The test must consider at least one instance of each class (base and derived classes) as a parameter in the call. An example of a class diagram is shown in Figure 7.11. For  Figure 7.12a. (Figure 7.12 is taken from [117]).
It is simple to determine the test cases for this SD. The possible test cases are an instance of a Book, instance of a ComputerCD, instance of a MusicCD and instance of a DVD.

Parameter-influenced polymorphic interaction
It is explained through an example. Consider the SD shown in Figure 7.12b. The possible test cases are identification number of a Book, a ComputerCD, a MusicCD and a DVD. Comparing to the test cases of Figure 7.12a, these are abstract. For generating the test cases for the SD, extra information is needed to get the identification number for an instance of each sub-class. Manual effort such as the data from domain expert is required to identify the exact appropriate test input values. Due to this, testing this type of polymorphic interaction is less likely to be automated and hence, requires extra test effort.

Configuration-influenced Polymorphic Interaction
For testing this type of polymorphic call, it is necessary to change the configuration of the system to various states. The initial system state and environment are changed again and again for setting different configurations. Consider the SD shown in Figure   7.12c. The method getTopselling() returns an instance of the best selling product, which could be a concrete sub-class of class Product. This polymorphic call is based on the external set up of the system. Parameters of the interaction has no effect on it. The possible test cases for this are obtained by setting the configuration of

Architectural Dependencies among use cases
The use cases of a system can be ordered as per the business logic of the system.
In a set of ordered use cases, one use case starts execution after the completion of its preceded use cases. There is a requirement of logical progression for tackling the use case that makes sense to the sequence. If the pre-condition of a use case is same as the post-condition of another use case then, use cases can be ordered to execute sequentially. For example, in LMS case study, a book cannot be deleted if, it is issued. So, for deletion of that book, first Return Item use case is called and then, Remove Title use case is called. UML-stereotyped association precedes is used for this relation. A use case may be followed or preceded by a number of use cases.
Order Flow Graph shows the dependencies among use cases within a system.
Sometimes, a preceded use case has to execute a number of times to satisfy the pre-condition of a particular use case. For example, consider the LMS case study.

Computing Complexity and Test Priority
In Section 7.1, we have discussed a list of eight factors that influence the complexity of a use case. Normally, a weight is associated with each factor, reflecting how much it affects the complexity. In this section, we compute the complexity of a use case and then, assign priority according to its estimated complexity, execution probability and business value.

Computing the complexity of a use case
The factors that affect the complexity of a use case are already discussed in Section 7.1. In this section, we first compute the complexity of a use case based on the above discussed factors. The complexity of a use case U i is computed as follow.
In this equation, W i represents the relative weight and c i is the estimated value of i-th complexity factor of a use case. The assignment of weights is a subjective matter. It may vary from analyst to analyst. The weight is not static, it may be adjusted and re-calibrated to suit a project's specific needs. The test manager accompanied with key people associated with development is responsible to decide the weight for each complexity factor. Our approach helps to estimate the value for each complexity factor. A value of '0' indicates no influence of the complexity factor on the use case.
Once the weight and value for each complexity factor of a use case is decided, the test manager estimates the complexity for the use case by applying Equation 7.5.
Complexity is related to the fault-proneness of a system. To estimate the failureproneness of a use case, we include its execution probability along with its complexity.
We define the Operational Complexity (OC) for a use case U i based on its execution probability p i and estimated complexity. The OC of use case U i is:

Computing test priority
We consider the Operational Complexity and the Value (business value that comes from customer and market) of a use case along with its estimated complexity for assigning test priority within a system. We compute the Test Priority (TP) for a use case within a system by applying the following formula.
In this equation, T P (U i ) is the test priority and OC(U i ) is the Operational Complexity associated with use case U i . V alue(U i ) is the estimated business value of use case U i . Business value estimation process is discussed in Chapter 2 (Background).
The normalized test priority, N T P (U i ), of use case U i is given by: In equation 7.7, n represents the total number of use cases within the system.
Once the total test cases T for a system under test is decided by the testing team, the number of test cases will be allocated to a use case U i is N T P (U i ) * T .
We have implemented our approach on LMS. The use case diagram of LMS is already shown in Figure 4.1c. Various use cases of LMS with their execution probabilities are shown in Table 7.4. For each high-level function (use case), we collected the information from the users such as the librarian, the library-incharge and students regarding the benefit of implementing the function and the penalty associated with not implementing that function. We have identified sixteen use cases in the LMS case study. We have developed a prototype tool called Complexity Factor Estimator (CFE) for automating three influencing factors (factor 1, factor 2 and factor 5 ) out of the discussed eight factors in Section 7.1. Other factors are estimated manually.
CFE is implemented using Java. The input to our tool, CFE, are the SD of a use case and the state chart diagrams of all interacting components within the SD. The design artifacts are produced in MagicDraw [118]. First, the UML diagrams are exported in XMI format through an existing XMI parser and then, the XMI format is taken as an input to our tool, CFE. The high level design of our tool, CFE, is shown in

Experimental Studies
In order to verify the effectiveness of our approach, we have carried out a series of experiments on the source code of the LMS. We have seeded 36 number of faults randomly in the source code of LMS. It has been shown that fault seeding is an effective practice for measuring the efficiency of a test method [106]. There is a number of interactions among components in an object-oriented software. So, there are opportunities for integration or interface faults. The seeded faults are of integration level faults. We assume that a rigorous unit testing has been done by the developers.
These seeded faults could not be detected through a rigorous testing at the unit level.
The various types of faults that we have considered in our experiment are discussed below.
1. Three types of interface mutation operators [107] such as Direct variable replacement operator, Indirect variable replacement operator and Return statement operator are seeded. [22] were inserted such as Missing transitions, Incorrect transitions, Unspecified event, Incorrect state of the sender object, Incorrect state of the receiver object, Message passing with incorrect/invalid value of arguments. The last one is just explained here. Suppose a message is passed with an incorrect argument or an invalid argument. An object O i is sending a message m i (a1, a2, a3). Instead of passing the correct value of a1=x1, it is passed with the value a1=x2, where the value x2 is an incorrect or invalid data. Four number of faults from each discussed type were inserted randomly.

Six types of state-based integration faults
We made three copies of the source code and applied three different types of testing methods. Foe each testing method, the test time was fixed to 36 hours based on the test budget, size of the source code, total number of use cases, total number of classes, total number of scenarios and total number of object-points. The first row of Table 4.2 shows a brief summary of LMS.
We conducted prioritization-based testing based on our proposed use case ranking approach on the first copy of the source code and called it Ranked testing. We applied coverage-based testing without any ranking information to the second copy of the source code and called it Unranked testing, in which equal importance was given to each use case. We conducted testing based on the operational profile designed for the system to the third copy of the source code and called it Semi-ranked testing.
The aim of Unranked testing is to cover high percentage of source code and fix as many bugs as possible based on the assumption that fewer bugs are consistent with higher reliability. The aim of Ranked testing is to rigorously test the parts of the source code that implement high priority use cases based on the assumption that the reliability will be improved in a higher rate, if high priority use cases will be tested thoroughly. So, in Ranked testing, we give effort to a use case based on its priority value as estimated in our proposed approach whereas, in Semi-ranked testing, we give test effort to a use case based on its execution probability. It has been shown by many researchers [9,13,87] that the user's view on the reliability of a system is improved when, faults which occur in the most frequently used parts of the software are almost removed. Keeping this in view, in Semi-ranked testing, we focus test effort on the parts of the source code that are executed frequently.
Both in Ranked and Semi-ranked testing, operational profile is used for testing and the operational profile is accurate as the system is an existing system. Hence, it is assumed that both the testing methods (Ranked and Semi-ranked) could be able to detect the important bugs at the early phase of testing, that are responsible for frequent failures. After the allocated test time was over, we felt that some bugs could not be fixed due to shortage of time, in each copy of the source code. It is because, some bugs were detected during the last stage of testing, that could not be fixed in stipulated time period. Table 7.7 shows the number of mutant bugs detected in three testing methods.  to different testing results. Faults that were detected through Unranked testing method was higher than the faults that were detected through the Semi-ranked testing method. It is because, we iterated a lot in the frequently executed parts of the code and gave less attention to others in this method. Though, Ranked testing method could not detect the maximum number of defects as in Unranked testing method, it detected the maximum number of critical defects as the severity of a message was considered as a factor for complexity computation. It was also found that the fault detection rate in Unranked testing method is nearly linear whereas, in Ranked case, the fault detection rate is high during the early stage of testing.
The seeded faults detected in the Ranked testing method was higher than that of the Semi-ranked testing method. The complexity is linearly proportional to defect rates [99] and our approach emphasizes the complexity feature of a use case as one input for ranking. In fact, our aim is to improve the reliability of a system. A software testing method that is efficient in finding faults may not improve the reliability of a system [9,13,119]. Our next job is to go for reliability assessment.
The tested source code that were obtained by three testing methods-Ranked, Semi-ranked, Unranked-again tested for reliability assessment. Here, the assumption is that the effect of all types of failures are same, which is practically not true. Some failures have very negative impact on the customer and on the system. A failure could be catastrophic or critical or major or minor [101]. Reliability of a system is assessed by checking how many test cases are executed and out of that, how many test cases failed. In each experiment, we run the three different testing results (tested source code obtained from the three different testing methods) n times according to the operational profile. The value of n varies from experiment to experiment.
The defects that caused failures were not fixed and the reliability was estimated.
shows the reliability obtained on LMS by applying two testing methods for assessing reliability. In the table, R rt and R adpt represents the reliability assessed by random testing and the reliability assessed by adoptive testing. The tested source code by the three discussed methods: Ranked, Unranked and Semi-ranked are executed for reliability assessment. Table 7.8 tabulates the experimental results of software reliability assessed for three discussed methods-Ranked, Unranked and Semi-ranked-based on random testing and adoptive testing. From Table 7.8, we observed that in adoptive testing, the variance is very less compared to random testing. From the reliability values assessed in various experiments, we found that the code tested through Ranked method observed the highest reliability and the code tested through Unranked method observed the lowest reliability in both the testing methods-adoptive and random testing-for test suites of different sizes.

Result Analysis
The observed reliability is the lowest in the code tested through Unranked method.
It is because, some residual bugs were found in the frequently executed parts of the tested code. Though, some of these failure causing bugs were detected at the time of testing but, some of these detected bugs were not fixed due to detection at the later phase of testing. This problem though was not observed in the code tested through Semi-Ranked method but, the reliability is not high compared to the code tested through Ranked method. It is because, the testing was done based on only operational profile. The complexity factor was not considered for testing. Hence, some seeded state-based integration faults could not be detected in the semi-ranked testing method.
Though, we have not analyzed the impact of failures on the system at the time of reliability estimation, but it is observed that some serious-failures 3 were observed in the tested source codes through Unranked and Semi-ranked methods. It is because, we have neither considered the Value associated with a use case nor the criticality of messages within the SD of a use case for testing in Unranked and Semi-ranked methods. As these two major external factors-Value and criticality of a messageare considered for prioritizing use cases in the Ranked testing method, the faults which may cause failures with high negative impact on the user were almost detected

0.8142638
Code-Ranked:Code tested by Ranked testing; Code-Unranked: Code tested by UnRanked testing; Code-Semi-ranked: Code tested by Semi-Ranked testing method.
R i : The reliability obtained at i-th run; R-:= and corrected at the early phase of testing through the Ranked method.

Summary
We have proposed a test effort prioritization technique at the architectural level. Our approach is ranking the use cases of an application according to their complexity and business value. For this, we first developed a technique to compute the complexity of a use case quantitatively. Our approach for complexity calculation is purely analytical.
For achieving high reliability, the degree of thoroughness with which a use case to be tested is made proportional to its priority value. We have conducted experiments to check the effectiveness of our approach and experimentally proved that assigning test efforts to a use case based on its execution probability only, is not sufficient for ensuring the quality of a system. Consideration of both structural and behavioral dependencies within a use case along with its execution probability for computing test priority is a powerful way to improve the quality of the system.
In this chapter, we have not considered the risk associated with a use case. The stakeholder of a software system feels that the measurement of quality of the software system through risk is significant than other factors such as expected number of residual bugs or failure rate etc. Keeping this in mind, we propose a novel risk analysis technique in the next chapter, that works at the software architecture level.

Chapter 8 Analyzing Risk at Architectural Level for Testing
The approach proposed in this chapter is two fold. In the first phase, the risk is estimated for components, use case scenarios and the overall system. In the second phase, risk-based testing is conducted, in which the test priority is assigned to various elements according to their estimated risk.
The existing work on software reliability estimation [9,14,15,57,63] do not consider the impact of failures observed at the execution of a software system. Due to the availability of design models, stake holders are now getting the opportunity to estimate the reliability quantitatively at the analysis and design stage and hence, the risk associated with the software before its implementation.
Risk analysis is conducted in a software application to assess the damage during use, frequency of use and to decide the probability of failure by looking at defects.
There are several types of risks such as reliability-based risk, availability-based risk, acceptance-based risk, performance-based risk, cost-based risk, and schedule-based risk. In the thesis, we are mainly concerned with the reliability-based risk, as our aim is to improve the reliability of a system, within the available test resources. It is the probability that the software product will fail in the operational environment and the adversity of that failure.
In order to save the time and cost in the software development life cycle, there is a requirement of an effective decision-making for allocating resources to various high level requirements. For this, there is a need to assess quantitatively the risk associated with high level requirements as early as possible. Researchers [81,82,84] have proposed risk estimation models by gathering data at the requirement stage and analysis stage. As the analysis and design stage is critical compared to other stages, assessing risk at this stage is beneficial to the stake holder. Detecting and correcting errors at this stage is less costly compared to later stages of SDLC. For estimation of risk at an early stage, the important feature is to design a model to predict the dynamic aspects of a system. Un-reliability at different states of a component within a scenario may affect the failure rate of the scenario differently.
If the failure probabilities of an interacting component in various states within a scenario are well known, it is easy to analyze their effect on the system behavior.
The risk for two different states of a component may vary within a scenario. A fault within a state of a component may be the reason for component failure and the failure of a component/connector may be responsible for a system level hazard [101]. We predict the dynamic aspects of a system and assess risk through the data collected at the detailed design stage. We consider the risk associated with active resources.
As connectors are passive 1 in nature, we have not considered any connector faults.
It is assumed that connectors are 100% reliable. Unlike the existing work [81,82] on risk estimation at the architectural level, we introduce the risk associated with TEst Model (SCOTEM) [115]. We are primarily motivated by the need to generate a list of scenarios ranked according to their estimated risks. This ranking technique provides a path to find truly critical system functionalities. Assigning test effort to various scenarios based on their estimated risks helps the tester to detect important faults at the early phase of testing. Once the risk for various scenarios within a system are estimated, the risk for the overall system is calculated based on two parameters: (i) estimated risks of various scenarios within a system (ii) list of scenario transition probabilities within a system.
The rest of the chapter is organized as follows : Our proposed risk estimation method is described in Section 8.1. The efficacy of our approach is evaluated in Section 8.2 and the summary of the chapter is given in Section 8.3.

Risk Estimation Method
Our proposed risk analysis method first estimates the complexity associated with individual state of a component. Then, it iterates on a scenario and estimates the severity associated with various states of an interacting component within the scenario. Based on the complexity and severity, it estimates the risk. Our approach estimates the risk for the scenario through the help of an existing state-based integration model called SCOTEM [115] and the estimated risks of various states of the components within the scenario. Our approach helps to carry out a sensitivity analysis for a scenario and generates a list of critical components that are responsible for increasing the risk of the scenario. Finally, we estimate the overall system risk on the basis of risks associated with scenarios and scenario transition probabilities.
For calculating the overall system risk, we use Interaction Overview Diagram (IOD) that represents scenario specifications. The procedure of our proposed methodology is shown in Algorithm 2 and a detailed description of the procedure is explained in subsequent sections.

Algorithm 2 Risk Analysis Procedure
1: for each component do 2: for each state do 3: estimate complexity through ISDG.

4:
end for 5: end for 6: for each scenario do 7: for each active state of an interacting component do 8: assign severity.  identify a list of critical components within the scenario through sensitivity analysis. 13: end for 14: rank scenarios based on their estimated risks. 15: estimate overall system risk using IOD and scenario risks. 16: identify a list of critical scenarios within the system through sensitivity analysis.

Quantifying the complexity for a state of a component
In this section, we propose a method to compute the complexity associated with a state of a component at the architectural level. In a sequence diagram, the interactions among components are represented through event/action pairs. An interaction within a sequence diagram is mapped to an event in a state chart diagram. When, an event is invoked by a component within a scenario, it may trigger an action, which in turn may trigger another event in another component. The event/action interaction describes how invocation of a function in a component affects other components. For example, in the well known case study, LMS, consider a situation when a borrower reserves a book. First, a new object of Reservation component (New state) will be created. The newly created Reservation object triggers a message to Borrower object and to Book object, simultaneously. By getting the message, the Borrower object will change its state to NonResearvable state from Active state (The business rule of our case study says that a borrower can reserve only one book) and the Book object will change its state to Reserved state from Issued state (Please refer Figure 7.7).
Similarly, consider another situation, when a book is returned while it is in Reserved state. First, there will be a transition in the Book object from Reserved state to we draw ISDG for our case study LMS and discuss it in detail.

Inter-component State Dependence Graph (ISDG)
We use the concept of Bayesian-model [121]

Complexity computation
From the architectural analysis of a software system, we found that a transition in one So, the complexity associated with Borrower.U1 is 1+1+1+3+3=9.

Severity analysis
In this section, we use a method based on three hazard techniques [96]  Hazard analysis is done at functional level (top level) through FFA [122]. It shows the possible ways of system failures. First, we identify all possible system level hazards 9 .
SFMEA identifies the component level failures and their effect on the system. While FFA needs abstract functional description, detailed architectural design is required for SFMEA. There is a requirement of cause and effect dependency for predicting the likelihood of system failure from the likelihood of component failure [123]. SFTA is conducted to find how the failure of a lower level element is responsible for the failure of an upper level element and finally, the failure of a scenario.
FFA is the first step of severity analysis. The input to a FFA is the list of external events that occur between external actor and the system. For this, we use System Sequence Diagram which consists only the messages of a sequence diagram that occur between an external actor and the system [83]. In this case, the system is treated  We have created an entire fault tree for each use case of the LMS through the FaultCAT tool [124]. The advantage of this tool is that it allows user to draw and edit the fault tree and calculate the probability of failure of intermediate nodes automatically. It also converts the fault tree into an XML form which helps us to check the consistency with SFTA through Java programs. Let us discuss the use case Issue Item. The system level hazards associated with Issue Item use case are as follows: The root node is "Failure to issue book" and the nodes in first level are the four hazards described above. The second level contains the nodes which contribute to the hazards, etc. We have not presented the entire fault tree due to space reasons.
A piece of it with its XML form is represented in Figure 8

Risk computation
In the section, we first estimate the risk for a state of a component within a scenario and then, compute the risk for the whole scenario. We combine the complexity and severity associated with a state to compute the risk for the state. Heuristic risk for a scenario is computed by considering two parameters (i) estimated risk of various states of interacting components within the scenario (ii) SCOTEM [115] of the scenario.  For execution of scenario S x , all the interacting objects those are modal will be in some specified states. The probability of occurrence of a state in a component box is the sum of probabilities of the paths from init node to that state.
For example in our LMS case study, for the successful execution of scenario Issue Item (U,B), the initial state of the requested Book object B will be either in Available state or in Reserved state and the initial state of the Borrower object U will be either in Active state or in NonReservable state. We have drawn a SCOTEM for use  Table 8.3. Hence, the total

Estimation of risk for the overall system
We use scenario-based specifications as an input for estimating the risk for the overall system. Scenario specification is the composition of a set of scenarios possibly from an user. For details, the reader can refer to [125,126]. The software industry is widely accepting the scenario specifications as these are well suited for describing the intended behavior of the application in abstract form. Rodrigues et al. [125] have modeled scenario specifications through Interaction Overview Diagram (IOD). IOD shows the flow of control among scenarios and the starting state and the end state of the flow which is executed by an average user. In UML 2.0, each activity node of an IOD is a sequence diagram. IOD shows the probability of transfer of control from a scenario to all adjacent scenarios. The transition probability P T S ij between two scenarios represents that the system will execute scenario S j after executing scenario S i . Rodrigues et al. [125] have done a sensitivity analysis and made it clear that the system reliability is sensitive to (1) the component reliabilities, and (2) the scenario transition probabilities. Based on this, we use scenario risk and scenario transition probabilities to estimate the risk for the overall system. We have already discussed our proposed method above for estimating scenario risk. The information about scenario transition probabilities are derived from operational profile [9] of the system.
Each path from starting point to end point in an IOD shows the probability of invo- The risk of the overall system, Risk(Sys), is estimated as follows: Our risk estimation procedure is not amenable to full automation. Construction of ISDG is semi-automatic in our approach. Automatic construction of ISDG is a complex activity in terms of data collection. It is hard to extract all possible state transitions for a non-trivial system. Unfortunately, the severity analysis techniques discussed in the paper are not fully automatic as they involve the user in analyzing the various ways of failures of components / system and determining their effects. The limitation of severity analysis is that SFTA may produce hundreds of combinations of events causing system level failures in a complex system. The analyst / programmer is concerned with what the software is used to perform, whereas SFTA forces the analyst / programmer to estimate the possibility of undesired events within a system and their contribution to system failures. The effort to estimate these may be expensive and time consuming. The skill of the analyst plays an important role for the severity analysis process. Finally, we say that a huge investment is required in order to run our analysis as we require data from a number of UML diagrams and conduct more than one hazard techniques for severity analysis.

Complexity analysis of risk estimation approach
The complexity of our risk assessment procedure shown in Algorithm 2 is dependent

Experimental Validation
In this section, we have conducted two experiments to evaluate the efficacy of our approach. The aim of the first experiment is to cross check the estimated risk with the actual failures observed in the system. The aim of the second experiment is to prove that (i) our risk analysis method drives the tester to increase the fault detection rate, (ii) our approach helps in detecting the important faults that are responsible for severe failures. The experiments are conducted on the source code of LMS.

Experiment 1
We performed the following steps in this experiment to cross check the estimated risk with the actual failures observed in the system.

1.
Step-1: We applied random testing to test the software. The detected defects were fixed. We recorded the defects found in various scenarios to cross check with the estimated risk.

2.
Step-2: We generated a set of test sequences based on the operational profile of the system and executed the tested software for each test sequence. We assumed that the failure rate would be high for a scenario with high risk. As our aim was to check the failure rate of a scenario, in this step, we did not remove any detected defects that were responsible for failures. The failure rate of each scenario was estimated within a test sequence.

3.
Step-3: We calculated the average failure rate of a scenario within a system.
The failure rate of a scenario S j in a test sequence, Seq i , is Θ ji . Θ ji is computed as follows: (8.5) z ji represents the execution result of scenario S j in the test sequence Seq i . The value of z ji is 1, if a failure is observed else the value is 0. n ji is the total number of times scenario S j is executed in the i-th test sequence. The following assumptions were taken for the experiment.
1. After the execution of a scenario, the system state may be changed. So, the same scenario may be executed a number of times, but in different system states within a given sequence.

2.
A test case that is designed for a scenario either pass or fail. If a test case is blocked, we first, correct the code and then, consider it in our experiment.
3. The output of a selected test case at the current time is not affected by the test results of previously executed test cases.
We executed the system 10 times for 10 test sequences of different length. Risk is the normalized risk and fourth column F R is the normalized failure rate of a scenario, within the system.

Discussion
In Table 8.5, we observed that a number of faults were found in majority of high risk use cases and the failure rate was also high for those use cases. It is because, the number of faults detected within a scenario is related to the complexity of the scenario and complexity is an input for risk estimation. The scenarios within which a number of inter-component state transitions have been occurred were more faultprone than others. From the table, we observed that the fault detection rate was high in use cases Issue Item, Renew Item and Return Item. We also observed that the failure rate is not linearly proportional to the estimated risk of the system, always.
We found that the failure rate is also high for some use cases with low risk. It is because, the third assumption, the output of a selected test case at the current time is not affected by the test results of previously executed test cases, may be a threat to the validity of our approach. It is also observed from the experimental result that the failure rate is low for some high risk use cases such as Return Item use case. It is because, only failure rate is considered in this experiment, but the risk is actually estimated as a combination of failure rate and severity of failures.

Experiment 2 (Comparison with related work)
We compare our objectives with the existing work on model-based risk analysis techniques according to the six criteria as defined in Table 8.6. In our approach, the smallest individual element for which the risk is assessed is the state of a class whereas, it is a class itself in other two approaches. The other advantage of our approach is that we have done a bi-directional analysis to check the consistency of failure modes in various levels and also extracted any missing failure mode which was not analyzed. We also consider the risk for the whole system on the basis of scenario transition probabilities and risk of scenarios, whereas it is assessed by CDG in [81] and average of risk of use cases in [82]. In Table 8.6, the approach proposed by Goseva-Popstojanova et al. [82] is just an extended version of the approach proposed in [81]. we have seeded 43 number of faults in the source code of LMS after the completion of unit testing. The seeded faults are of integration level faults. We assume that a rigorous unit testing has been conducted before error seeding. The various types of faults that we were selected in our experiment are discussed below.
1. Three types of interface mutation operators [107] are seeded. These are IMO1: Direct variable replacement operator, IMO2: Indirect variable replacement operator and IMO3: Return statement operator .
2. Six types of state-based integration faults [22] were inserted such as SF1: Missing transitions, SF2: Incorrect transitions, SF3: Unspecified event, SF4: Incorrect state of the sender object, SF5: Incorrect state of the receiver object, SF6: Message passing with incorrect/invalid value of arguments. These faults are already discussed in above section.
Details of bugs seeded to LMS are shown in Table 8.7. We made three copies of the source code and allocated three different testing approaches for testing and debugging at the higher level. Our aim is to check which testing method is efficient in minimizing the post-release failures and also the types of failures which have a negative impact on both the system and the user. The test time was fixed to 36 hours for each method on the basis of the test budget, size of the source code, total number of use cases, total number of classes, total number of scenarios and total number of object-points. The first copy was tested by our proposed approach called State-based Approach in which the supplied use cases are sorted in a prioritized order according to our calculated risk. The second copy was tested by an approach called component-based Approach in which the use cases are sorted according to the risk calculated by the approach of Goseva-Popstojanova et al. [82]. Both the approaches allocated test effort to a use case based on its estimated risk. The third copy was tested by Randomized Approach, in which a tester gives equal importance to each use case. For simplicity, we have considered only the main scenario of a use case which shows the successful execution of the use case. Our experiment was aimed at investigating the following queries: 1. Q1: Does our approach guide the tester in improving the test efficiency by detecting more number of important faults than the related approaches? 2. Q2: Does our approach help in improving the test efficiency by increasing the fault detection capability?
3. Q3: Does our approach guide the tester in detecting certain types of faults compared to other two approaches?

Experimental Result and Discussion
The experimental results are shown in Table 8.8. From the table, we observed that the generated test scenarios in our approach uncovered several state-based integration faults which could not be detected in other approaches. It is because, we have tested the components which are responsible to change the states of other components during run time whereas, the method proposed in [82] tested the components in which a number of intra-component state transitions occurred during run time. The test priority was assigned to the components based on the number of intra-component state transitions in [82]. Any bug related to that can be easily detected in rigorous unit testing but, it requires extra test effort to identify the bugs related to intercomponent state transitions. As state transition concept was not used in Random based Approach, it detected the lowest number of state-based integration faults. From  Table 8.8, we answer to the above stated queries Q2 and Q3.
1. Ans2: Yes, our proposed state-based risk analysis approach guides the tester to improve the test efficiency by increasing the fault detection capability.
2. Ans3: Yes, our proposed approach guides the tester in detecting certain types of seeded faults compared to other two approaches. State-based integration faults were detected through our approach as the state complexity was taken as one input for risk estimation of a scenario, in which both intra and inter-component state transition dependencies were considered.
To answer the first question, we had gone for another level of testing. After the detected faults were debugged, we run the three tested copies to test their behavior in the operational environment. We used the same test cases for each tested copy. This time the test cases are designed based on only operational profile. We assume that a test case either fail or pass. This time a failed test case is not corrected, only the action is taken to execute a blocked test case. We counted the total number of postrelease failures and the impact of those failures on the system and the user. Table 8.9  shows the result of our risk-based prioritization approach. The failures shown in the table were obtained after the completion of testing phase; at the operational environment. From Table 8.9, we observed that there is no Major type failure observed in the copy of the source code of LMS that is tested by our proposed state-based approach, whereas 2 and 4 numbers of Major type failures were observed in the source code of the LMS that were tested by component-based and randomized approach, when the number of test cases were 300. As shown in the table, Major type failures were also found in the tested copy of component-based approach, though risk analysis was conducted before testing. It is because the dynamic complexity of a component proposed in [82] did not help the tester to detect the state-based integration faults. The example of one such Major type failure observed in the case study LMS is described below.
Though the scenario Issue Item was executed successfully when, a borrower requested to issue a book which was already reserved by him however, we observed that the same borrower could not reserve any book further. It is because, due to a seeded bug, the system could not change the state of the borrower to Active state from NonReservable state, after the execution of Issue Item scenario. The severity of this failure is assumed to be Major, as the same borrower cannot reserve any book further.
Now we answer to query Q1. Ans1: Yes, our approach helps to improve the test efficiency by finding bugs that are responsible for severed failures such as Catastrophic and Major types.

Applicability
Risk analysis is a part of safety engineering. Errors related to the temporal behavior of a safety-critical system is hard to detect during testing. These errors may lead to severe failures such as causing severe harm to the life of people or equipments or environment. Our risk analysis approach is mainly applicable for pre-testing analysis of safety-critical systems such as software systems embedded in medical devices, nuclear power station, telecoms systems and industrial robots etc. These embedded softwares are of different sizes and different complexity. The basic principle of a safety-critical system is to keep the system as simple as possible.
Constructing ISDG is complicated and the severity analysis process is time consuming for a large and complex system, but it helps to detect the important errors at the early phase of testing and deliver the product with right quality within the limited budget and time. Our risk analysis approach ranks the components/scenarios within a system for testing according to their estimated risks. There are some components/scenarios with high risk and low execution time. Though their contribution to the overall system risk is less, more testing is needed for that items as they check the exception handling of critical conditions. Our approach also identifies the contribution of a component/scenario risk for increasing the risk of the whole scenario/system through the sensitivity analysis.

Summary
In this chapter, we have proposed an analytical method for risk estimation of a software system at the architectural level for testing. The approach proposed in the chapter is two fold. In the first phase, the risk is estmated for components, use case scenarios and the overal system. In the second phase, risk-based testing is conducted, in which the test priority is assigned to various elements according to their estimated risk.The data collected from UML diagrams: sequence diagrams and state chart diagrams are used for risk estimation. We have also considered the operational profile of the system to know the transition probability between any two scenarios.
Compared to the existing work on software risk estimation, our proposed method is a new one that considers (i) risk associated with various states of a component rather than the whole component within a scenario (ii) additional valuable information required for severity analysis of a component such as message criticality and bidirectional analysis to extract possible types of failure modes within a scenario. We have experimentally proved that, testing process is efficient when the testing team is guided by our approach compared to the approach proposed in [82].
We have explored some test effort prioritization issues at various levels of software development life cycle. Our proposed approaches identified the program's critical paths in which the impact of failure is high. At the implementation level, we have exposed the critical components that are responsible for increasing the system failure rate. At the architectural level, we have proposed novel methods to compute the complexity and risk associated with various high level functions within a system. As our approaches expose the critical elements at the architectural level, the testers and the developers are guided to produce a high quality software, within the available test resources.

Contribution
In this section, we summarize the important contributions of our work. There are five important contributions: (i) Computing the influence of a component toward system failures (ii) Computing the criticality of a component using both internal and external factors (iii) Improving the software quality using a multi cycle-based testing approach (iv) Estimating the criticality of a use case at the architectural level (v) Estimating the risk associated with various states of a component within a scenario, the risk of a scenario and the risk of the overall system.

Computing the influence of a component
We have proposed a framework to prioritize the components within a system according to their influence toward the system failures. For this, we introduced a metric

Computing the criticality of a component
Prioritizing the program elements within a system based on only influence value and average execution time may not help to expose all the important bugs during testing.
So, we have included some important factors for exposing the critical components within a system. We have computed the criticality of a component by adding two external factors: severity associated with each failure and the business value associated with various high level functions within a system and one internal factor: structural complexity, to our previous work. From the experimental results, we observed that by allocating test effort to various components according to their estimated criticality helps in decreasing the failure rate of the application as well as the chance of getting severe failures in the operational environment.

Conducting multi cycle-based testing
We have proposed a multi cycle-based test effort prioritization approach, in which the priority values of various components/scenarios change between test cycles within a system under test. In this work, we introduced the concept of Influence Metric through the dynamic slicing approach and used it as one input for prioritizing components within a test cycle in a sequence of many test cycles. From the experimental results, we obtained that the test cases generated through our multi cycle-based testing approach could uncover some important bugs that could not be detected in Musa'a approach [9]. As our approach considered the influence value of a component within a scenario during run time, the components providing a number of services got high test priority. We also assigned priority to the components based on their failure history. These factors helped to improve the reliability of the system under test, within the available test resources. We considered the business value associated with use case scenarios as a factor for prioritization in the third test cycle, which helps to increase the customer certification on the tested system.

Estimating risk at the architectural level for testing
Risk assessment at the early stage of software development helps achieving high level of confidence in a system and saves the cost and time during software development life cycle. We have proposed a novel risk analysis technique that works at the software architecture level. The main idea is to rank the components within a scenario and to rank the scenarios within a system according to their estimated risk. Unlike the existing work on risk assessment at the architectural level [81,82], our work assesses risks at a finer granularity level. The efficacy of our approach is evaluated on the Library Management System case study.

Future Work
We briefly outline the following possible extensions to our work.
1. Prioritization-based testing covers two aspects: (i) prioritizing the program elements for testing and (ii) prioritizing the test cases. In the present work, we concentrated on the first one, i.e. prioritizing the program elements for testing.
Automatic selection of test cases from a pool of test cases according to the estimated priority of components can be taken up as a future work.
2. We have considered complexity and failure history as defect generators. The software industry is considering a number of other factors such as change frequency, impact of new technology and impact of the number of people involved, optimization etc. Our proposed method will be effective, if these factors will be considered with it. In future, this scope may be explored.
3. We have proposed eight factors that affect the complexity of a use case at the architectural level. We have automated only three factors. We are planning to automate the rest factors in our future work. 4. We have proposed risk estimation method at the architectural level. One of the future work would be estimating the risk at the requirement phase using requirement models in UML and semi-formal languages.
5. Our approach can be applied to industry standard projects to analyze its effectiveness.