Machine Learning Approach for Software Reliability Growth Modeling with Infinite Testing Effort Function

Reliability is one of the quantifiable software quality attributes. Software Reliability Growth Models (SRGMs) are used to assess the reliability achieved at different times of testing. Traditional time-based SRGMs may not be accurate enough in all situations where test effort varies with time. To overcome this lacuna, test effort was used instead of time in SRGMs. In the past, finite test effort functions were proposed, which may not be realistic as, at infinite testing time, test effort will be infinite. Hence in this paper, we propose an infinite test effort function in conjunction with a classical Nonhomogeneous Poisson Process (NHPP) model. We use Artificial Neural Network (ANN) for training the proposed model with software failure data. Here it is possible to get a large set of weights for the same model to describe the past failure data equally well. We use machine learning approach to select the appropriate set of weights for the model which will describe both the past and the future data well. We compare the performance of the proposed model with existing model using practical software failure data sets. The proposed log-power TEF based SRGM describes all types of failure data equally well and also improves the accuracy of parameter estimation more than existing TEF and can be used for software release time determination as well.

Some SRGMs have been proposed with testing effort function (TEF) [11][12][13], since the fault detection and correction depend on efforts consumed such as test cases executed, man-days expended, computer utilization time, and other resources consumed, rather than only testing time or calendar time. The effort based SRGMs proposed in the past use exponential, Rayleigh, logistic, or Weibull distributions to specify testing effort function (TEF) to denote effort consumption during testing [11][12][13]. Although these functions seem to give good result and can well fit in some cases, there is a fallacy in assuming finite total test effort at an infinite time. Xie and Zhao proposed a Nonhomogeneous Poisson Process (NHPP) reliability growth model based on log-power distribution which is a graphical model where fitting of the data or not can be visualized in a graph before parameter estimation [19].
In this paper, we propose using log-power [19] distribution to describe TEF in Goel and Okumoto [1] SRGM to provide an SRGM with infinite TEF. We use Artificial Neural Network (ANN) for parameter estimation and apply machine learning technique to determine the most suitable weights for the proposed model that will fit the past and future data equally well. We study and compare the goodness of fit (GoF) performance of the proposed model with a popular test effort function based SRGM. We use ANN for parameter estimation uniformly in all cases since ANN improves the parameter estimation accuracy and gives better goodness of fit rather than traditional statistical parametric models [15][16][17][18].

Models Equation Notations
Weibull TEF [11] ( ) = (1 − (− ) ) ( ): cumulative testing effort consumption in (0, ] : expected total testing effort consumed during testing : scale parameter and : shape parameter Logistic TEF [12] ( ) = 1 + (− ) : total testing effort consumed : constant : consumption rate of testing effort expenditure Proposed log-power TEF ( ) = ln (1 + ) and are constants This paper is organized in the following manner. Section 2 presents the proposed testing effort function. Section 3 presents the proposed Software Reliability Growth Model. Section 4 gives the approach to check the validity of the proposed model. Section 5 describes parameter estimation using ANN. Section 6 presents the machine learning technique used to select appropriate weights of the proposed model. Section 7 describes the performance analysis. Section 8 describes one application of the proposed model, namely, software release time determination. Summary and conclusions are given in Section 9.

Proposed Testing Effort Function
Since the resources consumed during software testing directly impact software reliability improvement, few SRGMs with testing effort functions were proposed in the past. To study the properties of testing effort functions, we compare the proposed log-power TEF with already proposed test effort functions such as Weibull and logistic. The comparison is given in Table 1.
The exponential and Rayleigh TEFs are special cases of Weibull TEF when the shape parameter is 1 and 2, respectively. Weibull TEF displays a peak curve when the shape parameter in the Weibull function increases. The exponential TEF is used, when the effort is uniformly consumed on the testing time whereas the Rayleigh TEF is used when the testing effort first increases to a peak and then decreases. In case of logistic TEF, at time " = 0," the effort ( ) = (0) is nonzero. It is unrealistic, because at the initial stages when time is zero no testing effort can be consumed.
There are innumerable chances for faults creeping in software systems. Therefore one has to adopt a strategy for the generation of effective test cases for minimizing the error content. It is believed that achieving zero defect in software is possible but impractical due to the requirement of infinite efforts. At time " = 0," the effort ( ) = (0) is not zero since test cases and test plan are drawn before testing starts. Thereafter it grows with testing. We chose log-power TEF because of its simplicity with just two parameters and it was found to be growing logarithmically with time and representing real testing projects better.

Proposed Software Reliability Growth Model with Log-Power Testing Effort Function
Instead of proposing a brand new SRGM for the sake of it, we propose building on the past good work done by researchers [1,19]. We time transform the G-O model using log-power testing effort function. In the classical Goel-Okumoto SRGM, the independent variable, that is, time " ," is replaced with log-power testing effort function " ( )" by applying the time transformation as applicable to NHPP models [20]. If ( ) is the log-power testing effort spent at time then the mean value function ( ) of the Goel-Okumoto model can be transformed as given below: where ( ) is total testing effort consumed in time interval (0, ], is the expected number of software errors to be detected, and , , and are constants. Thus, the mean value function ( ) of the SRGM with logpower TEF is as follows: (2)

Checking Validity of the Model
We evaluate the performance of the proposed model by using four practical software failure data sets which are available in the form of ( , , ). The data set needs to be normalized in the range of [0, 1] before feeding to the ANNs. Table 2 provides the description of the software failure data sets. We measure and compare the goodness of fit (GoF) performance of the proposed model by using Mean Square Error (MSE) [22]. MSE is used to measure the square of the difference between the actual and estimated values. The smaller MSE indicates the less fitting error and better performance.
Step 1 (1) Normalize the input and output data set patterns.
(2) Initialize the weight values to small random numbers.
(3) Set the error rate condition criteria.
(4) Derive the activation functions for hidden layer and output layer from the mean value function.
Step 2 Calculate the Input value and Output value for hidden and output neurons using activation functions.
(1) Output of Input Neuron = The input data value (2) The output of Hidden neurons or output neuron = (NET) where NET = ∑ ⋅ , is the activation function, is the weight from to and is the input data set pattern values.
Step 3 Calculate error , Start from output layer & work backward to hidden layers recursively.
Error = ( − ) * ( 1 (NET)) where is the error for output layer neurons. Similarly calculate error for hidden layer neurons.
Step 4 Do the Weight Adjustments for output and hidden layers.
Weight adjustments for output layer = + Δ where = ⋅ ⋅ here is the learning rate co-efficient and the weights are adjusted by gradient descent method in which the weight change is proportional to the partial derivative of the error.
Repeat step 2 to step 4 until the stopping criteria are met.

DS-1 [20]
Release-1 which is a subset of four software releases cited from Wood for Tandem Computers Company. It was tested for 20 weeks ( ) in which 100 software failures ( ) were found and 10000 CPU hours ( ) were consumed.

DS-2 [20]
Release-2 from Tandem Computers Company. It was tested for 19 weeks ( ) in which 120 software failures ( ) were found and 10272 CPU hours ( ) were consumed.

DS-3 [20]
Release-3 from Tandem Computers Company. It was tested for 12 weeks ( ) in which 61 software failures ( ) were found and 5053 CPU hours ( ) were consumed.

Parameter Estimation Using Artificial Neural Network
We use feed-forward ANN with back-propagation algorithm for estimating parameters of the proposed model. Thus, the mean value function of the proposed SRGM with log-power TEF (2) is given as follows: where 1 , 2 , 3 , and 4 are the weights of software reliability model and their values are determined using ANN. Here, the activation functions of the ANN are developed according to the mean value function of the selected SRGM and testing effort function [16]. In order to estimate the weight values, software failure data which is available in the form of ( , , ) is used where is the cumulative testing time which is measured in terms of appropriate time such as months and hours, is the effort expended in terms of number of hours, and is the corresponding cumulative number of failures.
The ANN feed-forward back-propagation procedure for parameter estimation is given in Box 1.

Machine Learning Technique to Select Appropriate Weights of the Proposed Model
The goodness of fit statistic indicates the quality of fitting of past data. The objective is not only to get a better fit for the past data, but also to ensure that the model will describe the future data equally well. Traditionally the predictive validity, both short-term and long-term of the software reliability models, was measured in order to confirm that the model will describe the future data well. We apply hold-out cross-validation approach which is one of the conventional machines' learning technique to get the better goodness of fit for the past data as well as predictive validity to describe the future data [23]. Multiple sets of weights may lead to equally good fit when we use ANN. Different good fits are possible depending on the start values assigned at random for the weights. Selection of weights based only on minimum training error could be (1) Train on the 60% training data set.
(2) Calculate the training data set accuracy by propagating error through the network and by adjusting the weights using ANN feed-forward back-propagation algorithm. (3) Validate on the next 20% validation data set. (4) If the threshold validation accuracy is met stop training. Else continue training until the threshold validation accuracy met. (5) Test on the next 20% test data set to confirm the selection of the model with appropriate weights.
Box 2: Description of machine learning cross-validation procedure to select appropriate weight values. misleading since the model may not describe future data accurately in the same manner. If the selected weights result in low training error but have high validation error, it is due to high variance or overfitting. Hence after arriving minimum training error (for 60% training data set) with the selected weights, we carry out validation (for 20% nonoverlapping validation data set) to ensure that the model will fit new data adequately. Box 2 describes the cross-validation procedure to select appropriate weights of the model. Table 3 provides the Mean Squared Error values for both training and cross-validation for two trial weight sets of the proposed model.
It can be seen that although training error is more or less the same for both Trial-1 and Trial-2, the validation error is significantly higher for Trial-1 for both data sets. So it will not describe the future data better. Since the training and validation errors are both lower for the Trial-2 weights, the model will fit the future data also equally well.

Performance Analysis
Once the appropriate weights of the proposed model are determined as above, then the model is tested for performance using the remaining 20% test data to confirm the selected weights. The MSE calculated with test data is given in Table 4.
To study the relative performance of the testing effort function, we compare the proposed log-power TEF with already proposed Weibull test effort function [11], both used in G-O model [1]. The results confirm the suitability of

Determining When to Stop Testing: Use of Proposed SRGM
When to stop testing and release the software for operational use is one of the applications of Software Reliability Growth Models [22,24]. Since the estimation of optimum release time based on conditional reliability does not converge [19], release time determination was carried by Subburaj and Gopal using minimum target failure intensity as the criterion instead of reliability [5], which converged after a few phases of testing. We adopt the same approach to determine when to stop testing using the proposed model. Box 3 describes the procedure for software release time determination using failure intensity to stop testing. The equation of failure intensity function of proposed logpower TEF based SRGM is given as follows: A target failure intensity of 1.663 failures per week is set for software failure data set DS-4. The target failure intensity has been achieved, and testing can be stopped at 25 weeks by which time 1166 failures were observed as given in Table 5. When we use effort based SRGM we can not only find the optimum testing time ( OPT ), but also determine the effort needed to achieve target reliability as illustrated in Table 5.
(1) Set the target failure intensity to stop testing that depends on the software failure data set and customer requirements.
(2) In Phase I, estimate the parameters upto 25% of total number of failures in the software failure data set.
(3) Find the Optimal Testing Time ( OPT ) needed to meet the target failure intensity using ( ). (4) In the next Phase, estimate the parameters upto OPT with the software failure data set.
Repeat step (3) with the updated estimated parameters.
if it is less than or equal to target then stop testing. else repeat step (3) till the target failure intensity achieved with the required OPT .
Box 3: Procedure for software release time determination to stop testing.

Summary and Conclusions
In time-based Software Reliability Growth Models (SRGMs), we assume that the testing efforts are constant over time which may be unrealistic at times. Effort based SRGMs are more realistic and result in better goodness of fit. Hence, some SRGMs with testing effort functions were proposed in the past. We propose log-power TEF which is an infinite test effort function, since logically the test efforts will be infinite at the infinite testing time. The proposed log-power TEF based SRGM describes all types of failure data equally well. The goodness of fit indicates the quality of fitting of past data. It does not assure that the future data will be fitted equally well. Hence we determine the appropriate weights using machine learning technique to select the SRGM that will describe both the past and future failures equally well. The study confirms that SRGM with log-power TEF improves the accuracy of parameter estimation more than existing TEF and can be used for software release time determination as well. Instead of conventional parameter estimation methods, we use ANN for parameter estimation. Although already proposed SRGM uses Weibull distribution for effort function, our study reveals the log-power TEF to be simple and equally good and it is a natural choice for TEF. It is clear that the proposed logpower TEF based SRGM which is selected using machine learning technique improves the accuracy of the goodness of fit performance better than the Weibull TEF based SRGM which is already proposed.