Software Reliability Growth Model with Partial Differential Equation for Various Debugging Processes

. Most Software Reliability Growth Models (SRGMs) based on the Nonhomogeneous Poisson Process (NHPP) generally assume perfect or imperfect debugging. However, environmental factors introduce great uncertainty for SRGMs in the development and testingphase.WeproposeanovelNHPPmodelbasedonpartialdifferentialequation(PDE),toquantifytheuncertaintiesassociated withperfectorimperfectdebuggingprocess.Werepresenttheenvironmentaluncertaintiescollectivelyasanoiseofarbitrary correlation.Underthenewstochasticframework,onecouldcomputethefullstatisticalinformationofthedebuggingprocess,for example,itsprobabilisticdensityfunction(PDF).Throughanumberofcomparisonswithhistoricaldataandexistingmethods, suchastheclassicNHPPmodel,theproposedmodelexhibitsacloserfittingtoobservation.Inadditiontoconventionalfocus onthemeanvalueoffaultdetection,thenewlyderivedfullstatisticalinformationcouldfurtherhelpsoftwaredevelopersmake decisionsonsystemmaintenanceandriskassessment.


Introduction
Software reliability, defined as the probability of failure-free operation under certain conditions and by specific time [1], is one of the significant attributes of software systems development life cycle.As the software systems mature with ever growing complexity, evaluation, prediction, and improvement of their reliability become a crucial and daunting task for developers in both the development and testing phases.Numerous Software Reliability Growth Models (SRGMs) have been developed [2][3][4][5], which generally agree that the fault debugging process is a Nonhomogeneous Poisson Process (NHPP) under various diverging assumptions: perfect and imperfect debugging phenomenon [6,7].
A common assumption is that removing detected faults will not introduce new faults, usually called perfect debugging phenomenon.Based on earlier works of Jelinski and Moranda [8], Goel and Okumoto [9] developed the exponential Software Reliability Growth Models (G-O model) with a constant fault detection rate.In latter models, the smoothly changeable fault detection rates are incorporated.Yamada et al. [10] found the fault detection rate fit to a S-shape growth curve and proposed the delay S-shaped SRGM.Employing the learning phenomenon in software fault detection process, Ohba [11] later developed an alternative inflection S-shaped SRGM.Further work was conducted by Yamada and others [12] by incorporating the relationship between the work effort and the number of faults detected into the model.Huang et al. [13] stated that fault detection and repair process are different in software development and operation.They proposed a unified theory to merge multiple moves into a NHPP-based SRGM and found a regularly changeable fault detection rate in the detection process.Later on Huang and Lyu [6] also used multiple change points to deal with imperfect debugging problems.The common assumption is that the removal of the detected faults will not introduce new faults.This scenario is known as perfect debugging.However, given the complexity of the software debugging process, debugging activity may be imperfect in practical software development environment and thus those perfect models may oversimplify the underlying dynamics.
To solve such problems, imperfect debugging processes are incorporated in latter models.Whenever the software system is developed, new errors can be introduced into the 2 Mathematical Problems in Engineering software during development/testing phase and faults may not be corrected perfectly.This phenomenon is known as imperfect debugging.According to the above two phenomenons, SRGMs can be divided into two categories: perfect and imperfect debugging model.Kapur and Garg [14] discussed their fault removal phenomenon that as the team gains more experience, they may remove detected faults without causing new ones.However, the testing team may not correct a fault perfectly and the original fault may remain or generate new faults, leading to a phenomenon known as imperfect debugging.It was Goel [15] who first introduced the concept of imperfect debugging.Latter, Ohba and Chou [16] developed an error generation model based on the G-O model, named as an imperfect debugging model.Pham et al. [17] proposed a general imperfect software debugging model (P-N-Z model).Pham [18] also developed an SRGM for multiple failure types incorporating error generation.Zhang et al. [19] proposed a testing efficiency model that includes both imperfect debugging and error generation.Kapur et al. [20] proposed a flexible SRGM with imperfect debugging and error generation using a logistic function for the fault detection rate, which reflects the efficiency of the testing and removal team.Employing the learning phenomenon in software fault detection process, Yamada et al. [21] later developed an imperfect debugging model.In imperfect software debugging models, the fault introduction rate per fault is generally assumed to be constant or decreasing changes over time [22].Recently, Wang et al. [22] also proposed a model to represent the imperfect debugging process with a loglogistic distribution.To sum up, most models presume a deterministic relationship between the cumulative number of detected faults and the time span of the software fault debugging process.
However, software debugging is a stochastic and uncertain process, which can be effected by many environmental factors, such as the distribution of resources, strategy, and running environment [23].In particular, in the uncertain network environment, the aforementioned assumptions of deterministic model would become problematic.The environmental noise introduces great uncertainties that significantly affects traditional debugging process.To address such challenges, new stochastic models [24][25][26][27] were proposed, which treat the debugging process as perfect and stochastic; they assume each failure occurrence is independent and follows identical random distribution.By employing the white noise, that is, temporally uncorrelated random variables, for the environmental factors collectively, they used a flexible stochastic differential equation to model the irregular changes.Comparing to conventional models, such white-noise-based approach assumes the perfect debugging and no doubt provides closer approximation to uncertain fluctuations in reality but with great mathematical simplicity.Debugging activity is usually imperfect in piratical software development and recent data [28,29] show that the fault detection is highly susceptible to noise and is generally correlated; thus earlier assumptions become problematic.Thus, because of its mathematical simplicity, it may also considerably underestimate the imperfect debugging process and the temporal correlation in a dynamic environment.
In this study, we propose an alternative stochastic framework based on partial differential equation (PDE), to describe the perfect/imperfect debugging processes, in which the environmental noise is of arbitrary correlation.Details of the model is provided in Section 2 with an equation for the probabilistic density function (PDF) of system states.In Section 3, we validate our approach with both historical failure data and results from conventional methods; main features of the model, including the temporal evolutions of its full statistical information, are later addressed in Section 4. The final conclusions are given in Section 5.

Problem Formulation
2.1.Model Formulation.Widely used in many practical applications, NHPP model assumes the following: (1) the fault debugging process is modeled as a stochastic process within a continuous state space; (2) the system is subject to failures at random times caused by the remaining errors in the system; (3) the mean number of detected faults in a time interval is proportional to the mean number of remaining faults; (4) when a fault is detected, it may be removed with the probability ; (5) when a fault is removed, new faults may be generated with a constant probability .
In order to describe the stochastic and uncertain behavior of the fault debugging process, in a continuous state space, the temporal evolution of () is routinely described as [12] d d =  () ( +  − ) , where () is the expected number of faults detected in the time interval (0, ],  denotes the number of total faults, and () is a nonnegative function representing the fault detection rate per remaining fault at testing time .We further note that () is the number of newly introduced bugs, while () means the number of successful removal; together they represent a (perfect/imperfect) debugging process in which probabilities  and  are assigned [30][31][32].Without a loss of generality, we denote () = () − () to facilitate subsequent presentation and  = 1 indicates a perfect debugging process.Since () is subject to a number of random environmental effects [23], we follow the previous studies [24,25,27] and represent () as a sum of its mean value ⟨()⟩ and a zeromean (random) fluctuation   ():  () = ⟨ ()⟩ +   () , ⟨  ()⟩ = 0. ( By substituting (2) into (1), we obtain a stochastic differential equation: In contrast to earlier works of [24,25,27] which approximate the fluctuations term   () as a white noise, we relax such assumption and denote the noise's two-point covariance more broadly as where  2  represents the strength of the noise,  denotes its correlation function,  and  indicate two temporal points, and  is the noise correlation time.

PDF Method.
Our goal is to derive an equation for the probabilistic density function of , that is,   (; ).To be specific, we adopt the PDF method originally developed in the context of turbulent flow [33] and use the Dirac delta function (⋅) to define a "raw" PDF of system states  at time : whose ensemble average over random realisations of  yields   (; ) Following recent study on PDF method [34], we multiply the original equation ( 3) with Π/ and apply the properties of the Dirac delta function ( − )() = ( − )().This would yield the stochastic evolution equation for Π in the probability space ( ∈ Ω): Now we take the ensemble average of ( 7) and apply Definition (6); by employing a closure approximation, variously known as the large-eddy diffusivity (LED) closure [35] or the weak approximation [36], an equation of   can be obtained where the eddy diffusivity D and effective velocity V are And Green's function G  (, ; , ) satisfies the deterministic partial differential equation subject to the homogeneous initial (and boundary) conditions corresponding to their (possibly inhomogeneous) counterparts for the raw PDF (7).

Model Validation
In this section, we validate our PDE model (8a), (8b), (8c), and (8d) by computing the mean fault detection ⟨⟩ and compare it with historical data and two other methods, namely, the classic deterministic SRGMs (generalised NHPP model) and the popular stochastic SRGMs (white-noise model).As shown in Table 1, four sets of data are used: the first one (DS1) is from WebERP system [37]; the second (DS2) originated from the open source project management software [38]; and the third (DS3) and fourth (DS4) are from the first and fourth product releases of the software products at Tandem Computers [32].Among those four sets of data, the two UDP/TCP-based systems (DS1 and DS2) are more likely to be affected by environmental factors, whereas the TCP-based software development at Tandem Computers Inc. takes place in a more enclosed/stable environment.UDP is a connectionless-oriented transport protocol and will introduce larger correlation strength for system based on it, while TCP is connection-oriented and will generate smaller correlation strength.We encapsulate such difference in terms of noise variance, for example, larger  2  for UDP/TCP-based systems (DS1 and DS2).Moreover, DS1, DS2, and DS3 are from the imperfect debugging process, while DS4 is from the perfect debugging process; we demonstrate this diverse with various configurations of the subsequent presentation .The testing time of DS1, DS2, DS3, and DS4 is 60 (months), 29 (weeks), 20 (weeks), and 19 (weeks), respectively.For detailed information, please refer to the data values tabulated in the Appendix.

Model Solution for
Constant .For model validation, we adopt the popular generalized Goel-Okumoto model (G-O), as this enables us to derive an analytical solution of (3) using separation of variables.The expression for the mean fault detection rate per remaining fault using the generalised G-O model is given by [41] where  ∈ [0, 1] is a constant and  is the shape factor.Following the PDF method, we can obtain an analytical solution of   : and 2  is the variance of the random integral    = ∫  0   ()d, whose PDF follows a Gaussian distribution,     ∼ N(0,  2  ), as outlined in Table 2.
The mean fault detection ⟨⟩ is then It is noted here that the fault detection rate model (12) describes an imperfect debugging process for constant .In many circumstances, time delay, failed removal, and new faults would lead to the problem that () is a timedependent variable [6,7,13,42], so that we are not able to get an analytical solution ⟨⟩ and can only obtain a numerical solution.In the current study, our focus lies on the uncertainty quantification framework of environmental factors in imperfect debugging process; without a loss of generality, subsequent approach could also be extended to the imperfect debugging process with time-dependent ().

Performance Criteria.
To evaluate the performance of our PDE model, we employ four comparison criteria, for example, the predicted relative error (PRE), mean square error (MSE), Coefficient of Determination ( 2 ), and root mean square prediction error (RMSPE).

Predicted Relative Error (PRE).
The relative difference between observed and estimated number of faults at time , known as predictive validity [7,43], is as follows: where  is the fault number from observed data at time .
Hence for a better fit to the data, we seek PRE() → 0 [44].

Mean Square Error (MSE).
Consider where (  ) and (  ) denote the estimated and observed number of faults detected at time   , ( = 1, 2, . . ., ), respectively,  is the total number of observation data points.

Coefficient of Determination (𝑅 2
).We define this coefficient as the ratio of the sum of squares resulting from the trend model to that from constant model:

Root Mean Square Prediction Error (RMSPE).
The average of prediction errors (PEs) is known as Bias [46].The standard deviation of predicted relative variation (PRV) is known as predicted relative variation [46].The root mean square prediction error (RMSPE) is a measure of the closeness with which the model predicts the observation and is a measure of closeness with which a model predicts the observation [46]: Given The lower the values of Bias, PRV, and RMSPE, the better the goodness of fit.

Model Parameter Estimation.
There are six input parameters for our PDE method: the total number of initial faults , the fault detection rate , the shape factor , the subsequent presentation , the noise strength  2  , and correlation time .Given a data set, we used SPSS to estimate the parameters , , and , and the methods for estimating three parameters include method of moments, least square method, maximum likelihood, and Bayesian methods [47][48][49].For the subsequent presentation , we use the empirical value of [30,39,40].About the noise parameters, we adopt earlier works on white noise [24] for the strength of our correlated noise  2  , which yields the optimal solution as discussed in the sensitivity analysis Section 3.5.Regarding the noise correlation time , we take the fact that a software failure is typically repaired within days [50] and hence set  = 0.1 (month) or (week) for later analysis.A stationary Gaussian process is prescribed to the fluctuation term,   ∼ N(0,  2  ), with an exponential correlation (, ; ) = exp(−| − |/).

Validation Results.
In this section, we validate our PDE model ( 12) by computing the mean fault detection ⟨⟩ and compare it with historical data and two other methods, namely, the classic deterministic SRGMs (generalised NHPP model) and the popular stochastic SRGMs (white-noise model).Figure 1 shows expected number of faults ⟨⟩ from our PDE model, the G-O model, and the white-noise model against four data sets over the testing time, respectively.Parameters of all three models are adjusted for their best fit.As demonstrated in Figure 1, though our PDE model overestimates the number of faults for DS1 and leads to underestimation for DS2, DS3, and DS4, overall it provides the overall best match to the four data sets, compared with the G-O model and the white-noise model.
Figure 2 shows the PRE results for our PDE model, G-O model, and white-noise model over the testing time.Similar trends are exhibited for all three models: large estimation error (deviation from zero) at the beginning but gradual convergence to observation at later time.This is expected, since few data can be utilised at earlier time to gauge the models' parameters; as time elapses and more data become available, the models' accuracy improve and hence their PRE approaches zero.Overall, the PDE model yields the smallest PRE.
We tabulated the aforementioned evaluation results of our PDE model, G-O model, and white-noise model for DS1, DS2, DS3, and DS4 in Table 3.Despite a slightly higher value of PRV for DS1, the PDE model presents best goodness-to-fit with all four data sets in all performance indicators.

Sensitivity Analysis.
Having validated our PDE model with four observation real data sets, we now conduct the sensitivity analysis to study the impact of correlation time , strength  2  , and function (, ; ) on our fault prediction.We first investigate the impact of the correlation time . has a large effect on the white-noise model, while its influence on the PDE model is very limited.
Lastly we conduct the sensitivity analysis of the correlation function.Five types of correlation function, exponential, besselj, normal, cosine, and cubic, are studied.Figure 5 and Table 6 show the results on DS2 for our PDE model with parameters  = 156,  = 0.5446,  = 0.026,  2  = 0.0002228,  = 0.5092, and  = 0.1, G-O model with parameters  = 94 and  = 0.146, and white-noise model with parameters  = 97,  = 0.16246,  = 0.026,  = 0.5092, and  2  = 0.0002228.It is clear that the correlation type significantly affect our model estimation.It also demonstrates that the PDE model with exponential correlation function fits best.

Reliability Assessment
Having validated our model with historical data, we now apply our framework first to study the detection process behaviour through its full statistical information in Section 4.1 and later to assess software reliability in Section 4.2.

Temporal Evolution of PDF.
In addition to conventional focus on the mean value of fault detection, ⟨⟩, we now include full statistical information, that is, the probabilistic density function (PDF) and the cumulative density function (CDF), to study system behaviour.
Three snapshots of the PDF   (; ) and CDF   (; ) at time  = 2, 10 and 50 weeks are plotted in Figure 6, respectively, using  = 156,  = 0.005446,  = 0.026,  2  = 0.0002228,  = 0.5092, and  = 0.1.The uncertainty of our prediction, encapsulated in the width of the PDF and the slope of the CDF, initially rises from 1 to 10 weeks; it later falls at 50 weeks as most of the faults are detected.Here we propose employing the CDF as a criteria to ensure certain levels of reliability.For example, in Figure 6, for a 90% software reliability in 2, 10, and 50 weeks, the minimum number of faults detected must be 3, 10, and 44, respectively.

Software Reliability and Reliability
Growth Rate.Software reliability (; ) is broadly defined as the probability that no software fault is detected in the time interval [,  + ], given that the last fault occurred at time .The software reliability can be expressed as Thus, the software reliability growth rate, RGR(; ), a measure of reliability change rate, can be computed as where The temporal derivative of the variance d  /d is listed in Table 2. Three snapshots of software reliability, (; ), at time  = 2nd, 10th, 50th weeks, are presented in Figure 7, using  = 156,  = 0.005446,  = 0.026,  2  = 0.0002228,  = 0.5092, and  = 0.1.Reliability at all three timeframes exhibits initial drop but such deterioration gradually slows down and (; ) eventually approaches zero as  → ∞.This is expected, since, at earlier time, most of the faults are yet removed; with elapsing time, more are debugged and this leads to a fall in software reliability; when the system is cleared of all faults, (; ) reaches a stationary state.Therefore, later timeframe ( = 50th week) also presents a sharper drop to zero.For a closer examination, we now look at the software reliability growth rate, RGR(; ), at  = 2nd, 10th, 50th weeks.As shown in Figure 7, RGR in all three timeframes demonstrates an initial rise to maximum and late fall to stationarity of zero, albeit earlier timeframe ( = 2nd week) exhibits higher zenith (RGR = 600), since fewer faults are removed at earlier stage.

Conclusion
We presented a novel model to quantify uncertainties associated with the perfect or imperfect debugging process.Treating the random environmental impacts collectively as a noise with arbitrary correlation, an equation of the distribution of fault detection, that is, its probabilistic density function (PDF), is derived.Our analysis leads to the following major conclusions.
(i) Under the conventional evaluation criteria on mean value of detected fault in the perfect or imperfect debugging process, our model provides best fit to observation data comparing to the classic G-O model and white-noise model.
(ii) The proposed approach also goes beyond uncertainty quantification based on mean and variance of system states by computing its PDF and cumulative density function (CDF).This enables one to evaluate probabilities of rare events, which are necessary for probabilistic risk assessment.
(iii) Predictive uncertainty of fault detection initially increases but gradually diminishes as time elapses; hence more maintenance is needed at the initial stage.
(iv) Software developers could also use CDF as a new criteria to assess system reliability, for example, by computing the minimum number of faults detected for an prescribed confidence level.
(v) The proposed correlated model describes an imperfect debugging process for constant , which enables us to obtain an analytical solution ⟨⟩.The approach could also be extended to the imperfect debugging process with time-dependent (), to help us drive a numerical solution by computing its PDF.This would be the focus of our future work.

Figure 3 and
Table 4 present the PRE, MSE,  2 , Bias, PRV, and RMSPE results on DS2 for our PDE model with parameters  = 156,  = 0.5446,  = 0.026,  2  = 0.0002228, and  = 0.5092, G-O model with parameters  = 94 and  = 0.146, and white-noise model with parameters  = 97,  = 0.16246,  = 0.026,  2  = 0.0002228, and  = 0.5092.It demonstrates that the PDE model fits best the real data and the repair time is  = 0.1 week.Hence a correlation time  = 0.1 (month or week) is a good approximation of fault repair time in reality.Next we study the impact of correlation strength.

Figure 2 :
Figure 2: Predicted relative error (PRE) for the PDE model, G-O model, and white-noise model are compared with four data sets in (a) DS1, (b) DS2, (c) DS3, and (d) DS4, respectively.

Figure 4 :
Figure 4: Sensitivity analysis of PRE for PDE model, G-O model, and white-noise model with various configurations of  2  .

Figure 5 :
Figure 5: Sensitivity analysis of PRE for PDE model, G-O model, and white-noise model with various correlation functions.

Table 1 :
Data sets for model validation.

Table 4 :
Various configurations of  in obtaining MSE,  2 , Bias, PRV, and RMSPE values of PDE model, G-O model, and white-noise model.

Table 5 :
Various configurations of  2  in obtaining MSE,  2 , Bias, PRV, and RMSPE values of PDE model, G-O model, and white-noise model.

Table 6 :
Various correlation functions in obtaining MSE,  2 , Bias, PRV, and RMSPE values of PDE model, G-O model, and white-noise model.