A Comparative Study of Data Transformations for Wavelet Shrinkage Estimation with Application to Software Reliability Assessment

. In our previous work, we proposed wavelet shrinkage estimation (WSE) for nonhomogeneous Poisson process (NHPP)-based software reliability models (SRMs), where WSE is a data-transform-based nonparametric estimation method. Among many variance-stabilizing data transformations, the Anscombe transform and the Fisz transform were employed. We have shown that it could provide higher goodness-of-ﬁt performance than the conventional maximum likelihood estimation (MLE) and the least squares estimation (LSE) in many cases, in spite of its non-parametric nature, through numerical experiments with real software-fault count data. With the aim of improving the estimation accuracy of WSE, in this paper we introduce other three data transformations to preprocess the software-fault count data and investigate the inﬂuence of di ﬀ erent data transformations to the estimation accuracy of WSE through goodness-of-ﬁt test.


Introduction
In the field of software reliability engineering, the quantitative assessment of software reliability has become one of the main issues of this area. Especially, people are interested in finding several software intensity functions from the software-fault count data observed in the software testing phases, since the software intensity function in discrete time denotes the number of software faults detected per unit time. This directly makes it possible to estimate the number of remaining software faults and the quantitative software reliability, which is defined as the probability that software system does not fail during a specified time period under a specified operational environment. Moreover, these evaluation measures can be used in the decision making such as allocation of development resources and software release scheduling. Therefore, we are interested in developing a high-accuracy estimation method for the software intensity function.
Among over hundreds of software reliability models (SRMs) [1][2][3], nonhomogeneous Poisson process (NHPP)based SRMs have gained much popularity in actual software testing phases. In many cases, the NHPP-based SRM is formulated as a parametric model, where the mean value function or its difference in discrete time or derivative in continuous time, called "software intensity function," can be considered as a unique parameter to govern the probabilistic property. One class of parametric NHPP-based SRMs is concerned with modeling the number of software faults detected in testing phases, initiated by Goel and Okumoto [4]. Afterwards, many parametric NHPP-based SRMs were proposed in the literatures [5][6][7][8] from various points of view. However, it is well known in the software reliability engineering community that there does not exist a uniquely best parametric NHPP-based SRM which can fit every type of software-fault count data. This fact implies that nonparametric methods without assuming parametric form should be used to describe the software debugging phenomenon which is different in each testing phase.
Apart from the traditional Bayesian framework, some frequentist approaches based on non-parametric statistics were introduced to estimate the quantitative software reliability. Sofer and Miller [9] used an elementary piecewise linear estimator of the NHPP intensity function from the software-fault detection time data and proposed a smoothing technique by means of quadratic programming. Gandy and Jensen [10] applied the kernel-based estimator to estimate the NHPP intensity function in a non-parametric way. Barghout et al. [11] also proposed a kernel-based nonparametric estimation for the order statistics-based SRMs, where the likelihood cross-validation and the prequential likelihood approaches were used to estimate the bandwidth. Wang et al. [12] applied the similar kernel method to the NHPP-based SRMs, where they focused on the local likelihood method with a locally weighted log-likelihood function. By combining the non-parametric estimation with the Bayesian framework, El-Aroui and Soler [13] and Wilson and Samaniego [14] developed non-parametric Bayesian estimation methods. It should be noted that, generally, these non-parametric estimation methods require high computational cost, which, in some cases, may be almost similar to or greater than an effort on model selection in the parametric SRMs.
Another class of non-parametric estimation methods for NHPP-based SRMs is the wavelet analysis-based approach, initiated by Xiao and Dohi [15]. They proposed the wavelet shrinkage estimation (WSE), which does not require solving any optimization problem, so that the implementation of estimation algorithms is rather easy than the other nonparametric methods. They compared their method with the conventional maximum likelihood estimation (MLE) and the least squares estimation (LSE) through goodnessof-fit test. It has been shown that WSE could provide higher goodness-of-fit performance than MLE and LSE in many cases, in spite of its non-parametric nature, through numerical experiments with real software-fault count data.
The fundamental idea of WSE is to remove the noise included in the observed software-fault count data to get a noise-free estimate of the software intensity function. It is performed through the following three-step procedure. First, the noise variance is stabilized by applying the data transformation to the data. This produces a time-series data in which the noise can be treated as Gaussian white noise. Second, the noise is removed using "Haar-wavelet" based denoising algorithm. Third, an inverse data transformation is applied to the denoised time-series data, obtaining the estimate of the software intensity function. Among many variance-stabilizing data transformations, the Anscombe transform [16] and the Fisz transform [17] were employed in the previous work [15]. The other well-known square root data transformations are Bartlett transform [18] and Freeman transform [19]. Both Anscombe transform and Freeman transform are actually natural extensions of the Bartlett transform. This paper focuses on the first step of WSE and aims at identifying and emphasizing the influence that the data transformation exerts on the accuracy of WSE. The remaining part of this paper is planned as follows. In Section 2, we give a preliminary on NHPP-based software reliability modeling. Section 3 is devoted to introduce data transformations. Section 4 describes the WSE for NHPPbased SRMs in details. In Section 5, we carry out the real project data analysis and illustrate numerical examples to examine the effectiveness of the proposed methods. Finally the paper is concluded with future researches in Section 6.

NHPP-Based Software Reliability Modeling
Suppose that the number of software faults detected through a system test is observed at discrete time i = 0, 1, 2, . . . . Let Y i and N i = i k=0 Y k denote the number of software faults detected at testing date i and its cumulative value, where Y 0 = N 0 = 0 is assumed without any loss of generality. The stochastic process {N i : i = 0, 1, 2, . . .} is said to be a discrete non-homogeneous Poisson process (D-NHPP) if the probability mass function at time i is given by where Λ i = E[N i ] is called the mean value function of a D-NHPP and means the expected cumulative number of software faults detected by testing date i. The function is called the discrete intensity function and implies the expected number of faults detected at testing date i, MLE is one of the most commonly used parametric estimation method. Let θ denote the vector of parameters in the mean value function Λ i = Λ i (θ), and x i (y i ) denote the realization of N i (Y i ). When n software faults are detected, the log-likelihood function of a D-NHPP is given by where x i = i k=0 y k and y 0 = x 0 = 0. Then, the maximum likelihood estimate of θ, say θ, is given by the solution of argmax θ LLF (θ). Therefore, the estimate of the software intensity function In the following sections, we consider the problem of estimating the software intensity function from the noise-involved observation y i , in a non-parametric way.

Variance Stabilizing Data Transformation
It is very familiar to make use of data transformations (DTs) to stabilize the variance of Poisson data. By using DT, the software-fault count data which follow the D-NHPP are approximately transformed to the Gaussian data. The most fundamental data-transform tool in statistics is the Bartlett transform (BT) [18]. Let η i denote the Poisson white noise, that is, (3) Taking the BT, the random variables: can be approximately regarded as Gaussian random variables with the normal distribution N(2 λ i − 1/4, 1), so that the realizations: can be considered as samples from N(2 λ i − 1/4, 1). That is, the transformed realizations b i (i = 1, 2, . . . , n) by the BT are the ones from the normally distributed random variables: where λ i = 2 λ i − 1/4 is the transformed software intensity function, and ν i is the Gaussian white noise with unit variance.
Bartlett [18] also showed that is a better transformation since it provides a constant variance more closely to 1, even when the mean of Y i is not large. The Anscombe transform (AT) [16] is a natural extension of BT and is employed in our previous work [15]. AT is of the following form: where s i can be considered as observations of Gaussian random variable S i = 2 Y i + 3/8 with the normal distribution N(2 λ i + 1/8, 1). Freeman and Tukey [19] proposed the following square-root transform (we call it FT), which is also an extension of BT: They showed that the variance of Gaussian random variable F i = 2 Y i + 1+ Y i is the nearest to 1 among BT, AT, and FT if the mean of Y i is small. Recently, these variance stabilization techniques were used to LSE of the mean value function for the NHPP-based SRMs [20]. Table 1 summaries the DTs mentioned above. As mentioned in Section 1, the first step of WSE is to apply the normalizing and variance-stabilizing DTs to the observed software-fault count data. In this paper, we employ BT, AT, and FT in the first and the third steps of WSE. Then, the target of denoising in the second step of WSE is the transformed data b i , s i or f i (i = 1, 2, . . . , n). Letting b i , s i , and f i denote the denoised b i , s i , and f i , respectively, the estimate of the original software intensity function λ i can be obtained by taking the inverse DT of b i , s i and f i , as given in Table 1.

Wavelet Shrinkage Estimation for NHPP-Based SRM
4 Advances in Software Engineering  respectively. By introducing the scaling parameter j and shifting parameter k, the Haar father wavelet and the Haar mother wavelet are defined by respectively. Then the target function, transformed software intensity function λ i (i = 1, 2, . . . , n), can be expressed in the following equation: where are called the scaling coefficients and the wavelet coefficients, respectively, for any primary resolution level j 0 (≥ 0). Due to the implementability, it is reasonable to set an upper limit instead of ∞ for the resolution level j. In other words, the highest resolution level must be finite in practice. We use J to denote the highest resolution level. That is, the range of j in the second term of (12) is j ∈ [ j 0 , J]. The mapping from function λ i to coefficients (α j0,k , β j,k ) is called the Haar wavelet transform (HWT). Since b i , s i or f i (i = 1, 2, . . . , n) can be considered as the observation of λ i , the empirical scaling coefficients c j0,k and the empirical wavelet coefficients d j,k of λ i can be calculated by (13) and (14) with λ i replaced by b i , s i or f i . The noises involved in the empirical wavelet coefficients d j,k can be removed by the thresholding method that we will introduce later. Finally, the estimate of λ i can be obtained by taking the inverse HWT with denoised empirical coefficients.

Thresholding.
In denoising the empirical wavelet coefficients, the common choices of thresholding method are the hard thresholding: and the soft thresholding: for a fixed threshold level τ(> 0), where 1 A is the indicator function of an event A, sgn(u) is the sign function of u and (u) + = max(0, u). There are many methods to determine the threshold level τ. In this paper, we use the universal threshold [21] and the "leave-out-half " cross-validation threshold [22]: where n is the length of the observation s i . Hard thresholding is a "keep" or "kill" rule, while soft thresholding is a "shrink" or "kill" rule. Both thresholding methods and both threshold levels will be employed to work on the empirical wavelet coefficients d j,k for denoising.

Wavelet Shrinkage Estimation for NHPP-Based SRM.
Since the software-fault count data is Poisson data, the preprocessing is necessary before making use of the Haarwavelet-based denoising procedure. Xiao and Dohi [15] combined data transformation and the standard denoising procedure to propose the wavelet shrinkage estimation (WSE) for the D-NHPP-based SRMs. They call the HWT combined with AT, the Haar-Anscombe transform (HAT). Similarly, we call the HWT combined with BT and FT, the Haar-Bartlett transform (HBT) and the Haar-Freeman transform (HFT), respectively. In the numerical study, we investigate the goodness-of-fit performance of HBT-and HFT-based WSEs and compare them with the HAT-based WSE [15].

Data Sets and Measures.
We use six real project data sets cited from reference [1], where they are named as J1, J3, DATA14, J5, SS1, and DATA8. These data sets are software-fault count data (group data). We rename them for convenience as DS1∼DS6 in this paper. Let (n,    respectively. We employ the MSE (mean squares error) as the goodness-of-fit measures, where Additionally, we calculate LL (Log Likelihood), which is defined as where Λ i = i k=1 λ k (i = 1, 2, . . . , n), and λ i is the WSE estimate of the software intensity function λ i .

Goodness-of-Fit Test.
A total of 16 wavelet-based estimation methods are examined in this paper since the WSE is applied with four thresholding techniques: hard thresholding (h) versus soft thresholding (s); universal threshold (ut) versus "leave-out-half," cross-validation threshold (lht). Let HBT(·, ·), HAT(·, ·) and HFT(·, ·) denote the WSEs based on Haar-Bartlett Transform, Haar-Anscombe Transform and Haar-Freeman Transform, respectively. HBT1(·, ·), and HBT2(·, ·) correspond to the transforms in (5) and (7), respectively. Additionally, the result of HAT-based WSE was introduced in [15], but they only showed the results of HATbased WSE with hard thresholding. Here, we present comprehensively all the results of them for a further discussion.
We present the goodness-of-fit results based on different threshold levels in Tables 3 and 4, respectively. HBT2 provides smaller MSE and larger MLL than the others when "ut" is used, regardless of the thresholding method employed. It is worth mentioning that "ut" provides the same estimates with hard or soft thresholding in three data sets (DS1, DS2, and DS4). This is due to the relatively large value of "ut", since when threshold is set to be 0, the output of thresholding δ τ (d j,k ) is the wavelet coefficient d j,k itself. In our numerical experiments, "ut" is relatively large in these 3 software-fault count data sets, which result in δ ut (d j,k ) = 0 whichever hard or soft thresholding is applied. HBT2 also looks better than the others when software thresholding is applied with "lht." However, when "lht" is combined with hard thresholding, HBT1 (HAT; HFT) possesses the best results in DS3 and DS5 (DS1; DS6), respectively. Since "lht" is considered as a much more proper threshold level than "ut" in analyzing of software-fault count data, we suggest that HBT2 should be selected as an appropriate DT for the WSE.

Comparison with MLE.
Our concern in this section is the comparison of WSE with MLE. We estimate the software intensity function by using three parametric D-NHPP-based SRMs listed in Table 2, where the MLE is applied to estimate the model parameters for comparison. HBT2-based WSE is selected as a representative one among the 16 wavelet-based estimation methods. Figures 1 and 2   faults detected per testing date) with DS1. The observed data is plotted in bar graph in both figures. Looking at (i) and (iii) in these figures, it is clear that the parametric D-NHPPbased SRMs with maximum likelihood estimator can fit the real data. However, since the parametric D-NHPP-based SRMs assume the software intensity function as smooth function, they can estimate only the average tendency of the individual number of software faults detected at each testing date, but cannot follow the microscopic fluctuated behavior in (ii) and (iv) of Figures 1 and 2. In other words, the estimation accuracy based on the cumulative number of faults is embedded in "cumulative effects" in (i) and (iii). The experimental results performed here give the potential applicability of the wavelet-based estimation methods with different thresholding schemes. Our methods employed here do not need the expensive computational cost comparing with the MLE (within less than one second to get an estimate). This is a powerful advantage in applying the D-NHPP-based SRMs, in addition to the fact that practitioners do not request much time and effort to implement the wavelet-based estimation algorithms.

Concluding Remarks
In this paper, we have applied the wavelet-based techniques to estimate the software intensity function. Four data transformations were employed to preprocess the softwarefault count data. Throughout the numerical evaluation, we could conclude that the wavelet-based estimation methods with Bartlett transform 2 Y i have much more potential applicability than the other data transformations to the software reliability assessment practice because practitioners are not requested to carry out troublesome procedures on model selection and to take care of computational efficiency such as judgment of convergence and selecting initial guess of parameters in the general purpose of optimization algorithms. Note that, the result obtained here does not mean that the other data transformations are not good because the performance evaluation was executed only through goodness-of-fit test. Although the prediction ability of the proposed methods is out of focus of this paper at the present, the predictive performance should be considered and compared in the future.