Parameter Estimation for Type III Discrete Weibull Distribution : A Comparative Study

The type III discrete Weibull distribution can be used in reliability analysis for modeling failure data such as the number of shocks, cycles, or runs a component or a structure can overcome before failing. This paper describes three methods for estimating its parameters: two customary techniques and a technique particularly suitable for discrete distributions, which, in contrast to the two other techniques, provides analytical estimates, whose derivation is detailed here. The techniques’ peculiarities and practical limits are outlined. A Monte Carlo simulation study has been performed to assess the statistical performance of these methods for different parameter combinations and sample sizes and then give some indication for their mindful use. Two applications of real data are provided with the aim of showing how the type III discrete Weibull distribution can fit real data, even better than other popular discrete models, and how the inferential procedures work. A software implementation of the model is also provided.


Introduction
Most reliability studies assume that time is continuous, and continuous probability distributions such as exponential, gamma, Weibull, normal, and lognormal are commonly used to model the lifetime of a component or a structure.These distributions and the methods for estimating their parameters are well known.In many practical situations, however, lifetime is not measured with calendar time: for example, when a machine works in cycles or on demands and the number of cycles or demands before failure is observed; or when the regular operation of a system is monitored once per period, and the number of time periods successfully completed is observed.Moreover, reliability data are often grouped into classes or truncated according to some censoring criterion.In all these cases, lifetime is modeled as a discrete random variable (r.v.).Indeed, not much work has been done in discrete reliability.Generally, most reliability concepts for continuous lifetimes have been adapted to the discrete case; in particular, discrete analogues of continuous distributions have been introduced [1].In this context, geometric and negative binomial distributions are the corresponding discrete alternatives for the exponential and gamma distributions, respectively.Yet, discrete lifetime distributions can be defined without any reference to a continuous counterpart.Bracquemond and Gaudoin [2] provided an exhaustive survey on discrete lifetime distributions.
The (two-parameter) continuous Weibull distribution is one of the most widely used stochastic distributions for modeling the life of a component; it is very flexible since, for different chosen shape parameters, it models either increasing or decreasing failure rates [3].As a discrete alternative to the continuous Weibull distribution, three main forms have been proposed.The first one was introduced in Nakagawa and Osaki [4] and is referred to as type I discrete Weibull; it mimics the cumulative distribution function of the continuous Weibull distribution.The second one (type II) was proposed and studied in Stein and Dattero [5]; it mimics the hazard rate of its continuous counterpart.The third, which is referred to as type III and this paper considers, was introduced in Padgett and Spurrier [6]: their approach does not start from the continuous Weibull distribution but tries to generalize the notions of hazard rate and mean residual life to the discrete case [7].From a different perspective and with a different objective, Roy and Dasgupta [8] proposed a discretization method for continuous random variables for computing reliability in complex stress-strength models, with a specific application to the Weibull r.v.; however, the support of the discrete r.v.thus provided is not the set of nonnegative integers.
Type III discrete Weibull (henceforth simply discrete Weibull) r.v. is not similar in functional form to any of the functions describing a continuous Weibull distribution.It is defined by the following cumulative distribution: with  > 0 and  ≥ −1 (and not  ∈ R, as stated in Padgett and Spurrier [6]) or equivalently by the following probability mass function: which can be more compactly rewritten as follows: letting ∑  =1   = 0 if  = 0.The corresponding survival function is and the hazard rate (or failure rate) function is The complexity of the expression of the probability mass function (3) has somehow hindered the use and diffusion of this discrete model.A plot of the probability mass function of the discrete Weibull for three combinations of its parameters is given in Figure 1.Note that for  = 0, the discrete Weibull reduces to a geometric r.v. with parameter 1 − exp(−), characterized then by a constant failure rate; for  > 0, the distribution has an increasing failure rate, whereas for  < 0, it has a decreasing failure rate.The first and second moments cannot in general be expressed in a closed form but can only be expressed as an infinite series The first moment ( 6) is finite for  > −1 or for  = −1 and  > 1; the second moment ( 7) is finite for  > −1 or for  = −1 and  > 2 (see the appendix).This study first describes and discusses procedures for estimating parameters  and  of the discrete Weibull r.v., one of which is an original adaptation of a technique used for some discrete distributions, emphasizing their practical limits (Section 2).An extensive Monte Carlo study, implemented through an adhoc package developed under the R environment, assesses and compares the performance of these estimators, in terms of bias and variability, for different combinations of the parameters and sample sizes (Section 3).The estimation procedures are applied to two datasets taken from the literature (Section 4), and, finally, the summarizing remarks conclude the paper (Section 5).

Parameters Point Estimation
Focusing on the point estimation of the parameters of the discrete Weibull r.v., based on an observed simple random sample x = ( 1 ,  2 , . . .,   ) of size , three techniques are now described: the method of proportion, which is strictly related to the specific features of the distribution function of the discrete Weibull r.v.; the maximum likelihood method; and the method of moments.Below, we assume that both parameters are unknown.Special attention will be devoted to detecting samples leading to implausible estimates or to no estimate at all with either technique.
Method of Proportion.This method was originally introduced in Khan et al. [9] for type I discrete Weibull and was extended by Jazi et al. [10] to discrete inverse Weibull.It relies upon the estimation of a probability (or two or more probabilities, according to the number of parameters involved) using the corresponding sample proportion(s).For the present model, we have ( = 0) = 1 − exp(−), which can be estimated through the proportion of 0's in the sample, /, where  = ∑  =1    =0 denotes the number of 0's in the sample,   the indicator function, which equals 1 if  is true and 0 if  is false.Thus, an estimate of  is Similarly, the probability ( = 1) = exp(−)[1 − exp(−2  )] can be estimated by the proportion of 1's in the sample, /, with  = ∑  =1    =1 .Consequently, also substituting for  its estimate (8) (see the appendix),  is estimated by Though the method of proportion easily provides an analytical expression for both estimates, an apparent weakness is that this method does not exploit all the information contained in the sample, since the estimates involve only the rates of 0's and 1's.Moreover, it fails to provide feasible estimates for  if there are no 0's in the sample.In this case, ĉ = 0, which is a boundary value for , and β in (9) cannot be computed.This is particularly common for small samples, and for small values of .If in the sample there are no 1's ( = 0), then by formula (9), the estimate β is not available again.The method fails in computing an estimate for  even if the sample contains only 0's and 1's ( +  = ).
The method of proportion can also lead to "implausible" estimates of , that is, estimates that do not belong to its parameter space: β < −1.This happens when which is equivalent to say or in terms of The probability of an "implausible" estimate could be theoretically derived from the last inequality, remembering that  and  are the first two marginal distributions of a trinomial r.v. with parameters (, . Figure 2 shows the combinations of / and / leading to such implausible estimates as a subset of the triangular region (0 ≤ / ≤ 1; 0 ≤ / ≤ 1 − /).For example, in the sample x = (0, 0, 0, 1, 2, 2, 3, 5, 6, 8), with / = 0.3 and / = 0.1, the method of proportion provides, through ( 8) and ( 9), the estimates ĉ = 0.357 and β = −1.210.

Maximum Likelihood Method.
Having defined the loglikelihood function as (, ; x) = log ∏  =1 ( =   ; , ), we obtain The maximum likelihood estimates of  and  (ĉ ML , βML ) are defined as the values that maximize the log-likelihood function The two first derivatives of (, ; x) are quite easily computed, but, as already noted in Padgett and Spurrier [6], the solution to the maximization of (, ; x) (with the constraints that  and  belong to in their natural parameter spaces) cannot be derived in a closed form, by equating the expressions in (15) to zero, but can be obtained only numerically, for example, using one of the functions nlm, optim, and Rsolnp in the R programming environment, which allows the user to solve nonlinear constrained or unconstrained minimization/maximization problems.Even this method cannot be successfully applied to every possible sample; in particular, the method fails in providing a solution if  +  =  (i.e., if the sample contains only 0's and 1's).In this case, in fact, it can be easily proved that the first-order  derivative of the log-likelihood is never null; the log-likelihood does not have an absolute maximum in the parameter space.Let us consider, for example, the following sample: x = (0, 1).The corresponding likelihood function is given by and the log-likelihood The first-order derivatives are The derivative (18b) is never null but tends to zero for  → +∞; when  → +∞ the third addend in the derivative (18a) goes to zero, and solving the equation Note that this is the same estimate the method of proportion supplies.

Method of Moments.
Through this method, the parameter estimates are obtained solving, in terms of  and , the equations where  1 and  2 are the first and second sample moments: /.Since they cannot be solved analytically, as suggested in Khan et al. [9], one can numerically minimize, with respect to  and , the following quadratic loss function: The task can be carried out, for example, again using the functions nlm, optim, or solnp in the R environment [11].
The solution is the couple (ĉ  , β ), which should correspond to the value L(ĉ  , β ) = 0. Note that the minimization should be subject to the natural constraints  > 0 and  > −1.Without these constraints, the minimization algorithm may get stuck in implausible intermediate-step solutions.Indeed, the optim function under the R environment seems to provide excellent results in terms of convergence to the optimal solution, even without setting the constraints on  and .As to the launch values for  and , required by any minimization function, one can set  =  0 = 0 and  =  0 = log[(1 + )/].This pair of values is always computable and feasible (unless the sample contains all 0's) and ensures at the first iteration that  1 = E(): in fact, recalling (6), When the sample contains only 0's and 1's, the method of moments is not applicable.To see this, first consider the probability mass function of the discrete Weibull r.v., and note that by letting  tend to +∞ in (3), it degenerates into a r.v. that takes only two values: 0 with probability 1 − exp(−) and 1 with probability exp(−).It is then clear that if the sample is made up of a fraction  of 0's and a fraction (1 − ) of 1's, the equality of both first and second moments computed on the sample and on the original r.v.holds for 1 − exp(−) = ; that is,  = − log(1 − ) and  → +∞.In this case, the loss function L(, ) does not admit an absolute minimum, it only admits an inferior limit.
For some samples, the numerical minimization procedure can require a huge computation time, much larger than that required by the maximum likelihood method.This is due to the iterated calculation of the first and second moments, which is itself numerical and particularly time consuming for the negative values of  (in this case, in fact, the convergence of the series in ( 6) and ( 7) is slower).
Given the complexity of the estimators derived through the methods listed in this section (and only the method of proportion provides an analytical expression for them), not as much can be analytically derived about their statistical properties for finite sample size, that is, bias and variability.When the method of proportion can be applied, it provides a consistent estimator for ; β is consistent as well, but nothing can be said about their unbiasedness (see the analogous discussion in Khan et al. [9] for type I discrete Weibull).For large samples, the general properties of the estimators derived from the maximum likelihood method and the method of moments can be recalled.In the next section, an extensive simulation study is presented, which was performed to investigate the performance of the estimation methods and outline some practical advice for their use.

Simulation Study
The estimators presented in the previous section were investigated through an extensive Monte Carlo study; they were compared in terms of bias (), defined as ( θ ) = E( θ ) − , and root mean square error (RMSE), defined as RMSE( θ , where  denotes one of the two parameters ( or ) and θ denotes one of the three corresponding estimators, according to the method indicated by the subscript  (method of proportion, ; maximum likelihood method, ML and method of moments, ).

Simulation Design.
In this study, several parameter combinations and sample sizes ( = 20, 50, 100) were considered.The values (, ) were chosen in order to explore a large spectrum of the discrete Weibull distribution, in particular to comprise increasing, constant, and decreasing failure rates.At the same time, the parameters were set in order to keep the discrete nature of the distribution reasonable: values entailing a nonnegligible probability for a large number of integer values were deliberately excluded (in this case, a continuous r.v.should be preferred to model failure data); parameter values ensuring a nonnegligible probability for just the first integers were avoided as well (one assumes the component that is monitored is likely to last for more than 1 or 2 cycles).Table 1 shows the combinations of values of  and  examined in the simulation study along with the corresponding expected value, standard deviation, and 99% quantile.A note about the computation of the expected value and standard deviation is due: they are calculated numerically-see formulas ( 6) and ( 7)-considering the first  max integers, with  max = 2 −1 (1 − ) and  as small as possible (here,  = 0.0001, which actually proved to be a satisfactory practical choice for the considered scenarios).

Software Implementation.
The simulation study was based on 5,000 Monte Carlo replications and carried out under the R environment [11].In particular, the necessary code was developed to implement the probability mass function, the cumulative distribution function and its inverse, and the random generation for the discrete Weibull model; to compute the first and second moments; and to to realize the algorithms corresponding to each estimation method.This code, structured as an R package, DiscreteWeibull, is freely available in the CRAN repository [12].

Simulation Results.
Table 2 shows the bias and root mean square error for both estimators derived from each of the three methods, under each combination of the two parameters and for each sample size.The results took into account the applicability limits of the three methods, computing the Monte Carlo averages over the feasible samples only.For the smallest sample size ( = 20) and, more rarely, for  = 50, the method of proportion met a certain percentage of infeasible samples under several scenarios, the highest (about 24%) with  = 20,  = 0.1, and  = 0.25.The maximum likelihood method and the method of moments encountered only a few infeasible samples under some scenarios, whose rate was in any case always smaller than 1%.
Note that strictly speaking, to compare the behavior of the estimators under different scenarios, one should use their relative bias and relative root mean square error as performance indexes, that is, the bias and the root mean square error divided by the MC expected value of the estimator.This way of proceeding would provide a more correct "double" reading of the results, assessing the performance of a single estimator moving through different scenarios and comparing the performance of the three estimators under a fixed scenario.Our choice of using "absolute" instead of "relative" indexes was partially dictated by the necessity of including negative and null values for , which would represent difficulty when computing the relative indexes (the denominator could take very small values, dramatically amplifying the corresponding absolute value of the index and making the reading arduous).
Trying to synthesize the results, let us start with the largest sample size ( = 100) and the estimators of parameter .It is quite evident that all three estimators show a modest bias, regardless of the values of  and .The maximum likelihood method looks the best in this sense, while the method of moments shows the largest bias in absolute value for high values of  and negative values of , and it is often negatively biased.As to the root mean square error, the method of moments and the maximum likelihood method present very similar values under each scenario; the method of proportion shows a larger value, especially for small values of .The magnitude of RMSE seems to be much more affected by the value of  rather than by  for all the estimators.
As to the estimators of , on average, the method of proportion provides the least biased estimator, unless  is too small, while the maximum likelihood method and, to a larger extent, the method of moments are significantly biased.While  and ML estimators always overestimate the true value of the parameter, the  estimator seems to underestimate it for lower values.In absolute value, the bias of the ML estimator increases as  increases for a fixed ; it increases as  increases for a fixed .The bias of the  estimator increases as  increases for a fixed ; it increases as  decreases for a fixed .In terms of variability, the  and ML estimators perform in a similar way for each parameter combination, while the  estimator is much more variable and presents very different values moving through the scenarios: its RMSE is from 2 to 6 times larger than the corresponding value of its competitors.The worst scenarios for the estimator, in this sense, are those connected with  = 0.1, and this is quite reasonable.In fact, in this case, the probability ( = 0) equals 0.095, which corresponds to the expected fraction of zeros in the sample, and this small value leads to a modest performance of the  method, which uses only proportions of 0's and 1's of the samples and discards the other information.
Decreasing the sample size to 50 and 20, the bias in absolute value and the root mean square error of the estimators for each scenario obviously increase.However, the behaviors and trends exposed for  = 100 hold still.The bias of the estimators of  derived by the method of moments and the maximum likelihood method becomes substantial especially for large values of  and negative values of ; in these cases, the method of proportion is less biased, but its RMSE is still much larger than those of its competitors.For high values of  (viz., equal to or larger than 0.8) and  = 20, the bias in absolute value of the estimators of  for the method of proportion and the method of moments tends to become much more substantial than that for the method of maximum likelihood.
Looking at the overall values of RMSE for the estimators of  and  obtained through the three methods, it is also evident that much more uncertainty is associated with the point estimation of the second parameter.
Figure 3 shows the MC distributions of the estimators of  and  for  = 50 and three combinations of parameters.From the analysis, it emerges that the positive bias of βML and β is due to the presence of a certain number of samples providing estimates much larger than the true value of , while the MC medians of both estimators are very close to it.These boxplots also emphasize the presence of implausible values for the estimates of  yielded by the method of proportion ( β < −1).
Trying to synthesize all the results presented herewhich are not, however, exhaustive-the method of proportion, despite its straightforward analytical derivation of the estimators, performs worse overall (especially in terms of variability) than the method of moments and the maximum likelihood method, and it is competitive only for some specific scenarios, namely, for medium/high values of , where it can better exploit the information contained in the sample.

Applications to Real Data
The methods that have been illustrated and empirically investigated in the previous sections are applied to two datasets.The first one, called J1 [13], contains the number of failures of software observed over 62 weeks, and its frequency distribution is reported in Table 3.Assuming that the statistical distribution underlying the data is a type III discrete Weibull, it is possible to compute the estimates yielded by the three estimators described in Section 3, which are reported in Table 4.Note that the method of moments and the maximum likelihood method provide very similar estimates for  and ; the method of proportion supplies an estimate of  that is quite different from the other two methods (moreover, it is negative), while the estimate of , ĉ is quite close to ĉ and ĉML .If we want to test the goodness of fit of the discrete Weibull model for the data, we can use the chi-square statistic  = ∑  =1 (  −  ) 2 /(  ), where   denotes the category frequencies and   denotes the probability of an observation falling into the th category under the study model, such that ∑  =1   = 1 and ∑  =1   = .If the model holds,  is asymptotically distributed as a chi-square with  − 1− degrees of freedom, where  is the number of parameters to be estimated (in this case,  = 2).Before computing the empirical value of , under each model we have to group the empirical distribution of the data (many 0's and 1's) is favorable to the method of proportion, and the large sample size ( = 647) should ensure that all three methods are reliable.Testing the goodness-of-fit of the type III discrete Weibull model with the ML parameters, which requires grouping the last two categories (then  = 5), the value of the chi-square statistic with  − 1 −  = 2 degrees of freedom is 3.9595, and the corresponding -value is 0.1381.Note that this model fits the data better than the natural negative binomial model considered by Greenwood and Yule [14] and the generalized Poisson model proposed by Consul and Jain [15].

Conclusion
This paper examined three estimators for the parameters of the type III discrete Weibull random variable, which represents an alternative distribution to the geometric and negative binomial for modeling discrete reliability data, and can ensure increasing and decreasing failure rates.One of the methods presented (method of proportion) has recently been introduced in discrete models, and here it is newly phrased; it provides a closed form for the estimates of both parameters.The other two methods (maximum likelihood method and method of moments) are standard approaches for estimating parameters, but due to the complex expression of the probability mass function, they provide the estimates as a numerical solution to a minimization/maximization problem.It follows that not as much can be said about the statistical properties of the estimators for a finite sample size; then, an extensive Monte Carlo simulation study was carried out to assess their behavior.Far from giving a definitive solution to the problem, the study highlighted that the method of proportion, when applicable, can provide reliable estimates even for small sample sizes only under specific parameter configurations, whereas under other configurations it may provide poor results, especially in terms of the accuracy of the estimator of the second parameter.The other two standard methods can be usefully adopted under most scenarios; caution is necessary since some parameter combinations may lead to nonnegligible bias of the corresponding estimators.
The paper stressed the potential practical applicability limits of each method, also in terms of computational burden, and illustrated their use through two examples with real data.

Figure 1 :
Figure 1: Probability mass function of type III discrete Weibull distribution for three combinations of parameters.

Figure 2 :
Figure 2: "Plausible" and "implausible" regions for the estimate of  through the method of proportion.

Table 1 :
Parameter combinations and corresponding expected value, standard deviation, and 0.99-quantile for the simulation study.

Table 2 :
Simulation results: bias (B) and root mean square error (RMSE) for the estimators of  and .
P: method of proportion, M: method of moments, and ML: maximum likelihood method.

Table 3 :
Empirical frequency distributions for the first example dataset.

Table 4 :
Estimates of the parameters of the discrete Weibull for the first example dataset.

Table 5 :
Empirical and theoretical frequency distributions for the first example dataset.

Table 6 :
Empirical frequency distribution for the second example dataset.

Table 7 :
Estimates of the parameters of the discrete Weibull for the second example dataset.