A Simple Normal Approximation for Weibull Distribution with Application to Estimation of Upper Prediction Limit

We propose a simple close-to-normal approximation to a Weibull random variable r.v. and consider the problem of estimation of upper prediction limit UPL that includes at least l out of m future observations from a Weibull distribution at each of r locations, based on the proposed approximation and the well-known Box-Cox normal approximation. A comparative study based on Monte Carlo simulations revealed that the normal approximation-based UPLs for Weibull distribution outperform those based on the existing generalized variable GV approach. The normal approximation-based UPLs have markedly larger coverage probabilities than GV approach, particularly for small unknown shape parameter where the distribution is highly skewed, and for small sample sizes which are commonly encountered in industrial applications. Results are illustrated with a real dataset for practitioners.


Introduction
Weibull distribution is widely used in reliability and survival analysis due to its flexible shape and ability to model a wide range of failure rates.It can be derived theoretically as a form of extreme value distribution, governing the time to occurrence of the "weakest link" of many competing failure processes.Its special case with shape parameter b 2 is the Rayleigh distribution which is commonly used for modeling the magnitude of radial error when x and y coordinate errors are independent normal variables with zero mean and the same standard deviation while the case b 1 corresponds to the widely used exponential distribution.
Let X follow a Weibull distribution with scale parameter a and shape parameter b.The pdf of X is given by where η log a and β 1/ b.They computed value for u n,r,l,m using the following simulation study.
For the given values of n,r,l,m, and 1 − α, the following procedure is repeated N say 100000 times.
ii lth order statistic based on these samples y * i l ; i 1, 2, . . ., r are computed.
Then 100 1 − α th percentile of the generated N values of u is the estimate of u n,r,m,l .Note that the distribution of the pivotal quantity based on which UPL is developed does not depend on any unknown parameters, thus it is an exact method.

The Proposed Close-To-Normal Power Transformation
The proposed transformation is based on the two key features governing normality, namely, the symmetry and tail behaviour of the normal distribution.
Let X follow a two parameter Weibull a, b distribution and the shape parameter b is known.We consider a transformation Y X p for the r.v.X where the power p is chosen so that the distribution of the transformed variable Y has very small deviation from symmetry and simultaneously has tail behaviour very close to that of the normal distribution with the same mean and variance.Straightforward calculations show that the skewness of the distribution of Y is given by which is a function of the ratio θ p/b, and does not depend on the scale parameter a. Treating γ p, b as a function of θ, a solution for γ θ 0 is θ 1 0.2776, for which the distribution of the variable Y X p where p bθ, is exactly symmetric.To achieve control over the tail behaviour, it is noted that the kth central moment of the transformed r.v.Y is E Y k E X kp a kp Γ 1 kθ leading to the mean and standard deviation of Y given by μ y a p Γ 1 θ , σ y a p Γ 1 2θ − Γ 1 θ 2 0.5 .Furthermore, the αth quantile of a normal distribution with mean μ y and standard deviation σ y is given by

3.2
Similarly, if x α is the αth quantile of the Weibull a, b distribution that is, x α a − log 1− α 1/b , it easily follows that the αth quantile y α of the distribution of Y is given by To make the tail behavior of the distribution of Y very close to that of the normal distribution with same mean and standard deviation as that of Y , we solve the equation ξ α − y α 0 for the commonly used choice of α 0.025 and α 0.975 for a two-sided interval leading to the solutions respectively θ 2 0.2698 and θ 3 0.2994.To control the symmetry and tail behaviour of the distribution of transformed r.v.Y simultaneously close to the normal distribution, we suggest taking p θb where θ θ 1 θ 2 θ 3 /3 0.2823 as the power of a Weibull r.v.X.From 3.3 and 3.4 it follows that for this choice of θ, the difference between the two quantiles ξ α and y α is given by a b 0.2823 c α where c α Γ 1 θ Γ 1 2θ − Γ 1 θ 2 0.5 Z α − − log 1 − α θ is a constant depending on α.The values of c α for various commonly used choices of α are given in Table 1.
We note that the constant c α is considerably small for commonly used level of significance α 0.05, and further numerical study revealed that the accuracy of the proposed transformation is very good for small values of b say b < 4 and for small to moderate values of a say a < 100 , which covers a reasonable subset of the parameter space and commonly encountered real situations.We recall that the choice of p is uniform for all a > 0 since γ is free from a.When b is unknown, we take p θ b, replacing b by its mle b.In the sequel we refer to this transformation as the close to normal power transformation CNPT .

UPL Based on the CNPT
Let Y 1 , Y 2 , . . ., Y n be a random sample of size n from a normal distribution with mean μ and standard deviation σ.Let Y and S y be the sample mean and sample standard deviation.Then Davis and McNichols 2 suggested UPL that includes at least l out of m future observations from the same normal distribution at each of r locations as where the value of k u for selected values of n, r, l, m and level α is the solution to the following equation where N t x; v, δ is the cumulative distribution function Cdf of noncentral t r.v. with v df and noncentrality parameter δ, Φ −1 x is inverse Cdf of the standard normal distribution at x, β m, n is the usual beta function and I x; m, n is the Cdf of a beta distribution with parameters m and n.
Let X 1 , X 2 , . . ., X n be a random sample of size n from a Weibull a, b distribution.Let U be the normal based UPL obtained using 3.5 , based on Y i X p i ; where p 0.2823b; i 1, 2, . . ., n.Then U 1/p is the proposed UPL that is expected to include at least l out of m future observations from the Weibull a, b distribution at each of r locations for known shape parameter b, with probability 1 − α.When the shape parameter b is unknown, we suggest to replace it by its mle b and the proposed UPL is U 1/ p where p 0.2823 b.It is noted that p is a consistent estimator for p. Hence for large samples U 1/ p is expected to be close to U 1/p .Small sample behavior of U 1/ p is studied through simulation.

Box-Cox Transformation and Kullback-Leibler Information-(BCKL-) Based UPL
Hernández and Johnson 5 proposed the transformation to Weibull r.v.X for approximating distribution of Y to normal and used a solution λ 0.2654b for known shape parameter b that minimizes the Kullback-Leibler information between the distribution of Y and the normal distribution with the same mean and variance as that of Y .This transformation was used by Yang et al. 6 for obtaining prediction interval for a single future observation from Weibull a, b distribution.Using this transformation for the UPL problem under consideration, a 1 − α 100% UPL for Weibull distribution is 1 λU 1/λ where U is the normal based UPL obtained using 3.5 .As before an unknown value of b will be replaced by its mle b.This also enjoys the large sample properties mentioned above and its small sample behavior is studied through simulation.

Comparison
In this section we compare the performance of above two proposed UPLs with the UPL based on GV method-based on a simulation study with respect to expected lengths and expected coverages of the UPLs.max{X 1 l , X 2 l , . . ., X r l } where X i l is the lth order statistic from X ij for i 1, 2, . . ., r.This procedure is repeated 100000 times.Then the proportions of events and X * < exp η u n,r,l,m β , in these 100000 repetitions are the simulated coverage probabilities of the UPLs based on normal approximation, BCKL transformation, and GV method, respectively.Average of 100000 UPLs based on each of the three approaches discussed above are the simulated expected lengths of the corresponding UPLs.The simulated expected lengths and expected coverages for n 6, 10, 20, a 0.1, 1, 5, 50, 100, and b 0.5, 1, 3 are reported in Tables 2 and 3 respectively.The combinations r 4, l 2, m 6 , r 8, l 1, m 5 , and r 16, l 2, m 5 are chosen.The values of k u for these combinations are computed using 3.6 for α 0.05.

Results of the Simulation Study
Following prominent facts are clearly visible from Table 2.
1 CNPT-based UPLs have uniformly excellent coverage probabilities even for small sample sizes as small as n 6, for all a > 0, b > 0 and for all examined combinations of r, l, and m.The coverages are uniformly a little larger than those based on GV method, and the expected lengths are a little shorter than the same.
2 As mentioned in Section 2, GV method is exact and this fact is reflected in simulation study since its coverages are very close to the nominal coverage probability.

BCKL transformation-based UPLs have close to nominal coverage probabilities.
Based on these observations we recommend the proposed CNPT-and BCKL transformation-based UPLs that include at least l out of m future observations from a Weibull a, b distribution at each of r locations.

Illustrative Example
Nowadays vinyl chloride is one of the fifty most produced chemicals in the world.Its production almost doubled in the last 20 years and currently estimated to be about 27  million tons/year worldwide.A high concentration of vinyl chloride in water can cause cancer and and liver damage.Therefore being toxic and carcinogenic to humans, more attention has to be given to vinyl chloride as a groundwater contaminant.In this section we illustrate the methods discussed in Sections 2 and 3 with a real dataset.
The Kolmogorov-Smirnov test to above dataset for fitting Weibull distribution resulted respective P value 2-tail 0.94 indicating that Weibull is a good model for above dataset.Here a 1.89, b 1.01, sample mean is 1.88 and sample standard deviation is 1.95.The mle b indicates that the above dataset is moderately skewed.In order to compare the proposed 95% UPLs with those of Krishnamoorthy et al. 4 , we chose the various combinations of r, l, and m and are given in Table 4.
From Table 4, it seems that the proposed UPLs are little less than those of Krishnamoorthy et al. 4 .We also notice that all the UPLs are well above the nominal range of vinyl chloride concentration 2.0-2.4 suggested by US Environmental Protection Agency USEPA indicating that future vinyl chloride concentrations are likely to be larger than the nominal level and hence monitoring of these wells is necessary.

Overall Conclusion
The proposed normal approximation exhibits markedly well performance even for small sample sizes for almost all parameter combinations for estimation of UPL that includes atleast l out of m future observations from Weibull distribution at each of r locations.The superiority of normal approximation is much more strong for small shape parameters and small sample

Table 2 :
Simulated expected lengths of 95% UPL that contain at least l of m future observations at each of r locations from Weibull a, b distribution for sample sizes n 6, 10, 20 using GV, CNPT, and BCKL methods * .
For fixed values of the parameters a, b, and sample size n, we generate n random numbers X i ; i 1, 2, . . ., n from the Weibull a, b distribution, and set Y 1i X p i ,

Table 2 :
Continued. . .., n where p 0.2823b and λ 0.2654b.Normal based UPLs, U 1 and U 2 are obtained using 3.5 based on the transformed samples Y 1i and Y 2i , i 1, 2, . .., n respectively.Then U 1 1/p and U 2 1/λ are the UPLs based on the proposed CNPT and the BCKL transformation for Weibull a, b distribution with known shape parameter b.When the shape parameter b is unknown, we suggest to replace it by its mle b.For the same sample X 1 , X 2 , . . ., X n , the GV method based UPL is obtained using 2.1 and the procedure described in Section 2. Next we generate r sets of m random numbers say X ij ; i 1, 2, . . ., r; j 1, 2, . . ., m from Weibull a, b distribution, and set X *

Table 3 :
Percentage simulated coverage probabilities of 95% UPL that contain at least l of m future observations at each of r locations from Weibull a, b distribution for sample sizes n 6, 10, 20 using GV, CNPT, and BCKL methods * .
*Results are obtained assuming unknown scale and shape parameters a, b

Table 4 :
95% Upper prediction limits for the vinyl chloride data.