Data Transformation for Confidence Interval Improvement: An Application to the Estimation of Stress-Strength Model Reliability

In many statistical applications, it is often necessary to obtain an interval estimate for an unknown proportion or probability or, more generally, for a parameter whose natural space is the unit interval. The customary approximate two-sided confidence interval for such a parameter, based on some version of the central limit theorem, is known to be unsatisfactory when its true value is close to zero or one or when the sample size is small. A possible way to tackle this issue is the transformation of the data through a proper function that is able to make the approximation to the normal distribution less coarse. In this paper, we study the application of several of these transformations to the context of the estimation of the reliability parameter for stress-strength models, with a special focus on Poisson distribution. From this work, some practical hints emerge on which transformation may more efficiently improve standard confidence intervals in which scenarios.


Introduction
In many fields of applied statistics, it is often necessary to obtain an interval estimate for an unknown proportion or probability or, more generally, for a parameter whose natural space is the unit interval [1]. If is the unknown parameter of a binomial distribution, the customary approximate twosided confidence interval (CI) for is known to be unsatisfactory when its true value is close to zero or one or when the sample size is small. In fact, estimation can cause difficulties because the variance of the corresponding point estimator is dependent on itself and because its distribution can be skewed. A number of papers have been devoted to the development of more refined CIs for (see, e.g., [1][2][3][4]). Here, we will consider the estimation of the probability = ( < ), where and are two independent rv's. If represents the strength of a certain system and the stress on it, represents the probability that the strength overcomes the stress, and then the system works ( is then referred to as the "reliability" parameter). Such a statistical model is usually called the "stress-strength model" and in the last decades has attracted much interest from various fields [5,6], ranging from engineering to biostatistics. In these works, inferential issues have been dealt with, mainly in the parametric context. The problem of constructing interval estimators for has been considered; when an exact analytical solution is not available, approximations based on the delta method and asymptotic normality of point estimators are carried out, some of them making use of some data transformation of the point estimate of .
In this work, we concentrate on the case of Poisson-distributed stress and strength. Approximate large-sample CIs for the reliability have already been built and assessed for different parameter and sample size configurations and have been proved to give satisfactory results unless is close to 1 (or, symmetrically, 0) or the sample sizes are too small [7]. With the aim of improving the performance of such CIs, four transformation functions (logit, probit, arcsine root, and complementary log-log) are selected and applied to the maximum likelihood estimate of , and the resulting CIs are 2 Advances in Decision Sciences empirically compared in terms of coverage probability and expected length.
The paper is laid out as follows: in Section 2 a brief discussion on data transformations is presented, with a special focus on those ordinarily used in connection with the estimation of the reliability parameter of a stress-strength model. Section 3 introduces the stress-strength model with independent Poisson-distributed stress and strength, recalling the formulas for reliability, maximum likelihood estimator, and standard large-sample CI and introducing refinements for the latter based on data transformations. Section 4 is devoted to the Monte Carlo simulation study, which empirically assesses the statistical performance of CIs. An example of application on real data is provided in Section 5, and Section 6 concludes the paper with some final remarks.

Transformations and Application to the Estimation of a Stress-Strength Model
In almost all fields of research, one has to deal with data that are not normal. It is common practice to transform the nonnormal data at hand in order to exploit theoretical results that strictly hold only for the normal distribution, with the objective of building plausible or more efficient estimates. Citing [8], "transformations of statistical variables are used in applied statistics mainly for two purposes. " The first one is "variance stabilization. " "A transformation is applied in order to make it possible to use, at any rate approximately, the standard techniques associated with continuous normal variation, for example, the methods of analysis of variance. In particular, transformations are required which stabilize the variance, that is, which make the variance of the transformed variable approximately independent of, for example, the binomial probability or the mean value of the Poisson distribution.
[⋅ ⋅ ⋅ ] The constant-variance condition has led to the introduction of the inverse sine, inverse, sinh and square-root transformations which are used nowadays in many fields of applications. " In linear regression models, the unequal variance of the error terms produces estimates that are unbiased but are no longer best in the sense of having the smallest variance [9]. The second purpose is "normalization. " "A transformation is used in order to facilitate the computation of tail sums of the distribution by the aid of the normal probability integral [⋅ ⋅ ⋅ ]. A review of the literature shows that a considerable number of transformations of binomial, negative binomial, Poisson. and 2 variables have been proposed. " In regression models, for example, nonnormality of the response variable invalidates the standard tests of significance with small samples since they are based on the normality assumption [9]. Reference [10] noted how "approximately symmetrizing transformation of a random variable may be a more effective method of normalizing it than stabilizing its variance. " Reference [11], although dated, and the more recent reference [12] provide an exhaustive review of transformation used in statistical data analysis. If a random variable , whose probability mass function or density function depends on a parameter , is transformed by a function = ( ), which we suppose henceforth to be strictly monotone, the standard deviation of , according to the delta method (see, e.g., [13]), is given approximately by where ( ) and ( ) are the expected value and standard deviation of . Then the (approximate) standard deviation of the transformed random variable can be made equal to a constant if the function is chosen so to satisfy the following relationship: Thus, for example, if is distributed as a Poisson random variable with parameter being = 2 = , we obtain that the function ( ) = ∫ −1/2 = * √ is a variance stabilizing function, with and * proper positive constants. Formula (2) has been empirically shown to provide reasonable stabilization in various applications, as confirmed by its extensive use, but other criteria can be employed based on different "notions" of variance stabilization [8,12]. Otherwise, modifications to the function derived by (2) can be proposed, for example, in order to reduce or remove the bias [14]. Reference [15] studied the root transformation ( ) = √ + for a Poisson-distributed random variable and demonstrated that for = 1/4 the root-transformed variable √ + has vanishing first-order bias and almost constant variance.
Whichever criterion is selected, very often the proper transformation to be adopted is tied to the particular statistical distribution underlying the data. However, as much as often the exact distribution of an estimator for a probability/proportion is not easily derivable, even if the distribution that the sample data came from is known. This sometimes happens, for example, for the maximum likelihood estimator (MLE) of the reliability parameter of a stress-strength model. In this case, we do not know "a priori" which transformation may fit the data (i.e., the distribution of the MLE) best.
The focus of this paper is the estimation of a probability; thus we will confine ourselves to transformation of proportions. For this case, among the most used transformations, here we recall the logit, probit, arcsine, and complementary log-log. Table 1 reports the expression, the image, the first derivative, and the inverse of these four functions.
The logit and probit transformations are widely used in the homonym models [16] and more generally when dealing with skewed proportion distributions. They are similar since they are both antisymmetric functions around the point = 0.5; that is, ( + 0.5) = − (− + 0.5). The difference stands in the fact that the logit function takes absolute values larger than the probit, as can be noted looking at    the stress-strength model, the logit transformation has been by far the most used transformation for improving the statistical performance of standard large-sample CIs; among others, it has been considered by [17][18][19], all these contributions concerning the Weibull distribution, by [20] for the Lindley distribution, by [21] when stress and strength follow a bivariate exponential distribution, and by [22,23] in a nonparametric context. Through the years, the arcsine root transformation has gained great favor and application among practitioners [24], perhaps more than its real merit. As noted previously, it should be chosen for stabilizing binomial data but is often used to stabilize sample proportions as well (i.e., relative binomial data). However, it presents a drawback highlighted in [25] and related to the fact that its codomain is the limited interval (0, /2); thus its normalizing effect may turn out to be meagre, as already pointed out in some studies where it has been used for estimating the reliability of stress-strength models [19,25,26].
The complementary log-log function, which is sometimes used in binomial regression models, is slightly different from logit and probit since it assumes negative values for < 1 − e −1 (and positive values for > 1 − e −1 ) and takes large positive values only for values of very close to 1; for example, it takes values larger than 1 if and only if > 1 − e −e = 0.934.
In the next section, we will apply these transformations to the estimation of a stress-strength model with both stress and strength following a Poisson distribution.

Inference on the Reliability Parameter for a Stress-Strength Model with Independent Poisson Stress and Strength
Let and be independent rv's modeling stress and strength, respectively, with ∼ Poisson ( ) and ∼ Poisson ( ). Then, the reliability = ( < ) of the stress-strength model is given by (see [6, page 103]) If two simple random samples of size and from and , respectively, are available, the reliability parameter can be estimated by the ML estimator, obtained substituting in (3) the MLEs of the unknown parameters and : Deriving the expression of the variance of the MLE is not straightforward; however, an approximate expression has been easily derived through the delta method in [7]. Based upon this approximation, a large-sample 1 − CI for has been built. Such an interval estimator has the usual expression where V(̂) is the sample estimate of the (asymptotic) variance of̂(see [7] for details).
Although such an estimator has been proved to have a satisfactory behavior in terms of coverage for several combinations of sample sizes and values of the reliability parameter, some decay of the performance is observed when gets close to the extreme values 0 and 1 and when sample sizes are small. In these cases, the approximation to the normal distribution is in fact very rough, especially because of the skewness of the distribution of̂. Thus, transformations of the values of the estimateŝcan be considered in order to "make the data more normal" and produce CIs based on transformed data that have a coverage closer to the nominal level.
For the other transformations, following the same steps just outlined, approximate "transformed" CIs for can be obtained, which are alternative to the standard naïve CI of (5); they are synthesized in Table 2.
An alternative way to make inference on the reliability parameter for the stress-strength model with independent Poisson random variables can be summarized as follows: (1) to transform (normalize) samples from and according to the a proper transformation for the Poisson distribution; (2) to compute point and interval estimates of using the methods for a stress-strength model with independent normal variables with known variances [6, page 112]. Letting Ξ = √ + 1/4 and Υ = √ + 1/4, then Ξ and Υ are independent and approximately distributed as normal random variables with variance 2 = 2 = 1/4 and expected value = √ and = √ , respectively. Then, instead of estimating = ( < ), one can estimate = (Ξ < Υ); a 1 − confidence interval for is given by with = ( 2 + 2 )/( 2 / + 2 / ) and̂= ( − )/ being an estimator of = ( − )/ , where and are the sample means of Ξ and Υ, respectively, and = √ 2 + 2 = √1/2 . An alternative procedure to make inference about the reliability parameter can be provided by parametric bootstrap [27, pages 53-56]. In the basic version, it works as follows for the estimation problem at hand.
(1) Estimate the unknown parameters of the Poisson rv's and through their sample means and .
(2) Draw independently a bootstrap sample * of size from a Poisson rv * with parameter and a bootstrap sample * of size from a Poisson rv * with parameter .

Scope and Design.
The simulation study aims at empirically comparing the performance of the interval estimators presented in the previous section, namely, the standard CI of (5), labeled "AN, " those of Table 2 (labeled "logit, " "probit, " "arcsine, " "loglog"), and the CIs of (8) and (9), in terms of coverage rate (and also lower and upper uncoverage rates) and expected length. In this Monte Carlo (MC) study, the value of parameter of the Poisson distribution for stress is set equal to a "reference" value 2, and the parameter of the Poisson distribution modeling strength is varied in order to obtain-according to (3) Note that for the smallest sample size, that is, when = 5 (or = 5), a practical problem arises as the sample values of (or ) may all be 0; in this case, the variance estimate V(̂) cannot be computed and a standard approximate CI cannot be built. We decided to discard these samples from the 5,000 MC samples planned for the simulation study and to compute the quantities of interest only on the "feasible" samples. Indeed, the rate of such "nonfeasible" samples is in any case very low under each scenario. For the worst ones, characterized by = 0.1 and = 5, the theoretical rate of "nonfeasible" samples is about 5%.
Since results for the coverage rates of some CIs are often close to the nominal level 0.95, we performed a test in order to check if the actual coverage rate is significantly different from 0.95, that is, to state if such CIs are conservative or liberal. The null hypothesis is that the true rate of CIs that do not cover the real value is equal to = 0.05: 0 : = = 0.05, whereas the alternative hypothesis is that the rate is different from : 1 : ̸ = . We employ the test suggested in [28, pages 518-519], which is based on the statistic where is the number of rejections of 0 in (5, (8) and (9) are not reported here as their performances are overall unsatisfactory. Even if theoretically appealing, the procedure leading to the CI in (8) practically fails; some of the scenarios of the simulation plan were considered, and the CI built following this alternative procedure never provides satisfactory results. For the smallest sample size ( = 5) the coverage proves to be larger than that provided by the AN CI but however smaller than the nominal one; for the larger sample size ( = 10, 20, 50) the coverage rate dramatically decreases (even to values as low as 60%). Paradoxically, the discrete nature of the Poisson variable and thus the quality of the normal approximation of step (1) affect results to a more relevant extent as the sample size increases. As to the bootstrap procedure yielding the CI in (9), this solution is computationally cumbersome; it becomes even more time consuming if bias-corrected accelerated CIs [27, pages 184-188] have to be calculated and it provides barely satisfactory results, as proven by a preliminary simulation study not reported here that confirms the findings reported in [25] for parametric bootstrap inference on the reliability parameter in the bivariate normal case. The rejection of 0 based on the test statistic (10) described in the previous subsection is indicated by a " * " near the actual coverage rate of each CI. Figures 2 and 3 graphically display the values of lower and upper uncoverage rates for the five interval estimators, varying in {0.5, 0.6, 0.7, 0.8, 0.9, 0.95}, for = = 5 and = = 50, respectively. These results comprise only the equal sample size scenarios: those for unequal sample size scenarios do not add much value to the general discussion we are going to outline.
(i) First, the improvement provided by the four transformations to the standard naïve CI of in terms of coverage is evident. This improvement is considerable for a small sample size and extreme values of (say, 0.95) when the coverage rate of the standard naïve CI tends to decrease dramatically, even below 90%. Note that for the sample sizes 5, 10, and 20, the actual coverage rate of the standard naïve CI is always significantly different (smaller) than the nominal level. Under some scenarios, namely, for = 0.3-0.7, the increase in coverage rate is accompanied also by a reduction in the CI average length. On the contrary, for the other values of , there is an increase in the average width of the modified CIs, which is much more apparent for the logit transformation. (ii) Examining closely Table 3, one can note that the CI based on the arcsine root transformation is, except for one scenario, uniformly worse in terms of actual coverage than the other three CIs based on logit, probit, and complementary log-log transformations. For small sample sizes ( = 5, 10), the coverage rate actually provided is significantly different (smaller) than the nominal one. It can also be claimed that this unsatisfactory performance is due to its incapability of symmetrizing the distribution, as can be seen by glancing at the lower and upper uncoverage rates for values of getting close to 1; there is an apparent undercoverage effect on the left side (just a bit smaller than that of the "original" standard approximate CI). (iii) Logit, probit, and complementary log-log transformed CIs have overall a satisfactory performance in terms of closeness to the nominal confidence level. Logit and probit CIs exhibit a similar behavior as both the lower and upper uncoverage rates they provide are close to the nominal value (2.5%). On the contrary, complementary log-log function (as well as arcsine root) often produces a larger undercoverage on one side (here, left), which is partially balanced by an overcoverage on the other side (here, right). This is clear evidence of the higher symmetrizing capability of the logit and probit transformations. However, taking into account the results of the hypothesis test based on the statistic (10), the probit and complementary loglog functions are those that overall perform best; the statistical hypothesis that their actual coverage rate is equal to the nominal one is always accepted except for one case ( = 0.1 and = 5), whereas the same hypothesis is sometimes rejected (10 times out of 40) for the logit function (which tends to produce significantly larger coverage for small sample sizes).  (iv) Differences in behavior among the four "transformed" CIs tend to become more apparent for small sample sizes and for extreme values of (close to 0 or 1) as can be seen looking at the bottom right graph of Figure 2. Clearly, the same differences tend to vanish as the sample sizes increase and tends to 0.5 (see the top left graph in Figure 3).

Advances in Decision Sciences
(v) Contrary to what one would predict, the performances in terms of coverage rate of logit, probit, and complementary log-log-based CIs do not seem to be negatively affected by sample size; in fact, as underlined before, an (sometimes significant) increase of the coverage rate of the logit CIs is noticed when the sample size is small ( = 5), whereas the other two CIs show an almost constant trend. The CI exploiting the arcsine root transformation is the one taking most advantage from the increase in sample size. Obviously, there is an increase in the average length of the CIs moving to larger sample sizes.
(vi) As one could expect, there is a symmetry in the behavior of each CI when considering two complementary values of (i.e., values of summing to 1) keeping the sample size fixed; the values of the coverage rate and average width are similar. As to the uncoverage rates, they are nearly exchangeable for AN, logit, probit, and arscin CIs; that is, a lower uncoverage error for a fixed value of is similar to the upper uncoverage error for its complementary value (or at least their relative magnitude is exchangeable). This feature does not hold for log-log CIs, which always present a leftside uncoverage error larger than that of the rightside. These features are weakened for equal to 0.9 (and 1 − then equal to 0.1) probably because of the presence of "nonfeasible" samples as discussed in Section 4.1, which distorts this symmetry condition.
(vii) Finally-the relative results are not reported here for the sake of brevity-moving to larger values (i.e., larger than 2) of , keeping constant, seems to bring benefit to the performance of all the CIs as can be quantitatively noted by the reduced number of rejections of the null hypothesis of equivalence between actual and theoretical coverage. This may be explained by the fact that for a large parameter , the Poisson distribution tends to a normal distribution; therefore, larger values of imply a better normal approximation to Poisson and, presumptively, to the MLÊ.
In Figure 4, for illustrative purposes, the MC distribution of̂and the transformed data according to the four transformations are displayed ( = 0.8 and = = 5). We can note at a glance that logit and probit produce distributions closer to normality than arcsine root and log-log complementary functions, which do not seem completely able to "correct" the skewness of the original distribution. However, implementing the Shapiro-Wilk test of normality on the four transformed distributions leads to very low -values, practically equal to 0, except for the probit function whose -value is 0.03132, thus proving that in this case all the transformed data distributions are still far from normality. . Superimposed, the density function of the normal distribution with mean and standard deviation equal to the mean and standard deviation of the data. Note that neither the -axis nor -axis shares the same scale in the different plots.

An Application
The application we illustrate here is based on the data described in [29] and already used in [7], to which we redirect for full details. On these data, we build the four 95% CIs based on the transformations of Table 2, along with the standard one. The results (lower and upper bounds and length of the CIs) are reported in Table 4. Although the five CIs are not much different from each other, nevertheless we can note that all four transformation-based intervals have both lower and upper bounds smaller than the corresponding bound for the standard naïve interval (i.e., they are left shifted with respect to it); the logit transformation yields an interval a bit wider than the standard one, whereas the other three transformations produce a slight decrease in its length.

Conclusions
This work provided an empirical analysis of the convenience of data transformations for estimating the reliability Advances in Decision Sciences 9 parameter in stress-strength models. We focused on the model involving two independent Poisson random variables since this is a case where the distribution of the MLE is not easily derivable and exact CIs cannot be built; thus, transformation of the estimates is a viable tool to refine the standard naïve interval estimator derived by the central limit theorem. Four transformations were considered (logit, probit, arcsine root, and complementary log-log) and empirically assessed under a number of scenarios in terms of the coverage and average length of the CI they produced. Results are in favor of logit, probit, and complementary log-log functionsalthough to varying degrees-which ensure a coverage rate close to the nominal coverage even with small sample sizes. On the contrary, the arcsine root function, although improving the performance of the standard CI, often keeps its coverage rates under the nominal confidence level. These findings were to some extent predictable since logit and probit transformations are very popular link functions in generalized regression models but are a bit surprising with regard to the complementary log-log function whose use is limited. Future research will ascertain if such results hold true for other distributions for the stress-strength model and will eventually inspect alternative data transformations.

Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.