The Complementary Exponentiated Exponential Geometric Lifetime Distribution

We proposed a new family of lifetime distributions, namely, complementary exponentiated exponential geometric distribution. This new family arises on a latent competing risk scenario, where the lifetime associated with a particular risk is not observable but only the maximum lifetime value among all risks. The properties of the proposed distribution are discussed, including a formal proof of its probability density function and explicit algebraic formulas for its survival and hazard functions, moments, rth moment of the ith order statistic, mean residual lifetime, and modal value. Inference is implemented via a straightforwardly maximum likelihood procedure. The practical importance of the new distribution was demonstrated in three applications where our distribution outperforms several former lifetime distributions, such as the exponential, the exponential-geometric, the Weibull, the modified Weibull, and the generalized exponential-Poisson distribution.


Introduction
Several new classes of models have been introduced in recent years grounded in the simple exponential distribution.The main idea is to propose lifetime distributions which can accommodate practical applications where the underlying hazard functions are nonconstant, presenting monotone shapes, since the exponential distribution does not provide a reasonable fit in such situations.For instance, we can cite [1], which proposed a variation of the exponential distribution, the exponential geometric (EG) distribution, with decreasing hazard function, [2], which introduced the exponentiated exponential distribution as a generalization of the usual exponential distribution, which can accommodate data with increasing and decreasing hazard functions, [3], which proposed a generalized exponential distribution, which can accommodate data with increasing and decreasing hazard functions, [4], which proposed the exponentiated type distributions extending the Fréchet, gamma, Gumbel, and Weibull distributions, [5], which proposed another modification of the exponential distribution with decreasing hazard function, [6], which generalizes the distribution proposed by [5] by including a power parameter in this distribution, which can accommodate increasing, decreasing, and unimodal hazard functions, [7], which proposed the Poisson-exponential distribution, and [8], which proposed the complementary exponential geometric distribution, which is complementary to the exponential geometric distribution proposed by [1].The last two proposed distributions accommodate increasing hazard functions.
In this paper, following [8], we propose a new distribution family by extending the exponentiated exponential distribution [2] by compounding it with a geometric distribution, hereafter the complementary exponentiated exponential geometric distribution or simplistically the CE2G distribution.The new distribution genesis is stated on a complementary risk problem base [9] in presence of latent risks, in the sense that there is no information about which factor was responsible for the component failure and only the maximum lifetime value among all risks is observed.This family have one shape and two scale parameters accommodating increasing, decreasing, and bathtub failure rates.

The CE2G Model
Let  be a nonnegative random variable denoting the lifetime of a component in some population.The random variable  is said to have a CE2G distribution with parameters  > 0,  > 0, and 0 <  < 1 if its probability density function (pdf) is given by where  is a scale parameter of the distribution, and  and  are shape parameters.Figure 1(a) shows the CE2G probability density function for  = 1,  = 0.05, 0.5, 0.95, and  = 0.3, 1.0, 3 and we can see that the function can be decreasing or unimodal.
The th quantile of the CE2G distribution is given by where  has the uniform (0, 1) distribution and () = 1 − () is the distribution function of .
Consider that in the study of reliability we can observe only the maximum component lifetime for each component among all risks.On many occasions, the information about what risk produces the dead of the component in analysis is not available or it is impossible that the true cause of failure is specified.Complementary risks (CR) problems arise in several areas and an extensive literature is available.Interested readers can see [10][11][12].
Then, in this context, our model can be derived as follows.Let  be a random variable denoting the number of failure causes,  = 1, 2, . . .and considering  with geometrical probability distribution given by where 0 <  < 1 and  = 1, 2, . ... Also consider   ,  = 1, 2, 3, . . .realizations of a random variable denoting the failure times, that is, the time-to-event due to the th CR and, from [2],   has an exponentiated exponential probability distribution with parameters  and , given by  (  ; , ) =  (  ; )  (  ; ) where (⋅) and (⋅) are the pdf and df, respectively, of the exponential distribution with parameter .
In the latent complementary risks scenario, the number of causes  and the lifetime   associated with a particular cause are not observable (latent variables), but only the maximum lifetime  among all causes is usually observed.So, we only observe the random variable given by  = max { 1 ,  2 , . . .,   } .
The following result shows that the random variable  has probability density function given by (1).
Proposition 1.If the random variable  is defined as (7), then, considering (5) and (6),  is distributed according to a CE2G distribution, with probability density function given by (1).
Proof.The conditional density function of (7) given  =  is given by Journal of Probability and Statistics   Then, the marginal probability density function of  is given by This completes the proof.

Some Properties
Many of the most important features and characteristics of a distribution can be studied through its moments, such as mean and variance.A general expression for rth ordinary moment    = (  ) of the CE2G distribution is hard to be obtained and we resume the mean and variance as follows.
The moment generating function of the  variable with density function given by (1) can be obtained analytically, if we consider the expression, given in [13, page 329, Equation For any real number , let Φ  () be the characteristic function of , that is, Φ  () = [  ], where  denotes the imaginary unit.With the preceding notations, we state the following.

Proposition 2. For the random variable 𝑌 with CE2G distribution, we have that its characteristic function is given by
where  = √ −1.
Proof.Consider the following: where the last equality follows from the change of variable  = 1 −  − .
Skewness is a measure of the asymmetry of the probability distribution.The skewness value can be positive or negative, or even undefined.Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values lie to the right of the mean.A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean.The skewness of a random variable , say  1 , is given by the third standardized moment Kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable.In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution.It is common practice to use the kurtosis to provide a comparison of the shape of a given distribution to that of the normal distribution.One common measure of kurtosis, originating with Karl Pearson, say  2 , is based on a scaled version of the fourth moment, given by Algebraic expressions of kurtosis and skewness are extensive to show, due to the fact that is necessary the algebraic moment expressions up order four.This moment can be obtained by algebraic manipulation to determine (), ( 2 ), ( 3 ), and ( 4 ) in ( 14) and ( 15) through the Equation (11). Figure 2 shows the kurtosis ( 2 ) and skewness ( 1 ) of the CE2G distribution for  with  = 1,  = 0.1, 0.5, 0.9 and for  with  = 1,  = 0.3, 1.0, 3.

Order Statistics
Order statistics are among the most fundamental tools in nonparametric statistics and inference.Let  1 , . . .,   be a random sample taken from the CE2G distribution and  1: , . . .,  : denote the corresponding order statistics.Then, the pdf  : () of the th order statistics  : is given by The th moment of the th order statistic  : can be obtained from the following result due to [14]: Consider the binomial series expansion given by Skewness  where ()  is a Pochhammer symbol, given ()  = ( + 1) ⋅ ⋅ ⋅ ( +  − 1) and if || < 1 the series converge, and Proposition 4. For the random variable  with CE2G distribution, we have that th moment of the th order statistic is given by Proof.From ( 2) and ( 18), we have that Using the change of variable ln(1−) = − and the expansion (18) results in the kernel of the gamma distribution function as Now considering (22) in (17) and the property (19), the result follows.

Entropy
An entropy of a random variable  is a measure of variation of the uncertainty.A popular entropy measure is Rényi entropy [15].
If  has the probability density function (1) then Rényi entropy is defined by where  > 0 and  ̸ = 1.
Proposition 5.If the random variable  is defined as (7), then, the Rényi entropy, is given by Proof.From ( 23), we can calculate So, using the (25) in (), the result follows.

Reliability
In the context of reliability, the stress-strength model describes the life of a component which has a random strength  that is subjected to a random stress .The component fails at the instant hat, the stress applied to it exceeds the strength, and the component will function satisfactorily whenever  > .So,  = Pr( < ) is a measure of component reliability.In the area of stressstrength models there has been a large amount of work as regards estimation of the reliability  when  and  are independent random variables belonging to the same univariate family of distributions.
Proposition 6.If the random variable  is defined as (7), then, the reliability  = (, ) for  and  i.i.d is given by Proof.For  and  i.i.d.CE2G r.v.'s where  is the stress and  is the strength, the reliability  = ( < ) is given by This completes the proof.

Residual Lifetime Distribution
Given that there was no failure prior to time , the residual lifetime distribution of a random variable , distributed as CE2G distribution, has the survival function given by The mean residual lifetime of a continuous distribution with survival function () is given by Proposition 7.For the random variable  with CE2G distribution, we have that the mean residual lifetime is given by ( + ) +  + 1 ) . (30) Proof.From (29) and using () given by (2), we have that Now using (18) and making a binomial expansion in a similar way of the proof of Proposition 4 on (22), the result follows.

Inference
Assuming the lifetimes are independently distributed and are independent from the censoring mechanism, the maximum likelihood estimates (MLEs) of the parameters are obtained by direct maximization of the log-likelihood function given by ℓ (, , ) = ln () where   is a censoring indicator, which is equal to 0 or 1, respectively, if the data is censored or observed.The advantage of this procedure is that it runs immediately using existing statistical packages.We have considered the optim routine of the R [16].Large-sample inference for the parameters are based on the MLEs and their estimated standard errors.For (, , ), we consider the observed Fisher information matrix given by where the elements of the matrix   (, , ) are given in the appendix.Under conditions that are fulfilled for the parameters , , and  in the interior of the parameter space, the asymptotic distribution of (α, θ, λ), as  → ∞, is a normal 3-variate with zero mean and variance covariance matrix  −1  (, , ).In order to compare different distributions, we relied upon several authors in the literature, for example, [6,[17][18][19], which use the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values, which are defined, respectively, by −2ℓ(⋅) + 2 and −2ℓ(⋅) +  log(), where ℓ(⋅) is the LogLikehood evaluated in the MLE vector on respective distribution,  is the number of parameters estimated, and  is the sample size.The best distribution corresponds to a lower AIC and BIC values.

Simulation Study
Regarding the performance of the MLEs in the process of estimation, a study was performed based on one hundred generated dataset from the CE2G with six different sets of parameters for  = 20, 50, 100, 200, 500, and 1000.In order to have unbounded parameters, we consider the following restrictions on the parameters in estimation process.For the parameter , we considered the transformation  =   * /(1 +   * ), where  * ∈ R, and for  and  consider an exponential transformation.Based on the literature of the MLEs, we can return on the original parameters thought of the transformations.For the calculus of their variances, we use the delta method.The values (, , ) = (1, 1, 0.5) were used as the initial values for all numerics simulations since  > 0,  > 0, and 0 <  < 1.
The results are condensated in Table 1, which shows the averages of the MLEs, Av(α, λ, θ), together with coverage probability of the 95% confidence intervals for parameters of the CE2G, (, , ), the bias, the mean squarer error, MSE, and their deviance, Sd(α, λ, θ).These results suggest that the MLEs estimates have performed adequately.The deviance of the MLEs decrease when sample size increases.The empirical coverage probabilities are close to the nominal coverage level, particularly, as sample size increases.

Applications
In this section, we compare the CE2G distribution fit with several usual lifetime distributions on three datasets extracted

Figure 1 :
Figure 1: (a) Probability density function of the CE2G distribution.(b) Failure rate function of the CE2G distribution.We fixed  = 1.

Table 1 :
Mean of the MLEs, their deviances, coverages, bias and MSE.