Maximum Likelihood and Bayes Estimation in Randomly Censored Geometric Distribution

In this article, we study the geometric distribution under randomly censored data. Maximum likelihood estimators and confidence intervals based on Fisher information matrix are derived for the unknown parameters with randomly censored data. Bayes estimators are also developed using beta priors under generalized entropy and LINEX loss functions. Also, Bayesian credible and highest posterior density (HPD) credible intervals are obtained for the parameters. Expected time on test and reliability characteristics are also analyzed in this article. To compare various estimates developed in the article, a Monte Carlo simulation study is carried out. Finally, for illustration purpose, a randomly censored real data set is discussed.


Introduction
Lifetime experiments are conducted to collect data on items under study.The data are used for fitting a suitable lifetime model and then inferring about the statistical properties and survival/reliability characteristics of the items.These experiments may be expensive in terms of both cost and time.To save the cost and time, lifetime experiments may be censored intentionally or censoring may occur in an experiment naturally.Many types of censoring schemes have been studied in literature such as type-I, type-II, progressive, hybrid, and random censoring schemes.
Random censoring is a situation when an item under study is lost or removed randomly from the experiment before its failure.In other words, some subjects in the study have not experienced the event of interest at the end of the study.For example, in a clinical trial or a medical study, some patients may still be untreated and leave the course of treatment before its completion.In a social study, some subjects are lost for the follow-up in the middle of the survey.In reliability engineering, an electrical or electronic device such as bulb on test may break before its failure.In such cases, the exact survival time (or time to event of interest) of the subjects is unknown; therefore they are called randomly censored observations.The random censoring was introduced in literature by Gilbert [1].Thereafter, Breslow and Crowley [2], Koziol and Green [3], and Csörgó and Horváth [4] also discussed randomly censored data in their work.Kim [5] did chisquare goodness of fit tests for randomly censored data.In the last decade, the recent studies on randomly censored data from exponential distribution include Friesl and Hurt [6] and Saleem and Raza [7].Rayleigh model with randomly censored data was analyzed by Ghitany [8] and Saleem and Aslam [9]; Burr Type XII was analyzed by Ghitany and Al-Awadhi [10]; generalized exponential and Weibull models were analyzed, respectively, by Danish and Aslam [11,12].Krishna et al. [13] studied Maxwell distribution with randomly censored samples.
In all the above cases, survival time or failure time has been assumed to be a continuous variable.However, sometimes it is impossible or inconvenient to measure the life length of a device on a continuous scale.
In real life experiments we come across situations, where failure time data is discrete either through the grouping of continuous data due to imprecise measurement or because time itself is discrete, for example, days, weeks, or months.In such circumstances, one measures the life of a device on a discrete scale.A discrete lifetime model may also consider the number of successful cycles, trials, or operations before failure of a device.In discrete lifetime models, the one parameter geometric distribution has an important position.The geometric distribution can be used as a discrete failure to investigate the ability of electronic tubes to withstand successive voltage overloads and performance of electric switches, which are repeatedly turned on and off.
In reliability theory, geometric distribution has been considered as a lifetime model by Yaqub and Khan [14], Bhattacharya and Kumar [15], Maiti [16], Krishna and Jain [17], Sarhan and Kundu [18], and so forth.In most of above studies, a complete sample or right censoring is considered.No literature is available on random censoring in any discrete distribution.
In view of the above, this paper considers classical and Bayes estimation of the unknown parameters with some reliability characteristics for geometric distribution under randomly censored data.In Section 2, mathematical formulation for randomly censored data with failure and censoring times following geometric distributions is given.Section 3 deals with the maximum likelihood estimation (MLE) for the unknown parameters along with their variances and confidence intervals.Section 4 describes the expected time on test and observed time on test.In Section 5 we obtain the Bayes estimators for the unknown parameters under generalized entropy and LINEX loss functions using beta priors.In Section 6, we consider a Monte Carlo simulation study to explore the properties of various estimates developed in the above sections.Finally, Section 7 deals with a real data example to study the applications of random censoring in geometric distribution.It is essential to mention here that we have used statistical software R [19] for computation purposes throughout the paper.

The Model and Its Assumptions
In a life testing experiment or a clinical trial,  items or patients are subjected to test.Let  1 ,  2 , . . .,   be their discrete failure or survival times.Assume that   's are independently and identically distributed (i.i.d.) random variables (r.v.) with probability mass function (p.m.f.)   () and cumulative distribution function (c.d.f.)   ().Further, suppose that these items may be censored before their failure at times  1 ,  2 , . . .,   .Again, assume   's to be i.i.d.discrete random variables with p.m.f.  () and c.d.f.  ().It is also assumed that   's and   's are independent.In a randomly censored experiment, minimum of   's and   's, that is,   = min(  ,   ), will actually be observed for  = 1, 2, . . ., .Let   be an indicator variable defined by In the present article, we consider survival and censoring time variables to follow geometric distributions Geo() & Geo(), respectively, with p.m.f.,   () = (1 − )   ;  = 0, 1, . . .; 0 <  < 1,   () = (1 − )   ;  = 0, 1, . . .; 0 <  < 1. (1) The probability of failure of an item on test before its censoring is given by The above probability of failure is tabulated in Table 1 for various values of  and .From Table 1 we observe that on increasing the value of lifetime parameter , the probability of failure decreases.However, as we increase the value of censoring parameter , the probability of failure increases.Now, for  = 1, 2, . . ., , the Bernoulli p.m.f. of   is Also, the c.d.f. of which is a geometric distribution with parameter ().The independence of   and   implies the independence of   and   .Therefore, the joint p.m.f. of (  ,   ) is given by = 0, 1;   = 0, 1, 2, . . .; 0 < ,  < 1 (5) with (  ) = /(1 − ) and (  ) =  = (1 − )/(1 − ).Also, note that, for Geo(), the reliability characteristics with mission time  are given by

Maximum Likelihood Estimation
The likelihood function of the randomly censored sample data (  ,   ),  = 1, 2, . . ., , from geometric distributions as discussed in Section 2, is given by where, Taking log, differentiating with respect to  and , and equating to zero, we get the MLEs of  and  as θ =  1 /( 1 +  2 ) and λ = ( 1 +  2 )/( +  1 ).By the invariance property of MLEs, we have The MLEs can be viewed as the Bayes estimators under the 0-1 loss function and the uniform prior.Also, the estimates derived by the method of moments coincide with the above MLEs in this case.
The Fisher information matrix is given by so that the variances of the estimates are The estimates of the above variances can be obtained by replacing (, ) by ( θ, λ).The geometric distribution belongs to the exponential family of distributions; therefore, most of the properties of MLEs are valid in this case.The asymptotic, sampling distribution of θ is normal ( θ, Var( θ)).Thus, a two-sided 100×(1−)% confidence interval for  becomes θ±  /2 √ var( θ).Similarly, for  the confidence interval is given by λ ±  /2 √ var( λ).Here,  /2 is the (1 − /2)th percentile of standard normal distribution for 0 <  < 1.

Expected Time on Test
In life testing experiments, expected time on test (ETT) is beneficial to have an idea about the expected duration of the experiment.Since the time required for completing an experiment has a direct impact on the cost, this information is important for an experimenter to choose an appropriate sampling plan.Krishna et al. [13] developed ETT for Maxwell distribution under random censoring for the first time.In this section, we develop the mathematical formulation of ETT for randomly censored geometric distribution.
Let  () = max( 1 ,  2 , . . .,   ) be the th order statistic in a randomly censored sample of size , denoting the time to observe the th failure or censoring time.Then, the c.d.f. of  () is given by For randomly censored data, ETT is given by Similarly, in complete case,  () denotes the time to observe the th failure in a sample of size .Then, the ETT for By invariance property of MLEs, the MLE of ETT for randomly censored data is obtained as Also, we can obtain the observed time on test OBTT =  () , which is the actual observed experimental time.Sometimes, OBTT is also proposed to estimate ETT.
One can compute ratio of expected experiment time (REET) for comparison purpose, as under For various values of , , and , ETT (RCS) and ETT (CS) are computed and simulated in Table 2. From the table, it is observed that, for both randomly censored sample and complete sample, ETT increases with an increase in , , and .The OBTT estimates ETT quite satisfactorily.

Bayesian Estimation
In Bayes estimation, the prior knowledge is updated by conducting an experiment and estimators are constructed to make inferences about the characteristics of interest.This technique provides valid alternatives to traditional estimation methods.Bayesian analysis is an important technique carried out with various types of loss functions and a variety of prior distributions.In this paper, we use the natural conjugate beta priors for the unknown parameters under generalized entropy and linear exponential loss functions.
Let the unknown parameters  and  follow the beta distributions of first kind with parameters ( 1 ,  1 ) and ( 2 ,  2 ), respectively.Here,  and  are regarded as random variables having the marginal prior distributions as By assuming that the prior distributions of  and  are independent, we have a joint prior that is incorporated with the likelihood to yield the following joint posterior distribution: where The posterior distributions of  and  are again independent beta distributions ( 1 ,  1 ) and ( 2 ,  2 ), respectively.

Bayes Estimates under Generalized Entropy Loss Function.
The generalized entropy loss function (GELF) was proposed by Calabria and Pulcini [20] as where  * is the estimate of . is a constant which is cancelled out on dividing the numerator by denominator in the procedure to obtain the Bayes estimate.Thus, without loss of generality, we assume ( = 1).For  > 0, a positive error has a more serious effect than a negative error and for  < 0, a negative error has a more serious effect than a positive error.In Bayes estimation, we choose such value of  * which minimizes the risk function (,  * ) = [(,  * ) | data].We get the Bayes estimator of  by differentiating (,  * ) with respect to  * and equating to zero.The Bayes estimator and the corresponding risk function of  are given by From ( 16), the marginal posterior distribution of  is given by Thus, using (18) and (20) the Bayes estimator of  under GELF is obtained as Now, using (19) the posterior risk function under GELF can be derived as where (⋅) is the digamma function defined as () = (/) log D().Similarly, we can obtain the Bayes estimate and the risk function for the parameter .
Particular Cases of GELF.The above estimates reduce as particular cases to the Bayes estimates under other loss functions, such as the following: (a) For  = −1, they give the estimates under popular squared error loss function (SELF).
(b) For  = 1, they coincide with the estimates under entropy loss function (ELF).
(c) For  = −2, they reduce to the estimates under precautionary loss function (PLF).

LINEX Loss Function.
Varian [21] and various other authors have used the linear exponential loss function (LINEX) in different estimation problems.Under the assumption that the minimal loss occurs at  * = , the LINEX loss function can be expressed as Without loss of generality, we assume that  = 1.Under the LINEX loss function the Bayes estimator and posterior risk are define as Sometimes, a situation occurs where no formal prior information is available.Then, an improper joint prior is incorporated with the likelihood to yield a joint posterior distribution.By setting the values of  1 =  2 =  1 =  2 = 0 in beta priors, the Bayes estimates of  and  are obtained in case of noninformative prior.

Bayesian Credible and HPD Credible Intervals.
If  1 ( |   ,   ) is the marginal posterior distribution of the parameter , the credible interval for  is obtained by and for both sided equal tail credible intervals, we take Similarly, the credible interval for  is obtained.Chen and Shao [22] proposed a procedure for calculating a highest posterior density (HPD) credible interval for  when the posterior distribution of  is unimodal.In the present case, the posterior distribution of  is beta distribution which is a unimodal distribution; therefore we can apply the following Chen and Shao algorithm.
Step 4. The 100 × (1 − )% HPD credible interval is the one with the smallest interval width among all credible intervals obtained in Step 3.
Similarly, we can obtain the HPD credible interval for the parameter .For computing the above algorithm, Boa package of R can be implemented.

Simulation Study
Since the performance of different estimation methods cannot be compared theoretically in the present case, we therefore perform a Monte Carlo simulation study to compare the estimates, obtained from maximum likelihood and Bayes estimation techniques under various loss functions.The simulation procedure step by step is given below: (iv) Generate a randomly censored sample ( ̃,  ̃) of size  = 50 from the models given in ( 3) and ( 4).
(v) Calculate the maximum likelihood estimates, their variances, confidence intervals, and coverage probabilities for the parameters and MLEs of reliability characteristics.
(vi) Obtain the Bayes estimates under GELF and LINEX loss functions for the parameters and the reliability characteristics.Derive the associated credible and HPD credible intervals for the parameters along with their average length (AL) and coverage probability (CP).
(vii) Repeat steps (iv-vi),  = 1000 times, for different combinations of the parametric values.Compute the average values (AV) and mean square error (MSE) of the estimates obtained in steps (v-vi).
All computations were performed using the software R. The main results of the simulation study are shown in Tables 4-10.From the tables, we conclude the following: (i) The maximum likelihood estimation method for parameters and reliability characteristics gives very good results in terms of both average values and MSEs.
(ii) The Bayes estimates are also very good in respect of bias and MSE under SELF and LINEX loss functions but Bayes estimates under LINEX are better than under SELF in respect of bias.In ELF, estimates show underestimation and in PLF, they give overestimation.
(iii) The coverage probabilities of the parameters attain their nominal levels in the cases of confidence intervals based on MLEs, credible, and HPD credible intervals.But sometimes HPD credible intervals give better coverage than the others.

Real Data Example
In this section, we present a real data example to illustrate the utility of our model.The data set given in Lee and Wang [23, p-231]

Note. Censored observations are indicated by plus sign (+).
To test the goodness of fit, we consider chi-square test and derive the following results: Chi-square observed value = 19.34912,Chi-square tabulated value = 37.65248, and  value = 0.2199.
Here, observed value is less than the tabulated value at 5% level of significance and  value is also quite large in this case.Thus, the data set is fitted well with our model.Also, we obtain the empirical c.d.f. and maximum likelihood estimate of c.d.f.curves for comparing the behavior of the data through the graphs.In the graph, in Figure 1, both the curves are closer to each other.Hence, we conclude that this data set is fitted very well for this model.
First of all, we obtain the estimates of the unknown parameters by maximum likelihood estimation, 95% confidence intervals, and Bayes estimation under various types of loss functions with informative priors as beta distributions having parameters ( 1 ,  1 ) and ( 2 ,  2 ) taking  1 =  2 = 3,  1 = θ 1 /(1 − θ),  2 = λ 2 /(1 − λ).We also estimate the parameters for noninformative priors by taking  1 =  2 =  1 =  2 = 0, assuming no prior information is available with us.The expected time on test and the observed time on test are also calculated for this data set.The estimates of the unknown parameters are listed in Table 3.
By using all the above criteria, we see that the maximum likelihood estimates are quite close to Bayes estimates for Concluding Remarks.The present paper deals with the estimation of parameters and reliability characteristics with a randomly censored sample from geometric distribution.Maximum likelihood estimates along with their variances and confidence intervals for the parameters are derived.Bayes estimates under generalized entropy and LINEX loss functions are obtained.The estimates are shown to be almost unbiased and efficient through simulation study.The concept of expected time on test in random censoring is also considered.A real data example is given for explaining the methods developed in the paper.

Table 1 :
Probability of failure.

Table 2 :
Expected time on test (ETT) and its estimation.This table shows the values of ETT for randomly censored sample and ( ) contains the ETT values of complete experiment.In the next column { } and [ ] show the average values & MSEs of OBTT in randomly censored sample.
complete sample is given by

Table 3 :
Estimates for real data example.

Table 5 :
Bayes estimates under SELF.

Table 8 :
Bayes estimates under LINEX loss function.