Best Prediction Method for Progressive Type-II Censored Samples under New Pareto Model with Applications

This paper describes two prediction methods for predicting the non-observed (censored) units under progressive Type-II censored samples. The lifetimes under consideration are following a new two-parameter Pareto distribution. Furthermore, point and interval estimation of the unknown parameters of the new Pareto model is obtained. Maximum likelihood and Bayesian estimation methods are considered for that purpose. Since Bayes estimators cannot be expressed explicitly, Gibbs and the Markov Chain Monte Carlo techniques are utilized for Bayesian calculation. We use the posterior predictive density of the non-observed units to construct predictive intervals. A simulation study is performed to evaluate the performance of the estimators via mean square errors and biases and to obtain the best prediction method for the censored observation under progressive Type-II censoring scheme for diﬀerent sample sizes and diﬀerent censoring schemes.


Introduction
Studying new lifetime models become necessary and extensive as many applications appeared in natural sciences. Over the last four decades, many authors focused their works on generating new lifetime distributions that will fit the experimental data, for example, medical, engineering, social sciences, reliability analysis, and others. In literature, those new models seemed to possess good properties and some were superior relative to the original ones. Many generalized classes of lifetime distributions are implemented to describe various phenomenal data (one may refer to Kumaraswamy [1] and Marshall and Olkin [2]).
e new family of distributions should include the original distribution as a submodel and is expected to give more flexibility to the original model. In our work, we consider a new form of Pareto distribution which was introduced by Bourguignon et al. [3]. e new Pareto model generalizes the original Pareto distribution, and it seemed to be more simple in some mathematical calculations and had new characteristics, see for example reference No. [4], Almetwally and Haj Ahmad (2020).
In some life tests and reliability experiments, units may be removed or lost from the experiment before its failure. e loss can be unplanned, like in accidental damage of an experimental unit, or if a unit under study drops out. Sometimes, the experiment must stop due to the unavailability of testing facilities. Most often, the removal of units from an experiment is preplanned and is made to reduce time and cost limitations. e benefit of progressive censoring lies in its efficient utilization of the available resources, so when we start an experiment if any of the surviving units are removed early, then we can use them for other tests or experiments. In reliability and life testing experiments, one of the main objectives is to obtain inference about the unknown parameters of the lifetime distribution under consideration. Sometimes, this is based on certain censored observations (see Cohen [5]). Estimation and prediction problems arise quite naturally in a lot of real-life situations, and in many studies, researchers are interested in providing estimates for unknown parameters and/or making some prediction inference about censored (future) observations. e most commonly used censoring schemes are (i) progressive Type-I and (ii) progressive Type-II censoring schemes. One can refer to the books of Balakrishnan and Cramer [6] and Balakrishnan [7]. Recently, several authors are interested in studying parameter inference of different distributions under progressive Type-II censoring scheme (PC) (see, for example, Kundu [8], Pradhan and Kundu [9], Maurya et al. [10], and Bdair et al. [11]). In addition, inference with other censoring schemes appeared in literature  with different lifetime models, such as hybrid Type-I progressive censoring, adaptive Type-II progressive censoring,  Type-II hybrid censoring, and others (see, for example, Bdair and Haj Ahmad [12]; Haj Ahmad et al. [13]; Salah et al. [14] Almetwally et al. [15]; and Sabry et al. [16]). Still there is much space for more work with different censoring schemes under new generalized models.
In this paper, we restrict our attention on the case of censored samples under progressive Type-II censoring scheme (PC) and find the point and interval estimation of the unknown parameters of the new Pareto distribution (NPD); then, we study the prediction problem of the future data (unobserved).
For the NPD, we can write the probability density function (PDF) as where α > 0 and β > 0 are the shape and scale parameters, respectively. e cumulative distribution function (CDF) of NPD is given by In this paper, we mainly work on two objectives. First, we find the point and interval estimation of NPD's parameters α and β using the maximum likelihood and the Bayes estimates under PC and compare the effectiveness of the two methods of estimation numerically by simulation analysis using the R package. Second, we consider the problem of predicting unobserved (future) data based on the observed (available) data. erefore, we consider two prediction methods: (i) the best unbiased predictor (BUP) and (ii) the Bayes predictor (BP). We construct predictive intervals (PIs) for the unobserved (future) data that are censored from the experiment. Numerical analysis and simulation are used to compare the efficiency of prediction methods under consideration. e PC is a generalized censoring scheme for the wellknown Type-II right censoring. PC gained great attention in the last twenty years. We can simply describe this censoring scheme as follows: let X 1 , X 2 , . . . , X n denote the real outcomes of n independent and identically distributed (i.i.d) units which are under a life test experiment. Also, suppose that R 1 , R 2 , . . . , R m (m < n) are some fixed non-negative integers such that m i�1 R i � n − m. We need to observe m units and then remove the remaining n − m units progressively according to the censoring scheme R � (R 1 , R 2 , . . . , R m ). e censoring occurs progressively in m stages, which offer failure times for the m observed units. When the first failure time (the first stage) X 1:m:n occurs, R 1 of the n − 1 surviving units will be randomly removed or censored from the experiment. When the second failure time (the second stage) X 2:m:n occurs, R 2 of the n − 2 − R 1 surviving units are randomly removed from the experiment. Finally, when the m th failure time (the m th stage) X m:m:n occurs, all the rest of R m � n − m − (R 1 + R 2 + · · · + R m− 1 ) units are withdrawn from the experiment. We call this as progressive Type-II right censoring scheme R. We can verify easily that Type-II right censoring scheme and the complete sampling scheme are a special case of PC by choosing ( Prediction is very important in statistics, and many authors studied the prediction problem and its applications in real-life data (see, for example, Kaminsky and Rhodin [17]; Al-Hussaini [18]; Madi and Raqab [19]; Raqab et al. [20]; and Bdair et al. [11]). Prediction's idea depends on predicting the future order statistics based on the observed (obtained) sample data. Some authors studied the problem of estimation and prediction under different types of censored data from different models (see, for example, Kim et. al [21]; Kundu [8]; and Kundu and Raqab [22]). Raqab et al. [23] studied the prediction of the remaining time for the generalized Pareto distribution under a progressive censored sample. Belaghi et al. [24] considered estimation and prediction problems for the Poisson-exponential distribution under Type-II censored data. Bdair et al. [11] used Bayes prediction to predict future values of a progressively censored sample under flexible Weibull distribution. e rest of the paper is organized as follows. In Section 2, we obtain the MLEs for the two parameters of NPD. e Bayesian estimation method is used to estimate the unknown parameters in Section 3. In Section 4, we handle the point and interval prediction problems for the unknown observations from the censored sample using the best unbiased predictors (BUPs) and the Bayesian predictor (BP). In Section 5, numerical comparisons are performed via simulation analysis. Finally, some conclusions are drawn in Section 6.

Maximum Likelihood Estimation
In this section, we use the classical method of estimation which is the maximum likelihood method (MLE) for estimating the two unknown parameters of NPD under PC scheme. Let x � (x 1:m:n , x 2:m:n , . . . , x m:m:n ) with x 1:m:n ≤ x 2:m:n ≤ · · · ≤ x m:m:n denote the m observations under PC from a sample of size n drawn from a NPD with PDF and CDF given by equations (1) and (2), respectively. Based on a progressive Type-II censored sample x, the likelihood function is given by Balakrishnan and Aggrawala [25]). Using equations (1) and (2), we obtain 2 Journal of Mathematics e logarithmic likelihood function of NPD is We can notice that l(α, β; x) is monotonically increasing with β. Hence, since x ≥ β, the MLE of β will be β � x 1:m:n , where x 1:m:n is the first progressive ordered statistic.
From the above logarithmic likelihood equation, we find the partial derivative with respect to parameter α and then equate it to zero to obtain the MLE of α, and hence α is the solution of Numerical analysis and simulation are used to study the performance of MLE with respect to mean square errors (MSEs) and biases. We can observe the asymptotic confidence interval (CI) for α and β using asymptotic where Φ � (α, β) and I(.) is the Fisher information matrix, i.e., e second partial derivatives are obtained as e variances of the MLEs can be found from the asymptotic property of MLE so that V(α MLE ) ≈ I 22 /Det(I(Φ)), and V(β MLE ) ≈ (I 11 /Det(I(Φ))), where Det(I(Φ)) is the determinant of information matrix I. e (1 − ζ)100% asymptotic confidence intervals for α MLE and β MLE are given as respectively, where z ζ/2 is the (z ζ/2 )100 % lower percentile of the standard normal distribution.

Bayes Estimation
In the Bayesian method, all parameters are considered as random variables with a certain distribution called prior distribution. But if the prior information is not available which is usually the case, then we need to select one. Since the selection of prior distribution is important in parameter estimation, we chose the independent gamma distributions g(a 1 , b 1 ) and g(a 2 , b 2 ), respectively, for the prior of α and β.
Choosing this prior density is due to the fact that gamma prior has flexible characteristics as a non-informative prior, especially when the values of the hyperparameters are Journal of Mathematics 3 assumed to be zero. e suggested gamma distributions have the following densities: where a 1 , a 2 , b 1 , and b 2 are the hyperparameters of prior distributions and all are positive real constants. e joint prior of α and β is e joint posterior of α and β is where L(x /α, β) is the likelihood function of NPD under progressive censored samples as in equation (4). Substituting L(x /α, β) and h(α, β) for NPD under PC, the joint posterior density can be written as where Q(α, β) � m i�1 x α− 1 i:m:n /(x α i:m:n + β α ) 2+R i and g(., .) represents the PDF of gamma distribution. erefore, the Bayes estimate of any function of α and β, say k(α, β), under the quadratic loss function is k(α, β) � E α,β/data (k(α, β)). Since it is difficult to compute this expected value analytically, we decided to use the Markov Chain Monte Carlo technique (MCMC) (see Karandikar [26]).
Gibbs sampling method will be used to generate a sample from the posterior density function p(α, β/x) and compute Bayes estimates. For the purpose of generating a sample from the posterior distribution, it is assumed that the PDFs of prior densities are as described in equation (10). e full conditional posterior densities of α and β and the data are given by e full conditional distributions above cannot be simplified to well-known distributions, and hence we cannot generate α and β from these distributions in a direct way using standard methods. We can solve this problem by using the M-H algorithm (for further details about this algorithm, one may refer to Metropolis et al. [27] and Hastings [28]). e main point now is to decrease the number of rejections as possible. e algorithm below describes the M-H algorithm based on selecting the normal distribution as the main distribution which is used to find the Bayes estimators in addition to constructing the credible intervals for α and β. e algorithm is summarized as follows: (1) Start with initial values (α 0 , β 0 ).
(2) Use M-H algorithm on equation (14) to generate a posterior sample for the parameters α and β. (4) When we obtain the posterior sample, we have Bayes estimates of α and β with respect to quadratic loss function: where M 0 is the Markov Chain's burn-in period.

Prediction
In many fields of life sciences, dealing with the problem of predicting unobserved, censored, or lost observation from the experiment has had a great attention so far (one may refer to Kaminsky and Nelson [29]; Raqab et al. [20]; Raqab et al. [23,24]; and Bdair et al. [11]). Here we study two methods of prediction, namely, (i) the best unbiased predictor (BUP) and (ii) the Bayes predictor (BP).

Best Unbiased
where c * � r j !/(s − 1)!(r j − s)!. Now substituting the PDF and CDF of NPD into equation (16) Since y s:r j > x j:m:n , the term (1 − (x j:m:n /y s:r j ) α ) s− 1 in equation (17) can be represented as a series expansion using well-known binomial theorem, so the conditional density is rewritten as e best unbiased predictor (BUP) of Y s:r j is the expected value E(Y s:r j |X j:m:n ), that is, where y � y s:r j . Using integration techniques and binomial expansion in the integral part, equation (19) reduces to and u � k + r j − s + 1. If we assume that the parameters α and β are unknown, the BUP of Y s:r j will be where α and β � x 1:m:n are the MLEs of α and β, respectively, and Δ � 1 + (x 1:m:n /x j:m:n ) α .

Bayesian Prediction.
Bayes prediction (BP) of the censored observation from the future sample depends on the actual observed sample which is known as informative sample. For that reason, we consider the estimation of posterior predictive density (PPD) of the s th order Y s:r j . e posterior predictive density of Y s:r j given the observed censored data x is given by π Y s:r j |X � where f Y s:r j |X (y s:r j |α, β) is the conditional density of Y s:r j given α, β, and data x, which is given in equation (18), and p(α, β|x) is the joint posterior given in equation (13). Now the Bayes predictor (BP) of Y � Y s:r j under squared error loss function (SEL) can be obtained as

Journal of Mathematics
e form of the PPD in equation (23) is not easy to compute; therefore, Bayesian predictive estimates E π (Y|data) are difficult to find explicitly. us, there is a need to use the MCMC sample technique which was described in Section 3. e MCMC technique is conducted to generate samples from the PPD. ese samples are of the form (α ℓ , β ℓ ): ℓ � 1, 2, . . . , M and are obtained using the M-H methods and Gibbs sampling. e sample-based predictor Y BP s:r j of Y � Y s:r j is given by and hence after integration techniques and algebraic simplifications, the sample-based predictor can be written as From the above PPD, one can obtain a two-sided predictive interval for Y � Y s:r j (s � 1, 2, . . . , r j , j � 1, 2, . . . , m).
For that purpose, we need to find the predictive survival function of Y � Y s:r j at point y > x j:m:n , which can be defined as Under the SEL function, the predictive survival function of Y � Y s:r j is given by 6 Journal of Mathematics e predictive survival function in equation (28) cannot be easily evaluated analytically, and hence numerical approximation technique will be preferable in this case. e MCMC samples can be used to approximately evaluate equation (28), so let (α ℓ , β ℓ ): ℓ � 1, 2, . . . , M , and then the simulated estimator for the predictive survival function can be written as Now, the (1 − ξ)100% predictive interval of Y � Y s:r j is found by solving the following non-linear equations using a suitable numerical technique which is given in the following equation: where (L) denotes the lower bound and (U) denotes the upper bound.

Simulation Analysis
In this section, we perform a simulation analysis to check the performance of the Bayes estimators compared with the classical estimators obtained by the MLE based on PC with NPD lifetimes. Also, we compute the best unbiased predictor and Bayes predictor for the missing data with respect to the observed PC. In Bayes estimation, we use the square error loss function SEL. We compute the mean square errors (MSEs) and the biases for Bayes and MLE estimators based on 10000 replications using R package. In estimation and prediction, we suggest fixed values of the parameters to be (α, β) � (0.5, 0.5), (1.5, 0.5) { } and sample sizes to be n � 50, 100, in order to generate progressive Type-II censored data. Also, under PC, we obtain the point predictors and the 95% prediction intervals for the missing order statistics Y s:r j ; s � 1, . . . , r j , j � 1, . . . , m.
In Tables 1 and 2 Tables 3 and 4, we present numerical comparisons between the average lengths (AL) and the coverage percentages (CP) of the credible intervals and asymptotic intervals under NPD parameters. In Tables 5  and 6, we present MLE and Bayes point predicted values and the prediction intervals for the missing sth order statistics Y s:r j ; s � 1, . . . , r j , j � 1, . . . , m, based on the observed sample of size m with censoring scheme (r 1 ; r 2 ; . . . ; r m ), for all schemes described above under the loss function SEL. e MCMC samples (α i , β i )i � 1, . . . , M , M � 10000, the point BUP, and BP for the missing order statistics Y s:r j in censoring stage j, s � 1, . . . , r j , are computed. e 95% lower bound L and upper bound U of prediction interval for the missing sth order statistics Y s:r j are also computed.
From Tables 1 and 2, we observe many attractive results that are summarized as follows: (i) e best point estimation method for estimating the shape parameter α is Bayesian method under SEL, and this result is observed since it has minimum biases and minimum MSEs.
(ii) For the scale parameter β, the MLE proves to have the minimum biases and MSEs, and hence it is preferable to be used for point estimation of β. (iii) When comparing the efficiency of censoring schemes with respect to biases and MSEs, it appears that scheme 3 performs well when estimating α, while scheme 1 is better than others for estimating β.
For interval estimation of NPD parameters, we use asymptotic CI from the MLE method and the credible CI from the Bayesian method under SEL. A simulation analysis with some numerical methods and MCMC technique show some results that appear in Tables 3 and 4. Comparisons between the two CIs are conducted depending on the average interval Journal of Mathematics    (i) e asymptotic CI has less average interval length for estimating α than credible CI based on CI average lengths, under the three suggested censoring schemes and sample sizes 50 and 100. (ii) e credible CI has higher CP for estimating α than the asymptotic CI, under the three suggested censoring schemes and sample sizes 50 and 100.
(iii) e credible CI has less average interval length for estimating β than asymptotic CI, under the three suggested censoring schemes and sample sizes 50 and 100. (iv) e credible CI has higher CP for estimating β under censoring scheme 1, while the asymptotic CI has higher CP than the credible CI under censoring schemes 2 and 3 (see Table 3).
e two prediction methods that are used in this paper are BUP and BP, so for the purpose of comparison between these methods, we conduct a simulation analysis. e numerical results for the predicted unobserved order statistics Y s:r j are reported in Tables 5 and 6. Tables 5 and 6 illustrate point and 95% interval prediction values under different censoring schemes and sample sizes n � 50 and 100.
From these tables, we notice that the predicted values of Y s:r j belong to the proposed confidence interval and the predicted values under BP are less than their values under BUP for censoring scheme 1, while the converse is true for censoring scheme 2. Under censoring scheme 3, no fixed rule is obtained for the prediction value comparison.
For a fixed sample size, we observe that the largest predicted value is observed when applying censoring scheme 3. One may also notice that the lower and the upper bounds of prediction interval for the missing s th order statistics Y s:r j increase as 0 < s < r j increase for each r j .
In order to select the suitable prediction method, one can depend on either the average interval length or the coverage percentages (CP) of the observed intervals.
is can be calculated easily from Tables 5 and 6; for example, in the case of censoring scheme 1 with sample size 50, we prefer to use BP to predict the unobserved statistics Y 1:1:50 , since it has shorter CI length, but based on the CP, BUP is preferable. To predict Y 2:1:50 , we prefer to use the BUP as it has shorter CI length and higher CP as well (more valuable results are found in Tables 5 and 6).

Conclusions
In this article, we used estimation of the unknown parameters of NP distribution under progressive Type-II censored sampling to assess the performance of the MLE and Bayesian estimation methods and to determine the best prediction method for predicting unobserved lifetimes. e MLE and the Bayes estimation methods were considered to observe both the point and interval estimation. Two methods of prediction for the future observation were employed, namely, the BUP and the BP. Numerical methods and simulation analysis were used for comparison between methods of estimation and methods of prediction. We concluded that MLE is better to estimate the scale parameter β, while Bayes estimation is better to estimate the shape parameter α. Many valuable results were found and summarized from the tables in Section 5. Researchers may develop new distributions and apply different censoring schemes to their sample data to obtain better point and interval estimation and prediction criteria for the future unobserved data, such as adaptive, hybrid progressive, and other censoring schemes.

Data Availability
e numerical data used to support the findings of this study are included within the article.

Conflicts of Interest
e author declares that there are no conflicts of interest.