On Estimation of Three-Component Mixture of Distributions via Bayesian and Classical Approaches

In this study, we model a heterogeneous population assuming the three-component mixture of the Pareto distributions assuming type I censored data. In particular, we study some statistical properties (such as various entropies, different inequality indices, and order statistics) of the three-component mixture distribution. The ML estimation and the Bayesian estimation of the mixture parameters have been performed in this study. For the ML estimation, we used the Newton Raphson method. To derive the posterior distributions, different noninformative priors are assumed to derive the Bayes estimators. Furthermore, we also discussed the Bayesian predictive intervals. We presented a detailed simulation study to compare the ML estimates and Bayes estimates. Moreover, we evaluated the performance of different estimates assuming various sample sizes, mixing weights and test termination times (a fixed point of time after which all other tests are dismissed). The real-life data application is also a part of this study.


Introduction
In the last decade, finite mixture models have emerged as flexible models due to their applications in applied sciences, engineering, and physical sciences. As explained by Mendenhall and Hader [1], for real-life purposes, an engineer split the failures of a structure into more than one kind of causes. For example, to know the proportion of failures due to a definite reason and to recover the engineering system, Acheson and McElwee [2] separated electronic tube flops into three different faults such as mechanical faults, gaseous faults, and normal decline of the cathode.
Moreover, the mixture models can also be used in a situation when the data are presented in the form of the overall mixture models. e overall mixture models are also called the direct application of the mixture models, and their applications can be seen in medicine, botany, zoology, agriculture, economics, life testing, reliability, and survival analysis. e various aspects of mixture models were discussed by Li and Sedransk [3]. e interested readers can refer the work of Harris [4], Kanji [5], and Jones and McLachlan [6] on the application of mixture models for reallife problems. e mixture models have been extensively used for heterogeneous nature of the process in comparison to the simple models. Most of the researchers have comprehensively applied mixture distributions in various real-life situations and estimated parameters using the Bayesian and classical methods. For a detailed appraisal of classical techniques for estimation and applications of mixture distributions, we refer to studies by Sultan et al. [7], Abu-Zinadah [8], and Kamaruzzaman et al. [9] among others. On the other hand, the estimation of parameters in Bayesian framework for a mixture of two distributions has been considered by many researchers [10][11][12][13][14][15][16][17][18][19][20][21]. Contrary to the two-component mixture modeling, some authors have discussed situations where data are assumed to follow a three-component mixture of suitable probability distributions [22][23][24][25][26][27][28].
Censoring is a significant characteristic of the real data application. Due to time and cost problem, it is very difficult to continue the lifetime testing experiment till observing the last failure. Although there are many censoring schemes, the type I right censoring is commonly used in life testing experiments. In this scheme, we consider a fixed censoring time, and the values larger than the specific t (life test termination time) are observed as censored observations. Romeu [29] and Kalbfleisch and Prentice [30] explained various censoring schemes.
To motivate the readers about the mixture modeling, consider a sample of sand which is based on the mixture of some minerals. With the application of mixture modeling, estimates of the proportions of various minerals in the sand can be obtained. Similarly, the grain size distributions for the different minerals can also be estimated. It is worth mentioning that mixture models can be classified into type I (if component densities of the various components belong to the same family) and type II (if the component densities belong to different families) mixtures.
It has been noticed that from the recent literature that the Pareto distribution can be applied efficiently in various situations rather than other distributions to model data. e significance of Pareto distribution in forming various real phenomena is patent from the following revealed research and references mentioned therein, Abdel-All et al. [31], Ismail [32], Sankaran and Nair [33], and Nadarajah and Kotz [34]. Inspired by the wide real-life applications of mixture distributions, the main objective of this study is to develop a new three-component mixture of Pareto distributions (TCMPD) for lifetime data modeling under type I mixture. Furthermore, we also compare the maximum likelihood (ML) estimates, ML variances (MLVs), Bayes estimates, and their posterior variances (PVs) assuming type I right censored data.

The Population and the Three-Component
Mixture Distribution e finite k-components mixture density function can be where where F d (y; λ d ) � 1 − e − λ d ln y .

Properties of the TCMPD
e statistical properties such as moments, mean, variance, and mode of the TCMPD are derived in this section.
r th moment about zero: the r th moment about zero of a TCMPD is derived as k th order negative moment: the k th order negative moment is derived as

Mathematical Problems in Engineering
Factorial moments: the factorial moments can be determined as Where ξ ω ′ s is the real number. e E(Y δ− ω ) can be obtained ash Mean: the mean of a TCMPD is evaluated ash Variance: the variance of a TCMPD is derived as Median: the median (y) is determined by evaluating the following nonlinear equation for y.
Mode: the mode (y) is obtained by solving the following nonlinear equation for y.
Using the above expressions, mean, median, variance, and coefficient of skewness (SK) are calculated for different parameters' values and are given in Table 1.
It is observed from the entries in Table 1 and Figure 1 that TCMPD is positively skewed as we have SK > 0 for all the entries arranged in Table 1. We noticed that the variance of the TCMPD was a declining function of the mixture distribution's parameters.

Entropies
e entropy is a quantity of unspecified extent of evidence in a function. Here, we derive the expressions of the most commonly used entropy measures such as Shannon's entropy, β− entropy, and Re´nyi entropy, in this section. As said by Song [35], Shannon's entropy has a same behavior as a measure of kurtosis in equating the forms of different functions and computing substance of tails.

Inequality Measures
e most common income inequality indices are Bonferroni curve, Gini index, and Lorenz curve. e Gini index: this index (Gini, [38]) for a TCMPD is e Lorenz curve: this curve (Lorenz, [39]) for a TCMPD is Bonferroni curve: the Bonferroni [40] curve for a TCMPD is

Order Statistics
In this section, we derive g(y k: n ; λ 1 , λ 2 , λ 3 , w 1 , w 2 ) which is pdf of k th order statistic y k: n , assuming a sample of size n from the TCMPD. e r th raw moment along with mean and variance of 1 st and n th order statistics are obtained in this section. Probability density function of k th order statistic: the pdf of k th order statistic is After little simplification, the pdf (19) of k th order statistic can be written as where Probability density function of 1 st order statistic: substituting k � 1 in (21) and simplifying it, the pdf of 1 st order statistic is 6 Mathematical Problems in Engineering Probability density function of n th order statistic: substituting k � n in (21) and after little algebraic simplification, the pdf of n th order statistic is where r th moments, mean, and variance of 1 st order statistic: the r th moment about the origin, mean, and variance of 1 st order statistic are r th moments, mean, and variance of n th order statistic: the r th moment about the origin, mean, and variance of n th order statistic are Mathematical Problems in Engineering 7

Parametric Estimation
Here, we discuss the parameter estimation methods. In particular, we use the ML and Bayesian methods of estimation to estimate unknown parameters under type I censored data. Before discussing the parameter estimation, we construct the likelihood function. Assume n units, with fixed t as test termination time, from the TCMPD are used in a real-life experiment. Let y 1 , y 2 , . . . , y ξ be the ordered values that can only be observed. e n − ξ remaining largest values are censored from the study, i.e., their exact failure time cannot be recorded due to time constraint. So, y 11 , . . . , y 1ξ 1 , y 21 , . . . , y 2ξ 2 , and y 31 , . . . , y 3ξ 3 are failed observations relating to subpopulations I, II, and III, respectively. e observation y ξ is assumed to be censored from each component, whereas the numbers ξ 1 , ξ 2 , and ξ 3 of failed values are obtained from the subpopulations. e n 1 − ξ 1 , n 2 − ξ 2 , and n 3 − ξ 3 observations are assumed to be censored values from subpopulations, whereas r 1 + r 2 + r 3 � r. So, the likelihood function using type I censored sample, y � (y 1 � y 11 , . . . , y 1ξ 1 ), (y 2 � y 21 , . . . , y 2ξ 2 ), and (y 3 � y 31 , . . . , y 3ξ 3 ), for a TCMPD After substitution and simplification, the likelihood function of TCMPD becomes

ML Estimators and
Variances. e ML estimators of TCMPD for parameter Φ � (λ 1 , λ 2 , λ 3 , w 1 , w 2 ) are derived from the solution of nonlinear equations (30)- (34). e equations have been derived by partially differentiating the natural logarithm of the likelihood function as z ln L(Φ|y) z ln L(Φ|y)

Mathematical Problems in Engineering
It is very tough to find out an explicit solution from the nonlinear equations (30)- (34); therefore, to obtain ML estimators, any mathematical or statistical software such as Mathematica (Wolfram,[41]) can be used to solve them by an iterative procedure.
Let Φ � (λ 1 , λ 2 , λ 3 , w 1 , w 2 ), and by using multivariate central limit theorem, that is, Φ ∼ Ν(Φ, I − 1 (Φ)), one can get asymptotic variances, where the variance is given on diagonal of the inverted Fisher information matrix which is expectation of the negative Hessian as

Mathematical Problems in Engineering
Next, we discuss the Bayesian estimation for the estimation of unknown parameters.

e Joint Prior and Posterior
Distributions. Now, we discuss the Bayesian estimation of the unknown parameters.
is method allows us to obtain an updated form of the knowledge which is calculated by combining the current and the prior knowledge. In particular, we use uniform and Jeffreys as noninformative priors, which are used when little or no formal prior knowledge on the parameters of concern is available. Box and Taio [42] stated a noninformative prior which gives slight information relative to the testing experiment. Bernardo and Smith [43] defined a noninformative prior has the least influence relative to the data. Jeffreys prior suggested by Jeffreys [44] is obtained by evaluating the Fisher information.

Bayes Estimators and Posterior Variances.
e Bayes estimators of the component and mixing proportion, i.e., λ 1 , λ 2 , λ 3 , w 1 , and w 2 using the UP are obtained as where α, β, and c are defined as where δ, Υ, and Λ are defined as (i) δ � 1, To measure the accuracy and efficiency of the Bayes estimators, we normally calculate the PVs of the parameters.

Mathematical Problems in Engineering
For the PVs of λ 1 , λ 2 , λ 3 , w 1 , and w 2 , using the UP are derived as e Bayes estimators of λ 1 , λ 2 , λ 3 , w 1 , and w 2 using the JP are obtained as 32 32 .

e Bayesian Prediction.
One of the main objectives of statistical modeling is the prediction of the future values. e Bayesian methodology allows us to obtain this in a natural way. In particular, the posterior predictive distribution (PPD) comprises the knowledge about future value X � Y n+1 given data y. Al-Hussaini et al. [45], Bolstad [46], and Bansal [47] have discussed the usefulness of the prediction and predictive distribution comprehensively in the Bayesian framework.

e Posterior Predictive Distribution.
For the future value X � Y n+1 , the PPD using the UP and the JP is ) .

(49)
In equation, we consider v � 1 for UP and v � 2 for JP, respectively.

Bayesian Predictive Intervals.
To obtain the Bayesian predictive intervals (BPIs), let L and U are the two lower and upper endpoints of the BPI, which are obtained from (49). A 100(1 − α)% BPI (L, U) using UP and JP can be obtained by simplifying the given expression: Mathematical Problems in Engineering 13

Monte Carlo Simulation
Here, we tabulate a comprehensive simulation to check the performance of different estimation methods. As an analytical comparison of the Bayes and ML estimators is not possible, a Monte Carlo simulation is performed to measure the performance of the Bayes and ML estimators under different aspects. rough the following steps, we obtained the maximum likelihood estimates (MLEs), MLVs, Bayes estimates (BEs), and PVs as (1) A sample of size n from mixtures may be taken as (i) Generate a sample of w 1 n values randomly from f 1 (y; λ 1 ) (ii) Generate a sample of w 2 n values randomly from f 2 (y; λ 2 ) (iii) Generate a sample of (1 − w 1 − w 2 )n values randomly from f 3 (y; λ 3 ) (2) Select the values that are larger than fixed t as censored values (3) Tables 2 and 3.
From Figures 1-4, it is observed that parameters λ 1 , λ 2 , λ 3 , and w 2 are overestimated, but w 1 is underestimated at different values of t and n in both estimation methods, i.e., ML and Bayesian. Also, the degree of underestimation of λ 1 , λ 2 , λ 3 , w 1 , and w 2 is higher for a small n at various values of t, and an opposite behavior was observed for a large t at a given n. Furthermore, the parameters λ 1 , λ 2 , λ 3 , w 1 , and w 2 were observed overestimated to a larger extent when the true values of λ 1 , λ 2 , and λ 3 were smaller at different values of t for a fixed n. In addition, the similar pattern has been observed at different values of n for a fixed t. e difference of the MLEs and BEs of parameters λ 1 , λ 2 , λ 3 , w 1 , and w 2 from the nominal values becomes the minimum with the increase of t and n.
It can be seen from Tables 2 and 3 that at different values of t, the difference between the MLVs and the PVs (assuming the UP and the JP) diminishes by increasing sample size. e same remark is true for a large t at different values of n. Also, noticed that the MLVs and PVs of w 1 and w 2 are larger for smaller values of λ 1 , λ 2 , and λ 3 at different values of t and n. Also, it is pointed out that the performance of the Bayes estimators using JP is best than Bayes estimators using UP and ML estimators based on lesser associated PVs.
To the extent that the selection of an appropriate prior, Tables 2 and 3 revealed that the JP outperforms as compared to the UP because the variance of JP is smaller than the UP. Table 4 and Figure 5 showcase the 90% BPI using the UP and the JP. It is pointed out that the width of 90% BPI increases with a decrease in n. e same conclusion was observed with a smaller t for varying values of n. e 90% BPIs, for larger values of λ 1 , λ 2 , and λ 3 , were observed narrow at various values of t and n. Moreover, the BPIs using the JP were observed wider than the BPIs obtained by assuming the UP in the simulation study.

Real Data Application
To illustrate the proposed methodology, the mixture lifetime data, z � (z 11 , z 12 , . . . , z 1r 1 , z 21 , z 22 , . . . , z 2r 2 , z 31 , z 32 , . . . , z 3r 3 ) in thousand hours, was taken from Davis [48] on three factors, i.e., V805 Transmitter, Transmitter, and V600 Indicator Tube used in aircraft sets. For exponential distributed mixture data (z), the suitable transformation y � exp(z) gives the Pareto distributed mixture data (y). So, the mention transformation permits to utilize the given mixture data z for using the suggested ML and Bayesian estimation techniques. us, the proposed mixture of the Pareto distributions can be a fair choice to model the abovementioned data. Moreover, it is unidentified that which factor fails until a failure arises at or before 0. 6  (51) e MLEs, MLVs, BEs, and PVs assuming the UP and the JP are presented in Table 5.
From Table 5, it is clear that the performance of the BEs using JP is the best as compared to the MLEs, as the variance of BEs is smaller than the counterpart. Moreover, the BEs using the UP have smaller variances for estimating the unknown parameters. Also, the JP was observed superior to the UP due to smaller associated PV for estimating the unknown parameters.

Conclusion
In this study, we proposed TCMPD to model lifetime data. Parameter estimation assuming the type I censoring have been considered using the ML and Bayesian estimation methods. For Bayesian estimation, we assumed the noninformative priors and expressions of the Bayes estimators for the mixing proportion (w 1 and w 2 ) and component parameters (λ 1 , λ 2 , and λ 3 ) and PVs were derived. To examine the relative presentation of the Bayes and ML estimators under different scenario, a Monte Carlo simulation has been done. To illustrate a practical presentation of proposed mixture distribution, an example has also been analyzed.
From simulated results and depicted graphs, it has been noticed that an increase in t under a fixed n yields very efficient Bayes and ML estimators. It is also pointed that parameters λ 1 , λ 2 , λ 3 , w 1 , and w 2 are overestimated (underestimated) to a small (larger) extent with relatively larger (smaller) value of n (value of t ). More specifically, the amount of overestimation (underestimation) of parameters is smaller for a relatively large parameter value. As the value of n (value of t ) increases (decreases), the PVs decrease (increase) for a fixed t (fixed n). To address the problem of selection of a suitable prior, one can observe that the JP has smaller PVs than the UP. e results depend on real-life mixture data that also support the Monte Carlo simulation study. Finally, it is concluded that the Bayes estimators using the JP performed better as compared to the ML estimators because of smaller variances.

Data Availability
e dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Mathematical Problems in Engineering 17