Sequential Test for a Mixture of Finite Exponential Distribution

Testing the number of components in a finite mixture is considered one of the challenging problems. In this paper, exponential finite mixtures are used to determine the number of components in a finite mixture. A sequential testing procedure is adopted based on the likelihood ratio test (LRT) statistic. )e distribution of the test statistic under the null hypothesis is obtained using a resampling technique based on B bootstrap samples. )e quantiles of the distribution of the test statistic are evaluated from the B bootstrap samples. )e performance of the test is examined through the empirical power and application on two real datasets. )e proposed procedure is not only used for testing the number of components but also for estimating the optimal number of components in a finite exponential mixture distribution. )e innovation of this paper is the sequential test, which tests the more general hypothesis of a finite exponential mixture of k components versus a mixture of k + 1 components. )e special case of testing an exponential mixture of one component versus two components is the one commonly used in the literature.


Introduction
e exponential distribution, which is analytically very simple, plays an important role in reliability and lifetesting analogues to the normal distribution in other areas. Consequently, the exponential distribution became a basic model to research associated with experiments on life expectancy. Applications of the exponential distribution include designing acceptance sampling plans [1], estimation of reliability in multicomponent stress-strength [2], and construction of multivariate control chart [3]. Also, neutrosophic statistics is applied when the data have uncertain parameters or values [4,5]. A reason to study this distribution in mixtures is related to the lack of memory property.
Let x 1 , x 1 , . . . , x n be a random sample, arising from a mixture of finite exponential distribution (MFED), whose density is where k is the number of exponential components, Θ i � (p i , θ i ), i � 1, 2, . . . , k, and f i (x; θ i ) represents the density function of the k th component.
Inference on the number of components can be conducted through statistical tests such as likelihood ratio tests. Some papers have dealt with bootstrapping the LRT such as McLachlan [12] and Feng and McCulloch [13] who used the bootstrap resampling for the number of one normal distribution against a mixture of two normal distributions. Also, Feng and McCulloch [13] who noted the bootstrap resampling for the number of normal mixture with difference variances is a preferred method. Seidel, Mosler and Alker [14,15], who used a mixture of two exponential distributions; and Sultan, Ismail and Al-Moisheer [16]; who used a mixture of two inverse Weibull distributions. e discrete Poisson distribution was used by Karlis and Xekalaki [17]. Some criteria are used to choose the number of components in finite mixtures models such as McLachlan and Peel [11]. Various authors have suggested the simplest form of testing in LRT for a single component against a two-component model. Here, the test procedure is proposed for k components against k + 1 components. is paper is arranged into six sections. In Section 2, an algorithm is presented to determine the number of finite exponential components by using bootstrapping the LRT in R software packages. Section 3 contains the simulation results from computing the quantiles for an estimated number of finite exponential components using a sequential test. In Section 4, we evaluate the power of the sequential test when determining the number of finite exponential components. Criteria based on the likelihood are applied to determine (estimate) the number of components in finite mixtures models such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Hannan-Quinn information criterion (HQIC), and the consistent Akaike information criterion (CAIC). In Section 5, the sequential test procedures are presented. Finally, in Section 6, the conclusion results are shown for the sequential testing number of finite exponential components.

Determining the Number of Components in the Finite Exponential Mixture
In this section, we use a sequential test to specify the number of components in the finite exponential mixture by using a resampling procedure called bootstrap. McLachlan [12] and Feng and McCulloch [13] discussed the idea of bootstrapping the LRT. e general method for determining the number of components is based on the LRT. e LRT statistic is used as appropriate test statistic for testing hypotheses. e test statistic is defined as −2lnλ, where λ represents the ratio between the maximized likelihood functions under the null hypothesis (H 0 ) and the alternative hypothesis (H 1 ), respectively, (L 0 )&(L 1 ). Equivalently, the test statistic can be written as 2[ln L 1 (Θ) − ln L 0 (Θ)], where Θ is the maximum likelihood estimator MLE for the parameter Θ.
Consider the hypothesis H 0 : the number of components in the exponential finite mixture is k against H 1 : the number of components in the exponential finite mixture is k + 1. e procedure of testing is sequential for k � 1, 2, . . . using the LRT statistic. Bootstrapping the LRT is as follows.
(1). Find the MLE of the parameters Θ, of the finite exponential mixture for k and k + 1 and calculate the LRT statistic which is referred to as L obs . For the case k � 1, the MLE of Θ, Θ � x, the sample mean. (2) Generate a bootstrap B sample of size n (n is the sample size) from the exponential mixture of k components and calculate the value of −2 ln λ after obtaining the MLEs of Θ under k � k and k � k + 1.
e EM algorithm for a finite mixture of exponential distribution is used as mentioned in [15], as follows: where f(x; Θ) and f i (x; θ i ) are represents in equations (1) and (2), respectively.

Simulation Results
In order to find the LRTunder the null hypothesis H 0 , we use the simulated data for the sequential test from a mixture of univariate exponential distributions. Accordingly, we use the mix tools package for R, which provides a set of functions to analyze a variety of finite mixture models. e repmix package is used to generate a random sample for a mixture of univariate exponential distributions. en, we require the MLE of the mixing distribution. To find the best fit of Θ, the function nlm can be used to find the best fit for the model. e number of the finite exponential components test is applied to choose the number of components (k � 2, 3, 4). To calculate the quantiles of the LRT tests, we simulate the null distribution of −2 ln λ for the sample sizes at n � 20, 50, 75, and 100 according to the stopping criteria described in [14] by using acc � 10 − 5 as the level of accuracy. Each distribution is generated for the parameters in the model with 500 bootstrap samples. Tables 1-3 present the quantiles at the significance levels 10%, 5%, 2.5%, 1%, and 2%. e test always rejected for each k � 2 and k � 3 the number of components at sample size n � 20, and the test was repeated for five times, as shown in Tables 2 and 3. e epitome from the simulation is at k � 3, sample sizes are n � (20, 50, 75, 100), and the choices of parameters are (0.8, 0.15, 0.5, 2, 4). en, the best estimates for the optimal values for L obs are (2.03, 3.6, 6.54, 10.3), respectively. Further, simulation results depend on the following factors.
As shown in Tables 1-3, to calculate the quantiles, when the sample sizes increase, so do the values of the quantiles; 2 Journal of Mathematics for example, in Table 1, at k � 2 in the choice of parameter (0.9, 0.35, 5) for sample size n � 20, we get the result of α � 0.01 for the five repetitions and once at the significant level of α � 0.025, while for n � 100, we get the results for the significant levels of α � (0.01, 0.025, 0.05). e same goes for k � 3 and k � 4 (see Figure 1; see also k � 4, which shows how the sample size affects the value of the quantiles and the level of acceptance for α). With respect to the initial values of the parameters, as the initial vector consists of (p 1 , . . . , p (k−1) , θ 1 , . . . , θ (k) ) the simulation results reveal that when there is a large difference between the initial of the parameter (θ (k) ) the number of rejected ones decreases and get the quantile results for high values of α. is is clear at k � 3 with the parameters (0.12, 2.97, 2.92) and the maximum number of accepted quantiles is in α � 0.05, while in the parameters choices (0.52, 2, 4), the accepted values of the LRT at the levels of significance are α � 0.01 and 0.5 for sample size n � 75. For the mixing proportion, when the initial value of p 1 or the sum of (p 1 , . . . , p (k−1) ) is closer to 1, the number of rejected ones decreases. is is clear when we compare the results of the parameters at k � 3(p 1 � 0.6, p 2 � 0.3). en, the accepted values of the level of significance are up to α � 0.025 for a large sample size n � 100. On the other hand, for the mixing proportions (p 1 � 0.8, p 2 � 0.15), the accepted values of α are up to 0.5 in the same sample. Finally, according to the increasing number of components k and the increasing levels of significance (α), it is obvious in the tables for the LRT that the maximum level of significance (α) at k � 4 is 0.9 and at k � 3 it is 0.5, even though at k � 2, it is 0.1.

The Power of the Sequential Test
e empirical power of the sequential test for the k components against k + 1 is evaluated for sample sizes n � 20, 50, 75, and 100 and k � 1, 2, 3, 4. e empirical power is defined as the proportion of times H 0 was rejected when the data were generated under H 1 . e power is simulated for 500 bootstrap samples and the different choices of parameters. Also, the power is calculated for the significance levels at 10%, 5%, 2.5%, and 1%.
For each case, we study the effect of increasing the distance between the parameters on the power of the test. Also, we study the effect of increasing the mixing proportion and the sample size n for the test. e power results are shown in Tables 4-6. For each case, when the sample size increases, the powers improve for every component (k � 2, 3, 4). To test the k � 2 components against k � 3, the power is increased for large sample sizes, starting from n � 50 and over. For the k � 3 components against k � 4 for small sample sizes of n � 20, the power is always decreased. Performances of empirical power are affected by the sample sizes and not the reverse (see Figures 2-4).

Application
e sequential test procedure is applied in two real data as follows.

Application (1).
e data considered in this application are given by Maswadah [18].  (2). It is an application of the sequential mityres of an exponential test for fitting exponential mixtures. According to Smith and Naylor [19], the following data represent the strength of 1.5 cm glass fibers.

Application
Data We apply the sequential testing of the number of components in an exponential finite mixture for the above two real datasets for sample sizes n � 20 and 63, respectively. e sequential results for the two applications are given in Tables 7 and 8. Column 1 in Tables 7 and 8 contains the number of components k in finite mixtures of exponentials. Column 2 contains the values of the LRT statistics for testing k versus k + 1. e test's p values are obtained between 0 and 1 by using 500 bootstrap samples, as described previously. e last four columns contain some information criteria that Journal of Mathematics               Tables 7 and 8 lead to the selection of the mixture model with one component. It can also be seen from Tables 7 and 8 that the LRT increases when the number of components decreases, while the information criteria increase as the number of components increases. Briefly, the best mixture models at k � 1 because it has the largest p value and the minimum values for the four criteria that are used (Algorithm 1).

Conclusion
In this paper, the sequential testing of the number of components in exponential finite mixture is discussed. Simultaneously, the optimal number of components for a finite exponential mixture is determined to provide the appropriate fit with the data. A resampling approach to determine the number of components is used via B bootstrap samples. Bootstrap samples are generated from the finite exponential components under H 0 . e value of −2 ln λ n is evaluated for each bootstrap sample. is process is repeated for 500 bootstraps to obtain the j th order statistic as an estimate of the quantile. e power results for the estimated number of finite exponential components are computed. Two applications of real data are used to illustrate the sequential test. us, the innovation in this sequential test is that it permits the testing of the hypothesis of k components in an exponential mixture against k + 1 components, along with the determination of the optimal exponential mixture. It thus provides a general method than the one commonly used in exponential mixtures which focuses on testing one component versus a mixture of two components. e importance of this sequential test lies in cluster analysis and other applications. Finally, it is clear that our sequential test, which was applied to finite exponential mixtures, can be applied to finite mixture models from any family of mixtures.

Data Availability
e data used to support the findings of this study were obtained from Maswadah [18] and Smith and Naylor [19].

Conflicts of Interest
e author declares that there are no conflicts of interest.