Information Approach for Change Point Analysis of EGGAPE Distribution and Application to COVID-19 Data

Department of Mathematics, Pan African Insitute of Basic Science, Technology and Innovation, Nairobi, Kenya Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya Department of Accounting, College of Business Administration in Hawtat Bani Tamim, Prince Sattam bin Abdulaziz University, Hawtat Bani Tamim, Saudi Arabia Department of Statistics and Operations Research, Faculty of Science, King Saud University, P. O. Box 2455, Riyadh 11451, Saudi Arabia Mathematics Department, College of Science, Jouf University, P. O. Box 2014, Sakaka, Saudi Arabia Department of Mathematics, Faculty of Science, Minia University, Minia 61519, Egypt


Introduction
e concept of change point is important in statistical analysis since it helps to identify the points at which a time series' distribution changes. Change point Analysis is of great interest in real-life phenomena such as in health science, nance, and survival analysis. e contents of change point inference include the aspect of determining the existence of a change point and then estimating the number and the positions of the change points. Since the inception of the concept of change point analysis, many studies have been conducted. Sen and Srivastava [1,2] studied and came up with a statistic for detecting change in the mean of variables that were characterized by a normal distribution and derived the asymptotic, exact distribution. Change point in a binomial probability model, the power of likelihood ratio, and cumulative sum tests were investigated in [3]. In the context of distributions, Ngunkeng and Ning [4] did a study on change point problem for generalized lambda distribution. For an exponential distribution characterized by repeated values, change point identi cation was carried out in [5]. Hassan et al. [6] obtained weighted power Lomax distribution and its length biased version. Shrahili et al. [7] discussed the alpha power moment exponential model with applications to biomedical science. Chen and Arjun [8] did an extensive study on the statistical change point in the parametric context and gave applications related to the elds of nance, medicine, and genetics. Change point detection utilizing the information approach considering regular models was explored in [9]. Arellano-Valle et al. [10] considered the problem of change point for the Skew normal distribution using the Bayesian approach. Alghamdi et al. [11] studied the Rayleigh Lomax distribution and used the information approach to identify potential change points in the parameters. Almetwally [12] did an extensive study incorporating odd Weibull inverse Topp-Leone distribution with applications to COVID-19 data. Recently, Ratnasingam [13] did an extensive study incorporating modified information approach and confidence distribution skew normal distribution.
In the area of statistics, the unknown time point when observations follow different distributions before and after the point is described as change point. Following the definition of change point, the description follows. Let Z 1 , Z 2 , . . . , Z n be a series of independent random variables with CDF given as G 1 , G 2 , . . . , G n , respectively. e change point problem entails testing the null hypothesis as follows: (1) Against the alternative, where 1 < k 2 < k 2 < · · · < k q < n denotes the change points number and k 1 , k 2 , . . . , k q are the unknown locations of the change points that are to be estimated. Of importance, if G 1 , G 2 , . . . , G n are from the same parametric family, the change point problem turns out to be a test of the null hypothesis of the parameters of the population ϕ i , i � 1, 2, . . . , n, stipulated as versus the alternative, e Exponentiated Generalized Gull Alpha Power Exponential (EGGAPE) distribution is a recently developed distribution in [14]. It is flexible enough and can take various shapes of the hazard functions depending on the values of the shape parameters. e probability distribution of the EGGAPE distribution is given by where α, a, b > 0 are the shape parameter and λ is the scale parameter. e case, where a � b � α � 1, is the exponential distribution, the case, where α � 1 and a � 1, is the exponentiated exponential distribution, the case, where α � 1, is the exponentiated generalized exponential distribution. Several authors have investigated the change point problem for several distributions. ElSherpienyAlmetwally [15] introduced exponentiated generalized alpha power family. Jandhyala et al. [16] came up with change-point methodology that was used to identify changes in the parameters of the two-parameter Weibull distribution. e statistic they developed was the likelihood ratio test that was used to detected unknown changes in parameters, and the change points were located. Almongy et al. [17] discussed likelihood function for multicomponent stress-strength model under power Lomax distribution. Hafez et al. [18] studied likelihood of single and multiple ramp progressive stress with binomial removal. e application of the model was on temperature data. Jarušková [19] did a study to test the presence of change point using the log-likelihood statistic in a three-parameter Weibull distribution. Ratnasingam [20] proposed a procedure that was built on the MIC and the confidence distribution in a three-parameter Weibull distribution for detecting and estimating changes. To identify and find changes in the parameters of a four-parameter EGGAPE distribution concurrently, we present a methodology based on the information approach, specifically modified information approach, and Schwarz information approach. e proposed method can be applied to a variety of parametric distributions as long as the regularity and Wald requirements are met. e following is how the rest of the study is organized. In Section 2, we look at approaches based on the MIC and SIC for detecting simultaneous changes in all parameters. In Section 3, simulations for the scenarios will be run with a variety of parameter and sample size variations in order to examine the test's power. Section 4 shows how the algorithm was applied to three COVID-19 datasets to demonstrate change point detection. e results and areas for additional research are presented in Section 5.

Information Approach.
is section presents the methodology applied to detect the possible change points. e modified information approach and the Schwarz information criterion are discussed. Change point problem generally usually involves the estimation of parameters and testing of hypothesis. To be more specific, the null hypothesis tested is that there is no change point against that there exists at least one change point which is the alternative hypothesis. e use of model selection criteria is one of the most popular methods for change point detection. e Schwarz information criterion (SIC) was developed in [21]. As pointed out in [9], the SIC technique does not take into account the model's complexity, which might lead to redundancy in the parameter space. To address this shortcoming, Chen et al. [9] came up with the MIC technique that adjusted SIC penalty's term to reflect the contributions of the change points' locations to model complexity. Let y 1 , y 2 , . . . , y n be a sample chosen at random from a density function. e following is the SIC criterion: where L(Φ k ) is defined as the likelihood function of the model, n is the sample size, and q is the number of parameters in the model. In this, we denote Φ B and Φ A to be the parameters before and after the change point. e symbol k denotes the unknown change point location. When at least one change point is present, the SIC is as follows: where 1 ≤ k < n. Equation (7) does not treat the change position as a parameter, which could result in redundancy in the parameter space if the change happens near the end or beginning of the data. e MIC under the null hypothesis is given as where Φ maximizes the log likelihood log L(Φ). e MIC criterion under the alternative hypothesis is defined as follows: choose the model that has a change point, and the location of the specific change point is estimated by κ such that e MIC test statistic, which is used to determine the statistical significance of a change point, is defined as follows: Chen et al. [9] showed that, as n ⟶ ∞, in distribution under the null hypothesis. e SIC test statistic is given as e asymptotic distribution of the statistic in (14) is the type I extreme value distribution.

MIC and SIC Detection Approach for EGGAPE Distribution.
In this section, the study focuses on change point problem using SIC and MIC approaches to detect changes in parameters of the EGGAPE distribution defined in Equation (1). Let Z 1 , Z 2 , . . . , Z n be a sequence of independently random variables from the EGGAPE distribution with scale parameter λ and shape parameters α, a, and b. e null hypothesis is versus where 1 < k < n denotes the unknown location to be estimated.
For H 0 , the SIC and MIC are defined as where λ, α, a, b are MLEs of scale parameter λ and shape parameters α, a, b, respectively, fitted to whole dataset. e log likelihood function under H 0 is To obtain the MLEs of a, b, α, λ, then we let and so that w a ′ � zy/za, w α ′ zy/zα �, w λ ′ � zy/z λ, m a ′ � zm/za, m b ′ � zm/zb, m α ′ � zm/zα, and m λ ′ � zm/zλ: Partial derivatives of the log-likelihood function with respect to a, b, α, λ and equating them to zero are given: e parameter estimates for a, b, α, λ are obtained by equating equations (22)- (25) to zero and solving the system of nonlinear equations.
Under H 1 , the SIC and MIC are defined, respectively, as where α * , λ * , a * , and b * are the MLEs of α, λ, a, and b, respectively, fitted to the first segment of data and α * * , λ * * , a * * , and b * * are the MLEs of α, λ, a, and b, respectively, fitted to the second segment of data.

Simulation
In this section, simulations are carried out to assess the test's power in two scenarios: when there is a change point and when there is not. First, we conduct the simulation for SIC and MIC when there is change point.

Simulation Study:
When ere Is Change Point. In this section, change point problem of the scale and the shape parameters of the EGGAPE distribution were studied. To be able to calculate the statistic SIC(n), MIC(n), SIC(k), and MIC(k), the bbmle package developed in [22] was used to fit a dataset with EGGAPE distribution since the first derivatives of the log(f (z i ; a, b, α, λ)), log(f(z i ; a * , b * , α * , λ * )), and log(a * * , b * * , α * * , λ * * ) We conduct simulations 1000 times under the EGGAPE (a, b, α, λ) with different values of the shape parameter a, b, α and the scale parameter λ. e test statistic T n and S n are calculated and compared to the critical values corresponding to the significant level 0.05.
After rejecting, the null hypothesis, the powers of SIC and MIC with different sample sizes n � 100, 200, 300, 400, and different change locations are shown in Tables 1-4. e EGGAPE parameters are changing as a * � 0.9, a * * � 1.2, b * � 0.8, b * * � 0.9, α * � 0.2, α * * � 0.4, and λ * � 0.5, λ * * � 0.8. e purpose of the simulation power test is in order to verify the accuracy of detecting the change point at different locations. As indicated in Tables 1-4, the power increases as the change point location moves to the middle of the data. It can be observed that the MIC has high powers to detect the change point compared to SIC.
Compared to the power of the traditional SIC, MIC has a higher value for the EGGAPE distribution, as shown in Figures 1. It is clear that MIC has a higher power when the change point location k is in the middle of the dataset.
is is because of the penalty term in MIC (2k/n − 1) 2 which is different from the traditional value of 1 in SIC.
If the location of the change point is found at the start of the data and the end of the data, as k ⟶ 1 or k ⟶ n and n ⟶ ∞, is is very close to SIC. However, when the change point is in the middle of the dataset, as k ⟶ n/2, en, this quadratic term (2k/2 − 1) 2 will be canceled. When the change point is exactly the middle term (2k/2 − 1) 2 � 0 and the penalty term of MIC will be log n smaller than that of SIC. It is easier to reject the null hypothesis and detect a change in the data when the information criterion gets smaller. e main difference between SIC and MIC is that MIC has a higher power than SIC to detect the change when the changes happen in the middle of the dataset, as displayed in Figures 1 and 2. e following conclusions can be made with respect to the simulation study when there is change point: (i) As change point location approaches middle of data, the power of the test increases (ii) When the difference between parameters increases, the power of the test increases (iii) As sample size increases, the power of the test also increases (iv) Since the MIC has a higher power than the SIC, then we use the MIC in the application of the real data in detecting a change points

Simulation Study: When ere Is No Change Point.
In this section, we conduct a simulation study to investigate the power of the test when there is no change point in the parameters of the distribution. EGGAPE parameters are not changing as a * � 0.9, a * * � 0.9, b * � 0.8, b * * � 0.8, α * � 0.2, α * * � 0.2, and λ * � 0.5, λ * * � 0.5.
We conduct simulations 1000 times under EGGAPE(a, b, α, λ) with different values of the shape parameter a, b, α and the scale parameter λ. e results for both the SIC and the MIC are given in Tables 5 and 6

Italy COVID-19 Data.
is section explains the change point analysis of COVID-19 death rates data in Italy for a    Table 7 gives the data.
A time-series plot of the dataset is displayed in Figure 5. e mortality rate is calculated as All the parameters are considered changeable.
To identify a change point in the dataset of the Italy Mortality rates, we apply the test statistics defined in Equation (10). e results are displayed in Table 8.
From Table 8, H 0 is rejected and conclude that the change point exists at MIC (40) which equals the mortality rate of 5.073 and reflects the date of 2020-04-06. Based on the binary segmentation method, the dataset was separated into         two parts. e first part is (1 : 40) and the second part (41 : 59). e second change point is successfully identified at k � 33 which reflects the mortality rate at 2020-03-30. Next we conduct the same procedure, and a change point is located at k � 3 corresponding to 2020-02-29. However, no further change points were located. Next, we analyze the      us, the change point occurs at 2020-04-18 located at k � 52. Next, we conduct the same procedure and no further change points were located. Figure 6 represents the change points' location for Italy COVID-19 mortality rate. e possible reason for change point of Italy COVID-19 mortality data is displayed in Table 9.
e change points segmented the span into three segments. e first segment was between 29-02-2020 to 30-03-2020 characterized by high mortality rates. e second segment between 30-03-2020 to 06-04-2020 is characterized by a decline of mortality rates, and finally, the third segment between 06-04-2020 to 18-04-2020 is characterized by a further decline in mortality rates.

Change Point Analysis for UK Data.
is section describes the change point analysis of COVID-19 mortality rates data from the United Kingdom for a period of 76 days, from March 12 to July 15, 2020. https://COVID-19.who.int/ was the source of the data. e data are presented in Table 10.
A time-series plot of the dataset is displayed in Figure 7. e mortality rate is calculated as All the parameters are considered changeable.
To identify the change point in the dataset of the UK mortality rates, we apply the test statistics defined in Equation (10). e results are displayed in Table 11.      Mathematical Problems in Engineering A visual display of the change point locations in the dataset is displayed in Figure 8. e possible causes of the change point in the UK COVID-19 mortality rate data are given in Table 12.

Change Point Analysis for Mexico Data.
is section explains the change point analysis of COVID-19 death rates for Mexico for a period of 108 days that is from 4 March to 19 June 2020. https://COVID-19.who.int/ was the data source. e data are given in Table 13.
A time-series plot of the dataset is displayed in Figure 9. e mortality rate is calculated as      To detect the change point in the dataset of the Mexico Mortality rates, we apply the test statistics defined in Equation (10). e results are displayed in Table 14.
A visual display of the change point locations in the dataset is displayed in Figure 10. e possible causes of the change point in the Mexico COVID-19 mortality rate data are given in Table 15.

Conclusions
Although the EGGAPE distribution is a more flexible distribution that may describe data with monotonic and nonmonotonic hazard shapes, few or no studies of the change point problem for such a distribution have been done. For this study, we present a change point detection method for a four parameter EGGAPE distribution based on the information approach specifically modified information criterion (MIC). All the parameters are considered changeable. e benefit of using MIC-based test is in order to avoid the complications of deriving the complicated asymptotic distributions of test statistic of likelihood ratio test and cumulative sum tests. In addition, we have applied the binary segmentation to detect multiple change points and their locations. In the simulation study for the power of the test, two scenarios were considered: a simulation study when there was a change point and a simulation study when there was no change point. When there was a change point, the power of the test was high, and when there was no change point, the power of the test was so small. e testing procedure is applied to three real datasets related to COVID-19 mortality rates in Italy, the United Kingdom, and Mexico. Multiple change points were successfully identified and their location identified. In this study, we have only considered a case where all the parameters are changing, for future work, study can be done when at least one of the parameters is not changing.

Data Availability
e data used to support the findings of the study are available within the article.

Conflicts of Interest
e authors declare no conflicts of interest.