The Analysis of the Incidence Rate of the COVID-19 Pandemic Based on Segmented Regression for Kuwait and Saudi Arabia

Since the initial detection of the novel coronavirus in Wuhan, China, at the end of 2019, the virus has spread rapidly worldwide and has become a global health threat. Due to rising infections, it has a signiﬁcant impact on society as well as the economy. Although vaccines and treatment are available now, there is a need to control the pandemic’s spread by appropriate strategies and policies. This study evaluates incidence such as positive rates and mortality rates through breakpoints, which were not undertaken in Kuwait and Saudi Arabia. In this regard, we have two-fold objectives: (1) to forecast the cumulative conﬁrmed cases and death cases and (2) to compute the incidence rate within two consecutive days and evaluate the forecasted periods. The autoregressive integrated moving average model is used to forecast the cumulative conﬁrmed cases and death cases for two months. The segmented regression model is used to split the pandemic time series into six periods and compute the incidence rate. Our results show that cumulative conﬁrmed cases will reach 335733 in Kuwait and 445805 in Saudi Arabia by the beginning of June 2021. The cumulative death cases will reach 1830 and 7283 in Kuwait and Saudi Arabia, respectively. However, the positive rate will increase during the forecasted period in both countries, while the death rate will decrease in Kuwait and increase in Saudi Arabia. The study results can help public health organizations and decision-makers to control the spread of infectious diseases.


Introduction
During the past two decades, several severe outbreaks such as the severe acute respiratory syndrome coronavirus (SARS-CoV), H1N1 influenza, and Middle East respiratory syndrome coronavirus (MERS-CoV) have been reported in Middle East countries [1]. Most recently, an pandemic of unknown respiratory infections was reported on December 31, 2019, in Wuhan, China [1,2]. Initially, the pandemic expeditiously spread from Hubei province to all provinces of China and then other countries. 22,112 confirmed cases were reported in Wuhan and 31,481 in mainland China on February 6, 2020 [3].
e World Health Organization (WHO) announced that the pandemic is named coronavirus disease 2019 (COVID-19) [4] and it was also declared as a global pandemic on March 11, 2020 [5]. e first case of COVID-19 was directly linked to Huanan Seafood Wholesale market of Wuhan, where the transmission from the animal to human was considered as the main source. is virus can cross barriers and cause severe diseases in the society. Later, it was concluded that the COVID-19 outbreak is transmitted from person to person and that symptomatic people are the main source of spreading the infection [1]. Shortness of breath, fever, and cough are the main symptoms of COVID-19 [6]. e new virus is a highly infectious disease and has spread quickly worldwide [2]; the first case was diagnosed on February 24, 2020, in Kuwait [7] and on March 2, 2020, in Saudi Arabia [8]. Infections are on the rise around the world and pose a serious threat to human lives. e impact of the pandemic was not limited to the health of people's lives, but it has a huge impact on the world economy [5]. Many governments worldwide have introduced strict curfew, partial lockdown, and quarantine and called for social distance and work from home options to reduce the spread of the outbreak [2].
is pandemic raised many questions. Among these questions are the following: (i) What will be the duration of the pandemic's peak? (ii) How many people will be affected during the peak of the pandemic? (iii) What will be the infection rate in the future? Many researchers around the world have performed several statistical analyses. At a time when the number of reported pandemics is growing rapidly, forecasting is vital to curb further spread [9]. e study in [8] has collected the daily confirmed, death, and recovery cases from March 2 to May 15, 2020. e aim of the study is to forecast the confirmed cases across Saudi Arabia by logistic growth (LG) and susceptible infected recovered (SIR) model. e authors have predicted a total of 69,979 to 79,000 cases and stated that the pandemic would reach its final stage by the end of June 2020. It is very important to know the spread rate of any pandemic at a specific time. e authors in [10] have presented a statistical framework and analyzed the growth and double rate of COVID-19 for six European countries by breakpoints. e work in [11] analyzed the propagation characteristic of COVID-19 with SARS and MERS by the growth model. e daily time series of COVID-19 is collected from January 21 to March 18, 2020, of four regions of China, while SARS and MERS data are collected from a different region of the world where there was a significant influence.
e results indicate that the growth rate of a pandemic is twice as compared to SARS and MERS. It means that this outbreak can spread without intervention and cases can double within two to three days. e authors in [12] have introduced a new class of statistical model to provide the best description of COVID-19. ey have used the daily confirmed, death, and recovery cases up to April 18, 2020, of Asian countries and studied the submodel of the proposed method known as extended Weibull distribution in detail.
eir results indicate that the proposed model may provide a good fit to the death cases of COVID-19. Another study in [13] has presented the proposed NG-Weibull distribution to describe the best description of daily death cases of Iran and China from January 23 to April 2020. ey suggest that the proposed model fits the daily death cases very closely to the actual situation. e authors in [14] have used the daily infected and recovered cases from February 2 to April 15, 2020. e purpose of their study is to evaluate the impact of intervention policy on the transmission rate of South Korea. e SIR model was utilized with and without joining points. ey have observed that the spread rate has fallen to 0.23 after the breakpoint estimate on March 7, 2020. e researchers in [7] have used the deterministic and stochastic models to estimate the size of COVID-19 before and after repatriation.
ey have found that the estimated reproductive number (R 0 ) is 2.2 in Kuwait before the repatriation and until April 10, 2020. Even after the repatriation plan, it confirms the effectiveness of containment measures to control the transmission in Kuwait's state. e authors in [15] have used the daily confirmed cases from February 23 to May 7, 2020, to understand the temporal and spatial pattern for the two socioeconomic communities' citizen residents and migrant workers. ey have found that the outbreak is continuously growing (R 0 > 2), and it is an indication of significant transmission. Also, it is stated that the migrant worker's area has found a significant spread. However, the government's intervention measures have significantly reduced the infection rates in migrant workers' areas.
Due to increasing cases of COVID-19, the spread is not slowing down in Kuwait and Saudi Arabia. It is essential to know the outbreaks' trend, duration, peak, and incidence rate within a specific period, which have not been reviewed for both countries. In this paper, we will forecast the pandemic cases and then give insight into the incidence rate within two days as well as across the periods. e analysis of incidence rate and forecasting during the outbreak becomes essential and useful for preventing further spread, which can help in making political decisions such as staying at home and locking down. e rest of the article is arranged as follows: Section 2 includes the study area and methodology. Sections 3 and 4 cover the results and discussion of forecasting and incidence rates. Section 5 contains the conclusion and recommendations.
is website updates the global data of the outbreak countrywise. e study area is represented in Figure 1. e daily cumulative time series consists of two variables, namely, confirmed and death cases. erefore, the daily cumulative time series of confirmed and instances of death is included from February 24, 2020, to April 11, 2021, for Kuwait. Likewise, the size of the daily cumulative time series of confirmed and death cases is included from March 2, 2020, to April 11, 2021, for Saudi Arabia.

Autoregressive Integrated Moving Average Model (ARIMA).
e study in [16] presented an autoregressive integrated moving average (ARIMA) model as the modification of the autoregressive moving average (ARMA) model. e ARMA model is only applicable when the series is stationary, while the ARIMA model can deal with both stationary and nonstationary series.
e ARIMA model has three parameters, such as p, d, andq. e orders p and q are associated with autoregressive (AR) and moving average (MA) models, whereas d represents the differencing parameter and is used to make a stationary series. According to [17,18], the general ARIMA model can be defined as where ϕ(β) is associated with AR part of order p, while φ(β) is associated with MA part of an order q. Moreover, the notation ∇ d indicates a different parameter. ARIMA's construction was based on three stages: identification, parameter estimation, and model diagnostic. e time series analysis has been done on R version 3.6.1 by the "forecast" package.

Generalized Linear Model (GLM)
. e generalized linear model (GLM) is a generalization of the classical linear regression model as it permits a response variable with error distribution except for a normal distribution. e GLM consists of three components: random component, systematic component, and link function. e GLM model permits the systematic component to connect with random components through link functions. According to [19], the general form of GLM model can be written as where g(μ i ) represents the link function. e notations a and β i (i � 1, 2, 3, . . . , k) indicate the intercept as well as slope coefficients. ere are commonly used link functions, i.e., identity, log, power, square root, and logit. When the response variable is count, then the Poisson model is frequently used as the response distribution. In the Poisson model, the logarithm link function, i.e., log(μ), is utilized to explain the response mean μ in terms of an explanatory variable z. e general form of the Poisson link function is as follows:

Segmented Regression Model (SRM).
e nonlinear function estimates connected through two, three, or more straight lines at unknown points are referred to as breakpoints, join points, or change points. A relationship between the mean response E[y 0 ] � μ and the explanatory variable z is explained by adding the linear term of the model. In Muggeo et al. [20], the general segmented relationship can be described as

Mathematical Problems in Engineering
where φ i (i � 1, 2, 3, . . . , q) represent the breakpoints. Additionally, the notations β 1 and ϕ i (i � 1, 2, 3, . . . , q) define the slope and difference in slope. e segmented relationship described that the time axis is split into q + 1 intervals based on event occurrence patterns.

Akaike Information Criterion (AIC)
. Several tools are available in the literature to measure the goodness of fit of models. One of the best commonly used tools is the Akaike Information Criterion (AIC) that estimates models' quality. According to [21], the AIC can be described as where L ′ represents the likelihood and P is the parameter of the model. e minimum value of AIC among the several models is an indication of the preferred model.

Accuracy Measure Tools.
Several tools measure the accuracy of the stochastic model, but the most frequently used tools are mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). According to [22], these tools are as follows: where y 0,obs,i and y 0,pred,i are the observed and predicted time series. Besides, the notation m indicates the length of the time series. e lowest value of these accuracy measure tools is an indication of an accurate forecast.

Results
In the present study, we have used a daily time series of cumulative confirmed and death cases of Kuwait and Saudi Arabia to describe the incidence rate within two consecutive days. Initially, the ARIMA model is applied to both cumulative time series data for two months from April 12 to June 11, 2021. In this regard, the time series plot is constructed to visualize the cumulative case trend and can be observed in Figure 2. From Figure 2, (a) since May 24, the number of confirmed cases in Saudi Arabia is increasing compared to that in Kuwait and (b) similarly, the number of death cases in Saudi Arabia is rising, while Kuwait has slightly dropped.
In the identification phase, the Anderson-Darling (AD) test has been applied to verify the daily cumulative stationary assumption of time series. Initially, the parameters of ARIMA models are estimated by visualization of autocorrelation function (ACF) and partial autocorrelation function (PACF) plot as described in [17]. Subsequently, several combinations of parameters were used to test the suitability of the ARIMA models. eir effectiveness has been verified by the lowest values of MAE, RMSE, and MAPE. Of the suitable ARIMA models, the best model was selected based on the minimum value of AIC. Besides, the parameters of the best-fitted model are estimated by maximum likelihood estimation (MLE) and details are available in Table 1. e next step is to inspect the residuals of the model. e residuals should be uncorrelated and normally distributed. erefore, the autocorrelation in the residuals was tested by the Ljung-Box test and visualization of ACF. Furthermore, the normality assumption of residuals is confirmed by the Shapiro-Wilk test and visualization of a histogram. e estimation detail of the best-fitted ARIMA models for each cumulative time series is described in Table 2.
After this, all required conditions have been confirmed by residuals of the best-fitted ARIMA model. erefore, the two-month ahead forecast, such as April 12 to June 11, 2021, is done for each cumulative time series. e forecast, along with the 95% confidence level, is shown in Figures 3  and 4.
From Figure 3, in Kuwait, if the current strategy is continuous, then the cumulative cases might reach 292139 in May and 335733 in June, respectively, while in Saudi Arabia, the forecast for cumulative confirmed cases might reach 422520 and 445805 in May and June 2021, respectively.
From Figure 4, the cumulative death cases in Kuwait might increase to reach 1612 in May 2021 and 1830 in June, respectively, while in Saudi Arabia, the cumulative cases might reach 7022 and 7283 in May and June 2021, respectively.
From April 12 to June 11, 2020, the forecasted cumulative time series is merged with the historical time series to construct a new time series to analyze the incidence rate. e incidence may refer to positive or confirmed cases of COVID-19 and death cases. erefore, each country's latest cumulative time series has been used to fit the GLM with the Poisson link function. e aim of using a Poisson link function is to deal with count data. e complete estimation detail of GLM is described in Table 3. e next step is splitting the time series into intervals. e intervals may refer to breakpoints, change points, and join points. e SRM is used to partition the new time series of confirmed and death cases and fit separate line segments at each interval to analyze the incidence rate within two days. e splitting aims to examine the incidence rate that can be observed with a specified period. Finally, the SRM has split   the new time series into five breakpoints based on BIC's lowest value, and details can be seen in Table 4.
We have obtained five breakpoints that are connecting points between the lines. We consider these lines as periods where the trends of cases are similar. Moreover, the different periods are evident in different colors in the figures. erefore, the incidence rate with a 95% confidence interval is calculated for each period and observed in Figures 5 and 6.
It is important to note that the vertical black dash line distinguishes between historical and forecast observations in both figures.
From Figure 5, it is found that the positive rate had increased, as period-1 (February 24 to May 19) 7.9032% of positive rate, and it is gradually decreased to 3.1656%, 1.3929%, 0.6649%, and 0.2167% in period-2, period-3, period-4, and period-5, respectively. e positive rate  increased to 0.5559% from January 25 to April 11, 2021, and then stayed stable in the forecast period (period-6). In Saudi Arabia, the positive rate was high in period-1, 9.5177% from March 2 to May 5, 2020, and then gradually changed. In period-2, the positive rate reached 3.1886% and then 1.6106%, 0.429%, and 0.0759% for period-2, period-3, period-4, and period-5, respectively. Moreover, the positive rate was 0.186% and stable for the forecast period and gradually increased after March 24, 2021 (period-6). Figures 5(a) and 5 (b) show that the positive rate is high in Saudi Arabia as compared to Kuwait during period-1. After that, the positive rate is decreased in the remaining periods for both countries. But Saudi Arabia has the lowest positive rate in comparison to Kuwait during period-6, which includes the forecast period. us, as we have observed in all historical periods, significant changes have become apparent.
As seen in Figure 6, in Kuwait, the mortality rate (10.329%) was high in period-1; then it is gradually decreased in period-2, period-3, and period-4 and then increased to reach 0.6136% during period-5. We can observe that the mortality rate is reduced up to 0.42289% during period-6. In Saudi Arabia, the mortality rate (4.4165%) is found to be the highest in period-1, and after that, it had a gradual decrease to reach 0.0835% till period-5, which then slightly increased to reach 0.1241% during the forecast period.
Figures 6(a) and 6(b) demonstrate that the mortality rate has decreased until period-4 in Kuwait and till period-5 in Saudi Arabia. But the abrupt changes have been found after the aforementioned periods in both countries. Overall, the mortality rate is lower in Saudi Arabia than in Kuwait during period-6.

Discussion
e world is currently facing the severe COVID-19 pandemic, which originated in Wuhan, China, and then spread to other countries. e confirmed cases were imported from Iran and Iraq by travelers in Kuwait and Saudi Arabia, respectively [23]. It is observed that the cumulative confirmed cases are increasing day by day in both countries. Still, there is a massive difference between the cumulative cases in Kuwait and Saudi Arabia (Figure 2(a)). Similarly, there is a lot of difference in cumulative death cases between both countries, but Kuwait has a slight decrease in mortality rate during September 2020, followed by a gradual increase (Figure 2(b)). Many factors play an essential role in the spread of COVID-19, such as population and community lifestyle [24]. However, the primary factor in the rising of cases in Saudi Arabia is the larger population than Kuwait. A previous study shows that the pandemic will be entered in the final phase at the end of June 2020 [8]. Unfortunately, the cases are gradually increasing until now. Before the situation worsens, we need to make an accurate prediction to take serious steps to control the pandemic's spread. e authors in [25] suggest that the ARIMA model is best to forecast daily cases, so we have utilized the ARIMA model to predict cumulative confirmed and death cases for both countries. Our forecast results indicate that if the current strategy of both governments remains the same, there will be a further increase in cumulative confirmed and death cases by June 12, 2021 (Figures 3 and 4). Furthermore, the SRM was used to explore the incidence rate within two days across the periods. Figure 5 shows that the positive rate is high in both countries during period-1 and then decreases to one-third during period-2 in Kuwait and Saudi Arabia.
In Kuwait, a strict curfew imposed from May 10 to May 30, 2020 [7], is the primary reason for reducing positive cases. Before the curfew, the Kuwait government has implemented a partial lockdown, such as 5 pm to 4 am, which was not as effective as strict curfew in the whole country. As likely to confirmed cases, the mortality rate was higher in period-1 than in other periods (Figure 6(a)). Similarly, the mortality rate is reduced to one-third in period-2 and then continuously decreased, and an abrupt rise was found in period-5. erefore, we can say that the strict lockdown has reduced the peak of the pandemic in Kuwait, and it has proved to be very useful. Like Kuwait, Saudi Arabia had a higher positive rate during period-1 ( Figure 5(b)), and the government imposed a partial lockdown on March 23 from 7 pm to 6 am. But, a high number of cases have been diagnosed during period-1 and the government had announced a complete lockdown on April 6 [26]. e lockdown significantly reduced the pandemic's spread, as we saw in period-2, where the positive rate of onethird had been identified. Figure 6 (b) shows that the mortality rate was high in period-1 and then gradually decreased till period-5, and a slight increase was found in period-6.
Our forecast results indicate that if the current strategy continues, the number of cumulative confirmed and fatalities cases in both countries will increase by June 12, 2021. If we discuss the incidence rate, the positive rate will be increased during the forecast period (period-6) in both countries ( Figure 5). But, the mortality rate is expected to decrease in Kuwait as compared to the previous period-5 and increase in Saudi Arabia during the forecast period ( Figure 6). Overall, Kuwait's positive and mortality rates are expected to be higher than in Saudi Arabia during the forecast period (Figures 5 and 6). Our analysis suggests that the incidence rate may decrease further in the future but may require strict lockdown in both countries.
ere are some limitations as the ARIMA model did not have automatic updates when the new observation was merged with the previous observation. Also, a long historical record may be required for obtaining prediction accuracy. e ARIMA model is based on a linear structure. If the series has high nonlinearity, then the accuracy may decrease. ere is no limit to the number of segmented variables and breakpoints in SRM. Because the segmented relationship assumes a linear pattern over each segment, response can follow a nonlinear trend.

Conclusion
is study granted more profound insight into the COVID-19 outbreak in Saudi Arabia and Kuwait during the intervention activities. According to our analysis, the pandemic's peak was during period-1, as it spreads rapidly in both countries after the first case is diagnosed, and then the transmission gradually decreases. Similarly, the mortality rate peaked during period-1 and has declined since then. Our forecast results reveal that the cumulative confirmed and death cases are expected to increase by the beginning of June in Kuwait and Saudi Arabia. Still, the positive cases may increase in both countries if the current strategy continues. At the same time, the mortality rates may decrease in Kuwait and increase in Saudi Arabia. Both governments can implement appropriate strategies and policies to avoid further transmission in the future. It is suggested that this platform provide valuable information about the incidence rate and provide guidance for future precautions.

Data Availability
All daily cumulative time series of COVID-19 of Kuwait and Saudi Arabia have been obtained from the following source: https://data.humdata.org/dataset/5dff64bc-a671-48da-aa87-2ca40d7abf02. is website updates the global data of the outbreak countrywise.

Conflicts of Interest
e authors declare that they have no conflicts of interest.