COVID-19 Pandemic Data Modeling in Pakistan Using Time-Series SIR

Pakistan is currently facing the fourth wave of the deadly coronavirus, which was first reported in Wuhan, China, in December 2019. This work utilizes the epidemiological models to analyze Pakistan's COVID-19 data. The basic susceptible, infected, and recovered (SIR) model is studied assuming Bayesian and time-series SIR (tSIR) approaches. Many studies have been conducted from different perspectives, but to the best of our knowledge, no study is available using the SIR models for Pakistan. The coronavirus incubation period has been set to 14 days across the globe; however, this study noticed that the assumption of 14 days is not suitable for Pakistan's data. Furthermore, on the basis of R0, we infer that COVID-19 is not a pandemic in Pakistan, as it was in other nations, such as the United States, India, Brazil, and Italy, among others. We attribute this to the best strategy adopted by the Government of Pakistan to minimize the burden of COVID-19 cases in Pakistani hospitals. It is also noticed that the posterior-based SIR (pSIR) model with uniform prior toR0 and Poisson distribution (of log-likelihood) provides better results as compared to other distributions. From time-series SIR (tSIR), we observed that the value of the reporting rate (ρ) is less than 1 which means that cases are underreported.


Introduction
There are numerous types of coronaviruses, some of which cause severe threats to human life. Coronavirus gets its name from its shape-"corona" means "crown." The virus's top layers are coated with spike proteins, which act as a crown. The fatal COVID-19 disease started in 2019 which is a respiratory sickness caused by coronavirus. It first appeared in Wuhan, China, near the end of December 2019. The scientists initially treated it as pneumonia, but it spread at an alarming rate all over the world and became the first pandemic of the twenty-first century with a high reproductive rate. The International Committee on Taxonomy of Viruses named it severe acute respiratory syndrome coronavirus-2 [1]. However, on March 11, 2021 The first incidence of COVID-19 in Pakistan was recorded and reported by the Ministry of Health, Government of Pakistan, on February 26, 2020, in Karachi, Sindh. On the same day, Pakistan's Federal Ministry of Health confirmed another case in Islamabad. Within fifteen days, there were twenty confirmed cases out of 471 suspected cases. Sindh province had the greatest number, followed by Gilgit-Baltistan. All of the confirmed cases had traveled from Iran, Syria, or London, and the situation in Pakistan is still alarming. On February 12, 2020, Pakistan's Ministry of National Health Services, Regulation, and Coordination unveiled a plan titled "National Action Plan for Preparedness and Response to Corona Virus Disease Pakistan," with the goal of controlling virus spread and strengthening country and community emergency preparedness to ensure a timely, convenient, and prompt solution for future occurrences. The government introduced various measures over time to control the spread of coronavirus in the country. It includes containment measures, border control, quarantine houses, countrywide smart lockdowns, cordoning off areas, testing and contact tracing, establishing a field epidemiology laboratory program with the help of WHO, implementation of SOPs, initiation of an awareness campaign, production of ventilators, and other economic measures [3][4][5].
If we consider the current COVID-19 pandemic and try to develop a model that predicts how this mortal disease will behave in the upcoming months in Pakistan, it is plausible to think how many of those susceptible can be infected by a single infectious person. Suppose a single infection-carrying person can infect three others who are in contact with him. Then, the model will be like I t = I t 0 , where I t is the total number of infected people at time t and I t 0 is the initial number of infected people. This equation gives us the sudden increase in the disease and predicts that the epidemic will never end. Thus, if we use this single ordinary differential equation (ODE) to model the epidemic, it would not capture the real situation and will show totally different and coarse graphics as compared to the real situation. That is why the SIR model is preferred. The SIR model is time-dependent and depends upon the initial values.
Weiss [6] used the SIR model in public health, whereas Finkenstädt and Grenfell [7] used time-series SIR (tSIR) modeling for children's disease measles. Grenfell et al. [8] studied the dynamics of measles using the tSIR model with the aim of providing a suitable empirical and theoretical test to address the importance of noise versus nonlinearity and temporal predictability of population wealth. Katris [9] used the tSIR, autoregressive integrated moving average (ARIMA), feedforward artificial neural network, and adaptive regression splines for forecasting the outbreak of COVID-19 in Greece. They compared different countries using R 0 and the mean absolute percentage error (MAPE). Pasquali et al. [10] introduced a beta distribution for the observation error in a SIR model to model the underdetection and removed individuals in the compartmental model for COVID-19.
Chen et al. [11] conducted a study on COVID-19 by assuming two scenarios, i.e., the time-wise prediction, timedependent parameters (β and γ) and the undetectable number of infected individuals, to get a more accurate prediction using the time-dependent SIR (tSIR) model. Deo et al. [12] studied the transmission rate of COVID-19 in India using the tSIR model. They estimated that the total infections crossed 9 million with 1 million critical cases. They also estimated R 0 for smart and full lockdowns in different phases. Postnikov [13] conducted a study on COVID-19 using the simple SIR model for parameter estimation and future predictions. Metcalf et al. [14] conducted a study related to rubella in Mexico to capture seasonality, stochasticity, and region-wise variation. Lavielle et al. [15] extended the SIR model to model the COVID-19 data taken from Johns Hopkins University on the basis of the daily confirmed, active, death, recovered, and cumulative number of cases for each compartment. They applied the extended model to several countries like Switzerland, Italy, and the US.
Bjornstad et al. [16] estimated the transmission rates of measles for England and Wales using the tSIR model. Deo and Grover [17] noticed that the unreported pathogens are more threatening than reported and quarantined ones. Their proposed model was susceptible-infected (quarantined/ free)-removed-deceased (SI(Q/F)RD). The estimated values of R 0 of undetected pathogens for California and Florida were 1.464 and 1.612, respectively. On the other hand, R 0 for reported pathogens were 0.497 and 0.359 for respective states at the time of the study. Fang et al. [18] conducted a study on the contagion moral force of COVID-19 to imitate the dispersion of COVID-19 using the susceptible-exposedinfected-recovered (SEIR) model. Waris et al. [3] assessed COVID-19 in Pakistan using different factors like hospital capacity, isolation, quarantine, and vaccination facilities.
Brugnano et al. [19] studied the multiregional extension of the SIR model for the COVID-19 outbreak in Italy to capture the effect of misdiagnosed infected and recovered pathogens. They used the susceptible, diagnosedinfected, undiagnosed-infected, diagnosed-recovered, and undiagnosed-recovered ðSI 2 R 2 Þ model. Ferrari et al. [20] conducted a study about the seasonal incidence change There are different statistical and mathematical (deterministic) models that exist for epidemiological modeling. The existing models can also be used to predict the basic reproductive number of any pandemic like COVID-19. The SIR model, which is also known as the compartmental model, is used to predict the basic reproductive ratio. The SIR is the simplest form of the compartmental models, and all other models are the derivatives of this basic model. The SIR model has three different compartments, susceptible, infected, and recovered. The basic assumption of the traditional SIR model is that the infected (I) and susceptible (S) populations are mixed uniformly and that the overall population (N) remains constant throughout time [23]. Many studies, including time-series models for forecasting and mathematical models, are available for modeling the COVID-19 epidemic and for its prediction. Yousaf et al. [24] used the ARIMA model for forecasting COVID-19 in Pakistan which was the first study to make short-term forecasting about COVID-19 cumulative confirmed cases in Pakistan. The number of confirmed cases was increasing at a higher rate as compared to the number of recoveries. Contrary to their forecasted cases, the reality was totally different. Thus, we prefer to use compartmental modeling for the prediction of the basic reproductive number and fore-casting purpose which is somehow performing better than statistical or time-series models used in epidemiological studies. Thus, the primary objective of the study is to predict the reproductive number and forecast COVID-19 cases in Pakistan using the SIR model. Further objectives of the study are to estimate the parameters (contact rate ðβÞ, recovery rate ð1/γÞ, force of infection ðλÞ, and reproductive number ðR 0 Þ) of the model using Pakistan's COVID-19 data. In addition, the study investigates whether cases are underreported or overreported and decides whether it is a pandemic or not in Pakistan.
The rest of the study is categorized as follows. Section 2 presents the simple SIR, posterior-based SIR, and tSIR models. Analyses using the tSIR, SIR, and pSIR are discussed in Sections 3-5. Section 6 presents some concluding remarks and recommendations.

The SIR Model
The SIR model [1], introduced by Kermack and McKendrick in 1927, is a simple epidemiological model. This primary and simplest deterministic model is used for modeling the trend of epidemics (infectious diseases) and for obtaining its future predictions [25]. The population used in this model comprises three compartments which are as follows:   Computational and Mathematical Methods in Medicine feature of SIR is that it is used to find the basic reproductive number of a pandemic, denoted by R 0 . R 0 basically tells us how many susceptible people can be infected (on average) by a single pathogen. If it is less than 1, we conclude that the pandemic will end very soon, and if it is greater than 1, then it is a pandemic and will take time to end. If R 0 is greater than 3, the whole population will be infected [26].
The mathematical form of the SIR model is where dS/dt + dI/dt + dR/dt = 0. The basic reproductive number R 0 is computed by R 0 = ðβ/γÞN.

Bayesian Approach to the SIR Model.
In the Bayesian approach, the SIR model is slightly different from the simple SIR model as the Bayesian approach treatsR 0 as a random variable, and one can assume negative binomial, Poisson, normal, or log-normal distribution as the prior distribution. A slight modification of the SIR model is given below.       It is worth mentioning that the interpretation of the model is the same as that of the simple SIR model. This model has two parameters, R 0 and infectious period (P inf ). In this study, we use uniform distribution as a prior distribution of R 0 : 2.2. Time-Series SIR Model (tSIR). A time-series susceptibleinfected-recovered (tSIR) model proposed by [7] is employed to study the dynamical behavior of the COVID-19 data. To this end, the data comprise two discrete variables, i.e., "infected cases" and "susceptible." The SIR model can be expressed by the following three ODEs: where ξ ∈ ð0, 1Þ, ξPðtÞ is the fraction of respiratory patients with SARS-2 infection, and ξ might be approximated as a constant near 1 (e.g., 0.9), but it will depend on the time in general. λ is the force of infection which can be expressed as Iβ/N, μ is the total number of deaths at time t, and γ is the recovery parameter. In the cyclical occurrences of the epidemic, the transmission rate β would vary with time. The traditional models give us estimates of each parameter against seasonal data. As we know, the tSIR model is tractable in the situations where seasonality occurs, but it also has some flaws. First, it takes only one main variable, "reported cases." Second, there might be many underreported and overreported cases. Therefore, in this study, we use the "people having respiratory problems" to fulfill the requirements of the tSIR model. It is noticed from the literature that on a daily basis, almost 13,000 persons suffer from respiratory issues (https://www.aku.edu/news/ Pages/News_Details.aspx?nid=NEWS-000849). Thus, to capture under-or overreporting, the tSIR model is modified as where P t+1 and I t+1 are the one-day-ahead forecasted number of people having respiratory problems and reported cases, respectively. Similarly, β t+1 is the one-day-ahead forecasted contact rate given by the tSIR model. Under the tSIR framework, the R function "runtSIR" first fits a simple regression model between the cumulative reported cases and the cumulative number of people having respiratory problems. As there should be a linear relationship between reported cases and people with respiratory problems, from the slope of the fitted regression line, we can conclude that the cases are either underreported or overreported. We name this slope as ρ t , and it tells us whether the cases are under-or overreported.
If the value of ρ is near 1, one may assume that cases are almost entirely reported, and if it is less than 1, one can conclude that cases are underreported. However, if its value is more than one, one can assume that instances are being reported excessively. The "runtSIR" function also gives us S t    Computational and Mathematical Methods in Medicine and β t , susceptible dynamics with respect to time t and contact rate of the single pathogen over time t, respectively. After taking the expectation of equations (5) and (6), one can get the following log-linear equation: where Z t is the residual of the fitted regression model and α is the estimated homogeneity parameter by using the generalized linear model (GLM). S is the mean number of susceptible people of the overall series, and α is a parameter which describes the intensity of the epidemic. Furthermore, the "runtSIR" fits the above log-linear relationship and resimulates the SIR model (forward and backward) by using the estimated parameters.  Table 1). In particular, the blue curve shows the actual number of daily reported cases while the gray curve presents the behavior of the cases using the tSIR model. On the basis of prior lag daily cases, it is noticed that the first wave is better fitted by the tSIR. However, in the second wave, there is a small amount of uncertainty between the fitted number and the observed number of instances. There is a variation in the third wave, but it appears to be a better match overall. However, when we look at the fourth wave, a significant increase in the fitted curve is noticed, which indicates that the disease may pose a serious threat in the upcoming days. Figure 2 shows the wave-wise behavior of ρ of COVID-19 data for Pakistan. Figure 3 reports the daily total number of susceptible people estimated by the tSIR model for four waves of COVID-19, where S is the mean number of susceptible people. To depict this figure, initially, 1% of the population is taken as the susceptible population. Figure 4 presents the behavior of the contact rate ðβÞ along with its respective estimated (by the tSIR model) intervals for four waves of COVID-19, respectively. Furthermore, it represents α, which is the disease intensity parameter. Figure 5 depicts the daily number of cases for the first five hundred days of COVID-19. One can observe that there are four waves, and it can be observed that the highest number of cases is observed for the first wave. It is worth mentioning that to depict the figure, we set 1000 as the outbreak threshold. Figure 6 depicts the evolution of β for the first 500 days of COVID-19 data in Pakistan, as well as its estimated intervals. It also reflects the severity of the disease ðαÞ. The contact rate appears to be growing with time, which may lead to an increase in the daily number of cases.

Five-Hundred-Day Analysis.
The behavior of the estimated reporting rate ðρÞ using the first 500 days of COVID-19 data for Pakistan is shown in Figure 7. One can observe that in the case of the first wave, there are a less number of cases which are actually positive and not reported as compared to all other waves of COVID-19.   Figure 8 shows the fitted tSIR model to the data (500 days). The blue curve represents the observed number of daily instances, whereas the gray curve represents the estimated curve using the tSIR model. Furthermore, the predicted number of cases in the middle of the first wave of COVID-19 is extremely high when compared to the observed cases. Similarly, the second-wave model gives an indication of more instances than the observed cases. However, the predicted values after 350 days seem somewhat better than the preceding waves. Table 2 contains the summary of the mean absolute percentage error (MAPEðtÞ), mean absolute error (MAEðtÞ), and ρ for training data ( ρðtÞ) by the tSIR using different distributions for the first 500 days of training data. The MAPEð f Þ and MAEðf Þ for the 10-day projected data are also shown in Table 2. As it can be seen that for the Gaussian distribution with the link "log" in both the training and forecasted (tested) data, the tSIR model performs better. Similarly, Tables 3-6 exhibit the MAPE, MAE, and ρ for four waves, respectively. Finally, we conclude that the Gaussian distribution with the link "identity" is appropriate. Table 1 lists the estimated parameters for 500 days for four waves by the tSIR model. Figure 9 presents the four-wave analysis with the SIR model using the 14-day infectious period and considering the total population of Pakistan as the susceptible population. Contrary to the tSIR model, each wave has a distinct number of starting infected people with rapid exponential growth, indicating that COVID-19 is an epidemic in Pakistan and will infect a large population. This also indicates that the traditional SIR model should not be used for the data. Figure 10 presents the fitted SIR model for four waves with decreased susceptible population violating the assumption of an infectious period for 14 days, as stated by WHO. The fitted curves are produced using the optimal values of parameters β and γ, which result in the smallest error sum of squares (ESS). If we assume a 14-day infectious period, the daily number of cases increases dramatically and does not decrease as shown in Figure 9. Furthermore, the contact rate is too low in both    situations (with optimal settings and 14-day infectious duration). Thus, we may infer that though disease exists in Pakistan, it is not a pandemic like it is in other nations. Figure 11 shows the fitted SIR model to wave-wise data with errors. Figure 12 depicts the wave-wise optimum values of the contact rate ðβÞ with the minimum residual sum of squares (RSS). It can be seen that the first wave has a higher rate of infectious pathogen interaction than the others. However, this is due to the fact that the government did not take any strict steps against COVID-19 in the beginning. Due to smart and complete lockdowns, pathogen contact rates are minimal in other wave cases. Similarly, Figure 13 depicts    Computational and Mathematical Methods in Medicine the optimum values of the recovery rate (γ) for four waves, and it is noticed that the recovery rate ðγÞ is greater in the first wave and steadily decreases in subsequent waves. This happened because individuals used to adhere to the SOPs and scientists also advised anti-bodies to combat the virus. It is also noticed from the figures reported in the supplementary text that the intensity of illness rises as the pathogen contact rate increased. Also, the ESS increased due to the rise of β and γ.     Table 7 lists the parameters estimated using the SIR model, and one can observe that R 0 in all waves of COVID-19 in Pakistan is about 1.05, which is a strong indication that COVID-19 is not endemic in Pakistan. Figure 14(a) depicts the estimated number of susceptible, infected, and recovered people by using the pSIR model for the first-wave data assuming uniform distribution as the prior distribution ofR 0 and Poisson distribution for the log-likelihood function. It can be seen that this model behaves very similarly to the traditional SIR model. Here, we used the optimal parameter (estimated from the SIR model) values with the lowest RSS. Furthermore, the overall number of susceptible people is gradually reducing, near 2,600,000 after 188 days. Similarly, the recovery rate is increased with time, approaching 400,000 after 188 days. Furthermore, a large number of everyday instances follow a basic SIR model. Figure 14( Figure 15 depicts the wave-wise anticipated number of infected people, and by comparing this figure with Figure 10, one can see that there is no discernible difference between the SIR and pSIR models, except that SIR is a simple mathematical model, whereas the pSIR is a Bayesian SIR model.

12
Computational and Mathematical Methods in Medicine Poisson distribution for the log-likelihood, the RSS is the least for all waves as compared to other combinations of distributions.

Conclusion and Recommendations
In the past, many pandemics have occurred throughout the world, and epidemiological modeling is always an attractive research field in predicting disease dynamics and making optimal decisions. Currently, the deadly COVID-19 pandemic has affected our lives. In this study, we focused on the COVID-19 data from Pakistan and modeled it using epidemiological models. To this end, we used the tSIR model for disease prediction using people with respiratory issues. We observed that the forward simulation of the model for the data was very close to the observed data. It is also discovered that COVID-19-positive cases are underreported using the tSIR model because everyone is frightened of this deadly virus and does not report the COVID-19 test result, and even test reports are biased/forged. As a result, the first estimated number of people who are vulnerable is just 1% of the total population. Next, we used the two SIR models; one is the traditional model, and the other one is the pSIR model (using uniform distribution as prior distribution of R 0 ). Taken together, the COVID-19 pandemic characteristics are inconsistent with the SIR modeling paradigm, and the dynamics of this pandemic are influenced by a number of variables. The main reason for this outcome is the lack of reliable data. If we follow the assumptions of the model, the SIR models generate relatively coarse fitted curves, as we saw in the analysis. If we utilize the optimal parameter values, R 0 approaches 1, which means that if we use the fixed infectious period (as determined by WHO), R 0 will be small. This implies that COVID-19 is not a pandemic in Pakistan. One outcome which is often understated is that the simple SIR model, with some minor modifications, does a remarkably good job at predicting the size, extent, and shape of a single wave. That is why the many more sophisticated extensions of SIR all seem to largely agree, which has been a great advantage for the health response teams. By the middle of the first wave, our hospitals were overrun with infections, and thus, without a question mark, the Pakistani government has responded quickly to this deadly virus by implementing clever and full lockdowns, public awareness campaigns, quarantines, and screening centers. To combat future pandemics, we recommend that the government establish additional hospitals at each district level. In addition, there should be a secure and proper data reporting mechanism.
There are several problems in this study that have gone unnoticed. For example, we did not use vaccination data and avoided the cases that were exposed. Only the persons with respiratory issues were utilized as a variable connected to the daily number of cases in this study. Thus, the tSIR model may be studied with "daily deaths" as a primary variable. Furthermore, the extended SIR models may be used to examine wave-by-wave data.

Data Availability
The dataset used in this article is available at https://ncoc .gov.pk/.