Models to Predict the Number of Infected Cases and Deaths from COVID-19 in Chile and Its Most Affected Regions

. Tis paper designs and implements a methodology to model the evolution of the COVID-19 pandemic, produced by the SARS-CoV-2 virus, in what was called the frst wave in Chile, which lasted from March 2 to 31 October 2020. Te models are based on sigmoidal growth curves and can be used to predict the number of daily infections and deaths in future days, making them a useful tool for sanitary authorities to manage an epidemic. Te methodology is applied to the entire country and to each of its most afected regions. In addition, the dynamics of these models allow it to be nurtured with the new information that is being produced and forecast a tentative date on which there would be some control over the pandemic. Moreover, these models allow for predicting the total number of infected and deceased people at the time the pandemic is under control. However, the simplicity of these models, which consider only the accumulated data of those infected and deceased, does not contemplate an intervention analysis such as vaccinations, which, as is known, are being efective in controlling the pandemic.


Introduction
In his article, the astrophysicist Barrado [1] expresses his opinion regarding the crisis that COVID-19 is leaving and summarizes it with the signifcant title "Vivimos un punto de infexión: la generación 2020 y la nueva sociedad," which translated into English is "We live at point of infection: the 2020 generation and the new society." Te author expresses that the health crisis generated by the COVID-19 pandemic, produced by the SARS-CoV-2 virus, is not the frst, and unfortunately, it will not be the last that humanity will face. He also mentions that diseases have been powerful levers of historical change, having the ability to change a society, especially when combined with other disturbing elements.
To illustrate these historical changes, recall, for example, that (i) the plagues in Egypt caused notable changes in the way of life of the population since they afected the characteristics of social relationships [2]. (ii) Te appearance of the Black Death in 1347 gave rise to an epidemic that covered all of Europe, causing the death of nearly a third of its population. Teir socioeconomic structure completely changed [3]. (iii) Te encounter between Europeans and Native Americans caused epidemics that devastated the original society. It is said that this was one of the main causes of the destruction of the native culture [4].
Regarding the consequences of these catastrophes, it can be indicated that, for those involved, such as political structures or individuals, the change was dramatic and left innumerable victims, but it also opened up new opportunities. For example, during the birth of modern states, health statistics emerged that kept an accurate record of cases of illness and death in the population, which made it possible to study epidemic phenomena [5].
Since 2020, these studies of epidemic phenomena have become widespread. In particular, just over a year ago, González et al. [6] published an article entitled "COVID-19: Pandemia de modelos matemáticos," which translated into English is "COVID-19: pandemic of mathematical models," the authors mentioned that "the large number of mathematical models formulated to predict the evolution of the epidemic and the impact of the measures for its control are a fashionable crystal ball throughout the planet, with more or less academic and executive intention." In turn, in their article, the authors Grillo-Ardila et al. [7] stated that "there are many mathematical models that have been developed to understand the dynamics of COVID-19 infection. However, the diference in sociocultural contexts between countries makes it necessary to specifcally adjust these estimates to each scenario." To the best of our knowledge, publications that predict confrmed cases were found, and accumulated confrmed cases or deaths of COVID-19 are based on (a) SIRS models [8][9][10], (b) ARIMA model [11], (c) hybrid approaches that include both ARIMA and Wavelet models [12,13], (d) ARIMA models, cubist regression, random forest, ridge regression, support vector regression and stacking-ensemble learning [14], (e) SARIMA models [15], (f ) time series models based on growth curves [16], and (g) Gompertz curves [17].
According to the authors Harvey and Kattuman [16], epidemiological models (such as those mentioned in the previous paragraph) that seek to estimate epidemic trajectories fall into two broad classes: (1) Compartmental models consist of deterministic models that seek to be faithful to the details of the routes and processes of disease transmission. More specifcally, they project how individuals in an initially susceptible (S) population become exposed (E) to the virus, potentially infected (I), and, if infected, either recover (R) from the disease or die from it [16].
In the category of compartmental models are SIR, SIS, SEIS, SIRS, SEIR, MSIR, and MSEIR models, where M represents infants with passive immunity. Tese models are governed by diferential equations in which the compartments S, I, R, E, and M are mutually exclusive, and the sum of all is the total population. For example, in their SIR model, Kermack and McKendrick obtained the following diferential equations that describe the model: where β is the infection rate, and 1/c is the average time of infection. (2) Time series models, such as autoregressive integrated moving average (ARIMA) and seasonal ARIMA, known as SARIMA, use historical data to make predictions.
Given the time series X t where t is an integer index and X t are real numbers, an ARIMA(p, d, q) model is a stationary model given by where Δ d X t � X t − X t−d , d corresponds to the number of diferences that are necessary to make the original series stationary, ϕ 1 , . . . , ϕ p are the parameters belonging to the "autoregressive" part of the model, θ 1 , . . . , θ q are the parameters belonging to the "moving average" of the model, θ 0 � 1, ϕ 0 is a constant, and ε t is the error term (also called innovation or disturbance). Te publications presented in literal (a) correspond to class (1), and the publications identifed in literals (b) to (f ), with certain modifcations and extensions, belong to the class (2).
On the other hand, the publication registered in (f ) and the one described in literal (g) are the only ones that use the Gompertz function (sigmoid curve) as a basis to model the spread of COVID-19. In turn, and according to Harvey and Kuttuman [16], the progress of an epidemic typically starts of with the number of cases following an exponential growth path and over time, the growth rate falls, and the total number of cases approaches a fnal level: the "leveling of the curve." Since this evolutionary curve that is drawing and characterizing an epidemic such as COVID-19 is captured in all its essence by the sigmoid curves, it was the motivation that gave rise to the present study. In this research, the Gompertz function is used to model the process generated by COVID-19 and is extended to the application of diferent models of exponential growth curves (sigmoid) that have the ability to capture the evolution of the number of accumulated cases, of infected and dead by COVID-19 in Chile and in its most afected regions. Furthermore, these sigmoid curves have the power to track progress towards an upper limit or saturation level (see [16]). Ten, using optimization software, the parameters associated with each curve are estimated, and through comparison techniques, the model that best fts the collected data is achieved.

Data Collection.
Te data were obtained from the database managed by the Chilean government on its ofcial page https://www.minciencia.gob.cl/covid19. It is important to note how difcult it was to obtain this data at the beginning, and usually in a format that was not editable. In addition, some transformation always had to be done, because apparently the data could be seen, but once it was extracted, it corresponded to another value.
Te study of the number of accumulated and daily infections covered from March 2, 2020, when the frst confrmed case of COVID-19 appeared in Chile, until October 31, 2020. Figure 1 graphically represents the data used for this investigation. On the other hand, the study of the number of accumulated and daily deaths was considered from March 22, 2020 to November 30, 2020. Te gap in dates with respect to the study on those infected is due to the fact that at the beginning, there was no certainty if the people who died were due to COVID-19 or another cause. Figure 2 graphically summarizes the information used for the number of deaths.

Overview.
In the beginning, the scheme suggested by the authors Lega and Brown from the year 2016, described in Figure 3, was used. However, with the data that were collected, good fts were not obtained when using step 4 and step 5 of the methodology proposed by these researchers.
Upon further investigation, it was found that the authors Sánchez-Villegas and Daponte [17] used the frst two steps of the Lega and Brown [18] methodology; that is, (1) Fit the accumulated data to the three-parameter Gompertz curve, G(t), given by (2) Calculate the frst derivative, g(t), of the previous curve; that is, which is interpreted as the curve of daily cases. Figure 4 shows the three graphical representations of the study by Sánchez-Villegas and Daponte [17]: (a) the daily accumulated cases, given by the points in black, (b) the Gompertz curve, which predicts the accumulated information, and (c) in red, the change curve.
Te two practical interpretations that are obtained from here should be highlighted.
Te frst is that, through the curve of daily cases in red, the expected values in future days can be calculated which is the way used in this study to make forecasts.
Te second is that the coefcient "a" of the Gompertz curve corresponds to its upper asymptote, since if b < 0, then Parameter "a" could be interpreted as the "horizon" of the pandemic; that is, the number of cases expected when the pandemic is controlled.
As usual, when having new data, they are incorporated into those already available, and the model is recalculated according to the scheme described.

Proposed Methodology.
Considering the two approaches previously analyzed and according to the historical and ongoing information available for this research, the methodology that was implemented is summarized in the following points: S1: collect national data and for each region of both the infected and the deceased. S2: a growth curve from three families of curves was ftted to the daily accumulated data, which are detailed in the following subsection. To do this, the parameters were estimated using the "drc" (Dose-Response Curves) package of the R (2021) programming language. S3: obtain the daily change curve, the frst derivative of the growth curve. With this curve of change, the number of cases expected in future days was forecast. S4: calculate daily, the evolution of the horizons (asymptote of the accumulated growth curve), which was becoming increasingly precise. Tus, an estimate of the number of expected cases was obtained when controlling the pandemic. S5: as soon as the new data are published, steps S2 to S4 are repeated, each time obtaining a more robust model.
Tis scheme was used to obtain an estimate of the number of infected and another for the number of deaths at the national level and for each of the regions of Chile. Since the three-parameter Gompertz curve used by the authors Sánchez-Villegas and Daponte [17] did not ft the data in this study well, other curves were used and are discussed below.

Sigmoidal Growth
Model. Te growth curves analyzed are sigmoid curves or "S"-shaped curves, which represent a typical biological growth curve. Tey symbolize the growth of organisms in a new and favorable environment. Tese curves represent a variable that frst increases slowly, then speeds up, and fnally slows down, eventually growing very little or declining [19]. Te three stages of the sigmoid curve are called the exponential phase, the linear phase, and the senescence phase. Te sigmoid curves studied in this research are (A) Log-Logistic curve, (B) Gompertz curve, and (C) Weibull curve. Tese curves are detailed as follows:   (A) Log-logistic curve that is commonly used to model outbreaks, as they can capture the initial slow growth of the pandemic, followed by a period of rapid growth and a period of slowdown, such as the one shown with the blue line in Figure 5 and denoted by LL(x). Te curve with 4 parameters was used since it was the one that best fts the data, which is defned by the authors of [19] LL Te red line in Figure 5 is the corresponding change curve, expressed as this change curve is multiplied by an appropriate amount so that it can be displayed on the graph. Note that, for interpretation purposes, the upper asymptote of LL(x) corresponds to the parameter "c", that is, if a < 0, then (B) Another curve that was also studied was the Gompertz curve, in blue in Figure 6, denoted by G(x) and defned by the authors of [19] G It is similar to the logistic curve, with the diference that it grows faster at the beginning, which makes it more appropriate to describe biological and epidemiological growth. Furthermore, with 4 parameters, the parameter "c" is its upper asymptote, which is given by provided that a < 0. Te line in red in Figure 6 is the corresponding change curve, represented by magnifed by an appropriate amount so that it could be seen on the graph. (C) Te third curve studied was the Weibull curve, in blue in Figure 7, denoted by W(x) and defned by the authors of [19]: It is commonly used to model survival data used in biomedical applications.
Te red line in Figure 7 is the corresponding change curve, also amplifed by an appropriate amount, which is given by
Te respective asymptote is provided that a < 0. Once the methodology described above was implemented, certain results began to be obtained, which improved as more information was incorporated. Te Discrete Dynamics in Nature and Society following section presents the curves ftted to the data used and a description of the results obtained.

Results
To estimate the parameters of the three growth curves detailed in the subsection sigmoidal growth model, the "drc" package of the R language was used. Moreover, to quantify the ft error, the mean absolute percentage error (MAPE) was used, which is defned by (see [20])     Discrete Dynamics in Nature and Society where x i is the observed value, and x i is the estimated value, which is a measure of the accuracy of the prediction. Table 1 shows the values of the estimated parameters for each of the growth curves studied when these curves are adjusted to the number of people infected by COVID-19 accumulated in Chile. In addition, their respective MAPEs are presented.
With the estimated values of the parameters (given in Table 1), the graphs of the three growth curves that model the accumulated number of people infected by COVID-19 in Chile were made, and these curves are presented in Figure 8. Te points in black correspond to the number of people infected by COVID-19 accumulated in Chile, which were already presented in blue in Figure 1. Te fts obtained when using the estimated parameters are given explicitly by the following equations: where the subscript AI refers to the accumulated infected and is used to diferentiate these curves from those that describe the cases of accumulated deaths (A D) that will be seen as follows. Te great utility of these curves and the objective of this research are that with these functions, it is possible to estimate, for future days, the number of people infected accumulated by COVID-19 in Chile; that is, forecasts can be made of what could happen in the days following October 31, 2020, which was the last day contemplated in the study of the accumulated infected. Tis day corresponds to the 244th pandemic in Chile, with its beginning on March 2, 2020.
As an illustration, using the Weibull curve, W AI (x), and knowing the data until October 31, 2020, the number of accumulated infected for the following day (November 1, 2020) can be forecast, which corresponds to the day 245 of the pandemic in Chile, which gives an estimate of W AI (245) � 481, 659.1; that is, approximately 481,659 accumulated infected by COVID-19 in Chile are projected. Te real value as of November 1, 2020 was 480,085 accumulated infected, yielding a prediction error of approximately 0.3%, considered acceptable. Forecasts can also be made for the following days, but the accuracy of the estimate decreases.
From Table 1, it can be seen that the Weibull curve yields a lower MAPE; therefore, the following analysis and what is illustrated in Figures 9 and 10 are carried out considering the Weibull curve. Table 1 shows that the estimator of the parameter c (upper limit) for the Weibull curve is 1,279,980.52, which means that, with the information available at the time of completing the study of accumulated infected, they were projected around of 1,279,981 people infected by COVID-19 accumulated in Chile by having the pandemic under control. Value is far from reality, because, at the time of completing this work, the pandemic was not yet under control, and there were already more than 2,900,000 confrmed cases of COVID-19 in Chile.
On the other hand, to get an idea of the evolution of daily cases, the frst derivative of the sigmoid curves is used. Since, as already mentioned, the Weibull curve shows a lower MAPE, then the work continues with said curve, which is shown in red in Figure 9, amplifed to be able to visualize it on the graph. An expanded view of the evolution of daily cases is presented in Figure 10, where, in blue, the data on the number of daily infected people are shown, and in red, the ft using the frst derivative of the Weibull curve, W AI ′ (x). Te utilities of this curve of daily cases are as follows: (i) Te frst is that with this curve the number of new cases in future days can be forecast. (ii) Te second is that with this curve, it is possible to have a forecast of the date on which there would be some control over the pandemic which corresponds when W AI ′ (x) is less than one, that is, when the number of new cases is less than one person, which would be an indication that the pandemic would be coming to an end since there would no longer be someone who could spread.
Using the parameters of Table 1, equation (1), and the information available at the end of the study of accumulated infected, it is obtained that x e e −0.00132(ln(x)−ln(0.0001)) .
From this equation, it follows that W AI ′ (x) < 1 for x ≥ 622; that is, according to the Weibull model, it was projected that on November 13, 2021 (corresponding to day 622), the pandemic in Chile would be under control. But this was not fulflled, because at the time of completing this investigation, the cases of contagion in Chile were still continuing. Te reason why the model was not assertive was due to the fact that what was called the second wave of contagion began in Chile and the sigmoid curves only manage to capture one wave. Discrete Dynamics in Nature and Society Next, the study carried out for the cases of deceased is presented. Te methodology is the same that was applied to estimate the number of accumulated infected. First, the "drc" package of the R language was used to estimate the parameters of the three growth models already described. Table 2 contains the estimated parameters for the diferent growth curves studied when these curves are adjusted to the cumulative number of deaths from COVID-19 in Chile. Te respective MAPEs are also presented. Figure 11 presents the graphs of the respective growth curves that are obtained by using the estimated values of the parameters given in Table 2. Visually, it seems that the curves do not ft the data in black as well. Tis is apparently because the vertical scale was not altered as in the case of the accumulated infected. Consequently, it is being seen more closely. In order to have an objective measurement, Table 2 shows that the three curves efectively reduced their percentage error, and as before, the lowest MAPE was obtained by using the Weibull curve, so the study continues using this curve, whose equation is given by  On the other hand, from Table 2, it is obtained that the estimator of the parameter c (upper asymptote of the curve) for the Weibull curve is 26, 432.28, which means that, with the information available at the time of completing the study of deceased accumulated, around 26,432 people were projected to die from COVID-19 in Chile when the pandemic was under control. Value is far from reality, because, at the time of completing this work, the pandemic was not yet under control, and there were already more than 42,000 cases of COVID-19 deaths in Chile.
Analogous to the study on daily infections, to get an idea of the evolution of daily deaths, the frst derivative of the Weibull curve is used (because it has a lower MAPE), which is presented in red in Figure 12 and amplifed by an appropriate amount to display it on the graph. Another view is presented in Figure 13, where, in blue, the number of daily deaths is shown, and in red, the ft using the frst derivative of the Weibull curve, W AD ′ (x), whose applications are as follows: (i) Tat with this curve, it is possible to forecast the number of people who died on a certain day in the future (ii) Tat it is possible to have a forecast of the date on which there would be some control over the deaths from the pandemic, which would happen when W A D ′ (x) < 1, that is, when the number of deaths is less than one.
Using the parameters of Table 2, equation (1), and the information available at the end of the study of the accumulated deaths, it is obtained that x e e −0.085(ln(x)−ln(0.172)) .
From this equation, it is obtained that W AD ′ (x) < 1 for x ≥ 678; that is, according to the Weibull model, it was projected that on January 28, 2022 (corresponding to day 678), the cases of deaths in Chile would end. But this was not fulflled, because at the time of completing this investigation, deaths from COVID-19 continued to occur in Chile.
Regarding the study of the most afected regions, it can be mentioned that the situation is similar to that which occurred in the entire country. Figure 14 only presents, for some regions of Chile, the graphs of the accumulated and daily reported cases from March 2 to October 31, 2020. Terefore, the same methodology described in subsection proposed methodology was applied, obtaining results similar to those detailed in the national situation. In some regions, a 4-parameter Gompertz curve was a better ft.
As for the regional analysis of the number of accumulated and daily deaths, it is necessary to highlight that the situation is similar to the country case, but, for space reasons, the results are not shown.

Conclusions
In this research, a methodology was designed and implemented to model epidemics with sigmoidal growth curves. Tree diferent models were used and compared, instead of a single model as cited in the literature. One strength of the models used is that they only use the data on the number of infected and accumulated deaths, which are then used to predict the number of infected and deceased daily in future days, without incorporating external variables. Tis, on the one hand, could be considered a weakness by not adding more information to the model, such as periods of confnement or vaccination, for example. On the other hand, the absence of more information means that the quality of the predictions depends a lot on the quality of the data, a fundamental aspect, since, at the beginning of the pandemic, it was not certain whether the patients were due to COVID-19. Te growth models studied were adjusted to the evolution of the pandemic recorded in what was called the frst wave, which is undoubtedly a valuable aid for decisionmaking by government agencies responsible for health policies. However, these models do not contemplate an intervention analysis such as vaccinations, which, as is known, are controlling the pandemic.

Data Availability
Te data used to support the fndings of this study are available at the link https://www.minciencia.gob.cl/covid19/.

Conflicts of Interest
Te authors declare that they have no conficts of interest.