Identification of an Epidemiological Model to Simulate the COVID-19 Epidemic Using Robust Multiobjective Optimization and Stochastic Fractal Search

Traditionally, the identification of parameters in the formulation and solution of inverse problems considers that models, variables, and mathematical parameters are free of uncertainties. This aspect simplifies the estimation process, but does not consider the influence of relatively small changes in the design variables in terms of the objective function. In this work, the SIDR (Susceptible, Infected, Dead, and Recovered) model is used to simulate the dynamic behavior of the novel coronavirus disease (COVID-19), and its parameters are estimated by formulating a robust inverse problem, that is, considering the sensitivity of design variables. For this purpose, a robust multiobjective optimization problem is formulated, considering the minimization of uncertainties associated with the estimation process and the maximization of the robustness parameter. To solve this problem, the Multiobjective Stochastic Fractal Search algorithm is associated with the Effective Mean concept for the evaluation of robustness. The results obtained considering real data of the epidemic in China demonstrate that the evaluation of the sensitivity of the design variables can provide more reliable results.


Introduction
Since the first case of coronavirus disease 2019 (COVID-19), a total of 82,877 confirmed cases and 4,633 deaths have been reported in China, up to May 2, 2020. In order to study the dissemination of this disease, various mathematical models have been proposed and revisited. A large part of these models is based on compartments and relations between the different groups of individuals [5]. As pointed out by Adam [1], many of the mathematical models used to guide the policies of the countries to the pandemic situation are improvements of the simple compartmental SIR model (representing susceptible, infected, and recovered individuals, respectively). Some authors introduced new effects in this kind of model; for instance, Lin et al. [7] modified a SEIR model (here, E refers to exposed individuals) to introduce the effect of government actions and individual reactions to the disease. Recently, Ndaïrou et al. [11] presented a compartmental model with emphasis in super-spreader individuals, applied to the Wuhan case. Liu et al. [8] proposed the application of a four-stage modified SEIR model to the spread of the disease in Wuhan; some important factors were considered, as the isolation of the city and the construction of field hospitals. A possible vaccine administration strategy was detailed by Libotte et al. [6], considering the Wuhan case and using a multiobjective approach. This scenario indicates the large number of possibilities to enhance the predictions of compartmental models.
In all these models, no information about uncertainties are considered, that is, errors associated with modelling and estimation of parameters are neglected. In this case, optimal solutions may be influenced by small perturbations, as demonstrated by Deb and Gupta [3]. The objective of this work is to analyze the influence of noise during the solution of an inverse problem that aims to determine the parameters that characterize a SIDR (Susceptible, Infected, Dead, and Recovered) model. For this purpose, a robust inverse problem is formulated and solved using the Stochastic Fractal Search [12]. The problem proposed in this work considers the minimization of deviation between experimental and calculated values, together with the maximization of robustness parameter.

Background Information on Robustness Analysis
In many optimization problems, decision variables are subject to perturbation. When solving the problem, one must consider that the solution must be acceptable with respect to small changes in the values of the decision variables. Robust optimization is aimed at obtaining solutions that are least sensitive to such perturbations, that is, solutions that present the smallest possible deviation in relation to the objective value when subject to noise. The concepts of robust optimization were introduced by Tsutsui and Ghosh [16].
The objective function to optimize in searching robust solutions may be formulated as follows: where δ is the noise parameter, and pðδÞ represents the probability distribution function. Usually, this effective objective function is not available, since the probability distribution may not be known. Therefore, the calculation of the expected performance is usually not trivial in many applications. Deb and Gupta [3] proposed a methodology for obtaining robust solutions in the context of multiobjective optimization (more details on multiobjective optimization can be seen in Miettinen [10]). Essentially, it is proposed to optimize the mean effective objective values computed at a point by averaging the function values of a few samples in its vicinity, instead of optimizing the original objective functions. Thus, a solution x * is called a multiobjective robust solution of type I, if it is the global feasible Pareto optimal solution to the multiobjective minimization problem given by where f eff i ðxÞ is defined as follows: for i = 1, ⋯, m, and B δ ðxÞ is the hypervolume of the neighborhood. This approach is suitable for problems in which the result of the integral in Equation (3) can be obtained in a closed analytical form. For problems in which the search space is more complex, Equation (3) can be approximated by the mean value of the objective function, using a Monte Carlo integration given by In practice, a set of H points are randomly sampled (or respecting some structured manner, such as the Latin Hypercube method) in the range y k ∈ ½ð1 − δÞx, ð1 + δÞx, and the mean function value approximates Equation (3).

Stochastic Fractal Search
In the last decades, the development of optimization algorithms based on swarm intelligence has allowed the solution of complex problems in different areas of science and engineering. Inspired by the collective intelligent behavior of insects or animal groups in nature, such as flocks of birds, swarms of bees (bats, fireflies), colonies of ants, and swarms of fruit flies, various optimization algorithms have been proposed [15]. In this work, we use a promising method, recently proposed by Salimi [12], called Stochastic Fractal Search (SFS) algorithm. The main steps of the metaheuristic are presented below.
The SFS algorithm is a nature-inspired metaheuristic based on the natural growth phenomenon of fractals. Candidate solutions (particles) explore the search space considering a diffusion property which is regularly seen in random fractals. The SFS approach is based on random fractals grown by Diffusion Limited Aggregation concept [17]. This optimization strategy adopts a random walk in order to simulate the diffusion process, where the diffusing particle sticks to the seed particle. This process is repeated until a cluster has been created [12].
Two steps are applied to generate new candidate solutions at each iteration: diffusion and updating. In the first, each particle diffuses around its current position to ensure the exploitation property. The diffusion process avoids being trapped in local optimum and increases the chance of finding the global solution. In the second, a point in a group updates its location based on the locations of other points in the group. SFS considers a static diffusion process, that is, the best particle generated from the diffusion process is the only particle that is considered; the rest of the particles are ignored. In addition to efficient exploration of the feasible problem, SFS uses some random methods as updating processes.
The diffusion process uses Gaussian random walks to generate points around each particle until a predefined maximum diffusion number is reached. There are two types of Gaussian walks in the diffusion process: where ε and ε ′ are uniformly distributed random number in the range ð0, 1Þ. In turn, BP and P i are the position of the best 2 Computational and Mathematical Methods in Medicine point and the i-th point in the group, respectively. The first two Gaussian parameters are μ BP and σ, where μ BP is exactly equal to BP. The two latter parameters are μ P and σ, where μ P is equal to P i . The standard deviation σ is dynamically adjusted based on the number of the generation g.
The update process employs two statistical procedures to undertake the exploration in SFS. Initially, all the points are ranked based on the value of the fitness function, by calculating where rank ðP i Þ is the rank of the point P i among the other points of the group. In the first updating process, for each point P i in group, the j-th component of P i is updated according to where P r and P t are randomly selected points in the group. The point is updated if the condition Pa i < ε is satisfied, where Pa i is given by Equation (7). Otherwise, P i remains unchanged. In the second update process, P i ′ is updated if Pa i < ε holds for P i ′. Thus, the point is modified according to if ε ′ ≤ 0:5, where ε ′ is a random number generated by the Gaussian distribution. Otherwise, P i ′ is updated by In these cases, P t ′ and P r ′ are randomly selected points obtained from the first procedure. Details on the implementation are presented by Salimi [12].

Multiobjective Stochastic Fractal Search
In this work, the SFS strategy is extended for multiobjective optimization context. This new approach, called the Multiobjective Optimization Stochastic Fractal Search (MOSFS) algorithm, incorporates two operators to the original SFS algorithm: Fast Nondominated Sorting and Crowding Distance [2]. Briefly, MOSFS presents the following structure. An initial population of size NP is randomly generated. Then, a new population is generated from the current population, using the operators proposed in SFS. Each candidate of the new population is evaluated considering the vector of objectives. All dominated candidate solutions are removed from the population through the Fast Nondominated Sorting operator. The population is sorted into nondominated fronts (sets of vectors that are nondominated with respect to each other). This procedure is repeated until all vectors are assigned to a front. During the evolutionary process, if the number of individuals in the current population is larger than a predefined number, it is truncated according to the Crowding Distance.

Methodology
Traditionally, models based on compartments have been used to represent dynamic behavior of disease. In the literature, various applications involving this type of model can be found. The objective of this work is to determine the parameters of an epidemiological model to predict the evolution of COVID-19 epidemic considering experimental data from China, considering uncertainties. For this purpose, the SIDR (Susceptible, Infectious, Dead, and Recovered) model is adopted [5].
In this model, it is assumed that, for each infected individual/host, the disease can be transmitted to a susceptible individual. The number of susceptible individuals (S) varies with time according to where t is the time and β is the rate of disease transmission The dynamics of the number of infected individuals is calculated by where γ is the per-capita recovery rate, and ρ is the death probability. In turn, the number of dead individuals is calculated by with m = ðρ/1 − ρÞγ representing a per-capita mortality rate [5]. Finally, the number of recovered individuals is obtained from The initial conditions of the system are given by ðSð0Þ, Ið0Þ, Dð0Þ, Rð0ÞÞ = ðS 0 , I 0 , D 0 , R 0 Þ.
In this work, we used a normalized version of the SIDR model, with scaled variables defined as S n = S/N, I n = I/N, D n = D/N, and R n = R/N, where N is the population size. In this case, the following constraint must be obeyed: S n ðtÞ + I n ðtÞ + D n ðtÞ + R n ðtÞ = 1. Thus, the system represented by Equations (11)-(14) is converted into the normalized model represented by 3 Computational and Mathematical Methods in Medicine The new initial condition for the susceptible, infected, dead, and recovered populations is represented by S n0 , I n0 , D n0 , and R n0 , respectively.
In order to determine the SIDR parameters, it is required to formulate and solve an inverse problem. In general, the identification procedure consists in obtaining the model parameters through the minimization of the difference between calculated and real values. For the real data of the epidemic in China, both infected ðI e Þ and dead ðD e Þ time series are known. Thus, the merit function to be minimized is defined as follows: where I c and D c are the calculated values for infected and dead individuals, respectively. M and N are the number of data for I c and D c , respectively. In order to evaluate the influence of perturbations in this (nominal) optimization problem, the robustness of the variables I c and D c is analyzed. In this case, for each candidate solution generated by using the MOSFS algorithm, H points are sampled using the Latin Hypercube method. Each point is evaluated considering the system of ordinary differential equations given by Equations (15)- (18). For this purpose, the Fourth-Order Runge-Kutta method is used. After such simulations, the Mean Effective approach is employed. Then, the mean effective objective is associated with the candidate generated by MOSFS. This procedure is performed until the maximum number of generations is reached.

Results and Discussion
Considering the methodology presented, two cases are analyzed: in the first, the parameters of the SIDR model are estimated through the formulation and solution of an inverse problem without considering possible uncertainties. In the second one, the influence of possible perturbations on the decision variables is considered. For this purpose, the criteria below are defined.
In both problems analyzed, the parameters of the SIDR model are defined in the following intervals (obtained after preliminary runs): 0:1 ≤ β ≤ 0:6, 0:04 ≤ γ ≤ 0:6, 0 ≤ I 0 ≤ 1, and 0 ≤ ρ ≤ 1. The results correspond to 20 runs of the metaheuristics (using a different seed for each run) for a maximum of 250 generations in each run. The initial conditions of the compartmental model are given by ðSð0Þ, Ið0Þ, Dð0Þ, Rð0ÞÞ = ð1 − I 0 , I 0 , 0, 0Þ, and the COVID-19 data are retrieved from Worldometer [18]. In order to evaluate the performance of SFS, three other evolutionary algorithms are considered: Genetic Algorithm (GA) [4], Differential Evolution (DE) [14], and Firefly Algorithm (FA) [19]. For GA, we adopted a population size equals to 25, maximum number of generations equals to 250, crossover probability equals to 0.8, and mutation probability equals to 0.01. In the case of DE, the population size is equal to 25, maximum number of generations equals to 250, and probability crossover and perturbation rate both equal to 0.9. In FA, we set the population size equals to 25, maximum number of generations equals to 250, and maximum attractiveness value and absorption coefficient both equal to 0.9. In both algorithms, the stopping criterion used is the maximum number of generations.
In the deterministic inverse problem, Equation (19) must be minimized. For this purpose, SFS is employed with 25 individuals in the population. Table 1 presents the results for the best result and the standard deviation considering SFS, GA, DE, and FA algorithms in 20 runs (using a different seed for each run). In this table, we can observe that all algorithms converged to the same result. Table 1 presents the results for the best result and the standard deviation. It can be noted that the optimal value of F is relatively high. However, the corresponding standard deviation value indicates that the metaheuristic obtained very close optimizers in all runs. This is due to the fact that the population dynamics undergoes many variations in the course of the epidemic. Such variations are due to government actions, disease mitigation strategy, and capacity of the health network. All of these aspects have an impact on the time series of the number of infected and dead individuals. Therefore, the best curve fitting is not necessarily able to accurately describe the behavior of the epidemic at all times. Figure 1 presents the nonnormalized profiles considering the estimated parameters by solving the inverse problem defined by the minimization of Equation (19)-these results are weighted in relation to number of infected individuals, that is, the population size is a portion of the population that has been effectively tested. In Figures 1(b) and 1(c), we can observe the accuracy of the curve fitting. Note that the time series present sudden changes between two consecutive points that make the compartmental model not be able to describe its behavior more accurately. This is clear, mainly, from day 85 in the data referring to the number of dead individuals. Figure 1(a) presents the evolution of susceptible population during the epidemic. As expected, after the maximum value observed for the number of infected individuals (approximately between days 20 and 30), the number of susceptible individuals decreases.
The influence of each parameter of the model on the objective function is assessed by perturbing the best solutions presented in Table 1, aiming to show the variation of F in the vicinity of the optimal value of each parameter. The vector of optimal parameters is denoted by θ * = ðβ * , γ * , I * 0 , ρ * Þ. For each parameter, 100 equally spaced points are evaluated, in the range ½ð1 − τÞθ * k , ð1 + τÞθ * k , for k = 1, ⋯, 4, where τ = 0:25. Figure 2 shows the influence of each parameter in relation to the objective function defined by Equation (19). Note that, in the analyzed range, in fact, the optimal values of the corresponding parameters represent the minimum 4 Computational and Mathematical Methods in Medicine value of F. This indicates that, at least locally, the optimal parameters obtained represent the best fit in relation to the analyzed data. In addition, it is possible to notice that β is the parameter that presents greater sensitivity in relation to F, in the proposed analysis. In order to analyze the sensitivity of the design variables, a multiobjective optimization problem is formulated. In the robust inverse problem, Equation (19) must be minimized and the noise parameter ðδÞ is maximized. By solving this new problem, it is possible to assess the parameters of the compartmental model in the presence of uncertainty. Thus, the adjustment of the time series of data considers possible variations caused by external factors, such as underreporting of infected and dead individuals. Figure 3 shows the Pareto curve obtained for the robust inverse problem, calculated by the MOSFS method, using the Effective Mean approach Table 1: Results of the deterministic inverse problem obtained in 20 executions of the SFS method, with t f = 95 days.

Computational and Mathematical Methods in Medicine
with H = 50 random samples, noise parameter varying in the range 0 ≤ δ ≤ 0:1, and the same parameters adopted in the previous problem.
In order to compare the obtained results by using the MOSFS (population size equals to 25 and maximum number of iterations equals to 250), three evolutionary multiobjective optimization algorithms are considered: Nondominated Sorting Genetic Algorithm (NSGA-II) [2], Multiobjective Optimization Differential Evolution algorithm (MODE) [13], and Multiobjective Optimization Firefly Algorithm (MOFA) [9]. For NSGA-II, we set the population size equals to 25, maximum number of iterations equals to 250, crossover probability equals to 0.8, and mutation probability equals to 0.01. For MODE, we adopt population size equals to 25, maximum number of iterations equals to 250, crossover probability equals to 0.9, and perturbation rate equals to 0.9. For MOFA, the population size is equal to 25, maximum number of generations equals to 250, and maximum attractiveness value and absorption coefficient both equal to 0.9. The stopping criterion used is the maximum number of iterations. The Effective Mean approach runs with H = 50 random samples and noise parameter varying in the range 0 ≤ δ ≤ 0:1. Figure 3 shows the Pareto curve obtained for the robust inverse problem considering different multiobjective optimization strategies.
The analysis of Figure 3 shows that, as the noise parameter δ increases, the value of the objective F also increases. Initially, note that the nominal solution is equivalent to the point on the Pareto curve that δ = 0 (see Table 1). Increasing F means that the associated fitted curve departs from the actual data as δ increases. Such displacement is caused by noise in the design variables, which can be understood as the existing uncertainties in the real data. In addition, in this figure, we can observe a similar behavior for all algorithms. Thus, the proposed multiobjective strategy was able to obtain the Pareto curve when compared with other algorithms. Table 2 presents some highlighted points of the Pareto curve (points A, B, and C), as shown in Figure 3.     Figure 3: Pareto curve of the robust inverse problem. 6 Computational and Mathematical Methods in Medicine Figure 4 shows the nonnormalized profiles corresponding to the parameters of points A, B, and C, as shown in Table 2. Each of these three points presents a different compromise in relation to the objectives of the robust inverse problem. Point A, represented by a circle, presents an extreme compromise in relation to F, that is, it does not prioritize robustness. On the other hand, point C, represented by a diamond, has an extreme compromise to robustness. In turn, point B, represented by a triangle, presents an intermediate compromise between both objectives. Figure 4 also shows the profile which is equivalent to the nominal inverse problem solution ðδ = 0Þ, represented by the solid line.
Especially in relation to the curve of infected people as a function of time (Figure 4(b)), note that the solution of the robust inverse problem provides profiles which are shifted in relation to the result corresponding to the nominal case. Such displacements tend to become more pronounced when the required level of robustness increases. Thus, the profiles of the susceptible, dead, and recovered compartments are also shifted, in order to maintain NðtÞ = SðtÞ + IðtÞ + DðtÞ + RðtÞ constant.

Conclusions
In this work, we proposed and solved two inverse problems (nominal and robust) to simulate the dynamic behavior of COVID-19 considering real data from China. Stochastic Fractal Search algorithm was employed to solve the nominal Table 2: Some points of the robust inverse problem obtained with the MOSFS method. Points A, B, and C are highlighted in Figure 3.    Table 2. The profile represented by the solid line corresponds to the nominal case, whose parameters are shown in Table 1.   7 Computational and Mathematical Methods in Medicine inverse problem. We also proposed an extension of SFS in the multiobjective context to solve the robust inverse problem. In order to assess uncertainties, the mean effective concept was considered. The parameters of the compartmental SIDR model were determined and analyzed considering both nominal and robust context. In order to analyze the influence of uncertainties, a multiobjective optimization problem was formulated and solved. This problem considers the minimization of deviations associated with the experimental data and calculated values considering the proposed model and the maximization of the robustness parameter. In general, the solution of the proposed multiobjective problem demonstrates that the increase in the noise parameter implies an increase in the value of the objective function. The use of the proposed robust approach to estimate the compartmental model parameters demonstrates the importance of incorporating a methodology to assess the robustness during the solution of the proposed inverse problem. Finally, the use of a mathematical model associated with optimization tools may contribute in the future to the study and development of strategies to understand the dynamic behavior of COVID-19.

Data Availability
The data used to support the findings of this study are available from the Worldometer's COVID-19 Data (https://www .worldometers.info/coronavirus/country/china/).