A Method for Parameters Estimation in a Dynamical Model of Ebola Virus Transmission in Sierra Leone

Ebola is an infectious virus that causes Ebola hemorrhagic fever in primates and humans, which was first found in 1976.-e Ebola virus outbreak in West Africa in 2014 was the largest ever. A lot of researchers use mathematical models to analyze the characteristics of infectious diseases. However, many parameters in the model cannot be estimated completely. To ease the difficulty, we proposed an approach to estimate the parameter based on genetic algorithm (GA). GA uses the natural selection method of the fittest to find the optimal solution of the model.-e least residual squares sum is used as fitness function to measure the performance of GA in parameter estimation. Moreover, we used a dynamical model and the real data of Ebola in Sierra Leone to verify the validity of GA. -e experimental results indicate that the GA has strong competitiveness compared with the classical method, and it is a feasible method for estimating the parameters of infectious disease models.


Introduction
Ebola virus belongs to the family Filoviridae and is considered a prototype pathogen of viral hemorrhagic fever [1]. e virus was first detected in the Ebola river basin in southern Sudan and Congo in 1976 [2][3][4][5][6][7][8][9]. Since the discovery of Ebola virus, only four species of this virus cause human disease, namely, Zaire ebolavirus, Tai Forest ebolavirus, Sudan ebolavirus, and Bundibugyo ebolavirus [10]. e Reston virus causes only animal disease, not human disease. erefore, the source of the Ebola virus is unknown. Researchers found evidence of asymptomatic infection of Ebola virus in three species of the fruit bats, which suggested that the bats are most likely to be the source of the deadly virus [11]. e bats could carry Ebola virus to other animals and even humans [12][13][14].
Ebola virus is transmitted through the saliva, the urine, and other body fluids [15,16]. People can cause infection by direct contact with body fluids which carry the virus, with the virus entering the body through the nose, the mouth, the eyes, and the damaged skin [17]. Humans become infected after contact with the blood, the body fluids, and the infected fruit bats, as well as through the sexual contact [18].
Since there were no good treatments and approved vaccines at the time, the management of Ebola virus was limited to the use of obstacles and palliative care to suppress transmission [19]. A large-scale Ebola outbreak occurred in West Africa in 2014, mainly in Guinea, Liberia, and Sierra Leone. e number of confirmed cases is far greater than that in the past [10]. e lack of effective preventive measures at the time resulted in more people being infected with the Ebola virus [20]. In [21], the authors investigated the effectiveness of small interfering RNAs treatments for Ebolainfected patients. RNA interference can suppress the expression of viral genes; thus it is effective in suppressing Ebola virus replication, and the authors developed monoclonal antibodies against Ebola glycoprotein for the treatment of Ebola-virus-infected people [22]. In addition, some researchers used Sierra Leone's disease data to study mathematical models of Ebola virus, predict the progress of the epidemic, and propose preventive control measures and recommendations [5,[23][24][25].
Research on infectious diseases using dynamic models has become one of the important methods [26][27][28][29][30][31][32]. e propagation coefficient of the disease in the model affects the prediction results directly, and, consequently, it is important to estimate the propagation coefficient correctly. Classical parameter estimation methods are the Markov Chain Monte Carlo (MCMC) method, the least-squares method, and so on. e basic principle of the MCMC method is to construct a Markov chain by using the joint posterior probability distribution of the model propagation coefficients and assign any initial value to the simulation until it converges to a stable distribution. is determines the propagation coefficient [33][34][35]. ere are many improved MCMC methods, such as using sequential Monte Carlo (SMC) filter techniques to estimate the propagation coefficients in the model [36]. However, firstly, the traditional method is limited by the calculation cost of the high-dimensional nonlinear model, which may take a lot of calculation time; it is usually not easy to obtain high-precision results, and it is not possible to get all the propagation coefficients at once [37,38]. Secondly, the numerical estimation of the marginal probability distribution is difficult to achieve in the highdimensional inversion model [39]. e least-squares method is performed by convolving the simulated data with the real data [40]. Although it has low calculation cost and generality, it does not consider the uncertainty of the inverse problem solution, and the initial value of the propagation coefficient will affect the efficiency of the algorithm. e least-squares method has certain flaws in determining the initial value. If it is set close to the optimal propagation coefficient, the result will be obtained quickly. If it is set far from the optimal propagation coefficient, it will increase the time of the algorithm [41].
In this paper, we present a method to solve inverse problems of differential equations based on GA. e GA is a method that is widely used in parameter estimation and other fields, and it has been proven to be a reliable method for estimating parameters based on nonlinear functions [42,43]. It has a powerful adaptive search technology and uses the natural selection method of the fittest to simulate the evolution process, and thereby it can effectively solve the optimization problem [44]. When searching in high-dimensional models, GAs are superior to the other traditional search techniques due to their simplicity, effectiveness, versatility, and robustness [45,46]. We have used the GA of adaptive mutation operator to estimate the parameters of differential equations. e advantage of this method is that the parameters in the high-dimensional model can be completely estimated by a small amount of data and all parameters combinations can be quickly obtained in a limited evolution process. In addition, an effective combination of multiple propagation coefficients can be obtained by GA for reference in studying the propagation dynamics. e remainder of this paper is organized as follows. In Section 2, we introduced the transmission dynamical model of Ebola in Sierra Leone and the theories and processes of GA. In Section 3, we estimated the values of the parameters in a dynamical model based on GA. What is more, we validate the accuracy of the experiment results. Finally, we give the discussion and conclusions in Section 4.

A Dynamical Model of Ebola Virus Transmission in Sierra
Leone. e time series of Ebola-confirmed case reports were collected from the World Health Organization (WHO) and the Ministry of Health of Sierra Leone. e data include the Ebola outbreak in 14 regions of Sierra Leone, including the suspected cases (I S ), the probable cases (I P ), and the hospital-confirmed cases (H), which are thought to represent the best available data of the Ebola epidemic. Due to the fact that hospitalization is a result of real infections, while suspected and probable cases may not be completely converted into hospitalized cases, it is more accurate to use hospitalization cases to indicate the actual number of the Ebola infections. We collected the newly infected cases for 34 weeks from May 19th, 2014, to January 11th, 2015. More detailed data can be found in [47].
We used a GA based on adaptive mutation operator. ink of the propagation coefficient of the Ebola virus model as a genetic target. It is binary-encoded, and then the genetic operators of random selection with elitism, multipoint crossover, and gene site mutation are used to simulate evolution to find the optimal solution. Set evaluation index for parameter genetic process as fitness function which is a sum of variances of fitted data and real data. We estimated the parameters based on the dynamical model established in [6]. Record the optimal parameter set of each generation and perform comparison with the optimal parameter set of the next generation, always save the optimal set, and wait until the evolution has completed obtaining all the parameters in the model.
Based on other literature analyses, this article divides Ebola virus transmission into seven categories, namely, susceptible (S), exposed (E), suspected individuals may be misdiagnosed (I S ), probable individuals (I P ), hospital-confirmed cases (H), the individuals who may infect others at a funeral (F), and removed (R) [6]. Figure 1 depicted the transmission mechanism of the Ebola virus.
Consequently, we have used the following system of equation (1) to simulate the transmission dynamics of the Ebola virus in Sierra Leone and the biological meanings of parameters can be obtained in Table 1. We quantified the uncertainty of parameter estimates, and we give the 95% confidence intervals in Table 1.

Genetic
Algorithm. GA is a method for finding solutions based on biological evolution process [42]. e process includes random selection, crossover, and mutation for an individual with the best combination of genes. GA begins with initializing the propagation coefficient in model (1) for binary encoding. e encoding length is determined by parameter range and accuracy. We used par to represent the parameters set: Assuming that all propagation coefficients are within [0, 1], the accuracy is 4 digits after decimal point, and thus the coding length can be determined by the following formula: erefore, a parameter can be represented by 14 bits of binary, and par consisting of all parameters needs to be encoded with 14 × 21 � 294 bits of binary. We collected the disease status of Ebola for 34 weeks, and the detailed data are shown above, so we set the initial population of parameters to 34 parameters set. Afterwards, we need to determine the fitness function; we take the minimum residual sum of squares between the solution of the infected case in the model and the actual infected case: where I(t) represents the infected cases in the model at time t and I(t) represents the actual infected cases at time t. We use fitness function to evaluate the initial population and give the initial fitness value of parameters. GA mainly includes three genetic operators: selection, crossover, and mutation. Selection is to apply the selection operator to the group. e purpose of selection is to inherit the optimized parameters directly to the next generation or to generate a new par to the next generation through pairing and crossover. Selection operations are based on the fitness evaluation of parameters in the population. Here we adopt a random selection combined with elitism, which means that we will copy the parameters with higher fitness once, replace the ones with the least fitness, and retain the best parameters of each generation. is is the elite strategy; then we randomly select parameters for crossover and mutation.
Crossover operator plays a key role in GA. It is mainly divided into single-point crossover, two-point crossover, and multiple-point crossover.
e commonly used is the single-point crossover; that is, a cross point is randomly set in the parameter string, when the intersection is performed, in front of this point or partial structures of two parameters sets exchange and thus two new parameters sets are generated. An example of a single-point crossover is shown in Figure 2. Because our parameter binary is too long, singlepoint crossover cannot meet our needs, so we chose multiple-point crossover to increase the diversity of parameters. e mutation operator is to change certain gene positions of parameter strings in the population; for example, 0 becomes 1, and 1 becomes 0. An example of mutation is shown in Figure 3. We adopted the gene locus mutation with adaptive mutation operator in GA; that is, each gene position was mutated with a certain mutation probability. Moreover, the mutation probability can be adaptively adjusted by the parameters set fitness. When the difference between parameters set fitness and the average fitness of the population is small, it means that parameters are close to each other, which is not conducive to the next crossover. erefore, it is necessary to increase the mutation probability and reduce the mutation probability when the difference is large, because the mutation probability is usually between 0.001 and 0.1, and the probability of mutation is small, so that it is not easy to destroy the genes of the dominant parameters, and it can jump out when the algorithm falls into the local optimal    Complexity solution. e mutation characteristic of GA can make the solution process randomly search the entire space where the solution may exist and ensure the diversity of the population, so the global optimal solution can be obtained to a certain extent.
Next, we solve model (1), decode the parameters into decimalism, and substitute them into the model solution to get the estimated value of confirmed-infection cases, that is, hospitalization cases. Fitness function (3) is used to fit the real infected cases. en we find the parameters with the highest fitness in this generation, that is, the parameters with the least error, and keep them in the nextgeneration genetic process. Afterwards, we cyclically execute selection, crossover, mutation, and evaluation of new parameters set until the maximum number of iterations is reached. e general steps of using GA to estimate parameters of Ebola model are explained as follows: Step 1: consider the parameter to be estimated as a gene chromosome, define the parameter using binary coding, and then initialize the population.
Step 2: assign a fitness value to each parameter using equation (3). Starting from the second generation of parameters, parameters are ranked from small to large according to fitness values, and the first two parameters sets with the greatest fitness are duplicated once to replace the two parameters sets with the smallest fitness.
Step 3 (selection process): add randomly initialized parameters to the population to increase population diversity, and then two parameters sets are randomly selected as paternal parameters sets.
Step 4 (crossover): two new offspring parameters are generated by crossing the two parents at multiple points.
Step 5 (mutation): use the gene locus mutation described above in combination with adaptive mutation operator.
Step 7: convert the types of parameters sets from binary to decimal, substitute it into the solution of equation (1), obtain the predicted value of the diseased cases, and use equation (3) to evaluate the fitness value of the new parameters set to obtain an optimal parameters set.
Step 8: when the fitness of the new offspring produced by genetic manipulation is higher than that of the parent, the new parameters sets replace the parents and are inserted into the parent population for the next genetic manipulation. If the optimal individual remains unchanged for 30 consecutive times, multiple-point crossover and mutation are carried out.
Step 9: save the best parameters set of this generation.
Step 10: if the number of iterations is not reached, proceed with step 2. e above steps are executed iteratively until the termination condition is reached. Parameter estimation is finished whenever the genetic operation completed.
e parameter values to be used in the model are the optimal solution of the last generation of parameters, and the specific values are shown in Table 1. We can study the propagation dynamics and preventive measures of Ebola.

Main Results
ere are many methods for parameter estimation, including Markov Chain Monte Carlo (MCMC) method and least-squares method. It is not easy to analyze and estimate the parameters in the infectious disease model because there are many parameters in the model which cannot be estimated fully. Consequently, we propose to estimate the parameters in model (1) using GA, which are described in Section 2. Algorithm 1 shows the scheme of GA used on parameter estimation of the Ebola model. In this algorithm, par denotes a set of parameters, constants a and b represent two parameters sets randomly selected from par set and crossed according to the crossover probability p x to get a ′ and b ′ . In mutation, a ′ and b ′ are changed to a and b by gene locus mutations, according to mutation probability p m . IC denotes the initial value of each variable in equation (1) and t indicates the time of virus transmission. "count" indicates that successive generations of optimal values have not changed.
In this study, we used different genetic operators and fitness functions (3) to conduct data fitting for the real hospital-diagnosed cases. We conducted 70 experiments and selected a set of parameters which performed well. e results are listed in Table 1. After experimental verification, we chose the crossover probability p x to be 0.8 and the initial mutation probability p m to be 0.01. e genetic algebra Constant is 3000 times. e fitting result for the cumulative number of cases is shown in Figure 4. Figure 5 represents the evolutionary process of the optimal value of each generation in GA. With the increase of genetic algebra, the error between the model solution and the real data is gradually decreasing, which means that the fitness of the model is increasing. Until the maximum genetic algebra is reached, a set of near-optimal parameters are obtained. Since the error persists, we can regard the suboptimal solution as the optimal solution. e subgraph in Figure 5 is an enlarged view of early inheritance. Although this set of parameters performs well and converges quickly, we can see from the figure that it converges around 200 generations, but, because of the instability of GA, sometimes it takes a long time to converge to the optimal value. erefore, we unified the genetic algebra to 3000 generations during the experiment. It can be seen from the figure that the GA can quickly reduce the error in the early stage, and the convergence speed is very fast, which shows the effectiveness of our algorithm. In [5], the parameters β I � 0.0498, β H � 0.0225, and β F � 0.0013 are given, and, in [23], the parameters β I � 0.128, β H � 0.08, β F � 0.111, and δ 1 � δ 2 � 0.75 are given.
Meanwhile, the parameters β F � 0.489, δ 1 � 0.8, δ 2 � 0.4 are given in [6]. Although some of the propagation coefficients are very different from those in other papers, because the deterministic models are different, we have that cabins are (1) Parameter length C L is determined according to parameter range.
(2) Each parameter is represented by a chromosome of C L length to get a complete parameters set par � [β 1 , β 2 , . . . , r f ].
Sort par allocation fitness in an ascending order.  (21) par(a) � par(a). (22) end if (23) if count > 30 do (24) Multiple crossover and variation. (25) end if (26) Find the best parameters set in this generation best p (i) that satisfies best p (i) � minf(par). (27) end for (28) Find the best parameters set par best that satisfies par best � best p (Constant).  more, and we cannot directly compare them with those in other papers. We are uncertain about patients in the exposed period, so the propagation coefficient associated with it is uncertain. We can only use GA to solve each parameter value. us, we only perform comparison with some important parameters. In this paper, δ 1 � 0.5141, δ 2 � 0.6214 are close to the above literature to some extent. We obtain the basic reproduction number of Ebola virus in Sierra Leone R 0 ≈ 1.895 which is calculated in the same way as in Xia et al.'s work [6]; it is basically consistent with R 0 ≈ 1.7 given in [5] and R 0 ≈ 1.78 given in [23]. e results show that the GA can accurately estimate all the parameters in the model, and the data are fitted well. Another advantage of the GA is that you are free to set the parameters precision, but you need to consider the length of the parameters set, which is very useful for getting high-precision parameters. It can be seen that GA can be used as a feasible method for parameter estimation.
Due to the fact that we are using a certain mathematical model, we only need to use the actual diseased data as test data and apply them in model (1) to get a set of near-optimal parameters. We set each of the parameter ranges to [0, 1], randomly generate the initial parameters as the input of the model, and use the GA to obtain a set of parameters that fit the model better. We can use GA to give a variety of parameter combinations for researchers' reference. However, some parameters may have overfitting problems, so we conducted 70 experiments to select a set of parameters that are more realistic. We calculated the 95% confidence intervals for the best set of parameters in the 70 experiments, and almost all parameters are within the confidence intervals, which also shows the validity of the GA for parameter estimation. e confidence intervals for all the parameters are shown in Table 1.

Discussion and Conclusion
is work proposed a GA parameter estimation method based on adaptive mutation operator, which could be applied to biomathematical models and differential equations in other fields. rough GA's adaptive search parameters, the parameters in various models can be effectively found, and multiple parameter combination schemes can be given, which reduces the process of manual adjustment of parameters by researchers and provides an effective reference for scientific research. In addition, in the study of infectious diseases, due to the complexity of the model, all of the parameters are often difficult to obtain, and the basic reproduction number R 0 for evaluating whether there is an outbreak of an infectious disease needs to be obtained by calculating parameters; and the parameters can indicate the transmission dynamics of infectious diseases and the scale of transmission visually. us, parameter estimation is the most important, and the GA can be used to effectively find all the parameters in the model, which make up for the shortcomings of the traditional method, such as long calculation time and slow convergence speed. e method we have proposed has been evaluated in the experimental process, where the performance has reached the desired level. Finally, the GA can be applied not only to infectious disease models but also to other mathematical and physical models, and it proposes a new idea for parameter estimation.
Since the initial population of GA needs to assign values randomly, it will lead to output instability; we cannot quantify the uncertainty due to the discrete distribution of the output parameters, which is a common problem of GA and needs to be improved. On the other hand, fitness function is an important factor to determine the pros and cons of genetic evolution, so it is very important to select a suitable fitness function. Finally, a very important step in GA is to find the solution of the model, so it may not be suitable for the unsolvable equation, but it is applicable for most models. ere are many improved genetic algorithms, and we can improve the existing algorithms and expand their scope of application. e GA can solve specific problems with only a small amount of data, and the corresponding fitness function can be used for searching. erefore, it is a generalpurpose algorithm used in many fields. In function optimization, GA can estimate not only the parameters of biomathematical model but also the kinetic parameters of microorganisms. It can also solve the performance parameters of nonlinear physical problems. In addition, GA performs well in path planning, cloud computing task scheduling, communication network design, image feature extraction, and other fields. Furthermore, there are some studies that combine GA with machine learning methods such as neural network.
In this paper, we introduce the basic process of adaptive mutation genetic algorithm, introduce how to use GA to estimate the parameters of Ebola virus model in Sierra Leone, and give the data curve and genetic iterative process for fitting actual infections. In addition, we offer a new idea for parameter estimation in other research fields, such as dynamical model of disease transmission [48][49][50] or predator-prey interactions [51][52][53] with spatial effects [32,[54][55][56] in the form of reaction-diffusion equations.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.