The aim of the present study is to apply simple ODE models in the area of modeling the spread of emerging infectious diseases and show the importance of model selection in estimating parameters, the basic reproduction number, turning point, and final size. To quantify the plausibility of each model, given the data and the set of four models including Logistic, Gompertz, Rosenzweg, and Richards models, the Bayes factors are calculated and the precise estimates of the best fitted model parameters and key epidemic characteristics have been obtained. In particular, for Ebola the basic reproduction numbers are 1.3522 (95% CI (1.3506, 1.3537)), 1.2101 (95% CI (1.2084, 1.2119)), 3.0234 (95% CI (2.6063, 3.4881)), and 1.9018 (95% CI (1.8565, 1.9478)), the turning points are November 7,November 17, October 2, and November 3, 2014, and the final sizes until December 2015 are 25794 (95% CI (25630, 25958)), 3916 (95% CI (3865, 3967)), 9886 (95% CI (9740, 10031)), and 12633 (95% CI (12515, 12750)) for West Africa, Guinea, Liberia, and Sierra Leone, respectively. The main results confirm that model selection is crucial in evaluating and predicting the important quantities describing the emerging infectious diseases, and arbitrarily picking a model without any consideration of alternatives is problematic.
Emerging and reemerging infectious diseases such as severe acute respiratory syndrome (SARS) in 2003 [
Although susceptibleinfectiveremoval (SIR) compartmental model is commonly used to describe the transmission dynamics of an infectious disease, it cannot be used when we consider only the cumulative infected population and capture the temporal variations of an outbreak, such as the turning point that is the point in time at which the rate of accumulation changes from increasing to decreasing. Several models have been proposed to estimate basic reproduction number, turning point, and final size by cumulated cases; some of them are based on purely empirical relationship, while others have a theoretical basis and are realized by differential equations. The simplest and commonly applied model among all the infectious disease models is the Richards model [
The most common approach in infective disease data analyses with simply ODE model is to select one model, usually Richards model, based on the shape of the desired curve and on biological assumptions. A single wave of infections consisting of a single peak of high incidence, an Sshaped cumulative epidemic curve, and a single turning point of an outbreak can be the best fitting to data using the selected model. Inference and estimation of parameters and their precision are based on the fitted model. Therefore, the interesting questions would be as follows: Can Richards model effectively predict the growth of the cumulative infected population? How to select the best model for fitting the emerging infectious diseases data? Is it possible to predict the turning point and final size and effectively estimate the basic reproduction number which are quite important in the disease control and management?
The traditional approaches of hypotheses testing, when applied to model selection, have been often found to be mediocre [
In Section
We employ the data on laboratoryconfirmed cases of pandemic A/H1N1 influenza admitted to the 8th Hospital of Xi’an, the Province’s Public Health Information System [
As mentioned in Section
Secondly, the turning point (or the inflection point of the cumulative case curve), defined as the time when the rate of case accumulation changes from increasing to decreasing (or vice versa), will be estimated for A/H1N1 in China and Ebola in different regions of West Africa. The turning point plays an important role in determining the rate of change transitions from positive to negative, that is, the moment at which the cases begin to decline. Precisely estimating this point can allow us to determine either the beginning of a new epidemic phase or the peak of the current epidemic phase, representing the point at which disease control activities take effect or the point at which an epidemic begins to wane naturally, defined by Hsieh et al. [
The final size of an emerging infectious disease is another important quantity for public health, which is the likely magnitude of the outbreak, and it is often called the expected final size of the epidemic [
The principle of MCMC methods can be briefly described as follows: build a transition kernel
Given
move the chain to a new value
Take
where
with
Suppose that the observed data
A slightly more direct (and more common) approach to estimating posterior model probabilities using MCMC has been included in the model indicator
Besides the marginal posterior model probabilities
Logistic model is as follows:
Gompertz model is as follows:
Rosenzweig model is as follows:
Richards Model (the reverse Rosenzweig model) is as follows:
For convenience, we denote, respectively, the above four models as
Let
Further, we assume that the set of parameter vectors is
The step of model selection with the MetropolisHastings algorithm is based on a proposal for a move from model
Let the initial value be
Propose a new model
Accept the proposed move (from
Under the usual regularity conditions, this MH algorithm will produce samples. Provided that the sampling chain for the model indicator mixes sufficiently well, the posterior probability of model
Evidence categories for the Bayes factor (Jeffreys, 1961 [
Bayes factor  Interpretation 


Decisive evidence for model 

Very strong evidence for model 

Strong evidence for model 

Substantial evidence for model 

Anecdotal evidence for model 

No evidence 

Anecdotal evidence for model 

Substantial evidence for model 

Strong evidence for model 

Very strong evidence for model 

Decisive evidence for model 
Based on above procedures, we realize our model selection as follows. Firstly, we obtain the Markov chains having 500000 samplers for each parameter of each model, respectively, carrying out the MCMC procedure by using an adaptive MH algorithm. Then the best model can be selected dynamically with the Markov chains of all parameters as follows.
Let the initial value be
Generate a new model
Repeat for
Evaluate the acceptance probability of the move (from
with
Let
Return the values
The estimation of the corresponding Bayes factor is
In order to validate the proposed model selection algorithm, we generate the time series from a given model with known parameter values. To do this, we fix all parameter values of Richards model as
Using the simulated data set
The corresponding Bayes factor
Data source 

 

Logistic  Gompertz  Rosenzweig  Richards  
Richards model  Logistic  1  —  —  — 
Gompertz  Inf  1  Inf  —  
Rosenzweig  Inf  —  1  —  
Richards model  8866.4  Inf  144.8  1  
AIC  260  241  245  231  


Gompertz model  Logistic  1  —  —  — 
Gompertz  Inf  1  19.8855  13.0241  
Rosenzweig  Inf  —  1  —  
Richards model  Inf  —  Inf  1  
AIC  618  440  601  556 
Data source  Parameter  Mean  Std.  MC_err  Tau  Geweke 

Richards model 

0.3095  0.0335  3.9372 
95.36  0.9923 

100.26  2.8313  0.0255  85.563  0.9991  

0.3914  0.0579  6.7027 
95.533  0.9919  


Gompertz model 

0.1504  0.0062  9.2746 
60.727  0.9983 

100  1.8169  0.0249  62.058  0.9991 
— means a very small number.
Inf indicates a sufficiently big number.
A repeat of the above procedure by using the simulated data
The above results show that the proposed model selection methods based on Bayes factor and MCMC method can help us to choose the optimal model. In Figure
Model fitting of simulated data generated from Richards model and Gompertz model using four candidate models. The data in (a) and (b) are produced from Richards model; the data in (c) and (d) are produced from Gompertz model. The simulated data points are shown as black dot points. The curves represent the fitting to the data points for four models, respectively. The grey areas are the 95% confidence intervals of each lines.
The 2009 influenza A/H1N1 pandemic outbreaks in Shaanxi Province of mainland China started from the 3rd of September. The majority of reported A/H1N1 cases were initially diagnosed in colleges and universities in early September 2009 when the universities began their fall semester and then spread to the communities in the middle of October 2009. The epidemic curve in Shaanxi Province exhibited the bimodality, where the first and small wave started around 3 September till 21 September and the second and large wave followed [
In this subsection, we plan to realize the model selection procedures using the published accumulative cases number of A/H1N1 from the 8th Hospital of Xi’an, where the majority of the confirmed cases in the province of Shaanxi in early September 2009 were isolated. The selection results are given in the first line of Table
The corresponding Bayes factor
Data source 

 

Logistic  Gompertz  Rosenzweig  Richards  
H1N1  Logistic  1  Inf  Inf  1.34 
Gompertz 
0  1  /  —  
Rosenzweig 
0  /  1  —  
Richards model  0.75  Inf  Inf  1  
AIC  249  362  592  254  


West Africa  Logistic  1  Inf  Inf  2.1528 
Gompertz 
0  1  /  —  
Rosenzweig 
0  /  1  —  
Richards model  0.4645  Inf  Inf  1  
AIC  5200  49500  1872800  5400  


Guinea  Logistic  1  Inf  Inf  1.25 
Gompertz 
0  1  /  —  
Rosenzweig 
0  /  1  —  
Richards model  0.8  Inf  Inf  1  
AIC  1991  3427  18476  1998  


Liberia  Logistic  1  Inf  Inf  — 
Gompertz 
0  1  /  —  
Rosenzweig 
0  /  1  —  
Richards model 

Inf  Inf  1  
AIC  6308  6547  7980  2559  


Sierra Leona  Logistic  1  —  —  — 
Gompertz  102310  1  2.96  0.28  
Rosenzweig  34750  0.34  1  0.095  
Richards model  362940  3.55  10.48  1  
AIC  15432  6251  7038  5400 
— means a very small number.
Inf means a sufficiently big number.
0 means the probability of being chosen for model is zero.
/ means a no number (
(a) Model selection based on the accumulate cases data from the 8th Hospital of Xi’an from 3 September to 21 September with the last 2000group parameters of Markov chain; (b) model fitting of A/H1N1 data in Xi’an, 2009. The curves represent the fitting to the data for four models, respectively. The grey areas are the 95% confidence intervals of each curve. Here, cyan curve represents Logistic model; blue curve represents Gompertz model; red curve represents Rosenzweig model; black curve represents Richards model. Note that the cyan curve and black curve almost coincide.
To show the results of model selection intuitively, Figure
The estimations of basic reproduction number
The estimations of all parameters with respect to the best model.
Data source  Parameter  Mean  Std.  MC_err  Tau  Geweke 

H1N1 

0.1605  9.1570 
3.3003 
6.6007  0.9999 

1013  8.6356  0.0341  6.6724  0.9997  


West Africa 

0.0251  4.8634 
1.5757 
6.6144  0.9999 

25794  83.712  0.2631  6.6658  0.9999  


Guinea 

0.0159  6.1289 
2.1429 
6.7164  0.9999 

3916  26.131  0.1143  6.6456  0.9999  


Liberia 

0.0919  6.19 
6.1022 
44.625  0.9973 

9886  74.03  0.5253  29.833  0.9999  

0.2333  0.0225  2.0514 
39.52  0.9963  


Sierra Leona 

0.0536  1.0186 
4.0645 
12.058  0.9999 

12633  59.697  0.2866  11.61  0.9999  

0.3985  0.0149  6.4121 
12.063  0.9997 
The estimations of
Data source 

95% CI 

95% CI  Final size  95% CI 

H1N1  1.9005  (1.8869, 1.9142) 

(22, 24) 

(996, 1030) 
West Africa  1.3522  (1.3506, 1.3537) 

(226, 228) 

(25630, 25958) 
Guinea  1.2101  (1.2084, 1.2119) 

(237, 241) 

(3865, 3967) 
Liberia  3.0234  (2.6063, 3.4881) 

(121, 149) 

(9740, 10031) 
Sierra Leona  1.9018  (1.8565, 1.9478) 

(157, 174) 

(12515, 12750) 
On June 18, 2014, an Ebola outbreak emerged in Africa. The outbreak, first reported in Guinea in December, 2013, has spread to neighboring Sierra Leone and Liberia. Ebola, characterized by fever, severe diarrhea, and vomiting, has a high fatality rate, which has mooted by the World Health Organization (WHO) criteria for a serious disease. Therefore, the main propose of this subsection is to use the report data sets from the WHO about the most serious regions including Guinea, Liberia, and Sierra Leone from March 25, 2014, to May 3, 2015, in order to carry out model selections and parameters estimations and then to get the estimates of
The selection results are shown in Table
Comparison of the reported and model predicted cases of Ebola based in Richards model on June 14, 2015.
Source of data  Reported cases (number) 
Predicted cases (number)  The rate of underestimated or overestimated model 

West Africa  27305  25693  −5.9% 
Guinea  3674  3778  +2.8% 
Liberia  10666  9842  −7.7% 
Sierra Leone  12965  12515  −3.5% 
Model selection based on the accumulate Ebola cases for (a) West Africa, (b) Guinea, (c) Liberia, and (d) Sierra Leone with the last 2000group parameters of Markov chain. The Logistic model and Richards model are selected in (a) and (b), and Richards model is selected in (c) and (d).
Model fitting of 20142015 Ebola outbreaks in (a) West Africa, (b) Guinea, (c) Liberia, and (d) Sierra Leone. Data of the cumulative numbers of infected cases are shown as black dots. The curves represent the fitting to the data for four models, respectively. The grey areas are the 95% confidence interval of each curves. Cyan curve represents Logistic model; blue curve represents Gompertz model; red curve represents Rosenzweig model; black curve represents Richards model.
The effects of different generation time in West Africa, Sierra Leone, Liberia, and Guinea on the basic reproduction number of Ebola. Dotted lines represent the 95% confidence interval of
In particular, for West Africa, the selection results are given in the second line of Table
The estimations of the parameters for Logistic model are shown in the second line of Table
For Guinea, the selection results are given in the third line of Table
The estimations of the parameters for Logistic model are shown in the third line of Table
For Liberia, the selection results are given in the fourth line of Table
The estimation of
For Sierra Leone, it follows from Table
The estimation of
Comparing the actual reported cases and the model predicted cases, on 14 June, 2015, the rates of underestimated or overestimated model are, respectively, −5.9%, +2.8%, −7.7%, and −3.5% for West Africa, Guinea, Liberia, and Sierra Leone, as shown in Table
On the basis of four simplest single species models, the model selection, and MCMC method we choose the best model to fit the A/H1N1 data set in China and Ebola data sets in West Africa. This allows us to estimate the basic reproduction number, the turning point, and final size more quickly and accurately for the emerging infectious disease compared with some complex compartment models.
Our estimate of
When we fit the data sets for Ebola cases in West Africa, the selection of the most appropriate model is Logistic model or Richards model. Reproductive numbers
For the previous Ebola outbreaks in Central Africa, Chowell et al. [
The turning point and final size have been also estimated and calculated. For example, the turning point for West Africa was 227 days which corresponds to 5 November, 2014, and the turning points for Guinea, Liberia, and Sierra Leone were about October or November, 2014. Further, the final breakout time will be September or December, 2015, with final size of 3916 (95% CI (3865, 3967)) for Guinea, 9886 (95% CI (9740, 10031)) for Liberia, 12633 (95% CI (12515, 12750)) for Sierra Leone, and 25794 (95% CI (25630, 25958)) for West Africa, respectively, as shown in Table
For the results of model selection, the most appropriate model is Logistic model or Richards model which requires only cumulative case data from an epidemic curve (Table
The selection of model about different data.
Data  Xi’an  West Africa  Guinea  Liberia  Sierra Leone 

(H1N1)  (Ebola)  (Ebola)  (Ebola)  (Ebola)  
Model  L (R)  L (R)  L (R)  R  R 
L denotes Logistic model.
R represents Richards model.
L (R) means both Logistic model and Richards model.
In Figure
The estimation of
Model 

95% CI 

95% CI 

Liberia (Ebola)  

3.0223  (2.6026, 3.4855)  130  (111, 159) 

1.784  (1.7661, 1.802)  137  (135, 140) 

1.197  (1.1935, 1.2005)  128  (126, 130) 


Sierra Leone (Ebola)  

1.9016  (1.8563, 1.9475)  165  (157, 174) 

1.1507  (1.1478, 1.1536)  164  (162, 168) 

1.5894  (1.582, 1.5967)  171  (169, 173) 
The models are sorted from the best to the worst.
The estimation of
Liberia
Liberia
Sierra Leone
Sierra Leone
By the analysis of Ebola data, we get that model selection uncertainty caused a magnification of the standard error of the estimation of
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China (NSFCs, 11171199, 11471201, and 11171268) and by the Fundamental Research Funds for the Central Universities (GK201305010 and GK201401004).