^{1, 2}

^{1}

^{3, 4}

^{5, 6}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

Dengue fever (DF) is a serious public health problem in many parts of the world, and, in the absence of a vaccine, disease surveillance and mosquito vector
eradication are important in controlling the spread of the disease. DF is primarily
transmitted by the female

The incidence of dengue fever (DF) has grown dramatically around the world in recent decades, with some 2.5 billion people now at risk of the disease [

DF is a viral vector-borne disease that is common in the tropics and subtropics and is primarily spread by the female

Autoregressive Integrated Moving Average (ARIMA) models have been used in applications such as the assessment of seasonal variation in selected medical conditions [

Time series analysis of infectious diseases within the Bayesian framework has been considered in some studies [

Studies have compared ARIMA models with dynamic models for infectious diseases (fitted via maximum likelihood methods) [

The purpose of this paper is to compare the two-component K-H with the single-component ARIMA model in predicting weekly DF notifications. Different formulations of models within each type are compared, together with a sensitivity analysis of the K-H model, fitted within a Bayesian framework.

The Singapore Infectious Diseases Act (1977) requires medical practitioners to notify all cases of DF to the Ministry of Health (MoH) within 24 hours. We obtained data from the published “Weekly Infectious Disease Bulletin”, available from the MoH website which uses the World Health Organization 2009 criteria for DF which is also detailed there [

We studied weekly DF notifications in Singapore till June 2008. Data from January 2001 to December 2006 was used to estimate the model parameters. Thereafter, we performed external validation of the models using data from January 2007 to June 2008.

If

We describe the ARIMA (3,1,1) model equation used in our analysis. The number of cases of DF at week

The K-H model distinguishes between the endemic,

The endemic parameter,

The epidemic component is derived from the parameter sequence

The two-component model formulation is completed by specifying prior distributions for the parameters in the model as follows:^{6}, representing highly dispersed independent normal priors for each coefficient.

The K-H models were fitted using the customised Bayesian software Twins V1.0 [

We compared the ARIMA model with the K-H model and as well conducted a sensitivity analysis on the K-H model using the MAPE:

The Bayesian analyses were based on several assumptions regarding the prior distributions, and we assessed the robustness of our results in a sensitivity analysis. For the sensitivity analyses, we considered 4 different scenarios which involved varying values of

Figure

Weekly cases of dengue fever (DF) in Singapore.

The autocorrelation plots for DF (Figure

Plots of autocorrelation and partial correlation for dengue fever (DF).

Correlogram

Partial correlogram

We explored various formulations of the ARIMA model, and we summarise some of the more important ones in Table

Comparison of MAPE values across various ARIMA models.

Model | Model specification | MAPE |
---|---|---|

1 | ARIMA (1,0,0) | 23.61 |

2 | ARIMA (2,0,0) | 23.09 |

3 | ARIMA (3,0,0) | 23.20 |

4 | ARIMA (4,0,0) | 23.23 |

5 | ARIMA (3,1,0) | |

6 | ARIMA (3,1,1) | 19.96 |

Parameters for the final models.

ARIMA model | Coefficient | 95% confidence interval | ||
---|---|---|---|---|

Constant ( | 0.28 | −3.86 | 4.41 | 0.896 |

AR 1 ( | −0.10 | −0.16 | −0.04 | 0.001 |

AR 2 ( | 0.10 | 0.04 | 0.17 | 0.002 |

AR 3 ( | 0.23 | 0.17 | 0.29 | <0.001 |

K-H model | Coefficient | 95% credible interval | ||

25.1 | 18.4 | 32.3 | ||

3.3 | 1.9 | 3.6 | ||

−0.2 | −0.3 | 0.3 | ||

−0.5 | −0.7 | −0.4 | ||

1.0 | 1.0 | 1.0 | ||

7.6 | 0.01 | 15.0 | ||

1.3 | 0.7 | 2.0 |

The comparison between the ARIMA and K-H model is shown in Figure

Comparison of out-of-sample predictions (external validation) between ARIMA and K-H models.

MAPE | ARIMA | K-H |
---|---|---|

Overall | 17.54 | |

Stratified (in 4 week intervals) | ||

Weeks 1 to 4 | 17.07 | |

5 to 8 | 28.60 | |

9 to 12 | 33.41 | |

13 to 16 | 32.52 | 33.09 |

17 to 20 | 21.83 | |

21 to 24 | 20.64 | |

25 to 28 | 12.86 | 13.22 |

29 to 32 | 11.53 | 14.40 |

33 to 36 | 8.54 | 10.26 |

37 to 40 | 5.07 | 6.50 |

41 to 44 | 18.49 | |

45 to 48 | 8.54 | 10.35 |

49 to 52 | 15.70 | |

Weeks 1 to 4 | 11.13 | 11.16 |

5 to 8 | 29.09 | |

9 to 12 | 16.39 | 19.41 |

13 to 16 | 15.51 | |

17 to 20 | 19.21 | |

21 to 24 | 9.83 | 10.07 |

Sensitivity analysis on K-H model parameters.

MAPE | Initial K-H model | Sensitivity analysis | |||

1 | 2 | 3 | 4 | ||

Overall | 17.21 | 17.71 | 17.71 | 17.50 | 16.54 |

Stratified (in 4 week intervals) | |||||

Weeks 1 to 4 | 14.27 | 20.12 | 22.33 | 20.30 | 20.03 |

5 to 8 | 25.62 | 25.41 | 25.66 | 25.12 | 23.39 |

9 to 12 | 30.63 | 31.06 | 31.30 | 30.95 | 31.07 |

13 to 16 | 33.09 | 33.20 | 32.31 | 32.49 | 27.89 |

17 to 20 | 20.53 | 20.41 | 20.40 | 20.92 | 21.82 |

21 to 24 | 19.76 | 21.14 | 20.90 | 21.25 | 21.58 |

25 to 28 | 13.22 | 13.45 | 14.18 | 13.28 | 12.91 |

29 to 32 | 14.40 | 13.98 | 13.04 | 13.39 | 10.65 |

33 to 36 | 10.26 | 10.76 | 9.97 | 10.55 | 6.11 |

37 to 40 | 6.50 | 6.69 | 6.39 | 5.54 | 3.30 |

41 to 44 | 17.42 | 17.62 | 17.54 | 16.91 | 15.94 |

45 to 48 | 10.35 | 11.30 | 10.59 | 11.09 | 10.67 |

49 to 52 | 12.44 | 12.99 | 12.37 | 12.58 | 13.12 |

Weeks 1 to 4 | 11.16 | 11.09 | 10.85 | 11.13 | 11.31 |

5 to 8 | 25.63 | 25.53 | 26.01 | 25.49 | 25.83 |

9 to 12 | 19.41 | 20.03 | 20.25 | 19.31 | 16.28 |

13 to 16 | 10.77 | 10.72 | 10.47 | 10.72 | 11.25 |

17 to 20 | 18.38 | 19.20 | 17.97 | 18.59 | 19.27 |

21 to 24 | 10.07 | 9.33 | 10.98 | 9.95 | 9.92 |

A description of the parameters used in the sensitivity analysis is provided in the 4th page of the manuscript.

Comparison of out-of-sample forecasts of dengue fever (DF) between ARIMA and two-component K-H model (January 2007 to June 2008).

In terms of forecasting one-week ahead DF notifications, both methods performed well (Figure

The Bayesian analysis is influenced by the prior specification. As such, we investigated the robustness of our results to different formulation of the priors. These priors represented a wide range of realistic scenarios where the probability of an outbreak is expected to differ. As can be seen from Table

We found that the K-H model performed better than the conventional ARIMA time series model; however, this was only marginal. Forecasting weekly cases of DF has immense implication for hospital resources planning. For an infectious disease ward, knowing the normal trend of DF, along with predictions of the following week’s DF can allow hospital planners to better plan for and allocate their manpower and other resources. Intensive media campaigns (e.g., television advertisements) in the weeks prior to a projected increase in DF notifications may prove to reduce the number of new cases.

Though we used the MAPE index to compare the models, other indices are also available. The Mean Squared Error, for instance, is calculated from the sum of the squared error values. Compared to MAPE, the values are not relative to the magnitude of the observation, and the values are not intuitively easy to interpret.

There were several limitations in our study. Firstly, our analysis was dependent on notifiable data. While clinicians are required to report all cases of DF and DHF to the MOH, there is a possibility that the cases could be underreported, especially since mild asymptomatic cases of DF may have not been diagnosed. While this may have led to an under-estimate in the forecasts, the comparisons across the models are still valid, as they make use of the same number of weekly cases.

In our analysis, we compared the predictive capability of the models using one-week ahead forecast of dengue fever notification. It is possible to forecast for periods longer than that, of course the predictions may inherently not be as accurate as a one-week forecast.

In conclusion, we found that both the final models chosen for the ARIMA and K-H models predict the future course of DF in Singapore reliably well, while the former performed marginally better. The ARIMA models were relatively faster to implement and run, while the K-H model was sensitive to the choice of priors, which needs to be carefully made before the study is conducted.

Funding for this study was received from the Duke-NUS SRP block grant as well as the Merit Award from the Yong Loo Lin School of Medicine, Singapore.