Accurate incidence forecasting of infectious disease provides potentially valuable insights in its own right. It is critical for early prevention and may contribute to health services management and syndrome surveillance. This study investigates the use of a hybrid algorithm combining grey model (GM) and back propagation artificial neural networks (BPANN) to forecast hepatitis B in China based on the yearly numbers of hepatitis B and to evaluate the method’s feasibility. The results showed that the proposal method has advantages over GM (1, 1) and GM (2, 1) in all the evaluation indexes.
Hepatitis B is a vaccine preventable disease caused by the hepatitis B virus (HBV) that can induce potentially fatal liver damage. It has infected approximately 2 billion people worldwide, which represents onethird of the world population. Each year around the world, HBV infection is responsible for about one million deaths due to liver failure and cirrhosis and more than 75% of the hepatocellular carcinomas worldwide develop from HBV infection [
Mathematical and computational models have gained in importance in the publichealth domain, especially in infectious disease epidemiology, by providing rationales and quantitative analysis to support decisionmaking and policymaking processes in recent years. And many researchers advocate the use of these models as predictive tools [
The accurate forecasting of hepatitis B can be obtained by analyzing the sufficient historical data. However, in China and perhaps some other developing countries, the current public health surveillance system does not collect detailed essential epidemiological information as they are often difficult to obtain. The forecasted of hepatitis B will be inaccurate only by the limited data. Therefore, it is significant to make the limited dataprocessing.
The grey systems theory chiefly including the theory of grey system analysis, modeling, prediction, decisionmaking, and control is established by Deng, which focuses on uncertainty problems with small samples, discrete data and incomplete information that are difficult for probability, and fuzzy mathematics to handle. Grey prediction is an important embranchment of grey systems theory, which makes scientific, quantitative forecasts about the future states of grey systems. The precise prediction of system can be performed by generating and extracting the useful information from the small samples and the partially known information [
Artificial neural networks (ANN) are complex and flexible nonlinear systems with properties not found in other modeling systems. It allows a method of forecasting with understanding of the relationship among variables and in particular nonlinear relationships. ANN function by initially learning a known set of data from a given problem with a known solution (training) and then the networks, inspired by the analytical processes of the human brain, are able to reconstruct the imprecise rules. Once a model is trained, the forecasted outputs can be generated from novel records [
The aim of this study is to investigate the use of a hybrid method combining grey model (GM) and back propagation artificial neural networks (BPANN) to forecast hepatitis B in China based on the yearly numbers of hepatitis B from the years 2002 to 2012 and to evaluate the method’s performances of prediction.
The incidence data of hepatitis B are collected from the Ministry of Health of the People’s Republic of China from the years 2002 to 2012, which are opening government statistics data [
The proposed method is established based on the grey systems theory and BPANN theory. MATLAB software version 2011b is used for the statistical analysis.
The incidence data are considered as the original time series
Through grey generations or the effect of sequence operators to weaken the randomness, grey prediction models are designed to excavate the hidden laws; through the interchange between difference equations and differential equations, a practical jump of using discrete data sequences to establish continuous dynamic differential equations is materialized. Here, GM
The establishment for a GM
(1) Let nonnegative time sequence expressing
(2) Firstorder accumulative generation operation (1AGO) is used to convert
(3) Let
Then
(4) The whitenization equation is given by
(5) The forecasting model can be obtained by solving the above equation, which is shown as follows:
(6) The predicted value of the primitive data at time point
The procedure for a GM
(1) For a given sequence of original data
(2) The GM
(3) Solve the whitenization equation. If
The steps of the forecasting method can be described as follows.
In order to obtain the input of the BPANN, the GM
The GM
The method flow chart is shown in Figure
Flow chart of the hybrid method.
The metrics used are relative error (RE),
The incidence data of hepatitis B are collected year by year from 2002 to 2012 in China and taken as the original time series, which is shown in Figure
The incidence number of hepatitis B in China from 2002 to 2012.
The GM
In the threelayer BPANN, the hidden node
The topology structure of the proposal method.
The weights and thresholds of the proposal model will be obtained by training. Let the training time be 1000, the learning rate be 0.9, the momentum factor be 0.95, and the error be 0.001; Levenberg Marquardt is used as training algorithm.
The prediction of the original time series by the GM
The forecasted incidence of hepatitis B in China from 2013 to 2021 by the proposal method.
In order to compare the prediction created by the two GM models and the proposed method, a prediction is performed under the same conditions. The results are listed in Table
The prediction created by the GM (1, 1), GM (2, 1), and the proposal model.
Year  The observed data  GM (1, 1)  GM (2, 1)  The proposal method 

2003  719011  924130  882183  736600 
2004  916396  949918  1036306  884300 
2005  982297  976426  1133758  1027700 
2006  1109130  1003674  1183850  1096300 
2007  1169946  1031682  1201811  1118200 
2008  1169569  1060471  1198178  1124300 
2009  1179607  1090065  1180237  1125900 
2010  1060582  1120484  1153018  1126400 
2011  1093335  1151751  1119982  1126500 
2012  1087086  1183892  1083509  1126500 
The scatter diagram of the relationship between the observed data and the prediction.
The RE of prediction is shown in Figure
Comparison of the RE of the prediction by the proposal method and the GMs.
The comparison of
The evaluation indexes comparison.
Index 

MSE  MAE  RMSE  MAPE  SSE 

The proposal method  0.9495  2.3649 × 10^{7}  3.9704 × 10^{4}  4.863 × 10^{3}  3.9704 × 10^{6}  1.8162 × 10^{10} 
GM (1, 1) model  0.6365  2.2867 × 10^{8}  1.0492 × 10^{6}  1.5122 × 10^{4}  1.0492 × 10^{8}  1.1078 × 10^{13} 
GM (2, 1) model  0.9392  1.6798 × 10^{8}  1.1173 × 10^{6}  1.5122 × 10^{4}  1.1173 × 10^{8}  1.2570 × 10^{13} 
The forecasted generated by the GM
The forecasted generated by the three methods.
Year  GM (1, 1)  GM (2, 1)  The proposal method 

2013  1216929.0  1045223.6  1077864.1 
2014  1250888.2  1006228.6  1074038.2 
2015  1285795.1  967267.6  1012371.7 
2016  1321676.0  928834.0  976301.8 
2017  1358558.3  891248.9  946959.7 
2018  1396469.7  854715.0  939194.1 
2019  1435439.1  819353.3  937881.5 
2020  1475496.0  785229.0  937607.7 
2021  1516670.7  752369.4  937531.5 
The forecasted incidence of hepatitis B in China from 2013 to 2021 by the three methods.
The weights and thresholds of BPANN will generate randomly at first when the model is training. This will make the predicted and forecasted uncertainty. To describe this clearer, the proposal model is ran 100 times and the mean value will be taken as predicted or forecasted value. The 95% confidence interval and predicted or forecasted value are shown in Figures
The predicted incidence of hepatitis B in China from 2003 to 2012 by the proposal methods.
The forecasted incidence of hepatitis B in China from 2013 to 2021 by the proposal methods.
Although the prediction result created by the proposal method in the paper has more accurate than that by the two gray models, the proposal model has its limitations. Firstly, since the proposal model is built on the basis of gray model, the sample size, namely, the number of historical data must be not less than 4. Secondly, the prediction result will be inaccuracy if the weights and thresholds in BPANN ran into local optimum in the process of training. Intelligent algorithms can be used to optimize the weights and thresholds of BPANN [
The hepatitis B epidemiological information is often difficult to obtain. Forecasting of hepatitis B will be inaccurate by the limited data. The grey systems theory focuses on uncertainty problems with small samples and incomplete information. At the same time, the BPANN is a method of forecasting with understanding of the relationship among variables and nonlinear relationships. The research proposes a new forecasting method, which combines the GM and BPANN, to forecast hepatitis B in China. The useful information can generate and extract from the small samples and the BP neural networks can train data more sufficiently. The prediction results show that this method can obtain better forecasting.
The authors have declared that no competing interests exist.
The authors thank Zola Banh for carefully correcting grammar errors. This work was supported by Youth Science Foundation of Guangxi Medical University (GXMUYSF201208), the open project of Guangxi Medical Science Experimental Center (KFJJ201131), the training programs of innovation and entrepreneurship for undergraduates in Guangxi province (2012xjcxcy006), and the training programs of innovation and entrepreneurship for undergraduates in China (201210598002). The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the paper.