^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{1}

^{2}

Hand, foot, and mouth disease (HFMD) is an infection that is common in children under 5 years old. This disease is not a serious disease commonly, but it is one of the most widespread infectious diseases which can still be fatal. HFMD still poses a threat to the lives and health of children and adolescents. An effective prediction model would be very helpful to HFMD control and prevention. Several methods have been proposed to predict HFMD outpatient cases. These methods tend to utilize the connection between cases and exogenous data, but exogenous data is not always available. In this paper, a novel method combined time series composition and local fusion has been proposed. The Empirical Mode Decomposition (EMD) method is used to decompose HFMD outpatient time series. Linear local predictors are applied to processing input data. The predicted value is generated via fusing the output of local predictors. The evaluation of the proposed model is carried on a real dataset comparing with the state-of-the-art methods. The results show that our model is more accurately compared with other baseline models. Thus, the model we proposed can be an effective method in the HFMD outpatient prediction mission.

Hand, foot, and mouth disease (HFMD) is a common infection caused by a group of viruses. It is likely to occur to children under 5 years old. HFMD causes a serious threat to children’s health. Especially in developing Asian countries, this disease is more likely to cause big damage. China is a country with a large population and vast territory, and the development of different regions is uneven. Under this situation, it is difficult to control infectious diseases spread in China. HFMD has been a nationally notifiable disease since 2008. The new cases should be reported in 24 hours. However, the situation is still worsening. According to the data from the Chinese Centre for Disease Control and Prevention (CCDC) [

Many methods have been proposed to predict HFMD cases. ARIMA is one of the most general time series models, which is already used in HFMD prediction work [

However, on the one hand, the HFMD outpatient data is nonlinear and nonstationary. On the other hand, the spread of HFMD is affected by complex and diverse external factors, such as climate, living habits, and living conditions. These two characteristics make it difficult to improve performance based on a global predictor. The relationship between target data and external factors provides a new idea to researchers, and many studies focus on prediction using external factors to enhance the model performance have been down. The data about external factors is named exogenous data to distinguish it from target data. In this paper, we use new ideas to improve the accuracy of prediction: time series decomposition and local fusion.

Essentially, decomposition is the process of dividing a complex problem into subproblems that can be easily solved. In our experiments, a classical method named Empirical Mode Decomposition (EMD) is used to decompose the HFMD outpatient data. This method decomposes a time series into several subseries named Intrinsic Mode Function (IMF) and a residual. Each IMF contains a local feature. In addition, in our study, the residual is also treated as an IMF. Each IMF is treated equally by local predictors in the experiment.

In this paper, we propose a Concurrent Autoregression with Decomposition (CARD) model for HFMD prediction. We try to improve the accuracy of prediction as much as possible without exogenous data. CARD generates predicted value by fusing the output of the local predictors. The method utilizes two linear autoregression predictors to process the past outpatient data and the IMFs, respectively. Then, a fusion component fuses the outputs of two linear predictors. Finally, a global predictor is introduced to generate the predicted result. In a word, we propose an effective time series decomposition and local fusion method, which can catch a higher accuracy than several general methods that only use history outpatient data.

The main contributions of this paper can be summarized as follows:

We propose a novel prediction model, which applied time series decomposition and local fusion to the prediction of outpatient cases of HFMD

A classical decomposition method named EMD is introduced to decompose the HFMD outpatient time series. Compare with several other decomposition methods, EMD is simpler and more efficient in this study

The proposed method applies a linear weighted module to fuse the output of two local predictors. Each local predictor predicts an output result independently. Then, the fusion module trains to generate the final predicted value of the output of local predictors

The rest of the paper is organized as follows. Section

This section introduces several most commonly used decomposition methods and fusion methods related to our research.

A time series can be decomposed into several subseries via decomposition methods. For time series decomposition, the following methods are widely used: wavelet transform [

Wavelet transform [

RobustSTL [

EMD [

EEMD [

In recent years, there are some time series prediction works using time series decomposition in several search areas. A regression model combined with wavelet transform is proposed to forecast the future value of the S&P 500 [

The HFMD outpatient time series data is applied in our study. The spread of HFMD is easily affected by many external factors. Thus, the processing of the time series is difficult. But the adaptive nature of EMD overcomes this problem. In this paper, we introduce EMD to process our input data.

Time series forecasting has been a subject of interest in several different research areas including disease control and prevention. In the practical problems of nature, things are not isolated from each other but inextricably connected. The same goes for HFMD. Many studies have fused exogenous data to improve the accuracy of prediction.

The spread of HFMD is influenced by many external factors, such as meteorological factors including temperature, humidity, rapid climate change, local policies, air quality, and population [

Several methods using exogenous data are collected, and these models can be classified into two categories—stochastic methods and learning methods. Stochastic methods usually combine the past data and exogenous data by a linear method and then learn a linear function to get prediction results [

The learning methods focus on temporal, which inputs of different categories are differential treatment, such as [

Though the exogenous data can help to improve accuracy, it still has some unavoidable defects. That is, the exogenous data requires a mass of energy to collect and organize and it is unavailable sometimes. Therefore, it is not always wise to do prediction relying on exogenous data, especially in real-time systems. It is almost impossible to integrate required data into the model dynamically. Considering the drawbacks of exogenous data, we discussed above, our attention focuses on target data itself and we do not utilize the exogenous data.

This section formulates the problem and illustrates our approach.

Figure

The scheme illustration of the proposed CARD.

In the data preprocessing stage, the input data is the HFMD outpatient. The outpatient data is normalized and then further segmented. Finally, they are decomposed into finite IMFs and residual by EMD. In the CARD, softmax function is introduced to avoid unfairness in feature extraction. Two linear autoregression components are used to mine the sequence feature details and enhance the feature representation of input data. At last, the output of two linear components is fused and another linear component is applied to generate the predicted value. In the data postprocessing stage, the final result is generated and evaluated after denormalization.

The main notations are explained in Table

Notation and semantic.

Notation | Semantic |
---|---|

Number of points in the time series | |

Window size | |

Number of IMF | |

Input matrix, | |

Matrix after split, | |

Matrix after decomposition, | |

Output matrix, |

The problem of this paper can be addressed as the problem of time series prediction missions. A time series is a list of continuous history observation values with equal time intervals. Our goal is to get a predicted value of the outpatient value of the next day.

It is a mapping from the history observation time series and IMFs to the future outpatient value. The symbol

In this study,

Min-max normalization (0-1 normalization) is a widely use method in time series normalization. It is a linear transformation of the original data, making the result fall into the interval of (0,1). The original data can maintain the difference of value after the linear transformation. Thus, Min-max is suitable to normalize the outpatient time series in our study. The formula of the Min-max normalization is expressed as follows:

An IMF must satisfy the requirements as follows:

In any local time scale, the number of extrema and the number of points cross zero must be equal or the difference is 1

At any point, the mean value of the upper envelop defined by the local maxima and the lower envelope defined by local minima is close to 0

The procedures of EMD algorithm are shown in Algorithm

Input: The original signal

Output:

1.

2. while

3.

4.

5.

6.

7. if

8.

9.

10.

11.

Let

CARD employs a linear layer to receive a regression result of IMF. The formula is expressed as follows:

The detail steps of the proposed CARD are shown in Algorithm

Input: Observed HFMD outpatient time series

Output: Prediction value for future cases

1.

2.

3.

4. for each sample

5. for

6. for

7.

8. for each sample

9 for

10 for

11. for

12.

13.

14.

15.

16.

This section configures our experiments. Section

The real dataset we applied in our experiments is HFMD outpatient case data which is collected from the Xiamen Center for Disease Control and Prevention (XCDC). This dataset is the daily record data from January 1, 2012, to December 30, 2018. A total of 2555 sample points are included. In Figure

The distribution of outpatient cases ranges from Jan 1, 2012, to Dec 30, 2018.

To measure the performance of our proposed model and compare our model with the selected baseline models, 3 widely used standard methods are adopted in our experiments, and the formulas are defined as follows:

In these equations, the parameter

MAE is a basic and universal metric in regression mission. Compared with MAE, RMSE has the same degree as the data. For

The comparison of CARD performance at different values of

MAE comparison

RMSE comparison

The comparison of four methods in terms of MAE, RMSE,

MAE comparison

RMSE comparison

Computation time comparison

This section gives prediction results, comparisons, and analyses.

In this subsection, we investigate the effects of decomposition and fusion. As we can see in Figure

The comparison of eleven methods on three groups of inputs in terms of MAE, RMSE, and

MAE comparison

RMSE comparison

Time series decomposition is an important part of this study. We decompose the HFMD outpatient time series into finite and multitime scale IMFs and a residual; then, each subsequence is modeled and predicted with a local linear predictor separately. The single IMF contains a specific physical meaning, such as seasonality and trend. Each sequence is treated equally in the model. Compared with the original data, each IMF can represent the local features by itself. This means that separate predictions for each sequence and then fusion may give better results than using only the raw data, and the experimental results proved this.

The prediction accuracy of all baseline models has increased after fusing the HFMD outpatient case data. This result shows the superiority of data fusion. A possible explanation is that existing models do not work well with complex time series like IMF, and some methods cannot capture the relation between different sequences. The data processing of the CARD can be divided into two stages. In the first stage, each sequence is predicted separately, and then, the results are fused. In the second stage, the predicted values are obtained from the fused data, which can analyze the relationship between each sequence. So, we get better results than other models. By the way, the IMFs may lose some features in the original data. These defects are more obvious with the short and complex time series data. However, the fusion of IMF and case data overcomes this shortage. That may explain why all models have various degrees of improvement after fusion.

The main results are shown in Figure

Out of all the models, CARD performs the best. In detail, MLR is the second-best model. Compared to MLR, CARD is slightly behind in MAE and RMSE, and we are slightly ahead in

In the experiments using only decomposed data as input, several baseline models showed various degrees of degradation in performance. And their performance is improved if the outpatient data is added to the dataset. However, as we can see in Figure

Although the CARD model does not make revolutionary advances, however, the model is much less computationally intensive compared to most neural network models. Therefore, the model has relatively low hardware requirements. Moreover, this model still has good predictive performance when using only historical data, which means that the data needed to run the model is easily available. This further lowers the threshold for practical using the model. Therefore, our proposed model has good prospects for practical applications.

Our experiment indicates that data decomposition and local fusion can improve prediction performance. In this paper, we propose a time series decomposition and local fusion model named CARD for HFMD outpatient case prediction. The main conclusions of this study are shown as follows:

Compared with wavelet transform and EEMD, the EMD method has advantages in predicting accuracy in terms of HFMD outpatient prediction. Therefore, EMD is suitable for HFMD outpatient time series

The fusion model we proposed is superior to the most general methods, which means that such a model still has great potential in infectious disease forecasting

Our study must go further research. In this paper, we do not test the predicting accuracy on the multistep prediction. In the next step, we can try to extend our model to multistep times series prediction and other diseases.

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

This work was partly supported by Jimei University (nos. ZP2021013 and ZP2020043), the Science Project of Xiamen City (CN) (no. 3502Z20193048), the Education Department of Fujian Province (CN) (nos. JAT200277 and JAT200232), and the Natural Science Foundation of Fujian Province (CN) (no. 2019J01713). We gratefully acknowledge Xiamen Center for Disease Control and Prevention (XCDC) for sharing the data. We gratefully acknowledge the editor and anonymous reviewers for their valuable insights and suggestions which enormously benefited the paper.