Modeling Seasonal Influenza Transmission and Its Association with Climate Factors in Thailand Using Time-Series and ARIMAX Analyses

Influenza is a worldwide respiratory infectious disease that easily spreads from one person to another. Previous research has found that the influenza transmission process is often associated with climate variables. In this study, we used autocorrelation and partial autocorrelation plots to determine the appropriate autoregressive integrated moving average (ARIMA) model for influenza transmission in the central and southern regions of Thailand. The relationships between reported influenza cases and the climate data, such as the amount of rainfall, average temperature, average maximum relative humidity, average minimum relative humidity, and average relative humidity, were evaluated using cross-correlation function. Based on the available data of suspected influenza cases and climate variables, the most appropriate ARIMA(X) model for each region was obtained. We found that the average temperature correlated with influenza cases in both central and southern regions, but average minimum relative humidity played an important role only in the southern region. The ARIMAX model that includes the average temperature with a 4-month lag and the minimum relative humidity with a 2-month lag is the appropriate model for the central region, whereas including the minimum relative humidity with a 4-month lag results in the best model for the southern region.


Introduction
Influenza, commonly referred to as the flu, is a worldwide respiratory infectious disease that easily spreads from one person to another. Influenza is the cause of approximately 3-5 million cases of severe illness and 250,000-500,000 deaths annually worldwide [1]. The disease is transmitted through the air by coughs or sneezes, creating aerosols containing the virus from infectious individuals. Individuals, who come into contact with or breathe in these aerosols, will likely become infected by the virus [1]. The first reported major global pandemic, known as the "Spanish" influenza, occurred in 1918; it was caused by a novel H1N1 virus subtype [2]. This pandemic was estimated to have cost the lives of 20-40 million people from 1918 to 1919 [3].
In temperate regions, influenza has a seasonal pattern peaking during winter seasons [4], whereas, in tropical regions, the incidence is less likely to be seasonal rather than randomly patterned [5]. The important role of climate in influenza is less understood for tropical regions. However, a marked increase of influenza cases was reported during the rainy seasons, in countries such as Singapore, northeastern Brazil, and French Guiana [5][6][7]. In Dakar, Senegal, the incidence of influenza peaked during 1996-1998. This peak was 2 Computational and Mathematical Methods in Medicine attributed to high precipitation, humidity, and temperature [8]. From the laboratory surveillance data in Brazil, Alonso et al. found that temperature and humidity played an important role in driving the influenza epidemic [9]. Influenza incidences were also associated with temperature in tropical countries [10]. Moreover, in warm climates, the recording of environmental variables was shown to increase the ability to predict influenza cases [11]. An experimental study showed that the spread of the influenza virus depends upon both temperature and relative humidity [12,13]. It was found that aerosolized influenza virus is stable maximally at low relative humidity conditions and moderately stable at high relative humidity [14]. From simulated coughs, influenza viruses maintain infectivity at low relative humidity and are increasingly inactivated at high relative humidity [15].
In Thailand, few reports of the seasonality of influenza outbreaks exist. Influenza cases peaked twice per year in the 11 provinces of Thailand during 2004-2010, as monitored by passive surveillance. A major peak occurred during the rainy season (June-August), and a minor peak occurred during the winter season (October-February) [16]. Climate factors may play an important role in predicting the number of influenza cases. The links between climatic variables and influenza cases in Thailand are also less understood. Using regression analysis, Chumkiew et al. found that the temperature difference and percent of rainfall were associated with the influenza incidences in Nakhon Si Thammarat [17]. However, this research covered only one province in Thailand. Most of the studies in Thailand focused on the relation of the seasonal patterns of influenza outbreaks [18,19]. Knowledge of the effects of climatic variables on the influenza seasonality in Thailand may be important for developing efficient intervention measures that may help mitigate and/or contain the disease.
Here, we retrospectively analyze the time-series pattern of reported suspected influenza cases in 2 regions of Thailand during the years 2009 to 2014. We used autocorrelation and partial autocorrelation plots to determine the ARIMA model that predicts the future influenza cases with a linear function of time lag values (autoregressive part) plus uncorrelated random variables (moving average part). The relationships between the climate data and reported influenza cases were evaluated using cross-correlation function. We followed previous suggestions [5][6][7][8][9][10][11] and investigated each climatic variable (the amount of rainfall, average temperature, average maximum relative humidity, average minimum relative humidity, and average relative humidity) that may influence the influenza cases as input time-series in the ARIMAX model. The most appropriate ARIMAX model for each region is presented, based on the previous data and climate variables.

2.
1. Data Sources. Data of monthly influenza cases from 2009 to 2014 were extracted from the National Notifiable Disease Surveillance Report of the Bureau of Epidemiology at the Ministry of Public Health [20]. Most positive cases were suspected influenza cases, based on clinical diagnosis made by attending physicians. The clinical criteria for influenza  were fever, cough, sore throat, headache, and myalgia. Samples from some suspected influenza cases were then tested using RT-PCR for laboratory confirmation. The suspected influenza cases are reported only from the public hospitals and some private hospitals. In this work, we analyzed only the influenza cases in 2 regions, the central and southern regions, which exhibit more flu cases than other regions in Thailand.
There are three seasons in the central region: the rainy season (mid-May to mid-October), the winter season (mid-October to mid-February), and the summer season (mid-February to mid-May). However, in the southern region, the year is divided into only two seasons: the rainy season (June to February) and the summer season (March to May). Geographically, the southern region is a peninsula located between the Andaman Sea on the west and the South China Sea on the east, whereas the central region has higher latitude.
The monthly rainfall, monthly average temperature ( mean ), average maximum relative humidity (RH max ), and average minimum relative humidity (RH min ) were obtained from the Research Data Archive (RDA), which is maintained by the Computational and Information Systems Laboratory (CISL) at the National Center for Atmospheric Research (NCAR). The monthly original data can be obtained from the RDA (http://rda.ucar.edu/) in the dataset number ds512.0. We extracted the climate data from 39 weather stations, with 23 stations located in the central region and 16 stations located in the southern region ( Figure 1). The climate data extracted from each station were used to determine the average climate value that represented the regional climate data. We also obtained data on rainfall variations in Thailand from the US National Climate Data Center (NCDC). Data is available from 171 stations in Thailand, including 35 stations for the central region and 15 stations for the southern region.

Time-Series Analysis.
The time-series data from 2 regions were divided into 2 parts; the first 60 months of data (from 2009 to 2013) was used to calibrate the time-series model, and the last 12 months of data (in 2014) was used to test the model prediction. In this study, we used the autoregressive integrated moving average, ARIMA( , , )( , , ) model. For a complete description of ARIMA analysis, we refer readers to [21]. Briefly, the ARIMA univariate analysis models consisted of 3 subprocesses: (a) autoregression (AR), (b) moving average (MA), and (c) differencing forming a stationary time-series [21][22][23][24]. , , and are the orders of the AR, the differencing, and the MA process, respectively, whereas , , and are the seasonal orders of AR, differencing, and MA process, respectively; is the seasonal period. In the ARIMA model, the predicted influenza cases at time , , have been obtained by applying the weight ( ) to the uncorrelated random variables ( ) in the th order. The timeseries of may be written as a linear function of the lag value at th order, as follows: If the time-series were nonstationary, we used log transformation and/or differencing ( = − −1 ) to remove the nonstationary terms. The and orders can be obtained using the cutoff time lag of autocorrelation function (ACF) and partial autocorrelation function (PACF). If the timeseries showed a seasonal pattern with an ACF peak at time lags, then the same procedure was applied to the seasonal ARIMA model. The ARIMA model was examined using the goodness of fit process, in which the residual is likely to be white noise. The coefficients of the model were estimated using the mean square method. Akaike's Information Criterion (AIC) was used to determine the optimal model, which can avoid overparameterization. However, the root mean square error (RMSE) was also used to determine the best fit [11].
The climate variables were used as input time-series in the ARIMAX model. In the case of strongly autocorrelated data, it is difficult to determine the correlation between the time-series of climate and suspected cases. Prewhitening is a useful method to disentangle this strong linear correlation. In this work, the prewhitening process was applied to the climate time-series to avoid autocorrelation with suspected cases time-series. Then, the cross-correlation function (CCF) between the prewhitened climate and suspected influenza case time-series was calculated to identify the significant time lag. Climate time-series that do not show a significant time lag were excluded from the ARIMAX model. The predicted influenza cases at time , , from ARIMAX model were determined by = 0 + − + , where − is the climate series at lags and is the predicted influenza cases at time and corresponds to the optimal ARIMA( , , )( , , ) model from the previous step. The coefficients of the ARI-MAX model were estimated as described above. The multivariate model is used to fit series data and to predict future cases. The associated RMSE was calculated to determine the predicted model. R software, version 3.1.1 (the R foundation Rainfall in mm, temperature in ∘ C, and relative humidity in percent. for Statistical Computing, https://www.R-project.org/) was used for the time-series analysis and graphic display, and < 0.05 was considered to be statistically significant.

Climate Patterns.
We found that the southern region experienced small temperature variations and that its lowest temperature value is higher than in the central region. The regions also have different climatic variables ( Figure 2). The peak temperature in both regions occurred in the summer season. The mean climatic parameters are shown in Table 1. Monthly rainfall in the central and southern regions showed slightly different patterns, with the lowest rainfall level occurring at the end of the year in the central region. We found greater variations in the southern region, with peaks occurring from April to November each year. RH max in the central region is low, whereas RH max in the southern region is higher yearround. This pattern is similar to RH min , which shows high value year-round, with the lowest variation in the southern region. RH min and RH max peaks occurred midyear.

ARIMA Model.
The monthly number of suspected influenza cases in the central and southern regions during the period of 2009-2014 is shown in Figure 3. We found that the incidences in the central region were higher than in the southern region and that the case time-series in both regions tentatively decrease over time. We also found that the cases time-series peak once each year. We hypothesized that climate factors are correlated with suspected influenza time-series. We identified the crosscorrelations of climate factors with a time lag of max 4 months [25,26] for both regions, as shown in Tables 2 and 3. We found that the case time-series in the central region was significantly correlated with rainfall variables at lag of 0-4 months; average temperature at lags 3 and 4; maximum relative humidity at lags 0 and 1; minimum relative humidity at lags 0-2; and average relative humidity at lags 0-2. However, in the southern region, we found significant correlations only with average temperature at lags 3 and 4 and maximum relative humidity at lag 1. We also fitted the suspected influenza data with several univariate ARIMA( , , )( , , ) models using different  orders. The best models for each region are shown in Tables 4 and 5 as model 1. We then tested several seasonal ARIMAX models with one or more climate factors at significant lags to find the most appropriate models. In the central region, we found several significant multivariate models, as shown in Table 4. We first tested the ARIMAX model for one climate variable as input series. The results show that model 10 with minimum relative humidity at lag 1 had the smallest AIC and smallest fitted RMSE. However, model 10 also had the highest predicted RMSE. Model 11 with minimum relative humidity at lag 2 had the smallest predicted RMSE. We further screened the ARIMAX models including two or three climate factors for coefficients presenting highest significance. For three climate factors, no model was found to improve the AIC, fitted or predicted RMSE (data not shown). When including two climatic factors, we found that model 16 with the average temperature at lag 4 and minimum relative humidity at lag 1 has the smallest AIC but also shows the highest predicted RMSE. The difference in AIC for models 16 and 17 is about 3%. However, model 17 showed smallest fitted and predicted RMSE and a higher value coefficient for mean and RH max . Therefore, model 17 (ARIMAX(1, 0, 2)(1, 0, 0) 12 with mean at lag 4 and RH min at lag 1) is in our view the most appropriate for use. Figure 4 shows the fit and prediction results and concludes that this model can roughly demonstrate the tendency for future cases.
For the southern region, model 3 with an average temperature at lag 4 had the smallest AIC. This model also had the smallest predicted RMSE. However, this model demonstrated a high fitted RMSE, whereas model 2 had the smallest RMSE for prediction (Table 5). For two climate factors, model 6 incorporates mean at lag 4 and RH min at lag 1 and showed values of AIC and RMSE higher than that of model 3, though it had a lower value. The difference of fitted RMSE for models 3 and 5 was less than 1%, but the difference of predicted RMSE was about 15%. Hence, model 3 (ARIMAX(1, 0, 2)(0, 0, 1) 12 with mean at lag 4) which had the smallest RMSE and smallest AIC was selected for monitoring and prediction. The fitting result is shown in Figure 5

Discussion
Suspected influenza cases (exceeding approximately ten thousand cases every year) have been reported by the Bureau of Epidemiology, Thai Ministry of Public Health, on a monthly basis. This study was conducted to provide information about the seasonality of influenza and the impact of climatic variables in 2 geographical regions in Thailand. In the final multivariate model, we found that the best model for fitted and predicted influenza cases for the central region includes average temperature and minimum relative humidity; in the southern region, the best model includes average temperature only.
In this study, we explained the different correlations between the central and southern regions by the differences in climatic conditions of these two regions. In the southern region, the year is divided into two seasons: the rainy season, which lasts for 9 months, and the summer season lasting for 3 months. In the central region, seasons comprise the rainy, winter, and summer seasons, each lasting for 4 months. People in the southern region are exposed to high and relatively constant humidity all year round. Therefore, one could expect people in the southern region to be less prone to contract the influenza virus due to variations of relative humidity than people in the central region. From cross-correlation analysis, we found that only maximum relative humidity correlated with suspected influenza cases in the southern region, whereas all of the maximum, minimum, and average relative humidity correlated with suspected influenza cases in the central region.  In general, an association between influenza cases and local environmental factors, such as humidity and temperature, were found in temperate zones. Low relative humidity increases the rate of infection in guinea pig models, whereas high relative humidity blocks transmission [12]. Low influenza virus survival occurred at high temperatures, and no transmission was detected at temperatures higher than 30 ∘ C [12,13]. Based on simulated coughs, the results also showed inactivation of the virus at higher relative humidity after coughing [15]. However, the findings of this study contradict those of previous research. In the central and southern regions, the influenza peaks occurred during the rainy season which is characterized also by high average temperatures. We can assume that the climate factors may have less influence on airborne transmission in tropical regions [27]. The timeseries for the influenza cases may be associated with a minimum relative humidity of 38-64% and average temperatures of 24-32 ∘ C for the central and with average temperatures of 26-30 ∘ C for the southern regions, respectively. Our models suggest that relative humidity is associated with influenza transmission in the central region. This finding is consistent with previous studies performed in other tropical countries. For example, a negative correlation between the relative humidity and the influenza incidence rate as obtained by the ARIMA model was found in Brisbane and Singapore during 2000-2007 [27]. Furthermore, the logistic regression model for the influenza transmission in subtropical Guatemala with a relative humidity similar to the central region in Thailand also presented a negative correlation with humidity [28].
A positive correlation with temperature was found in this study. This finding is consistent with the study made in westcentral El Salvador [28]. The average temperatures in both regions of Thailand are also corresponding to those in westcentral El Salvador, which has average temperatures of 25-29 ∘ C. In the study [28], it was suggested that temperature may be a proxy for other factors. In a contact transmission experiment, Lowen et al. found that titers from exposed guinea pigs increase at similar patterns at 30 ∘ C and 20 ∘ C and at both 20% and 80% relative humidity [13]. The amount of virus shed at 30 ∘ C and 20 ∘ C was not markedly different, which indicates that the rate of transmission did not decrease with temperature [13]. These results correspond to the humidrainy conditions in tropical climates [26].
In this work, we found that the amount of rainfall was not significantly correlated with the influenza cases. This association has been suggested in several countries, including Brazil [5], Hong Kong [11], and Senegal [8]. Similarly, some studies (e.g., in Hong Kong [27,29,30] and Singapore [27]) have not observed a correlation between influenza cases and rainfall.
Our study suggests an association between suspected influenza cases and some climatic variables in the central and southern regions of Thailand. Therefore, in countries without advanced influenza surveillance systems, appropriate use of ARIMAX models may facilitate the prediction of influenza transmission at present and in the near future. Due to limitations in available and reliable data of influenza time-series in Thailand, the influenza forecasting accuracy is impaired. However, it is hoped that, with further model refinement, more reliable influenza data, and more suitable environmental data (e.g., higher spatial resolution data), these ARIMAX models may provide a sufficiently accurate reference point for public health officials to prepare for and to respond to influenza epidemics.
Admittedly, our study has some limitations. The ARIMA(X) models could only identify correlations, but not causality between influenza cases and average temperature in the central region and between influenza cases and both temperature and average minimum relative humidity in the southern region of Thailand. Consequently, those correlations may act only as proxies for factors not considered in this study. In this analysis, we used suspected influenza cases as a proxy of influenza activity, although it could not stand as a direct measure of influenza morbidity or mortality. However, the suspected influenza cases are sufficient to determine the timing of influenza activity in the central and southern region. The suspected influenza cases are reported only from public hospitals and some private hospitals. The long sampling period may have caused variations in sampling rate due to the laboratory test limitations and changing government policies. Furthermore, this study did not take into account the effects of vaccination on the correlation between influenza cases and the climatic variables [31]. During 2010-2012, approximately 8.18 million influenza vaccines were administered mostly to people of age ≥65 years, people with chronic diseases, and healthcare personnel/poultry cullers [32]. Also, other social and economic parameters were not regarded, which likewise may have played a role in affecting the correlation between influenza activity and climatic variables. Climatic parameters may also affect social behaviors, such as school closure and the tendency for people to stay indoors [33,34]. Finally, the models did not include vitamin D measurements or solar radiation level data [35,36].