Forecasting Time Series Movement Direction with Hybrid Methodology

Forecasting the tendencies of time series is a challenging task which gives better understanding. The purpose of this paper is to present the hybrid model of support vector regression associated with Autoregressive Integrated Moving Average which is formulated by hybrid methodology. The proposed model is more convenient for practical usage. The tendencies modeling of time series for Thailand’s south insurgency is of interest in this research article. The empirical results using the time series of monthly number of deaths, injuries, and incidents for Thailand’s south insurgency indicate that the proposed hybrid model is an effective way to construct an estimated hybrid model which is better than the classical time series model or support vector regression. The best forecast accuracy is performed by using mean square error.


Introduction
Time series modeling and forecasting are a challenge for describing dynamic phenomena and pattern behavior of the time series.In recent years, the issue of accurate Thailand's south insurgency trends has been receiving more attention.There are many research papers that studied the unrest in southern Thailand.According to the database of Deep South Watch [1], Jitpiromsri and Mccargo [2] and Jitpiromsri [3] reported the trends of Thailand's south insurgency using diagram for comparing the monthly number of the unrest incidents.By applying a polynomial least-square regression, they provided the forecasting model for describing the unrest incidents in the south of Thailand.This polynomial is not indeed fitting the monthly number of the unrest incidents as well.
In this study, we would like to identify patterns and trends of Thailand's south insurgency and to evaluate the accuracy of model for modeling and forecasting.By doing this, we use the traditional regression models such as Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and Autoregressive Integrated Moving Average (ARIMA).These models are also called the Box-Jenkins models.
In general, time series data of Thailand's south insurgency can be categorized as nonstationary time by using Box-Jenkins methodology.Then an estimated model of time series data of Thailand's south insurgency can be obtained by support vector regression (SVR).We aim to combine ARIMA and SVR for making an adequately estimated model in order to forecast time series of Thailand's south insurgency.
This paper is organized as follows.Section 2 provides some backgrounds of mathematical theories related to time series modeling and forecasting and SVR.The detail of proposed hybrid model is explained in Section 3. Section 4 gives experimental results obtaining the proposed hybrid model with the first difference in time series of Thailand's south insurgency.Finally, the main conclusions are summarized in Section 5.

Autoregressive Integrated Moving Average Modeling.
Three basic methods for forecasting time series are naïve model, exponential smoothing model, and ARIMA model.The first two models relate to a random walk as the formulation of the model.In this section, ARIMA model will be reviewed.
The Moving Average model of order  abbreviated as MA() model is where   is stationary,  1 , . . .,   are constants ( ̸ = 0), and   is a Gaussian white noise series with mean zero.MA() model of (2) explains the current value   by a linear combination of the  white noise  −1 ,  −2 , . . .,  − .
Autoregressive Moving Average model abbreviated as ARMA(, ) model developed by Box and Jenkins [4] is defined by the combined autoregressive and the Moving Average model.It has the form According to the original Box-Jenkins methodology, an integrated process is the stationary process obtained by differenced a nonstationary process.The stationary ARMA(,) process after being differenced  times is denoted by ARIMA(, , ): where Δ  denoted th difference time series [5].These models are as foundation model for time series forecasting.

Box-Jenkins Methodology.
Plots of autocorrelation function (acf) and partial autocorrelation function (pacf) are the main tools in order to identify parameters for AR, MA, ARMA, and ARIMA models.AR() is used to obtain an estimated model for time series when the acf exhibits tendency to die down quickly, either by an exponential decay or by a damped sine wave whereas the pacf exhibits tendency to show spike (significant autocorrelation) for lags up to  and then will die down immediately.
Opposite to AR(), MA() is used to obtain an estimated model of time series when the acf exhibits tendency to die down quickly, either by an exponential decay or by a damped sine wave whereas the pacf exhibits tendency to show spike (significant autocorrelation) for lags up to  and then will die down immediately.
A mixed process ARMA(, ) is suggested when either the acf or the pacf exits tend to show spike for lags up, respectively, to  and  and then die down quickly, either by an exponential decay or by a damped sine wave.Proceeding diagnostic checking to identify  and  for the mixed process ARMA(, ) which is able to fit to times series is the best performance [6].
This identification as described in this section will be important to diagnose a model of our study.

Hybrid Models.
In recent years, the forecasting model used in the literature can be classified into three categories: statistical models, artificial intelligence model (AI), and hybrid model.
Statistical models are known as time series models including naïve model, AR model, MA model, ARIMA model, exponential smoothing, and generalized autoregressive conditional heteroskedasticity (GARCH) volatility which aim to utilize time series analysis to identify the pattern of time series and provide the future value based on the obtained pattern.
ARIMA model is known as Box-Jenkins model [4] which includes AR and MA models identified by Box-Jenkins methodology.These models are based on the assumption that the time series under study are stationary and linear which means that the relationship between the input and output series is linear.
AI models are the second kinds of forecast time series, practically artificial neural networks (ANNs), genetic algorithm (GA), and supported vector machine (SVM).AI models can capture nonlinear pattern and improved forecast performance.
Many of the literatures introduce a hybrid model in order to capture the linear and nonlinear characteristics in time series.Wang et al. [7] reported that using a statistical model alone or using an AI model alone are not adequate in making forecasts for stock price time series.

Hybrid Methodology.
A hybrid model is described by a combination of models with mixed methodology for formulation.Many literatures suggested that time series consists of linear   and nonlinear   as in the form An estimated model of ( 5) is formulated as follows: using linear statistic model to obtain an estimated model of linear component   denoted by L and after that modeling the residual   − L which contains only the nonlinear relationship to obtain an estimated model of nonlinear component   denoted by N .
Zhang [8] utilized the hybrid model by introducing the estimated model of ( 5) in the form ŷ = L + N , where L is prescribed by ARIMA model and N is prescribed by feedforward neural networks model.Modified Zhang's hybrid approach with estimated N by support vector machine (SVM) model can be found in many literatures, for example, De Oliveira and Ludermir [9], while Aladag et al. [10] estimated N by Elman's recurrent neural networks (ERNN) model and applied to Canadian Lynx data.

Supported Vector Regression.
Let the dot product pace R  be our data universe with vectors x ∈ R  as objects.Let  be a sample set such that  ⊂ R  .Let  : R  → R be the target function.Let  = {(x, ) | x ∈  and  = (x)} be the training set.
The regression problem is to find the best approximate model f : R  →  for the true underlying function  mapping input x to output  by using  such that f(x) ≅ (x).
The regression problem is classified as linear or nonlinear type.For the linear regression model, the best approximate model f can be obtained from the set of possible functions with the following set of specifications: where  is a weight vector and  is a constant.Generally, in order to describe nonlinear relationship between input and output, the SVR allied Φ : R  → F transform the nonlinear regression problem in the lower dimension input space R  into a linear regression problem in a high dimension feature space F. In the new space F, a linear model f is formulated, which represents a nonlinear model in the original space: where ⟨⋅, ⋅⟩ denotes the dot product in F. Linear SVR model f in ( 6) is obtained from (7) by using the identity function Φ(x) → x.
Performing SVR to fit linear regression f to the training data by estimate  and  in (7) as minimization of the following regularized function: where both  and  are user-given parameters and The following two propositions related to the formulation of an estimated model.These propositions are modified from [11,12] for our study.
The constant  is called the penalty constant which is trade-off between margin maximization and the minimization of the slack variables.

Proposition 2. Given a regression training set
and  * are the parameters solved by the following dual quadratic optimization problem: The parameter  * is obtained by  *  and α *  which satisfied optimization (11)

Lemma 3. The optimal regression model is
where the coefficient ( *  − α *  ) is nonzero as support vector.The optimal regression model f * (x) depends only on the support vectors.

Formulation of the Proposed Model
In this section, we want to formulate the proposed model.We begin by using the hybrid models that combine several models in order to reduce the risk of using an inappropriate model, obtain the results that are more accurate than the previous one, and improve overall forecasting performance.
Assume that (  ) is the under-study time series based on the assumption of linear and stationary time series.Then, we use the Box The under-study time series (  ) is initially modeled by the proposed hybrid model as follows: where   is residuals of the time series model in the time  that is as obtained from (13), where    * and     are residuals of the under-study time series model in the time  of the estimated model of ỹ  * , respectively, to     .

Application of the Proposed Hybrid Model to Thailand's South Insurgency Movement Direction Forecasting
4.1.Data Set.In this research, we are interested in studying the unrest in the four southern provinces of Thailand, particularly in Pattani, Yala, Narathiwat, and parts of Songkla.We consider the monthly number of deaths, injuries, and incidents in these provinces.At the time of working research, we can get the latest data from Deep South Watch (DSW) [1] and Deep South Coordination Center (DSCC) [13].By using the proposed hybrid model, our aim is to formulate an estimate model for the trend of the number of deaths, injuries, and incidents in these regions.The data series of our study consists of 40 months of deaths, injuries, and incidents in the four southern provinces of Thailand from September 2012 to December 2015.
Figure 2 presents three graphs describing three data series of monthly number for deaths, injuries, and incidents.It shows that the graph of deaths is in the bottom for all periods of time, while the graph of injuries is in the middle between the graphs of deaths and incidents in almost all periods of time.Moreover, the graph of incidents is in the top in almost all periods of time.From Figure 2, we can see that the number of incidents is not necessary to be equal to the sum of numbers of deaths and injuries.Sometimes, there is an unrest incident; no deaths or no injures occurs.Or there are high numbers of deaths and injuries in some incidents.
Monthly numbers of injures and incidents are apparently stationary.A candidate model for monthly number of two data series can be determined by plotting of acf and pacf.However, the monthly number of deaths exhibits a linear trend in the mean since it has a clear downward slope.
Figure 3 shows comparing of monthly number of deaths plotted against its first differenced series for monthly number of deaths (a) and plotting of acf (b) and pacf (c).The data series of injures plotted against its first differenced series is shown in Figure 4 and the data series of incidents plotted against its first differenced series is shown in Figure 5.
Plotting of the first differenced series (Figures 3, 4, and 5) shows that it looks like a stationary process, although plotting acf and pacf of series of deaths, injuries, and incidents cannot clearly identify parameter for constructing an estimated model formulated by the ARIMA model.
The acf for the first difference in monthly number of deaths tends to die down quickly whereas the pacf tends to show spike for lags up to 1 which ignores significant spikes in each plot when it is outside the limits.This suggests that the first difference in monthly number of deaths can be a model as an AR (1).
Similarly, the first differenced series of injures and incidents can be a model as an AR(1).After checking of residual in diagnosis stage, this indicates that ARMA(2, 3) is a candidate model for formulating an estimate model for the first difference in monthly number of deaths and injuries.MA( 1) is also a candidate model for the first difference in monthly number of incidents.With notation of ARIMA(, , ), ARIMA(2, 1, 3) is an estimated model for monthly number of deaths and injuries and ARIMA(0, 1, 1) for monthly number of incidents.
Table 1 reports mean square error (mse) of three estimated models for monthly number of deaths, injuries, and incidents formulated by ARIMA, SVR, and hybrid.The mean square error of the formal model is calculated by choosing the best trajectory: 1 × 10 6 trajectories simulated by ARIMA for each series.
Plotting a convergent of mean square error is calculated from monthly number and an estimated model with 2,500,  5,000, . . ., 1 × 10 6 trajectories for monthly number of deaths, injuries, and incidents illustrated in Figure 6.
Setting  = 0.0025,  = 150000,  = 3.25, and  = 2.75 for SVR model and using ARIMA(2, 1, 3) model in order to select from the best trajectory from 1 × 10 6 trial trajectories, then both models are combined in order to formulate an estimated model for monthly number of deaths: ỹ = ỹ  + ỹ  , where  Predictive performance of SVR-ARIMA(2, 1, 3) hybrid model for monthly number of deaths and injuries, respectively, is shown in Figure 7.
In the same way, for monthly number of injures, setting  = 0.025,  = 350000,  = 2.755, and  = 0.00125 for SVR model and using ARIMA(2, 1, 3) model in order to select from the best trajectory from 1 × 10 6 trial trajectories, both models are combined in order to formulate an estimated model for monthly number of injuries: ỹ = ỹ  + ỹ  , where Predictive performance of SVR-ARIMA(0, 1, 1) hybrid model for monthly number of incidents is shown in Figure 9.

Conclusions
The hybrid SVR-ARIMA model has been investigated to formulate time series model of monthly number of Thailand's south insurgency in this study.In particular, we consider the first difference in monthly number of deaths, injuries, and incidents in Pattani, Yala, Narathiwat, and Songkla provinces in 40 months from September 2012 to December 2015.According to the hybrid methodology, the SVR-ARIMA(, , ) model is obtained by combining ARIMA(, , ) and SVR model.Plotting of autocorrelation and partial autocorrelation indicates that the first difference in monthly number of deaths, injuries, and incidents is linear and stationary.
The test results of the estimated model are obtained from the proposed hybrid model and compared with the estimated model of the AR(), MA(), ARIMA(, , ), and SVM models.This presents the fact that the proposed hybrid model performs better than the remaining models.For time series of Thailand's south insurgency, SVR-ARIMA(2, 1, 3) is the estimated model for monthly number of deaths and injuries and SVR-ARIMA(0, 1, 1) is the estimated model for monthly number of incidents.In particular, SVR-ARIMA(2, 1, 3) consists of two components: the first component uses the SVR model in order to formulate the estimated model for historical data and the second component uses the ARIMA model in order to formulate the estimated model for the unseen value in the short future.

Figure 1 :Figure 2 :
Figure 1: Number of unrest incidents in the four southern provinces of Thailand (Pattani, Yala, Narathiwat, and Songkla) from 2005 to 2015.

Figure 1
illustrates a diagram of the number of unrest incidents in the four southern provinces from 2005 to 2015.This diagram presents a high frequency of the number of unrest incidents with a small fluctuation in the first period (2005 to 2008), a decreasing frequency of the number of unrest incidents in the middle period (2009 to 2012), and an increasing frequency of the number of unrest incidents from 2013 to 2014, the lowest frequency in 2015.

Figure 3 :
Figure 3: (a) Monthly number of deaths is plotted against its first differenced series, acf (b) and pacf (c) plots for the first difference in monthly number of deaths.

Figure 4 :
Figure 4: (a) Monthly number of injuries is plotted against its first differenced series, acf (b) and pacf (c) plots for the first difference in monthly number of injuries.

Figure 5 :
Figure 5: (a) Monthly number of incidents is plotted against its first differenced series, acf (b) and pacf (c) plots for its first difference in monthly number of incidents.

Figure 7 :Figure 8 :
Figure 7: The actual, fitted, and forecasted series by hybrid model for series of deaths.

Figure 9 :
Figure 9: The actual, fitted, and forecasted series by hybrid model for number of incidents.