Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.


Introduction
Quantitative forecasting methods are classified into causal and time series models. The causal models are based on the relationship between the variable to be forecasted and independent variables. A linear relationship is typically used in the causal models. Regression, econometric models, and inputoutput models are examples of some of the causal models. The time series models are models that use historical data to estimate the future which can be categorized into linear and nonlinear models. Most of the linear models are statistical models such as exponential smoothing, moving average, and autoregressive integrated moving average (ARIMA). However, the nonlinear models consist of statistical models such as bilinear models, the threshold autoregressive (TAR) models, and autoregressive conditional heteroscedastic (ARCH) as well as nonstatistical models such as artificial neural networks (ANN) and support vector regression (SVR).
The causal models such as regression and econometric models are commonly used in crime rates forecasting. The causal models can describe the causal relationship between the crime variable and other explanatory variables. However, the development of the causal models is quite complex and requires theoretical assumptions about the relationship between the explanatory variables. Therefore, the time series model has been considered as a promising alternative tool for crime rates forecasting. The application of time series models for crime rates forecasting is still scarce. Standard time series models usually require a substantial number of observations. However, insufficient crime data makes the standard time series models less suitable for crime rates forecasting. Therefore, a new model that suits small data set is needed to improve the crime rates forecasting performance.
There are two types of time series models, namely, linear and nonlinear models. The linear models can only model the linear relationship, while the nonlinear models only 2 The Scientific World Journal model the nonlinear relationship. In the literature, there is no single model that can predict well in all conditions. Therefore, many researchers have used a hybridization of linear model with nonlinear model as an approach to time series forecasting [1]. The hybrid linear and nonlinear models are not only capable of modeling the linear and nonlinear relationships, but are also more robust to changes in time series patterns [2]. Artificial neural networks (ANNs) and support vector regression (SVR) are two nonlinear models usually being employed, while ARIMA, seasonal autoregressive integrated moving average (SARIMA), autoregression (AR), exponential smoothing, moving average, and multiple linear regression are usually used to represent linear model in hybridization of linear and nonlinear model. Several examples of hybrid time series models that have been proposed in the literature are ARIMA and ANN [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16], ARIMA and SVR [17][18][19][20][21][22][23], seasonal autoregressive integrated moving average (SARIMA) and SVR [24,25], autoregression (AR) and ANN [26], exponential smoothing and ANN [27], ARIMA and genetic programming (GP) [28], exponential smoothing, ARIMA and ANN [29], and multiple linear regression (MLR) and ANN [30].
A hybridization of ARIMA and ANN models as linear and nonlinear model is extensively studied by researchers since it produces promising results. However, this hybridization requires sufficient data to produce a good model. Furthermore, ANN models suffer from several problems such as the need for controlling numerous parameters, uncertainty in solution (network weights), and the danger of over fitting. Support vector regression (SVR) was proposed by Vapnik [31] in order to overcome the drawback of ANN. SVR is a nonlinear model to solve regression problems and has been used by researchers as an alternative model to ANN [17][18][19][20][21][22][23]. A hybridization of ARIMA and SVR has been successfully applied in time series forecasting such as stock market [17,20], electricity price [22], and power load [18,19]. There are four factors that contributed to the success of SVR which are good generalization, global optimal solution, the ability to handle nonlinear problems, and the sparseness of the solution. This has made SVR a robust model to work with small training data, nonlinear, and high-dimensional problems [32]. Despite the advantages, SVR also has some limitations. For example, SVR model parameters must be set correctly as it can affect the regression accuracy. Inappropriate parameters may lead to overfitting or underfitting [33]. Genetic algorithm (GA) and particle swarm optimization (PSO) are among the approaches that have been used by researchers to estimate the SVR parameters. However, PSO is easier to implement as compared to GA, because it does not require evolution operators such as crossover and mutation [34]. As for linear model, ARIMA is the preferred choice by researchers for hybridization with nonlinear model due to its ability to model various smoothing models such as simple autoregressive (AR), a simple moving average (MA), and a combination of AR and MA (ARMA model) [23]. ARIMA has high forecasting performance when the data set is large and linear. However, it is not robust for small data sets and nonlinear data. Therefore, several improvements have been proposed to improve the performance of ARIMA [35].
Asadi et al. [36] suggested the use of PSO to estimate the parameters of ARIMA model for small data sets.
Most hybrid models use a sequence of linear and nonlinear models, but there are also hybrid models that use a sequence of nonlinear and linear models. It depends on which component is more dominant, either linear or nonlinear. The dominant component needs to be modeled first. However, the linear and nonlinear components in the data can have interaction. It is quite difficult to determine which component is more dominant. Modeling linear patterns using linear model will change the nonlinear patterns, and vice versa. However, in comparison with ANN model, the SVR is able to keep the linear components undamaged [22]. A hybridization of SVR and ARIMA model with a sequence of nonlinear and linear models has been found to outperform the existing neural network approaches, traditional ARIMA models, and other hybrid models, such as ARIMA and ANN (with a sequence of linear and nonlinear models and a sequence of nonlinear and linear models), as well as ARIMA and SVR (with a sequence of linear and nonlinear models) [22]. Therefore, this study uses a sequence of nonlinear and linear models by combining SVR with ARIMA model.
The hybrid linear and nonlinear model has never been used for crime rates forecasting. Most of the models used for crime rates forecasting are linear. As for real-world data, crime rates may have linear and nonlinear components, where the use of linear model may not be adequate to forecast the crime rates. Therefore, this study attempts to propose a hybrid time series model that can work well with limited data that consist of both linear and nonlinear components. SVR is used as nonlinear model and ARIMA is employed as linear model. First, the SVR is used to model the nonlinear component. After that, the remaining from the SVR model, which represents the linear component, is modeled by using ARIMA. In order to overcome the drawbacks of the SVR and ARIMA model, particle swarm optimization (PSO) is used to estimate the parameters of SVR and ARIMA models. PSO has the ability to escape from local optima, easy to implement, and has fewer parameters to be adjusted [34]. There are several factors that influence the property crime rates, among these are economic indicators. Therefore, this study uses three economic indicators, namely, unemployment rate, gross domestic product, and consumer price index as inputs to the proposed hybrid model.
The remainder of this study is organized as follows. Related work on crimes and economic indicators is first discussed in Section 2. In Section 3, brief explanations on the support vector regression, ARIMA, particle swarm optimization, and the proposed hybrid model are described. Section 4 describes the data set and model evaluation employed in this study. The determination parameters of model and the analysis of the results are presented in Sections 5 and 6, respectively. Finally, a brief conclusion is drawn in Section 7.

Related Work on Crimes and Economic Conditions
Economic conditions are often considered to be related to crimes, especially property crimes. In the literature, many The Scientific World Journal 3 studies have been done by researchers in order to relate the economic conditions with property crimes. The unemployment rate is often selected by the researchers in their studies to represent the economic conditions [37]. A study using a country level panel data set from Europe found that unemployment has a positive influence on property crimes [38]. Meanwhile, another study based on UK annual regional data has discovered that unemployment is an important explanatory variable for crimes motivated by economic gain [39]. Results produced by some other studies also found significant relationship between the unemployment and property crimes. Among the findings is that motor vehicle theft is significantly associated with the unemployment rate [40] and is also cointegrated with male youth unemployment [41]. Another finding shows that unemployment has a positive effect on burglary, car theft, and bike theft [42]. Unemployment, especially among youth and young adults, is also found to influence crimes. According to a study on the United States arrest data, unemployment has a positive relationship with theft crimes among youth and young adults (16-24 years) [43]. Another study investigated the relationship between crime with male adult (26-64 years) and youth (16-25 years) unemployment in Britain [44].
The results indicate that youth unemployment and adult unemployment are both significantly and positively related to burglary, theft, fraud, and forgery as well as total crime rates.
In addition to unemployment, other economic indicators such as consumer price index, gross domestic product, and consumer sentiment index were also studied by the researchers to examine the relationship between economic conditions with crimes. Several researchers used the consumer price index to measure the inflation [45,46]. Inflation reduces the purchasing power and increases the cost of living. A study on the impact of inflation rate on crime in the United States using the modified Wald causality test found that the crime rate is co integrated with inflation and unemployment rates [45]. Further, another study which examined the linkages between inflation, unemployment, and crime rates in Malaysia revealed that inflation and unemployment are positively related to the crime rate [46]. Meanwhile, for gross domestic product, a study to explain the relationship between national crime rates with social and economic variables has found that robbery and homicide have significant negative relationship with gross domestic product [47].
As a conclusion, the economic conditions do have an influence on the property crime rates. Therefore, this study attempts to employ the economic conditions to forecast the property crime rates. The economic conditions will be represented by three economic indicators, namely, unemployment rate, consumer price index, and gross domestic product. These economic indicators are used as input to the proposed hybrid model for forecasting property crime rates.

Methodology
In this section, explanations on support vector regression, ARIMA, and particle swarm optimization are summarized as a basis to describe the proposed hybrid model.

Support Vector Regression (SVR)
. Support vector regression (SVR) is a nonlinear model to solve regression problems. SVR training process is similar to solving the linearly constrained quadratic programming problems that provides a unique optimal value and there is no local minimum problem. The solution is sparse, as only essential data are used to solve the regression function. Lagrangian multipliers are introduced to solve the problem. The SVR model is given by formula [48] where z is weight vector, is a bias value, and (x) is a kernel function. SVR used -insensitivity loss function which can be expressed as formula where is the region for -insensitivity. Loss is accounted only if the predicted value falls out of the band area. The SVR model can be constructed to minimize the following quadratic programming problem: min : where = 1, 2, . . . , is the number of training data, ( + * ) is the empirical risk, (1/2)z z is the structure risk preventing overlearning and lack of applied universality, and is the regularization parameter. After selecting proper regularization parameter ( ), width of band area ( ) and kernel function ( ), the optimum of each parameter can be resolved through Lagrange function. The commonly used kernels are linear kernel, polynomial kernel, radial basis function (RBF), or Gaussian kernel and sigmoid kernel. Formulas (4), (5), (6), and (7) are the equation for linear kernel, polynomial kernel, RBF kernel [49], and sigmoid kernel [50], respectively. Consider linear kernel, polynomial kernel, RBF kernel, 4 The Scientific World Journal sigmoid kernel, The type of kernel function influences the parameters of SVR kernel. The kernel function and parameters of SVR kernel function should be set properly because it can affect the regression accuracy. Inappropriate parameters may lead to over-fitting or under-fitting [33]. This study uses the RBF kernel function because it suits most forecasting problems [51]. The RBF kernel is also effective and has fast training process [52]. For the RBF kernel function, there are three important parameters to be determined [53].
(i) Regularization parameter : is parameter for determining the tradeoff cost between minimizing training error and minimizing model complexity.
(ii) Kernel parameter ( ): represents the parameter of the RBF kernel function.
(iii) The tube size of e-insensitive loss function ( ): is the approximation accuracy placed on the training data points.
These parameters must be set correctly, in order to produce accurate estimation model. In this study, these parameters are determined through particle swarm optimization (PSO). The explanation on how PSO is used to estimate the parameters of SVR is given in Section 3.3.1.

Autoregressive Integrated Moving Average (ARIMA).
Autoregressive integrated moving average (ARIMA) model was introduced by Box and Jenkins and has become one of the most popular models in forecasting [17]. The ARIMA model is a stochastic model for time series forecasting where the future value of a variable is a linear function of past observations and random errors, expressed as where is the actual value and is the random error at time , and ( = 1, 2, . . . , ) and ( = 0, 1, 2, . . . , ) are model parameters. Integers, and are referred to as order of the model and random errors, , are assumed to be independently and identically distributed with a mean of zero and a constant variance of 2 [2].
ARIMA model is developed using Box-Jenkins methodology that involves the following three iterative steps [2].
(i) Model Identification. At this step, data transformation should be done if necessary, to produce a stationary time series. Stationary time series is needed because the ARIMA model is developed with the assumption that the time series is stationary. Mean and autocorrelation structure are constant over time for stationary time series. Therefore, for a time series that exhibit trends and heteroscedasticity, differentiation and power transformation are necessary to change the time series to be stationary. Next, autocorrelation (ACF) and partial autocorrelation (PACF) are calculated from the data and compared to theoretical autocorrelation and partial autocorrelation for the various ARIMA models to identify the appropriate model form. The selected model is considered as a tentative model. Steps (ii) and (iii) in turn will determine whether the model is adequate [54].
(ii) Parameter Estimation. Once the tentative model is identified, parameters in ARIMA model can be estimated using the nonlinear least square procedure.
(iii) Diagnostic Checking. The last step in model development is to check whether the model is adequate. Model assumptions about the errors must be met. Several diagnostic statistics and plots of the residual can be done to check the goodness of fit of the tentative model to the historical data. Among plots that can be very useful are histogram, normal probability plot, and time sequence plot. Residual autocorrelations should be small where chi-square test can be used to test the overall model adequacy. However, if the model is considered inadequate, a new tentative model should be identified and steps (ii) and (iii) will be repeated again.
Once a satisfactory model is produced, the three-step development process is no longer repeated and selected model will be used for forecasting purposes. In this study particle swarm optimization (PSO) as suggested by Asadi et al. [36] is used to estimate the parameters of ARIMA model. The explanation on how PSO is used to estimate the parameters of ARIMA model is given in Section 3.3.2.

Particle Swarm Optimization (PSO).
Particle swarm optimization (PSO) is one of stochastic optimization methods introduced by Kennedy and Eberhart [55]. This method is based on the natural evolution process which uses swarming strategies in bird flocking and fish schooling. PSO is a population-based which consists of particles. Initially, the particles are randomly generated. Each particle has a position and velocity, which represents a potential solution to a problem in -dimensional space. The position and velocity of ith particle are denoted by = ( 1 , 2 , . . . , ) and = (V 1 , V 2 , . . . , V ), respectively. While solving the search problem, each particle explores the search space by moving in previous direction, its previous best particle (pbest), and the best solution for the entire population (gbest). The velocity and position of each particle are updated by using the following [56]: where V ( ) is the velocity of th particle at iteration , ( ) is the position of th particle at iteration , = 1, 2, . . . , , is the dimension of the search space, is the˜inertia weight The Scientific World Journal 5 to balance the global and local search abilities of particles, rand1 and rand2 are two uniform random numbers generated independently within the range of [0, 1], 1 and 2 are two learning factors which control the influence of the social and cognitive components, ( ) is the best previous position yielding the best fitness value for th particle at iteration , and is the global best particle by all particles at iteration . After changing the position of the particle, the particle's fitness value is evaluated. The pbest and gbest are updated based on the current position of the particles. As this process is repeated, the whole population evolves towards the optimum solution.
The following are the steps in PSO implementation [57].
Step 1. Initialize the positions and velocities of all the particles randomly in the D-dimensional search space by uniform probability distribution function.
Step 2. Evaluate the fitness values of the particles.
Step 3. Update pbest for each particle; if the current fitness value of the particle is better than its pbest value, set the pbest equal to the current position of the particle.
Step 4. Update gbest; if the current fitness value of the particle is better than the gbest value, then set gbest equal to the current position of the particle.
Step 5. Update the velocity and position of each particle using (9) and (10), respectively.
Step 6. Repeat Steps 2 to 5, until stopping criteria are met, such as a sufficient good fitness value or a maximum number of iterations.
The explanations on how PSO is used to estimate the parameters of SVR and ARIMA models are given in Sections 3.3.1 and 3.3.2, respectively.

PSO for SVR Parameters Estimation (PSOSVR).
Since there are three parameters to be estimated, the th particle is represented by the three-dimensional vectors, where the first, second, and third dimensions of the vectors refer to , , and , respectively. In this study, the fitness is defined by -fold cross-validation, where = 5. In -fold cross-validation, the training data set is divided into subsets of equal size. One subset is used for validation. The regression function is built with a given set of parameters ( , , ) using the remaining − 1 subsets. The performance of the parameter set is measured by the root mean square error (RMSE) on the validation set. Each subset is used once for validation and the process is repeated times. The average of RMSE on the validation set from 5 trials is used as a measure of fitness. The RMSE is defined as where is the number of validation data; is the actual value and̂is the predicted value.

PSO for ARIMA Parameters Estimation (PSOARIMA).
A hybrid model of PSO and ARIMA was proposed by Asadi et al. [36] to estimate the parameters of ARIMA model. This method is efficient for cases where inadequate historical data is available. The implementation of this method involves two main steps. First, an ARIMA model is generated by applying the Box and Jenkins method. Second, the PSO model is used to estimate the ARIMA parameters. The data set is divided into training and testing data set. The training data set is used to estimate the ARIMA model. However, testing data set is used to evaluate the estimation results. In this study, the fitness is defined by sum square error (SSE) as follows: where is the number of training data; is the actual value and̂is the predicted value.

The Proposed
Model. The proposed hybrid model consists of a nonlinear model, SVR, and a linear model, ARIMA. According to Zhang [2], it is reasonable to consider a time series as the composition of a linear autocorrelation structure and a nonlinear component, as where denotes the nonlinear component and denotes the linear component. These two components are estimated from data using the following two steps. First, SVR model is used to model nonlinear components in the data. Second, the residual from the nonlinear model is modeled using linear model, ARIMA. Let denote the residual which is represented by The residual represents linear components that cannot be modeled by SVR model. The SVR and ARIMA parameters are estimated by applying PSO, as described previously. Forecasting results from SVR and ARIMA models will be combined to represent the forecasting results of the proposed hybrid model. The combined forecast is shown by the formula (15)̂=̂+̂.

Data Set and Model Evaluation
This section describes the data set used and the model evaluation carried out in this study.

Model Evaluation.
The performance of the proposed hybrid model is evaluated using the test data set. The forecasting performance of the proposed hybrid model is evaluated using four types of evaluations.
where is the number of test data; is the actual value and is the predicted value. (iv) Hypothesis Test. The hypothesis test is performed to prove that there is no significant means difference between the forecasting values and the actual data. Paired sample -test is used in this study. There are two hypotheses, the null hypothesis ( 0 ) and the alternative hypothesis ( 1 ). Let 1 be the mean of actual data, 2 the mean of forecasting values of the forecasting model, and ( 1 − 2 = ) the difference of means. The hypothesis is that 0 : = 0, The test statistic is shown by formula (21), which is distributed with − 1 degrees of freedom. Consider where is a sample mean difference, is a sample standard deviation of the difference, and is a sample size. The mean of forecasting values is equal to the mean of actual values, if the hypothesis test fails to reject the null hypothesis. It indicates that the model is appropriately used as forecasting model since it represents the real situation.

Determination Parameters of Model
In PSO algorithm, population size is set to 5, maximum number of iterations is set to 50 and the value of 1 , 2 is set

Parameters
Range values C 2 −1 to 2 7 2 −4 to 2 2 0.01 to 0.05 to 1. The inertia weight, , initially is set to 1.4, and its value is decreased along with the iterations according to [58] where Maxiter is the maximum iteration and iter is the current iteration. The range values of SVR parameters ( , , ) used in this study are shown in Table 1. Meanwhile, the searching range for ARIMA parameters is between −100 and 100.
The optimum parameter for SVR model obtained by PSO is ( , , ) = (60.1924, 0.0625, 0.01). SVR model was developed using the optimum parameter which is then used to forecast the property crime rates using the training data and test data. After that, the residual which represents the difference between the actual value and prediction value was calculated. The residual of the training data was used to build the ARIMA model. Based on ACF and PACF, the appropriate model for the residuals is ARIMA (2,0,0), represented aŝ wherêis the forecast for period , −1 and −2 are the residuals at time lags − 1 and − 2, and 0 , 1 , 2 are the coefficients to be estimated. Since the data varies around zero, the coefficient 0 is not required. Therefore, the coefficients to be estimated are 1 and 2 . PSO was applied to estimate the value of the coefficients. The th particle is represented by the two-dimensional vectors, = ( 1 , 2 ) and = (V 1 , V 2 ), where the first and second dimensions of the vectors refer to 1 and 2 , respectively. After using the PSO, the coefficients were estimated and the ARIMA model is shown aŝ Next, the ARIMA model obtained is used to forecast the residual from the SVR model. The forecast results from SVR and ARIMA models are combined, to represent forecast results of the proposed hybrid model. Table 2 and Figure 2 show the comparison of actual values and forecast values of property crime rates for test data set. PSOSVR PSOARIMA, PSOSVR, and PSOSVR ARIMA were found to predict closer to the actual value and have a similar pattern with the actual data. Meanwhile, ARIMA and PSOARIMA show unsatisfactory forecasting performance with the predicted values slightly higher than the actual value. Figure 3 shows box plot of forecast errors for testing data set. Based on the box plot, there is an outlier in the error of 8 The Scientific World Journal  the PSOSVR model, while for ARIMA and PSOARIMA models, respectively, there is an extreme error value. The hybrid models, PSOSVR PSOARIMA and PSOSVR ARIMA, show better forecasting performance than the individual models, with no outliers or extreme error values. The proposed hybrid model was found to have the least forecast errors with a median closer to zero as compared to other forecasting models. Table 3 shows the RMSE, MSE, MAPE, and MAD of the proposed hybrid model in comparison to the individual models, ARIMA, PSOARIMA, PSOSVR, and hybrid model, PSOSVR ARIMA. The linear models, ARIMA and PSOARIMA, show poor performance with a relatively large error values in comparison to nonlinear model, PSOSVR and hybrid models. On the other hand, the nonlinear model, PSOSVR, shows a comparable performance to hybrid models with a slight difference in error values. The proposed hybrid model is found to have smaller errors than the individual models and PSOSVR ARIMA. In addition, PSO is found to be able to improve the performance of the ARIMA model in the hybridization of SVR and ARIMA model. The results have shown that the use of PSO on ARIMA model is capable of improving the performance of the hybrid model PSOSVR ARIMA. Therefore, PSOARIMA is suitable to be applied on small data set. Table 4 shows the results of paired samples -test. Paired samples -test is used to compare the actual data with the   forecasting models. Therefore, the assumption can be made that PSOSVR PSOARIMA is able to give better forecasting performance in comparison to other forecasting models. Based on model evaluation that has been carried out, we can conclude that the individual models, ARIMA, PSOARIMA, and PSOSVR, are not sufficient to model the property crime rates. The hybrid models are more suitable to be employed as forecasting model for property crime rates, where the proposed hybrid model shows the best forecasting performance. These results indicate that there are linear and nonlinear components in property crime rates. Therefore, the use of linear and nonlinear models individually is not sufficient for modeling the property crime rates. However, the optimal parameters are very important for ensuring the accuracy of SVR models. The use of PSO has facilitated the searching process for the optimal parameters of SVR model, thus able to produce more accurate models. Results of this study also demonstrated that the use of PSO in estimating the parameters of ARIMA model was able to improve the accuracy of the ARIMA model for small data sets.

Conclusions
This paper proposes a time series model for crime rates forecasting. The proposed model is a hybrid model that combines the nonlinear model, SVR, with linear model, ARIMA. The proposed model was used to predict the property crime rates. The PSO is used to estimate the parameters of SVR and ARIMA models. Economic indicators are used as inputs to the proposed model. The economic indicators used are gross domestic product, unemployment rate, and consumer price index. The experimental results have found that the proposed model, PSOSVR PSOARIMA, can produce smaller errors as compared to the individual models and hybrid model, PSOSVR ARIMA. In conclusion, it can be concluded that the proposed hybrid model is an acceptable model to be applied in the crime rates forecasting.