Financial Time Series Forecasting Using Directed-Weighted Chunking SVMs

Support vector machines (SVMs) are a promising alternative to traditional regression estimation approaches. But, when dealing with massive-scale data set, there exist many problems, such as the long training time and excessive demand of memory space. So, the SVMs algorithm is not suitable to deal with financial time series data. In order to solve these problems, directed-weighted chunking SVMs algorithm is proposed. In this algorithm, the whole training data set is split into several chunks, and then the support vectors are obtained on each subset. Furthermore, the weighted support vector regressions are calculated to obtain the forecast model on the new working data set. Our directed-weighted chunking algorithm provides a newmethod of support vectors decomposing and combining according to the importance of chunks, which can improve the operation speed without reducing prediction accuracy. Finally, IBM stock daily close prices data are used to verify the validity of the proposed algorithm.


Introduction
Financial time series forecasting is an important aspect of financial decisions.Financial practitioners and academic researchers have proposed a lot of methods and techniques to improve the accuracy of predictions.Because it is inherently noisy, nonstationary, and deterministically chaotic [1], financial time series forecasting is regarded as one of the most challenging applications of modern time series forecasting.
Among these artificial intelligence algorithms, SVMs are an elegant tool for solving pattern recognition and regression problems.According to the research of Vapnik [11], SVMs implement the structural risk minimization principle which seeks to minimize an upper bound of the generalization error rather than minimize the training error.The regression model of SVMs, called support vector regression (SVR), has also been receiving increasing attention to solve linear or nonlinear estimation problems.For instance, Tay and Cao [9] studied the five real future contracts in Chicago Mercantile Market, Cao and Tay [12] studied the S&P 500 daily price index, and Kim [13] studied the daily Korea composite stock price index (KOSPI).Based on the criteria of normalized mean square error (NMSE), mean absolute error (MAE), directional symmetry (DS), and weighted directional symmetry (WDS), the above researches indicate that the performance of SVMs is better than ARMA, GARCH (ARCH), and ANN.
According to the statistical learning theory (SLT), support vector machine regression is a convex quadratic programming (QP) optimization with linear constraint.However, there is an obvious disadvantage of SVMs that the training time scales somewhere between quadratic and cubic depend on the number of training samples.In order to deal with massive data set and improve the training speed, many improved support vector machine methods are proposed.One way is to combine SVMs with some other methods, such as active learning [14,15], multitask learning [16,17], multiview learning [18], and semisupervised learning [19,20].Another way is to develop some optimization techniques of training algorithms in SVMs, such as sequential updating methods, like kernel-Adatron algorithm [21], successive over relaxation algorithm [22], working set methods, like chunking SVMs [23,24], reduced support vector machine (RSVM) [25], and sequential minimal optimization algorithm (SMO) [26].
Chunking SVMs, reduced support vector machine (RSVM), and SVMs with sequential minimal optimization (SMO) are outstanding methods in dealing with massive data set.For example, Lee and Mangasarian [25] designed a reduced support vector machine algorithm (RSVM), which can greatly reduce the size of the quadratic program to be solved by reducing the data set volume, so the memory usage is much smaller than that of a conventional SVM using the entire data set.Osuna et al. [27] designed a decomposition algorithm that is guaranteed to solve the QP problem and that does not make assumptions on the expected number of support vectors.Platt [26] put forward a sequential minimal optimization algorithm (SMO), which breaks the large QP problem into a series of smallest possible QP problems which are analytically solvable in order to speed up the training time.Tay and Cao [28] proposed to combine support vector machines (SVMs) with self-organizing feature map (SOM) for financial time series forecasting, where SOM is used as a clustering algorithm to partition the whole input space into several disjoint regions.Tay and Cao [29] also put forward C-ascending support vector machines to amend the -insensitive errors in the nonstationary financial time series.
Most of the improved support vector machine methods do well in memory requirement and CPU time, but the forecast accuracy has declined more or less.For financial time series prediction, the typical massive-scale data, we should pay more attention to the prediction accuracy while reducing the computational complexity.In this paper, we proposed a directed-weighted chunking SVMs algorithm, which can improve the operation speed without reducing prediction accuracy.
This paper consists of five sections.Section 1 introduces the basic algorithm of SVM.Section 2 proposes a directedweighted chunking SVMs algorithm.Section 3 designs a series of experiments, and empirical results are summarized and discussed.Section 4 presents the conclusions and limitations of this study.

SVMs Regression Theory
In this section, we will briefly introduce the support vectors regression (SVR) theory.Suppose there are a given set of data points  = {(  ,   )}   (  is the input vector;   is the desired value).SVMs approximate the function in the following form: where {()}  =1 are the features of inputs and {  }  =1 ,  are coefficients.
According to the structural risk minimization principle, the support vectors regression problem can be expressed as Minimize: where   is mapped to a higher dimensional space by the function .  and   * are slack variables (  is the upper training error;  *  is the lower), which are subject to the insensitive tube |  − ((  ) + )| ≤ .The parameters which control the regression quality are the cost of error , the width of the tube , and the mapping function .
Thus, (1) becomes the following explicit form: Then, we can obtain the following form by maximizing the dual form of function (3): with the following constraints: where the   ,  *  are the Lagrange multipliers introduced and (,   ) is named the kernel function.The value is equal to the inner product of two vectors   and   in the feature spaces (  ) and (  ).There are many choices of the kernel function: common examples are the polynomial kernel Training SVMs is equivalent to optimizing the Lagrange multipliers   ,  *  with constraints based on (4).Good fitting function can be obtained by choosing appropriate functional space.
There are many different researches using the straightforward approaches to construct and implement SVMs for financial time series analysis (see, e.g., [9,13,30]); other methods like chunking SVMs, reduced support vector machine (RSVM), and SVMs with sequential minimal optimization (SMO) are used to deal with massive-scale data sets.Among these methods, SVM chunking provides an alternative method to running a typical SVM on a data set by breaking up the training data and running the SVM on smaller chunks of data.In previous literature, many decomposition and combination methods are proposed.
We would like to mention that, for financial time series prediction problems, the data set from different periods will have different effects on current forecasts, and the change of direction of past stock price will also affect the current forecasts.Therefore, we proposed directed-weighted chunking SVMs, which can improve the operation speed without reducing prediction accuracy.In training stages of directed-weighted chunking SVMs, the whole training set is decomposed into several chunks, and support vectors are calculated, respectively, in their working subset.In the prediction stage, all these support vectors are combined into new working data set to get the model in accordance with their importance as illustrated in Figure 1, and the progress of directed-weighted chunking SVMs algorithm can be described as follows.
Step 2. Calculate the support vector regression SVG  for each subset   .
Step 3. Calculate the weight and direction of each subset.
Step 4. Combine all support vectors  SVM  for each subset into new working data set  SVM .
Step 5. Calculate the weighted support vector regression on the new working data set  SVM , and get the model.

Directed-Weighted Chunking SVMs.
In the time series forecasting, such as stock market, the effect of past stock prices on the future stock prices will be different.Usually, the more recent the period of time, the greater the weight coefficient.According to a certain time interval, the total time series of stock prices will be divided into different chunks, and support vectors in each chunk are calculated, respectively.Furthermore, the weighted support vector regressions are calculated to obtain the forecast model.
The stock market is one of the complex systems [31,32].We can treat chunks as the nodes and relationship between chunks as edges.Thus, these nodes and edges will be a complex network.The entire time series of stock price can be regarded as a directed-weighted network with a large number of nodes and edges.The mutual influence between chunks has its direction.For example, in chunks G1 and G2, if there exists a nonzero correlation coefficient from chunk G1 to G2, we will draw a directed edge between G1 and G2.Because the strength of mutual influence between chunks is different, the edge weight, which is called correlation intensity, is also different.So, simply chunking SVMs cannot reflect the influence of respective chunk on final model.Here we will introduce directed-weighted chunking algorithm into SVMs.
In traditional SVM regression optimization problem, the parameter  * is a constant.In order to show the different influence on the prediction results, we introduce a function () and modify the SVMs regression function as follows: In (6), (  , Δ) can be understood as the weight of each chunk, and the values of (  , Δ) can be positive or negative.The positive or negative values of (  , Δ) will change parameter  * in SVMs, which is very similar to that of the positive or negative information impact on the future stock price.According to the definition of network correlation coefficient introduced by Bonanno et al. [33], we can define the correlation intensity (  , Δ) (influence of chunk   by the time interval Δ) as follows: where ⟨⟩ is a temporal average always performed over the investigated time period;  represents a time; Δ represents a time interval; (  , ) is the stock returns in a time interval Δ, and (  , ) = ln[(  , )]−ln[(  , −Δ)] is the logarithmic difference of current stock price and the stock price before the time interval Δ.
According to the data point and time interval Δ, original data can be decomposed into several chunks.The boundary of (  , Δ) is −1 ≤ (  , Δ) ≤ 1.If the (  , Δ) is positive, the influence of chunk   on the time interval Δ is positive, and vice versa.We can calculate all the correlation intensity (  , Δ) according to (7) and get a  ×  matrix of Now, the solution of this equation is the overall optimal solution of the original optimization problem.

Experiments
In order to make a fair and thorough comparative between directed-weighted chunking SVMs and ordinary SVMs, the IBM stock daily close prices are selected as shown in Figure 2. The data points cover the time period from December 31, 1999, up to December 31, 2013, which has 21132 data points.
Data points starting from December 31, 1999, up to December 31, 2007 (12075 data points), are used for training, and data points starting from January 1, 2008, up to the December 31, 2013 (9057 data points), are used for testing.Now we decomposed the training data into 1208 chunks by the time intervals of 10 days and calculated the correlation intensity (  , Δ) according to the function (7).Finally, we obtain a 1208 × 1208 matrix of correlation intensity.

Forecast Accuracy Assessment of Directed-Weighted Chunk
SVMs.The prediction performance is evaluated by using the following statistical metrics, namely, the normalized mean squared error (NMSE), the mean absolute error (MAE), and the directional symmetry (DS).These criteria are calculated as (9).NMSE and MAE are the measures of the deviation between the actual and predicted values.The values of NMSE and MAE denote the accuracy of prediction.A detailed description of performance metrics in financial forecasting can be referred to in Abecasis [34]: where, The program of directed-weighted chunking SVMs algorithm is developed using R language.In this paper, the Gaussian function is used as the kernel function of the SVMs.The experiments show that a width value of the Gaussian function of 0.02 is found to produce the best possible results. and  are arbitrarily chosen to be 8.5 and 10 −3 , respectively.
We calculate the SVMs on the training data sets (from December 31, 1999, up to December 31, 2007) and then obtain the trained model.Finally, we obtain the predicted result by applying the trained model on the test data set (from January 1, 2008, up to December 31, 2013).In order to compare the differences of various algorithms, real value, predicted value in ordinary SVMs, and the predicted value in directedweighted chunking SVMs are plotted in Figure 3.
In Figure 3, we can clearly see that these two forecasting methods are very precise, but it is hard to tell which one is more excellent.So we calculated the performance criteria, respectively, as shown in Table 1.By comparing these data, we find that the NMSE, MAE, and DS of directed-weighted chunking SVMs are 0.3760, 0.1325, and 38.29 on the training set and 1.0121, 0.2846, and 43.78 on the test set.It is evident that these values are much smaller than those of ordinary SVMs, which indicates that there is a smaller deviation between the actual values and predicted values with directedweighted chunking SVMs.

Calculation Performance of Directed-Weighted Chunk
SVMs.As is well known, the performance of SVM depends on the parameters.But, it is difficult to choose suitable parameters for different problems.Chunk algorithm reused the Hessian matrix elements from one to the next, which can improve the performance sharply.The calculation performance of all algorithms is measured on an unloaded AMD E-350 1.6 GHz processor running Windows 7 and R 3.0.1.The same experiment will be done on the data set of IBM stock daily close prices.The results of the experiments are shown in Table 2.
The primary purpose of these experiments is to examine differences of training times between two methods.An overall comparison of the SVM methods can be found in Table 2. Compared to the traditional SVMs, directed-weighted chunk SVMs can improve the accuracy and decrease run times sharply.Additionally, the directed-weighted chunk SVMs method allows users to add machines to make the algorithm training even faster.According to NMSE criteria, we get the minimum NMSE value 0.3173 on the point of chunks 2350.That means that the best number of chunks is 2350.Under this chunking number, we can get the best performance of prediction.

Analysis of Optimal Number of Chunks in
From Figure 4, when the chunks number is increased, NMSE value is declined rapidly.But, when the decrease reaches a certain value, NMSE value will increase conversely.However, this upward trend is not very large, which indicates that the directed-weighted chunking SVM is not a fundamental transform of SVM, but a limited improvement.But from the perspective of processing massive-scale data, this improvement is very important.

Conclusions
In this paper, we proposed a new chunks algorithm in SVMs regression, which combined the support vectors according to their importance.The proposed algorithm can improve the computational speed without reducing prediction accuracy.
In our directed-weighted chunking SVMs, Δ, the criteria for the chunking, is a constant, but, in practice, Δ can be variable or some form of function.In addition, further studies on the different kernel functions and more suitable parameters and  can be done in order to improve the performance of directed-weighted chunking SVMs.

Figure 3 :
Figure 3: The predicted value of IBM stock daily close prices in test data set (from January 01, 2008, up to December 31, 2013).

Figure 4 :
Figure 4: NMSE value in different chunks (IBM stock daily close prices data).
3.1.Chunking Model in Support Vector Regression.Training SVMs is equivalent to solving a linearly constrained QP problem.Training SVMs depends on QP optimization techniques.Standard QP techniques cannot be directly applied to SVMs problems with massive-scale data set.

Table 1 :
Results of accuracy performance criteria.

Table 2 :
Performance comparison between the traditional SVMs and directed-weighted chunk SVMs.