Stock index prediction is considered as a difficult task in the past decade. In order to predict stock index accurately, this paper proposes a novel prediction method based on Ssystem model. Restricted gene expression programming (RGEP) is proposed to encode and optimize the structure of the Ssystem. A hybrid intelligent algorithm based on brain storm optimization (BSO) and particle swarm optimization (PSO) is proposed to optimize the parameters of the Ssystem model. Five real stock market prices such as Dow Jones Index, Hang Seng Index, NASDAQ Index, Shanghai Stock Exchange Composite Index, and SZSE Component Index are collected to validate the performance of our proposed method. Experiment results reveal that our method could perform better than deep recurrent neural network (DRNN), flexible neural tree (FNT), radial basis function (RBF), backpropagation (BP) neural network, and ARIMA for 1weekahead and 1monthahead stock prediction problems. And our proposed hybrid intelligent algorithm has faster convergence than PSO and BSO.
Stock market plays a leading and crucial role in the market mechanism, which connects the savers and investors [
Many machine learning (ML) methods containing statistical models, artificial neural networks, and hybrid prediction models have been proposed to model and predict the stock index. As a classical statistical model, the ARIMA model has proposed to predict the New York Stock Exchange (NYSE) and Nigeria Stock Exchange (NSE), and the results revealed that the ARIMA model performed better for shortterm prediction [
In the past decades, many ANN models have been employed for solving real problems, especially stock market prices forecasting [
However, the existing methods mainly trained the black box with the training sample. The model could change its internal structure and parameters to make it approximate to the training sample. The gained model could not display the distinct inputoutput relationship and deeply understand the internal mechanisms of realworld problems. And, in most of these methods, all variables are input into the models, which easily lead to overfitting problem. Recently the methods based on mathematical formulations have been proposed to predict time series, which could clearly indicate the mathematical relationship between the input data and output data. Zuo et al. proposed that gene expression programming (GEP) was utilized to identify differential equation for time series prediction [
As a classical nonlinear differential equation, the Ssystem model has been proposed to predict time series and identify genetic networks. Zhang and Yang proposed a restricted additive tree (RAT) to represent the Ssystem model for stock market index forecasting [
Dow Jones Index, Hang Seng Index, and NASDAQ Index are old and famous stock indexes in the world, which are usually utilized to reflect the development of the global economy. Shanghai Stock Exchange Composite Index and SZSE Component Index represent the general trend of China’s stock market and economic development. These five stock indexes have been considered as the standard datasets to evaluate the performance of stock prediction models [
Let stock time series data to be
Data structure.
The Ssystem model has a complex and powerful structure, which captures the dynamic nature of the real system, and achieves a good performance in the terms of precision and flexibility [
Brain storm optimization (BSO) algorithm is a new swarm intelligence optimization algorithm, which was proposed by Shi in the year 2011 [
Initialize the population and generate
The
Select randomly the central individual of a class and mutate it with a random disturbance.
Update the individual with the following four methods.
Select randomly a class (the probability is proportional to the number of individuals in each class). A new individual (
Randomly select a class and an individual in the selected class. A new individual is created with the selected individual and Gaussian value by equations (
Select randomly two classes, and two central individuals from the two classes are utilized as the candidate individuals
After merging the candidate individuals, the individual is updated according to the formula (
Two candidate individuals
After the new individual is generated, its fitness value is calculated. Compared with the fitness values of the candidate individuals, the individuals with the better fitness values are selected to the next generation. When
When the maximum iteration number is reached, algorithm stops; otherwise, go to step (2).
The particle swarm optimization (PSO) algorithm is a classical swarm intelligent method [
At each step, a new velocity for the particle
With the updated velocities, each particle changes its position according to the following equation:
The restricted gene expression programming (RGEP) as the improved version of GEP was proposed to identify the Ssystem model for gene regulatory network (GRN) inference [
Initialize the population. One example of chromosome in population is depicted in Figure
The phenotype of chromosome in RGEP with parameters.
In order to make the chromosome similar to the Ssystem, each gene is allocated the corresponding parameters. For gene 1,
According to the given fitness function, evaluate the population with the training samples. In this process, the Ssystem model is solved by the fourthorder Runge–Kutta method [
If the optimal solution appears, RGEP is terminated; otherwise, turn to (4).
Selection, recombination, and mutation are used for reproduction of each chromosome, which are introduced in Reference [
The expression tree of chromosome in RGEP with parameters.
In the initial stage of structural optimization, the symbols of the chromosome in RGEP are randomly selected, including function symbols and variable symbols. With training data, reproduction operators are used to optimize and change the chromosomal symbols in the optimization process. The optimized Ssystem structure does not contain all the input variables. According to the training data, RGEP could automatically select the appropriate input variables. In Figure
The BSO algorithm is suitable for solving the problem of multipeak and highdimensional function. The PSO algorithm has the advantages of easy realization, high accuracy, and fast convergence. But these two methods are easy to converge prematurely and fall into local optimum. In order to improve the diversity of population, a novel hybrid intelligent algorithm based on BSO and PSO (BSOPSO) is proposed. In the BSOPSO algorithm, the half of individuals are selected randomly and optimized by BSO. And the other individuals are optimized by PSO. The flowchart is described in Figure
The flowchart of BSOPSO algorithm.
The flowchart of time series forecasting using the Ssystem model is described in Figure
The flowchart of time series data forecasting using Ssystem.
Initialize the Ssystem population with the structure and parameters. Each Ssystem is encoded as the RGEP chromosome, which is described in Figure
With the training samples, the Ssystem is solved by equation (
Selection, recombination, and mutation are used to search the optimal structure of the Ssystem. Go to step (2).
At some iterations in RGEP, BSOPSO algorithm is used to optimize the parameters of RGEP chromosomes. In this process, the structure of the Ssystem model is fixed. According to the structure of the model, the number of parameters (
With the data at the previous time point, the optimal Ssystem model obtained in the training phase is solved and the data at the current time point are predicted. Repeat this procedure until that the data at all testing time points have been predicted. According to the predicted data and target data, the predicted error is calculated.
Five stock indexes such as Dow Jones Index (DJI), Hang Seng Index (HSI), NASDAQ Index (NASI), SSE (Shanghai Stock Exchange) Composite Index (SSEI), and SZSE Component Index (SZSEI) are proposed to test the performance of our method. Seventy percent of the data are used for training, and 30% of the data are used for testing. The descriptions of five stock indexes are listed in Table
Parameters of five stock indexes.
Parameters  DJI  HSI  NASI  SSEI  SZSEI 

Time interval  1/2/1990–12/29/2017  1/2/1991–12/29/2017  1/2/1990–12/29/2017  1/1/1996–12/29/2017  1/2/2008–12/30/2016 
Train data for weekahead prediction  4936  4666  4936  3866  1528 
Test data for weekahead prediction  2115  2000  2115  1657  655 
Train data for monthahead prediction  4918  4649  4918  3849  1511 
Test data for monthahead prediction  2108  1992  2108  1649  647 
RMSE (root mean square error), MAP (mean absolute percentage), and MAPE (mean absolute percentage error),
In order to test the performance of our method clearly, five states of the art methods (Deep Recurrent Neural Network (DRNN) [
For 1weekahead prediction problem, function set is
The optimal phenotypes and expression trees for aweekahead prediction with five stock indexes: DJI (a), HIS (b), NASI (c), SSEI (d), and SZSEI (e).
Optimal Ssystem models of five stock datasets for aweekahead prediction.
Type of datasets  Optimal Ssystem model 

DJI 

HSI 

NASI 

SSEI 

SZS 

The prediction and actual results for aweekahead prediction with five stock indexes: DJI (a), HIS (b), NASI (c), SSEI (d), and SZSEI (e).
Comparison results of different prediction models’ performance on five stock indexes are listed in Table
Comparison results of six methods for aweekahead prediction.
Stock index  Method  RMSE  MAP  ARV  MAPE 

VAF (%) 

DJI  Our method 






DRNN  0.0083  8.4909  0.002735  1.1696  0.99726  99.981  
FNT  0.015427  14.907  0.00613  1.8023  0.98939  99.933  
RBFNN  0.016188  24.473  0.01057  2.2631  0.98943  99.927  
BPNN  0.049026  22.721  0.060301  5.5923  0.9397  99.328  
ARIMA  0.052472  20.924  0.071326  5.9773  0.92867  99.230  


HSI  Our method 






DRNN  0.016272  13.384  0.033924  1.8718  0.96608  99.944  
FNT  0.020128  19.261  0.065331  2.2759  0.93467  99.915  
RBFNN  0.023406  25.32  0.072545  2.7067  0.92756  99.885  
BPNN  0.035987  44.738  0.12867  4.1357  0.87133  99.729  
ARIMA  0.013361  13.817  0.026688  1.5367  0.97331  99.963  


NASI  Our method 






DRNN  0.013969  11.069  0.005465  1.757  0.99453  99.941  
FNT  0.016468  32.352  0.006859  2.5336  0.99314  99.918  
RBFNN  0.03669  37.371  0.027513  4.5327  0.97249  99.591  
BPNN  0.046  17.5  0.042533  5.973  0.95747  99.353  
ARIMA  0.049849  18.46  0.093189  5.31  0.90681  99.245  


SSEI  Our method 






DRNN  0.010107  12.959  0.008481  1.7396  0.99152  99.941  
FNT  0.014559  18.931  0.018903  2.2848  0.9811  99.878  
RBFNN  0.014681  20.06  0.018024  2.1804  0.98198  99.876  
BPNN  0.035922  32.768  0.091613  6.9046  0.90839  99.256  
ARIMA  0.020766  20.533  0.029814  3.9766  0.97019  99.752  


SZSEI  Our method 






DRNN  0.018315  19.783  0.011685  3.0419  0.98831  99.826  
FNT  0.018571  21.67  0.012189  3.0233  0.98781  99.821  
RBFNN  0.023881  31.222  0.018031  3.8187  0.98197  99.704  
BPNN  0.027297  41.441  0.027768  4.0751  0.97223  99.614  
ARIMA  0.029022  26.983  0.02844  4.8583  0.97156  99.563 
For 1monthahead prediction problem, function set is
The optimal phenotypes and expression trees for amonthahead prediction with five stock indexes: DJI (a), HIS (b), NASI (c), SSEI (d), and SZSEI (e).
Optimal Ssystem models of five stock datasets for amonthahead prediction.
Type of datasets  Optimal Ssystem model 

DJI 

HSI 

NASI 

SSEI 

SZSEI 

The prediction and actual results for amonthahead prediction with five stock indexes: DJI (a), HIS (b), NASI (c), SSEI (d), and SZSEI (e).
Six prediction models are used to forecast five stock indexes, and the prediction results are listed in Table
Comparison results of six methods for amonthahead prediction.
Stock index  Method  RMSE  MAP  ARV  MAPE 

VAF (%) 

DJI  Our method 

6.9911 




DRNN  0.007741 

0.002616  1.4501  0.99738  99.983  
FNT  0.012504  2.4418  0.007481  1.9062  0.99252  99.956  
RBFNN  0.013379  3.9361  0.007573  2.5368  0.99243  99.950  
BPNN  0.048029  10.1  0.13547  7.3864  0.86453  99.356  
ARIMA  0.052385  93.126  0.11662  7.3731  0.88338  99.234  


HSI  Our method 

6.95 




DRNN  0.011502 

0.027542  1.489  0.97246  99.972  
FNT  0.014388  4.3212  0.046465  1.7348  0.95353  99.957  
RBFNN  0.044134  35.289  0.41237  5.1249  0.58763  99.592  
BPNN  0.045971  55.247  0.22748  5.3369  0.77252  99.557  
ARIMA  0.061245  53.144  0.54399  7.0813  0.45601  99.214  


NASI  Our method 

7.9168 




DRNN  0.031166 

0.031964  5.3044  0.96804  99.706  
FNT  0.031894  9.3407  0.037088  3.7881  0.96291  99.692  
RBFNN  0.035863  11.232  0.04911  3.9605  0.95089  99.610  
BPNN  0.047487  9.9709  0.083254  7.5185  0.91675  99.317  
ARIMA  0.098081  91.02  0.3589  12.177  0.6411  97.086  


SSEI  Our method 






DRNN  0.008104  9.957  0.005335  1.1292  0.99467  99.962  
FNT  0.033005  34.807  0.070553  5.5303  0.92945  99.372  
RBFNN  0.04973  76.098  0.12737  8.4546  0.87263  98.574  
BPNN  0.053661  111.48  0.14219  9.4878  0.85781  98.340  
ARIMA  0.071626  87.753  0.2008  13.551  0.7992  97.043  


SZSEI  Our method 






DRNN  0.045067  19.483  0.13092  7.3945  0.88439  98.959  
FNT  0.06323  25.729  0.2267  11.628  0.7733  97.950  
RBFNN  0.071342  35.295  0.32104  11.772  0.67896  97.390  
BPNN  0.082818  37.58  0.51956  16.103  0.48044  96.483  
ARIMA  0.084487  117.1  0.32894  12.521  0.67106  96.340 
In order to test the performance of our proposed hybrid intelligent algorithm, we use BSO and PSO to optimize the parameters of Ssystem models in the comparison experiments. Through 20 runs, with DJI dataset, the aweekahead prediction results by three evolutionary methods are listed in Table
The averaged RMSE results of three evolutionary methods for aweekahead prediction.
Method  Best  Worse  Mean  SD 

Our hybrid intelligent algorithm  0.005411  0.0071  0.0061  0.00065 
BSO  0.005608  0.0085  0.0072  0.00074 
PSO  0.005475  0.0098  0.0079  0.00081 
Figure
Comparison of error convergence characteristics of our hybrid intelligent algorithm, BSO, and PSO for aweekahead prediction using DJI dataset.
In order to test the performance of restricted gene expression programming for Ssystem optimization, the restricted additive tree is used to optimize the structure of the Ssystem model in the comparison experiments. Through 20 runs, with five stock indexes, the aweekahead prediction results by RGEP and RAT are depicted in Figure
Prediction comparison of two optimization algorithms for aweekahead prediction with five stock indexes.
In this paper, a novel stock prediction method based on the Ssystem model is proposed to forecast the stock market. An improved gene expression programming (RGEP) is proposed to represent and optimize the structure of the Ssystem model. A hybrid intelligent algorithm based on BSO and PSO is used to optimize the parameters of the Ssystem model. Our proposed method is tested by predicting five real stock price datasets such as DJI, HIS, NASI, SSEI, and SZSEI. The results of predicting the stock price aweekahead and amonthahead reveal that our method could predict the stock index accurately and performs better than DRNN, FNT, RBFNN, BPNN, and ARIMA.
The convincing performance of our method is mainly due to three aspects. The first is that the nonlinear ordinary differential equation model Ssystem has strong nonlinear modeling and forecasting ability. Table
The five stock indexes could be downloaded freely at
There are no conflicts of interest regarding the publication of this paper.
This work was supported by the Natural Science Foundation of China (no. 61702445), Shandong Provincial Natural Science Foundation, China (no. ZR2015PF007), the Ph.D. research startup foundation of Zaozhuang University (no. 2014BS13), and Zaozhuang University Foundation (no. 2015YY02).