A Hybrid Approach by Integrating Brain Storm Optimization Algorithm with Grey Neural Network for Stock Index Forecasting

and Applied Analysis 3 Step 2. Calculate the weights by using 11 = b1, 21 = −y1(1), 22 = 2b2/b1, 23 = 31 . . . , 2n = 2bn/b1, 31 = 32 = ⋅ ⋅ ⋅ = 3n = 1+ e −b 1 (t−1) and let θ = 1 −λ)∗ (1+ e −b 1 (t−1) ). Step 3. For each input-output pair ((t − 1), y1(t)), t = 1, 2, . . . , T, determine the output in each layer by using the following activation functions: (a) from the input layer to the first hidden layer: the logistic sigmoid (logsig) function is used; that is, the output in the first hidden layer is expressed as 11 = 1/(1 + e −ω 11 (t−1) ); (b) from the first hidden layer to the second hidden layer: the linear function is used; that is, the output in the second hidden layer is expressed as 21 = ω21z11, 2i = ω2iz11yi(t), i = 2, . . . , n; (c) from the second hidden layer to the output layer: the linear function is used; that is, the output in the output layer is expressed as 31 = ∑ n i=1 3i2i + θ. Step 4. Calculate the error between the target outputy1(t) and the calculated output z31, and update the weight according to the following expressions: (a) the weights from the first hidden layer to the second hidden layer are updated by 21 = −y1(1), 22 = 22 − η2δ2z11, 23 = 23 − η3δ3z11, . . ., 2n = 2n − ηnδnz11, where 1 = 2 = 3 = ⋅ ⋅ ⋅ = n = 31 − 1 + e −ω 11 (t−1) ) and 2 3 . . . , n are called the learning rate; (b) the weights from the input layer to the first hidden layer are updated byω11 = ω11+b1tδn+1, where n+1 = (1 + e −ω 11 (t−1) ) −1 (1 − 1/(1 + e −ω 11 (t−1) )) ∑ n i=1 ω2iδi; (c) meanwhile, update θ as θ = (1 + e −ω 11 (t−1) ) × ( 222 (t) + 233 (t) + ⋅ ⋅ ⋅ + 2nn (t))


Introduction
The stock market not only occupies an important position in the financial investments domain but also plays an important role in the financial market researching.Stock index is a significant indicator in reflecting the stock market.It refers to the stock price index and is regarded as a reference indicator which is compiled by the stock exchange corporations or the financial services sectors to indicate the changes of the overall stock prices.Stock index forecasting is an important tool for both the investors and the government organizations.However, the accuracy of the stock index forecasting has been affected by many factors, and it has characteristics such as large volatility, high noise, and nonlinearity.Thus, it is a quite complex and difficult task to provide an accurate stock index forecasting.
For the recent years, researchers have made great effort to develop new forecasting models so as to improve the stock index forecasting accuracy.The application of the neural network models to stock indices forecasting has attracted great attention in the recent few decades.The artificial neural network (ANN) was used by de Faria et al. [1] as an alternative method along with the adaptive exponential smoothing (ES) approach to forecast the Brazilian stock market.Shen et al. [2] surveyed the effectiveness of the radial basis function neural network optimized by artificial fish swarm approach in the stock indices forecasting.Lu and Wu [3] developed a cerebellar model articulation controller neural network, whose performance is better than the support vector regression (SVM) and the backpropagation neural network (BPNN) when it was adopted to forecast the Nikkei 225 and Taiwan Stock Exchange Capitalization Weighted Stock Index closing indices.BPNN was also integrated with the ES algorithm and the autoregressive integrated moving average method by Wang et al. [4] to predict the closing price of the stock index.Nayak et al. [5] also introduced a hybrid model formed by the chemical reaction optimization and ANN to the stock index forecasting.Aside from the neural network models, 2 Abstract and Applied Analysis fuzzy time series approaches have also achieved considerable successes in stock indices forecasting.A type-2 fuzzy time series approach was applied by Huarng and Yu [6] to the stock index forecasting in 2005, and Singh and Borah [7] enhanced this approach by adopting the particle swarm optimization approach.Chen et al. [8] presented the performance of a high-order fuzzy time series with the assistance of the multiperiod adaption approach in stock markets forecasting.By regarding the stock index as well as trading volume as the forecasting factors, a dual-factor fuzzy time series approach was employed by Chu et al. [9] to forecast the stock index.Wong et al. [10] also proposed an adaptive time-variant fuzzy time series model to perform the stock index forecasting.Stock indices forecasting involving other models can be found in [11][12][13][14].
Although a large number of models have been developed by researchers to forecast the stock indices, there is no model that can perform well in any situations; thus, novel effective stock indices forecasting approaches still need to be explored.This paper aims to propose a novel hybrid stock index forecasting model.As known, the grey model has great capability in dealing with data with small samples, and the neural network has strong abilities in handling nonlinear fitting problems.Thus, with the consideration that combining these two methods can make up the deficiency of the single models, a grey neural network (GNN) is developed in this paper to forecast the daily stock index price.However, it is found that the grey neural network model with randomly initialized parameters has two deficiencies: on the one hand, it is easy to fall into local optimum; on the other hand, the forecasting accuracy is generally low.Therefore, a new brain storm optimization algorithm-based grey neural network (BSO-GNN) approach, which initializes the parameters in grey neural network with the brain storm optimization (BSO) algorithm, is proposed in this paper to solve the local optimum and low forecasting accuracy problems.The effectiveness of the new proposed BSO-GNN model is validated by the 5-day-ahead daily opening price forecasting results of three stock market indices named the Shanghai Stock Exchange (SSE) Composite Index, the Shenzhen Composite Index, and the HuShen 300 Index.
The remainder of this paper is organized as follows.In Section 2, the related methodologies including the single GNN model and the single BSO algorithm as well as the combined BSO-GNN approach are introduced.The performance measurement criteria of different models can be also found in this section.Section 3 presents the data collection and parameters initialization results.The simulation results and discussions are exhibited in Section 4. And the last section concludes the whole paper. (1), (2), . . ., (), where  is the number of the data in this sequence, then a new data sequence can be generated based on the 1-accumulated generating operation (1-AGO) as  (1) (1),  (1) (2), . . .,  (1) (), where  (1) () = ∑  =1 (),  = 1, 2, . . ., .
Step 3.For each input-output pair (( − 1),  1 ()),  = 1, 2, . . ., , determine the output in each layer by using the following activation functions: (a) from the input layer to the first hidden layer: the logistic sigmoid (logsig) function is used; that is, the output in the first hidden layer is expressed as ( Step 5. Judge whether the termination criterion has been reached: if so, the whole GNN model has been determined; otherwise, return to Step 3 and repeat the process. Once the sequence  1 (2), . . .,  1 () has been forecasted by the GNN model, the forecasted values of the original data sequence (1), (2), . . ., () can be obtained by the 1-inverse AGO (1-IAGO) defined by

Brain Storm Optimization Algorithm.
Parameter optimization is one of the effective ways to improve the accuracy of forecasting approaches [16][17][18][19].In this paper, the brain storm optimization (BSO) algorithm [20] is used to optimize the unknown parameters in the GNN model.For a population of  individuals   = ( ,1 ,  ,2 , . . .,  , ),  = 1, 2, . . ., , where  is the dimension of the individuals, the BSO algorithm consists of seven procedures in all, which is described as follows.
Step 1. Initialize the population of individuals by using  , =  + ( − ) * rand, where  and  are the left boundary and the right boundary of the parameter, and rand is a random value selected from the unit interval (0, 1).
Step 2. Divide these  individuals into  clusters with the means cluster method.
Step 3. Calculate the  objective function values with regard to the  individuals, and pick up the individual which corresponds to the smallest objective function as the clustering centroid in each cluster.
Step 4. Choose a random value  1 from the interval (0, 1), and compare this value with a predetermined probability  1 : (a) if  1 <  1 , randomly select one clustering centroid and replace it with a randomly generated individual; (b) otherwise, go to Step 5.
Step 5. Update the individuals.First randomly choose a value  2 from (0, 1): (a) if  2 is less than a predetermined probability  2 , then select a random value  3 which follows the uniform distribution on (0, 1): Step 6. Repeat Step 5 until the  individuals have been all updated.
Step 7. Judge whether the termination criterion has been reached: if yes, end the whole process; otherwise, go to Step 2 and repeat the process.As observed, the individual update in Step 5 is of great importance in conducting the BSO process.The update of an individual in this step is carried out by where  selected and  updated are the selected and updated individuals, respectively, (,  2 ) is the normal random function of which the mean is  and the variance is  2 , and the coefficient  is determined by where max iteration and current iteration denote the maximum iteration number and the current iteration number, respectively, rand is a value randomly chosen from the interval (0, 1), and  is a predetermined value.

New Combined Model. As observed, in
Step 1 of constructing the GNN model, values of  1 ,  2 , . . .,   should be first determined.However, in previous studies, values assigned to these parameters are randomly selected or just determined by experience; this will greatly affect the accuracy of the GNN model.Therefore, this paper develops a new BSO-GNN model, which utilizes the BSO algorithm to optimize these parameters and the optimal parameters are substituted for (1) to conduct the GNN model construction.In this manner, error caused by the improper parameter values can be reduced a lot, and the whole forecasting error is expected to be lowered.Flowchart of the new combined BSO-GNN model is shown in Figure 2.

Performance Evaluation Criteria.
To evaluate the performance of different models, three error evaluation criteria named the mean absolute error (MAE) [21], the root mean square error (RMSE) [22], and the mean absolute percentage error (MAPE) [23], which are defined as follows, respectively, have been adopted: where () and x() are the actual value and the forecasted value at time , respectively.

Data Collection and Parameters Initialization
In this paper, five indicators with regard to three stock market indices named the Shanghai Stock Exchange (SSE) Composite Index (http://q.stock.sohu.com/zs/000001/lshq.shtml), the Shenzhen Composite Index (http://q.stock.sohu.com/zs/399106/lshq.shtml), and the HuShen 300 Index (http://q.stock.sohu.com/zs/000300/lshq.shtml) are collected to evaluate the performance of the new proposed BSO-GNN model.Herein the five indicators consist of the opening price (OP), the closing price (CP), the highest price (HP), the lowest price (LP), and the trading volume (TV).All the five indicators of these three stock market indices are collected from January 5, 2012, to April 17, 2012.Since the stock trading is not made on weekends and during the national holidays, the number of the data during the previous collection period presented in the form of (OP  , CP  , HP  , LP  , TV  ) is only 66, where the integer  varies from 1 to 66 in the case of regarding January 5, 2012, as the first day.And the purpose of this paper is to forecast the 5-day-ahead daily opening price of the stock.
In general, it is believed that the opening price of the stock on one special day is relevant to the five indicators of the corresponding stock on the former day.Therefore, the GNN model constructed in this paper is as follows: where the coefficient   ( = 1, 2, . . ., 6) is unknown and needs to be determined by the BSO algorithm, the output variable OP  in the GNN model denotes the opening price of the stock on the th day, and the input variables OP −1 , CP −1 , HP −1 , LP −1 , and TV −1 in the GNN model represent the opening price, the closing price, the highest price, the lowest price, and the trading volume of the corresponding stock on the (−1)th day, that is, on the former day, respectively.In this manner, there are 65 input-output pairs in all (because the subscript  varies from 2 to 66).These 65 input-output pairs are divided into two parts: the training set and the testing set, where the number of the two parts is set to 60 and 5, respectively; that is, the first 60 input-output pairs are regarded as the training set, while the last 5 pairs are employed to validate the performance of the new proposed model.In other words, this paper is devoted to the 5-day-ahead daily opening price forecasting.According to the GNN algorithm introduced in Section 2.1.1,the number of the output in the GNN model is 1, so the 5-day-ahead daily opening price forecasting is finished by carrying out the GNN model five times in all.

Basis Statistical Characteristic Analysis.
In this section, the basic statistical characteristics of the three stock indices are first analyzed and the results are provided in Table 1.
The mean figures listed in Table 1 demonstrate that the variation ranges of the SSE Composite Index and the HuShen 300 Index revealed by the five indicators, that is, the opening price, the closing price, the highest price, the lowest price, and the trading volume, are much similar to each other, however, differ a lot from those of the Shenzhen Composite Index.This phenomenon is clearly shown in Figure 3.And the standard deviation values of the first four indicators can be ordered as the HuShen 300 Index, the SSE Composite Index, and the Shenzhen Composite Index, from large to small, respectively.Among these three stock markets, the standard deviation values of the fifth indictor (i.e., the trading volume) in the SSE Composite Index market are the greatest and in the HuShen 300 Index market are the least.The skewness statistic is adopted to describe the symmetry of the variables, and the greater the absolute value of the skewness, the more asymmetric the variable.So, as observed, for a specific variable among the five indicators, the most relative symmetric is the Shenzhen Composite Index, while the most relative asymmetric is the HuShen 300 Index.In addition,  a statistic which is used to describe the steep degree of the variables named the kurtosis is also introduced.For this statistic, values large than 0 reveal that the distribution of the variable values is steeper than the standard Gaussian distribution; on the contrary, values smaller than 0 mean that the distribution of the data is less steep than the standard Gaussian distribution; and the optimal situation occurs when the value of this statistic is equal to 0, which shows that values have the same distribution with the standard Gaussian distribution [24].

Results
Obtained by BSO-GNN and GNN Models.In this section, the performance of the new proposed BSO-GNN model is validated by comparing the fitting results (corresponding to the training set) and the forecasting results (corresponding to the testing set) obtained by the BSO-GNN model and the individual GNN model.The new BSO-GNN model proposed is carried out by performing the BSO algorithm first to determine the unknown parameters presented in (10); the optimal values with regard to the six parameters are listed in Table 2.As seen from Table 2, apart from the negative value (−0.0108) of the parameter  5 for the Shenzhen Composite Index, all the other parameter values are positive.
Due to the reason that different indicators may differ in units and variation intervals, the normalization preprocessing operation is often introduced into the neural network models.Taking the opening price for an example, the following expression is used to normalize this variable: OP  processed = (OP  − OP min )/(OP max − OP min ), where OP  and OP  processed are the raw and normalized opening price values on the th day, respectively, and OP min and OP max are the minimum and maximum opening price values of all the days, respectively.When the normalization operation is performed in the neural network, the values obtained by the final output of the network should be reproduced by the inverse normalization operation, that is, by carrying out the operation expressed by OP  = OP  processed (OP max −OP min )+ OP min .
Figures 4 and 5 present the opening price fitting and forecasting results of the three stock market indices gained by the BSO-GNN and the individual GNN models wherein the data are not preprocessed and preprocessed by the normalization operation, respectively.As seen, on some days, curves obtained by the GNN model have an opposite trend with the actual curve; the same situation occurs in curves gained by BSO-GNN.However, as a whole, in both the withoutnormalization and with-normalization cases, as compared to the individual GNN model, the fitting and forecasting results obtained by BSO-GNN models for all the three stock market indices are much closer to the actual data on most of the days.

Error Comparison.
It is difficult to judge which model is better by only observing fitting and forecasting results presented in Figures 4 and 5. Thus, in this section, error comparison results obtained by the three error evaluation criteria indictors mentioned in Section 2.3 are provided to gain a much clearer recognition about the good capability of the new proposed BSO-GNN model.
One of the most common comparison methods is to compare the mean values obtained by different models and by the actual data.Figure 6 shows the mean comparison results of the GNN and BSO-GNN models in the training stage and the testing stage.However, results have shown that there is no significant difference between the mean obtained by the GNN model and the mean calculated by the actual data or between the mean obtained by the BSO-GNN model and the mean calculated by the actual data.Thus, a further comparison should be conducted by using other indicators, and error evaluation by using criteria described in Section 2.3 is just a good approach to solve this problem.
As observed, when the raw data is preprocessed by the normalization operation, OP  processed will be equal to 0 in the case of OP  = OP max .However, in this situation, the MAPE error with regard to the data sequence constructed by

Forecasting results
Figure 6: Mean comparison results of the GNN and BSO-GNN models.lot: error values obtained by the MAE and RMSE criteria vary in the interval (5, 50) in the without-normalization situation, while the variation interval changes to (0.02, 0.12) in the with-normalization situation.When it comes to the MAPE criterion, the corresponding variation intervals in these two cases are (0.2, 1.6) and (4, 29), respectively.In other words, the normalization operation has a significant impact on the variation range of the error values.Furthermore, Table 3 demonstrates that for all the three stock market indices, the MAE, RMSE, and MAPE values obtained by the BSO-GNN model are all smaller than those obtained by the individual GNN model no matter whether the raw data are preprocessed by the normalization operation or not.In the case where the raw data are not preprocessed by the normalization operation, the greatest improvements of the BSO-GNN as compared to the GNN models calculated according to the MAE, RMSE, and MAPE criteria are all obtained by the HuShen 300 Index with values of 25.1897 (from 36.0929 to 25.1859), 34.5939 (from 49.4477 to 14.8538), and 1.0163 (from 1.4410 to 0.4247), respectively.While results in the case where the raw data are preprocessed by the normalization operation are different, the most significant improvements obtained according to the MAE, RMSE, and MAPE criteria occur in the SSE Composite Index, the HuShen 300 Index, and the Shenzhen Composite Index, respectively.
Analogously, a clear view with regard to the forecasting error results is provided in Figure 7.As seen, with no exception, the error results gained according to the BSO-GNN model are less than those obtained by the GNN model.And consistent with the results obtained in the training stage, under the without-normalization situation, the greatest improvements demonstrated by the MAE, RMSE, and MAPE criteria all occur in the HuShen 300 Index market, with values of 16.8303 (from 28.0607 to 11.2304), 20.3949 (from 32.3231 to 11.9282), and 0.6570 (from 1.0973 to 0.4403), respectively.As turn to the with-normalization case, there is some difference as compared to the error results obtained in the training stage: the most significant improvement demonstrated by the MAE criterion appears in the HuShen 300 Index market, with an improvement of 0.0401 (from 0.0668 to 0.0267), while those demonstrated by the RMSE and MAPE criteria occur in the SSE Composite Index market, with values of 0.0504 (from 0.0774 to 0.0270) and 6.3921 (from 10.6773 to 4.2852), respectively.

Conclusions
In this paper, a hybrid stock index forecasting model BSO-GNN is proposed based on the BSO algorithm and the GNN approach, which utilizes the advantages of the BSO method, the grey model, and the neural network.The normalization operation has been frequently introduced into the neural network.And this paper presents a view on whether the normalization operation can affect the forecasting results by considering two situations in evaluating the performance of the BSO-GNN model furthermore, that is, withoutnormalization and with-normalization.In addition, different from the previous GNN models adopted by researchers, the unknown parameters in GNN models, which should be predetermined so as to gain the weights values in the networks, are optimized by the BSO algorithm to guarantee that the error is as low as possible.To evaluate the performance of the new developed model, three stock market indices including the SSE Composite Index, the Shenzhen Composite Index, and the HuShen 300 Index are used.The fitting and forecasting results obtained in the training stage and testing stage reveal that the improved BSO-GNN model is more effective than the individual GNN model in the stock index forecasting.Moreover, the forecasting results and error comparison results demonstrate that the BSO-GNN is effective and robust in the stock index forecasting.

Figure 1 :
Figure 1: Structure of the GNN model.

Figure 2 :
Figure 2: Flowchart of the new proposed BSO-GNN approach.

8 Figure 3 :
Figure 3: Box graphs of the five indictors in the three stock markets.

Figure 4 :
Figure 4: Opening price fitting and forecasting results of the three stock market indices gained by the individual GNN model and the BSO-GNN model under the nonnormalization situation: (a) SSE Composite Index, (b) Shenzhen Composite Index, and (c) HuShen 300 Index.

Figure 5 :
Figure 5: Opening price fitting and forecasting results of the three stock market indices gained by the individual GNN model and the BSO-GNN model under the normalization situation: (a) SSE Composite Index, (b) Shenzhen Composite Index, and (c) HuShen 300 Index.

Table 1 :
Statistical characteristics of the three stock indices.
a Std.refers to the standard deviation.

Table 2 :
Optimal parameters obtained by the BSO algorithm.

Table 3 :
Fitting error results of different models under different data preprocessing operations.processedwillmakenosense since the denominator in the definition of MAPE is equal to 0. So we correct the MAPE error by eliminating the data satisfying OP  = OP max and just calculating MAPE error on the data sequence constructed by other data in this case.Table3exhibits the fitting error results of GNN and BSO-GNN models under different data preprocessing operations, that is, with normalization and without normalization.A crucial phenomenon revealed in Table3is that the error values obtained by using the same error evaluation criterion but different preprocessing operations, that is, with normalization situation and without normalization, differ a OP