Integrating Independent Component Analysis and Principal Component Analysis with Neural Network to Predict Chinese Stock Market

We investigate the statistical behaviors of Chinese stock market fluctuations by independent component analysis. The independent component analysis ICA method is integrated into the neural network model. The proposed approach uses ICA method to analyze the input data of neural network and can obtain the latent independent components ICs . After analyzing and removing the IC that represents noise, the rest of ICs are used as the input of neural network. In order to forect the fluctuations of Chinese stock market, the data of Shanghai Composite Index is selected and analyzed, and we compare the forecasting performance of the proposed model with those of common BP model integrating principal component analysis PCA and single BP model. Experimental results show that the proposed model outperforms the other two models no matter in relatively small or relatively large sample, and the performance of BP model integrating PCA is closer to that of the proposed model in relatively large sample. Further, the prediction results on the points where the prices fluctuate violently by the above three models relatively deviate from the corresponding real market data.


Introduction
Recently, some progress has been made to investigate the statistical behaviors of the financial market fluctuation, see 1-6 and some forecasting methods of price changes have been developed and studied by using the theory of artificial neural networks 7-9 .Financial market is a complex system which has many influence factors and many kinds of uncertainties, and its fluctuation often represents strong nonlinear characteristics, so that the forecasting of financial time series has long been a focus of financial research.Unlike the traditional time-series analysis methods, such as exponential smoothing, GARCH 10 , and ARIMA 11 , artificial neural network which can handle disorderly comprehensive information does not require strong model assumptions and also has good nonlinear approximation, strong self-learning, and self-adaptive abilities.Therefore, the neural network is usually applied to forecast the financial time series; see 12-15 .The most popular neural network training algorithm for financial forecasting is the backpropagation neural network BP , which has powerful problem-solving ability.When the BP neural network is applied to forecast financial time series, we need to consider the related factors which are correlated and contain a large amount of noise included into the system.The noise in the data could lead to overfitting or underfitting problems.The key of using BP neural network is to eliminate the correlation and noise among the input data as far as possible, so that it can improve the performance of the BP model and the prediction accuracy.
In the present paper, the methods of independent component analysis ICA and principal component analysis PCA are integrated into BP neural network for forecasting financial time series, which are called ICA-BP model and common PCA-BP model, respectively.Independent component analysis is a recently developed and new method of feature extraction in blind source separation which is the process of separating the source signal from the mixed signal in the case of unknown mixing signal model; see 16 .The explication of ICA is that multichannel observed signals, according to the statistical independence, are decomposed into several independent components ICs through the optimization algorithm.ICA can also be used for the financial model to extract latent factors of influencing financial market; for example, see 17 .In this model, the input data is firstly analyzed by using ICA and obtains several latent ICs.After identifying and eliminating the IC that represents the noise, the remaining ICs are conducted as the input of the BP model, so that the noise component of original data is removed and the input data is made independent of each other.The approach of PCA-BP is to extract the principal components PCs from the input data according to the PCA method and conduct PCs as the input of BP model which can eliminate redundancies of original information and remove the correlation between the inputs.In order to forecast the fluctuations of Chinese stock market, we compare the forecasting performance of ICA-BP model with those of common PCA-BP model and single BP model by selecting the data of Shanghai Composite Index SHCI .Index SHCI plays an important role in Chinese financial markets.It can reflect the activity and the trend of Chinese security markets in large degrees, so this will be helpful for us to understand the status of China macroeconomic.The database is from Shanghai Stock Exchange; see www.sse.com.cn.This paper is organized as follows.Section 2 gives a brief introduction about independent component analysis, BP neural network, and principal component analysis.The forecasting models of stock market are described in Section 3. Section 4 presents the experimental results according to the datasets of SHCI.

Independent Component Analysis
Independent component analysis as a new statistical method is developed in the recent years; for example, see 18, 19 .The ICA model has been widely applied in signal processing, face recognition and feature extraction; see 20, 21 .Kiviluoto and Oja 22 also employed the ICA model to find the fundamental factors influencing the cash flow of the 40 stores belonging to the same retail chain.They found that the cash flow of the retail stores was mainly influenced by holiday, season, and competitors' strategy.The purpose of ICA is that the observed data will be decomposed linearly into statistically independent components, and ICA method aims at finding several latent unobserved independent source signals.Suppose that the observed signal x t is a zero-mean data vector observed at the time t and that the source signal s t is a zero-mean vector with the components being mutually independent, such that x t As t , where A is a full rank linear mixing matrix.The algorithm considers a linear transformation y w T x t w T As t z T s t to obtain the solution of the ICA model, where z A T w, w is an estimator of a row of the matrix A −1 , and y is a linear combination of s i with z i showing the weight.Since the two independent random variables are closer to Gaussian distribution than the original variables, z T s is closer to Gaussian distribution than any s i .Consider w as a vector that maximizes the non-Gaussiandistribution of w T x; it means that w T x z T s equals one of the independent components.A quantitative measure of the non-Gaussian of a random variable y is negentropy which is based on the concept of entropy in information theory.The entropy of random variable could be interpreted as the information degrees of a given observe variable.The negentropy J is defined by where H y is the entropy, given by H y − p y log p y dy, p y is the probability density, and y gauss is a Gaussian random vector having the same covariance matrix as y.The negentropy is always nonnegative, and it is zero if and only if y follows a Gaussian distribution.For the calculation of negentropy, the method of approximate calculation is often applied, which is given as follows: where k i are some positive constants, v is a Gaussian variable with zero mean and unit variance, y is a random variable with zero mean and unit variance, and G i are some nonquadratic function.Even in this case, the approximation is not accurate.Here we use nonquadratic function G, and the approximation is changed into where G y exp{−y 2 /2} or G y 1/α log cos αy , 1 ≤ α ≤ 2, and α is some appropriate constant.If the signals are repeatedly observed, the observed signals are denoted as original signals matrix X.When the matrix W, the separate matrix, is the inverse of the mixing matrix A, the independent component matrix Y WX could be used to estimate source signals matrix S, where one row of matrix Y is an independent component.In this paper, we apply the FastICA algorithm which is based on fixed-point algorithm and is applicable for any type of data to solve the separate matrix.

Artificial Neural Network Model
Neural network is a large-scale and nonlinear dynamic system 23 , which has the abilities of highly nonlinear operations, self-learning, and self-organizing.Since 1987 Lapedes and Farber applied neural network technology to prediction research firstly in 1987, many researchers have been engaged in the study of the predicting method of neural network.Azoff 24 also applied neural network to forecast time series of financial market.In the financial field, neural network is often applied to predict the closing stock price of the next trading day according to the history data.The stock data of the last trading day, including daily open price, daily closing price, daily highest price, daily lowest price, daily volume stock trading amount , and daily turnover stock trading money , are very important indicators.We can apply the history indicators as the input of neural network, the closing price of the next trading day as the output to predict stock price.
In practice, feed-forward neural network, which can be thought of as highly nonlinear mapping from the input to the output, is usually adopted to predict.Since the three-layer feed-forward neural network possesses the capability that it can be approximated to any complicated continuous function, it is suitable for time series prediction.BP neural network that is characterized by the error backpropagation is a kind of multilayer feed-ward neural network.A three-layer BP neural network which contains input layer, one hidden layer, and output layer is chosen in this study.Figure 1 shows the corresponding topological structure.The training of BP neural network is as follows: for the neuron j, its input I j and output O j are calculated with the following formula: where w ij is the weight of the connection from the ith neuron in the previous layer to the neuron j, f x 2/1 − e −2x − 1 is the activation function of the neurons, and θ j is the bias input to the neuron.The error E in the output is calculated with the following formula: where n is the number of training set, l is the number of output nodes, O nl is the output value, and T nl is the target value.When the error E falls below the threshold or tolerance level, the training will end.The error δ l in output layer and the error δ j in hidden layer are calculated according to the following formula: where T l is the expected output of the lth output neuron, O l is the actual output in the output layer, O j is the actual output value in the hidden layer, and λ is the adjustable variable in the activation function.The weights and biases in both output and hidden layers are updated with back propagation error.The weights w ji and biases θ i are adjusted as follows: where k is the number of the epoch and η is the learning rate.

Principal Component Analysis
PCA is a well-established technique for feature extraction and dimensionality reduction.The basic concept of PCA is to use fewer indexes to replace and comprehensively reflect the original more information, and these comprehensive indexes are the principal components.
Ouyang 25 used PCA method to evaluate the ambient water quality monitoring stations located in the main stem of the LSJR.The outcome showed that the number of monitoring stations can be reduced from 22 to 19.Yang et al. 26 built a prediction model for the occurrence of paddy stem borer based on BP neural network, and they applied the PCA approach to create fewer factors to be the input variables for the neural network.Because the essence of PCA is the rotation of space coordinates that does not change the data structure, the obtained PCs are the linear combination of variables, reflect the original information to the greatest degree, and are uncorrelated with each other.The specific steps are as follows: assume the data matrix with m variables, X 1 , X 2 , . . ., X m , n times observations Firstly, we normalize the original data by using the following method: where 0 be the eigenvalues of covariance matrix of normalized data.Also let α 1 , α 2 , . . ., α m be the corresponding

Common PCA-BP Forecasting Model
The BP neural network model requires that the input variables should have worse correlation because the better correlation between input variables implies that they carry more repeated information, and it may lead to increasing the computational complexity and reducing the prediction accuracy of the model.The concept of the common PCA-BP forecasting model is explained as follows; for more details, see 14, 26 .Firstly, use PCA method to extract the principal components from the input data of BP neural network, and then conduct the principal components as the input of the BP neural network.The following example is to illustrate how to extract the principal components from the input data using the method of PCA.Six financial time series are denoted as x 1 , x 2 , x 3 , x 4 , x 5 , and x 6 , the size is or each 1×400.Table 1 exhibits the correlation which is measured by Pearson that correlation coefficient.From Table 1, we can clearly see that the correlation between the six time series is obvious; it means that they contain more repeated information.
Table 2 shows the PCA result on six time series.It indicates that the cumulative contribution rate of the first two PCs exceeds 99%, namely, the first two PCs contain 99% information of the original data.The two PCs are, respectively, recorded as F1 and F2 which are conducted as the input of the PCA-BP model instead of the original data.

ICA-BP Forecasting Model
In the proposed ICA-BP model, ICA method is firstly used to extract the independent components from the original signals.The feature of original signals is contained in the ICs; each IC represents a feature.The IC including the least effective information of original signals is the noise IC.After identifying and removing the noise IC, the rest of ICs are conducted as the input of BP model.Here the observed time series represent the original signals.The obtained PCs according to PCA method are only eliminated by the correlation, but the obtained higher-order statistics with the ICA method are also independent of each other.In statistical theory, independent is a stronger condition than uncorrelated.The key of the model is to identify the noise IC after obtaining the latent ICs.The testing-and-acceptance TnA method is used to solve the problem in this study; see 27 .
Similarly as the above given six time series x 1 , x 2 , x 3 , x 4 , x 5 , and x 6 , Figure 2 shows the tendencies of six time series.Each of the time series is considered as a row; they can be formed by the matrix X of size 6 × 400.By the ICA method, the separate matrix W and the independent component matrix Y can be obtained.Each row of Y , the y i of size 1 × 400, represents an IC. Figure 3 shows the tendencies of the six ICs.It can be seen from Figure 3 that each IC can represent different features of the original time series data in Figure 2. Now the TnA method is applied to identify the noise IC.To introduce the algorithm of TnA, we consider the obtained m ICs.After excluding one IC for each iteration, the remaining m − 1 ICs are used to reconstruct the original signals matrix.Let y k be the excluded IC and X R the reconstructed original signals matrix.X R can be calculated according to the following equation: i is the ith reconstructed variable, a i is the ith column vector of mixing matrix A which is the inverse of separate matrix W, and y i is the ith IC.Respectively, we consider the cases k 1, 2, . . ., m, that is, repeat m iterations and each IC is excluded once.The reconstruction error, which is measured by using relative hamming distance RHD 27 , between each reconstructed matrix X R and the original signals matrix X can be computed.The RHD can be computed as follows: where R i t sign A i t 1 − A i t , R i t sign P i t 1 − P i t .Here sign u 1 if u > 0, sign u 0 if u 0, and sign u −1 if u < 0. A i is the actual value, P i is the predicted value, and n is the total number of data points.
The RHD reconstruction error can be used to assess the similarity between the original variables and their corresponding reconstructed variables.When the RHD value is closer to zero, it shows that there is higher similarity between original variables and their corresponding reconstructed variables, that is, the corresponding ICs that are used to reconstruct original variables contain more features of original variables and the eliminated IC contains less effective information.On the contrary, when the RHD value is farther from zero, this means that the similarity between the original variables and their corresponding reconstructed variable is lower, that is, the eliminated IC contains more effective information of original variables.So the reconstruction in which the RHD value is the closest to zero should be found out; the corresponding eliminated IC is the noise IC.In allusion to the given six financial time series, Table 3 shows the RHD reconstruction errors of each iteration.
Table 3 reveals that the value of RHD, which is reconstructed by using IC2, IC3, IC4, IC5, and IC6 and eliminating IC1, is the smallest.It is concluded that the IC1 contains the least information and IC1 represents the noise IC.IC2, IC3, IC4, IC5, and IC6 are conducted as the input of the proposed ICA-BP model.

Selection of Datasets
For evaluating the performance of the proposed ICA-BP forecasting model and the common PCA-BP forecasting model, we select the data of Shanghai Composite Index to analyze the models by comparison.In the BP model, the network inputs include six kinds of data, daily open price, daily closing price, daily highest price, daily lowest price, daily volume, and daily turnover.The network outputs include the closing price of the next trading day, because, in stock markets, practical experience shows us that the six kinds of data of the last trading day are very important indicators when we predict the closing price of the next trading day at the technical level.For comparing the performance, two sets of data are used to analyzed, that is, Set 1 and Set 2. Set 1 contains relatively fewer data, that is, the data of SHCI each trading day from April 11, 2008, to November 30, 2009.Figure 4 presents the daily SHCI closing price in this period.Set 1 includes 400 selected data in which the first 300 data points are used as training set while the rest 100 data points are used as testing set.Set 2 contains relatively more data, that is, the data of SHCI each trading day from January 4, 2000, to November 30, 2009.Figure 5 presents the daily SHCI closing price in this period.In Set 2, there are 2392 selected data in which the 2171 data points from 2000 to 2008 are conducted as training set while the remaining 221 data points are conducted as testing set.

Performance Criteria and Basic Setting of Model
The prediction performance is evaluated by using the following performance measures: the mean absolute error MAE , the root mean square error RMSE , and the correlation coefficient R .The corresponding definitions are given as follows: where A is the actual value, P is the predicted value, A is the mean of the actual value, P is the mean of the predicted value, and n is the total number of the data.MAE and RMSE values and the larger R value represent the less deviation, that is, the better performance of the forecasting model.
To compare the forecasting performance of the sing BP model, the common PCA-BP model, and the proposed ICA-BP model, all the three models contain BP neural network, so we set for the BP neural network the similar architecture and the same parameters.This can show the effect that the PCA and the ICA method process the input data of BP neural network.For the BP neural network, we only set one hidden layer.The number of neural nodes in the input layer is N it is different for the 3 models , the number of neural nodes in the hidden layer is set to be 2N 1 according to the empirical formula see 23 , and the number of neural nodes in the output layer is 1.We can use N − 2N 1 − 1 to represent the architecture of the network.The threshold of the maximum training cycles is 1000, the threshold of the minimum error is 0.0001, the activation function is f x 2/1 − e −2x − 1, and the learning rate is 0.1.In the single BP model, the number of neural nodes in the input layer is 6 which corresponds to daily open price, daily closing price, daily highest price, daily lowest price, daily volume, and daily turnover, the number of the neural nodes in the hidden layer is 13, the number of neural nodes in the output is 1 which corresponds to closing price of the next trading day, and the architecture is 6-13-1.In the common PCA-BP model, after analyzing the six original time series by using PCA method, we obtain two PCs see Section 3.1 , and the number of neural nodes in the input is 2 which corresponds to the   two PCs, the number of neural nodes in the hidden layer is 5, the output layer is the same as the single BP model, and the architecture is 2-5-1.In the proposed ICA-BP model, after analyzing the six time series by using ICA method and eliminating one IC that represents the noise, we obtain five ICs see Section 3.2 , and the number of neural nodes in the input layer is 5 which corresponds to the five ICs, the number of neural nodes in the hidden layer is 11, the output layer is also the same as the single BP model, and the architecture is 5-11-1.The architectures of all the three models are N − 2N 1 −1.

The Comparisons of Forecasting Results
For comparing the forecasting performance of the proposed ICA-BP model with the common PCA-BP model and the single BP model, the two sets of data Set 1 and Set 2 are, respectively, used for the empirical study.I Firstly, Table 4 depicts the forecasting result of daily SHCI closing price with the three forecasting models by using Set 1 data.It can be observed that the MAE is 68.5315, the RMSE is 90.3209, and the R is 0.9334 in the proposed ICA-BP model.The MAE and the RMSE are smaller and the R is larger than those of the other two models.We can summarize that the proposed ICA-BP model outperforms the other two models and the common PCA-BP model outperforms the single BP model.From Figure 6, we can conclude the same result.
Table 5 and Figure 7 both show the forecasting result of daily SHCI closing price with the three forecasting models by using Set 2 data.The conclusion is similar to that of Set 1, that is, the proposed ICA-BP model has the best performance and the common PCA-BP outperforms the single BP model.It means that the denoising ability of ICA method is clearly better than that of the PCA method in relatively small samples, but the denoising ability of PCA method is closer to that of the ICA method in relatively large samples.This may be because the PCA method is based on Gaussian assumption and the ICA method is based on non-Gaussian assumption.In the case of small sample, the corresponding distribution usually deviates from the Gaussian distribution.III In this part, we consider the statistical behaviors of the price returns in Shanghai stock market and the relative error of the forecasting result in Set 2. The formula of stock logarithmic return and relative error is given as follows: where A t and P t , respectively, denote the actual value and the predicted value of daily closing price of SHCI at the date t, t 1, 2, . ... In Figure 8, we consider the fluctuation of the daily SHCI return and the relative error of forecasting result from the single BP model.Similarly, Figure 9 is the plot for the daily SHCI return and the relative error of forecasting result from the common PCA-BP model; Figure 10 is the plot for the daily SHCI return and the relative error of forecasting result from the proposed ICA-BP model.From Figures 8-10, it can be seen that there are all some points with large relative error of forecasting result in the three models.Through the observation, we can notice that these points appear basically in the place where there is large return volatility marked in Figures 8-10 .This indicates that the predicted results to the points where prices fluctuate violently are relatively not satisfactory by using the three models.The marked parts in Figures 6 and 7 can also support this opinion.

Conclusion
In the present paper, we investigate and forecast the fluctuations of Shanghai stock market.The independent component analysis method and the principal component analysis method are introduced into the neural network model to forecast the stock price.In the proposed ICA-BP model, the input data is firstly analyzed by using ICA, and we obtain several latent ICs; after identifying and eliminating, the IC represents the noise and the remaining ICs are conducted as the input of the BP model.Further, the empirical research is made to compare the actual daily SHCI closing price with the predicted values of the three models,

Figure 6 :Figure 7 :
Figure 6: The actual daily SHCI closing price and the predicted values of the three models in Set 1.

Figure 8 :
Figure 8: The daily SHCI return and the relative error of forecasting result from the single BP model.

Figure 9 :
Figure 9: The daily SHCI return and the relative error of forecasting result from the common PCA-BP model.

Figure 10 :
Figure 10: The daily SHCI return and the relative error of forecasting result from the ICA-BP model.

Table 1 :
Pearson's correlation coefficient of six time series.

Table 2 :
The PCA result of six time series.If the cumulative contribution rate exceeds 85%, the first k principal components contain the most information of m original variables.
eigenvector; the ith principal component is such that F i α T i X, where i 1, 2, . . ., m.Generally, λ k / m i 1 λ i is called the contribution rate of the kth principal component and k i 1 λ i / m i 1 λ i is called the cumulative contribution rate of the first k principal components.

Table 3 :
The RHD reconstruction errors.

Table 4 :
Forecasting result of the daily SHCI closing price with three models using Set 1.

Table 4 with
Table 5, we can also see that the proposed ICA-BP model outperforms the common PCA-BP model distinctly in Set 1.The MAE values are 68.5315 and 84.3123, respectively, where the difference of the two numbers is about 16, and

Table 5 :
Forecasting result of the daily SHCI closing price with the three models using Set 2. Nevertheless in Set 2, the MAE values are 51.5165 and 56.4246, respectively, where the difference of the two numbers is about 5, and the RMSE values are 70.8551 and 80.9682, respectively, where the difference of the two numbers is about 10.It shows that the performance of the common PCA-BP model becomes closer to the proposed ICA-BP.