Stock Market Prediction on High-Frequency Data Using Generative Adversarial Nets

. Stock price prediction is an important issue in the financial world, as it contributes to the development of effective strategies for stock exchange transactions. In this paper, we propose a generic framework employing Long Short-Term Memory (LSTM) and convolutional neural network (CNN) for adversarial training to forecast high-frequency stock market. This model takes the publicly available index provided by trading software as input to avoid complex financial theory research and difficult technical analysis, which provides the convenience for the ordinary trader of nonfinancial specialty. Our study simulates the trading mode of the actual trader and uses the method of rolling partition training set and testing set to analyze the effect of the model update cycle on the prediction performance. Extensive experiments show that our proposed approach can effectively improve stock price direction prediction accuracy and reduce forecast error.


Introduction
Predicting stock prices is an important objective in the financial world [1][2][3], since a reasonably accurate prediction has the possibility to yield high financial benefits and hedge against market risks.With the rapid growth of Internet and computing technologies, the frequency for performing operations on the stock market had increased to fractions of seconds [4,5].Since year of 2009 the BM&F Bovespa (the Brazilian stock exchange) has worked in high-frequency, and the number of high-frequency operations has grown from 2.5% in 2009 to 36.5% in 2013.Aldridge and Krawciw [6] estimate that in 2016 high-frequency trading on average initiated 10%-40% of trading volume in equities and 10%-15% of volume in foreign exchange and commodities.These percentages suggest that the high-frequency stock market is a global trend.
In most cases, the forecast results are assessed from two aspects: the first is forecast error (chiefly the RMSE (Root Mean Square Error) or RMSRE (Root Mean Square Relative Error)) between real price and forecast value; the second is direction prediction accuracy, which means the percentage of correct predictions of price series direction, as upward and downward movements are what really matters for decisionmaking.Even small improvements in predictive performance can be very profitable [7,8].
However, predicting stock prices is not an easy work, due to the complexity and chaotic dynamics of the markets and the many nondecidable, nonstationary stochastic variables involved [9].Many researchers from different areas have studied the historical patterns of financial time series and have proposed various methods for forecasting stock prices.In order to achieve promising performance, most of these ways require careful selection of input variables, establishing predictive model with professional financial knowledge, and adopting various statistical methods for arbitrage analysis, which makes it difficult for people outside the financial field to use these methods to predict stock prices [10][11][12].
Generative adversarial network (GAN) was introduced by Goodfellow et al. [13], where images patches are generated from random noise using two networks trained simultaneously.Specifically, in GAN a discriminative net  learns to distinguish whether a given data instance is real or not, and a generative net  learns to confuse  by generating high quality data.Although this approach has been successful and applied to a wide range of fields, such as image inpainting, semantic segmentation, and video prediction [14][15][16], as far as we know, it has not been used for stock forecasting.
This work uses basic technical index data as an input variable, which can be acquired directly from trading software, so that people outside the financial field can predict stock price through our method easily.This study introduces forecast error loss and direction prediction loss and shows that generative adversarial training [13] may be successfully employed for combining these losses to produce satisfying predict results, and we call this prediction architecture GAN-FD (GAN for minimizing forecast error loss and direction prediction loss).For the purpose of conforming to the practice of actual transactions, this work carries out rolling segmentation on training set and testing set of the raw data, and we will illustrate it in detail in the experimental section.
Overall, our main contributions are twofold: (1) we adapted generative adversarial network for the purpose of price prediction, which constitutes to our knowledge the first application of adversarial training to stock market, and extensive experiments show that our prediction model can achieve remarkable results and (2) we carry out rolling segmentation on training set and testing set of the raw data to investigate the effect the of model parameter update cycle on the stock forecast performance, and the experimental results show that smaller model update cycle can advance prediction performance.
In the remainder of this paper, we begin with a review of the literature on which algorithms have been used for the financial market prediction.Then we formulate the problem and propose our general adversarial network framework.Furthermore, in the experiments section, we presented the experimental analysis with the proposed model, as well as a comparison between the obtained results with those given by classical prediction models.Finally, conclusions and possible extensions are discussed.

Related Work
This section introduce the related work from the stock market prediction method and the generative adversarial network.

Stock Market Prediction Method.
According to the research developed in this field, we can classify the techniques used to solve the stock market prediction problems to twofold.
The first category of related work is econometric models, which includes classical econometric models for forecasting.Common methods are the autoregressive method (AR), the moving average model (MA), the autoregressive moving average model (ARMA), and the autoregressive integrated moving average (ARIMA) [17][18][19].Roughly speaking, these models take each new signal as a noisy linear combination of the last few signals and independent noise terms.However, most of them rely on some strong assumptions with respect to the noise terms (such as i.i.d.assumption, -distribution) and loss functions, while real financial data may not fully satisfy these assumptions.By introducing a generalized autoregressive conditional heteroscedastic (GARCH) model for conditional variances, Pellegrini et al. [20] apply ARIMA-GARCH model to the prediction of financial time series.
The second category involves soft computing based models.Soft computing is a term that covers artificial intelligence which mimics biological processes.These techniques include artificial neural networks (ANN) [21,22], fuzzy logic (FL) [23], support vector machines (SVM) [24,25], particle swarm optimization (PSO) [26], and many others.Many authors have tried to deal with fuzziness along with randomness in option pricing models [27,28].Carlsson and Fullér [29] were the first to study the fuzzy real options and Thavaneswaran et al. [30] demonstrated the superiority of the fuzzy forecasts and then derived the membership function for the European call price by fuzzifying the interest rate, volatility, and the initial value of the stock price.Recently there has been a resurgence of interest in deep learning, whose basic structure is best described as a multilayer neural network [31].Some literatures have established various models based on deep neural networks to improve the prediction ability of high-frequency financial time series [32,33].The ability of deep neural networks to extract abstract features from data is also attractive, Chong et al. [12] applied a deep feature learning-based stock market prediction model, which extract information from the stock return time series without relying on prior knowledge of the predictors and tested it on high-frequency data from the Korean stock market.Chen et al. [34] proposed a double-layer neural network for high-frequency forecasting, with links specially designed to capture dependence structures among stock returns within different business sectors.There also exist a few studies that apply deep learning to identification of the relationship between past news events and stock market movements [35][36][37].
However, to our knowledge, most of these methods require expertise to impose specific restrictions on the input variables, such as combining related stocks together as entry data [12], inputting different index data to different layers of the deep neural network [34], and converting news text into structured representation as input [36].In contrast, our proposed forecasting model directly uses the data provided by the trading software as input, which reduce the barrier for ordinary investors.

Generative Adversarial Network.
Generative adversarial network (GAN) is a framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model  that captures the data distribution and a discriminative model  that estimates the probability that a sample came from the training data rather than .The training procedure for  is to maximize the probability of  making a mistake.This framework corresponds to a minimax two-player game.In the space of arbitrary functions  and D, a unique solution exists, with  recovering the training data distribution and  equal to 0.5 everywhere [13].While  and  are defined by multilayer perceptrons in [13], most researches recently constructed  and  on the basis of Long Short-Term Memory (LSTM) [38] or convolutional neural network (CNN) [39] for a large variety of application.LSTM is a basic deep learning model and capable of learning long-term dependencies.A LSTM internal unit is composed of a cell, an input gate, an output gate, and a forget gate.LSTM internal units have hidden state augmented with nonlinear mechanisms to allow state to propagate without modification, be updated, or be reset, using simple learned gating functions.LSTM work tremendously well on various problems, such as natural language text compression, handwriting recognition, and electric load forecasting.
CNN is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.A CNN consists of an input layer and an output layer, as well as multiple hidden layers.The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers, and normalization layers.CNN also has many applications such as image and video recognition, recommender systems, and natural language processing.
Although there are a lot of literatures forecast stock price by using LSTM model, to the best of our knowledge, this paper is the first to adopt GAN to predict stock prices.The experimental part (Section 4.2) compares the prediction performances between GAN-FC and LSTM.

Forecasting with High-Frequency Data
In this section, we illuminate the details of the generative adversarial network framework for stock market forecasting with high-frequency data.

Problem Statement.
Under the high-frequency trading environment, high-quality one-step forecasting is usually of great concern to algorithmic traders, providing significant information to market makers for risk assessment and management.In this article, we aim to forecast the price movement of individual stocks or the market index one step ahead, based solely on their historical price information.Our problem can be mathematically formalized as follows.
Let X  represent a set of basic indicators and   denote the closing price of one stock for a 1-minute interval at time  ( = 1, 2, . . ., ), where  is the maximum lag of time.Given the historical basic indicators information X (X = {X 1 , X 2 , . . ., X  }) and the past closing price Y (Y = { 1 ,  2 , . . .,   }), our goal is to predict the closing price  +1 for the next 1-minute time interval.There are literatures that examined the effects of different  [7,12,40], but, in this work, we just set  to 242 because each trading day contains 242minute intervals in the China stock exchanges.

Prediction Model.
The deep architecture of the proposed GAN-FD model is illustrated as in Figure 1.Since the stock data is a typical time series, we choose LSTM model, which is widely applied to time series prediction, as the generative model  to predict output Ŷ+1 based on the input data X; that is, The discriminative model  is based on the CNN architecture and performs convolution operations on the one-dimensional input sequence in order to estimate the probability whether a sequence comes from the dataset (Y = { 1 ,  2 , . . .,   ,  +1 }) or being produced by a generative model  ( Ŷ = { 1 ,  2 , . . .,   , Ŷ+1 }).
Our main intuition on why to use an adversarial loss is that it can simulate the operating habits of financial traders.An experienced trader usually predicts stock price through the available indicator data, which is the work of the generative model , and then judges the correct probability of his own forecast with the previous stock price, as the discriminative model  does.
It is noteworthy that the structure of  and  in GAN-FD can be adjusted according to specific application, and the experimental part in this paper just proposed simple  and  framework (Section 4.2) for stock prediction.It is reasonable to believe that fine-tuning the structure of  and  can improve the predictive performance.

Adversarial Training.
The training of the pair (, ) consists of two alternated steps, described below.For the sake of clarity, we assume that we use pure SGD (minibatches of size 1), but there is no difficulty to generalize the algorithm to minibatches of size  by summing the losses over the samples.
Training  (let (X, Y) be a sample from the dataset).In order to make the discriminative model  as "confused" as possible, the generative model  should reduce the adversarial loss in the sense that  will not discriminate the prediction correctly.Classifying Y into class 1 and Ŷ into class 0, the adversarial loss for  is where  sce is the sigmoid cross-entropy loss, defined as However, in practice, minimizing adversarial loss alone cannot guarantee satisfying predictions.Imagine that  could generate samples to "confuse" , without being close to Ŷ+1 , and then  will learn to discriminate these samples, leading  to generate other "confusing" samples, and so on.To address this problem, the generative model  ought to decrease the forecast error loss; that is,   loss where  = 1 or  = 2. Furthermore, as mentioned above, stock price direction prediction is crucial to trading, so we define direction prediction loss function  dpl : where sgn represents sign function.Combining all these losses previously defined with different parameters  adv ,   , and  dpl , we achieve the final loss on : Then we perform one SGD iteration on  to minimize   (X, Y) while keeping the weights of  fixed.
Training  (let (X, Y) be a different data sample).Since the role of  is just to determine whether the input sequence is Y or Ŷ, the target loss is equal to the adversarial loss on D. While keeping the weights of  fixed, we perform one SGD step on  to minimize the target loss: =  sce ( ( Ŷ) , 0) +  sce ( (Y) , 1) .
We train the generator and discriminator iteratively.The entire process is summarized in Algorithm 1, with minibatches of size .

Experiments
4.1.Dataset.Next, we evaluate the performance of the proposed method based on the China stock market, ranging from January 1, 2016, to December 31, 2016.There are totally 244 trading days and each day contains 242-minute intervals, corresponding to 59048 time points.These stocks selected for the experiment should conform to three criteria: first, they should be the constituent stock of  300 (the CSI 300 is a capitalization-weighted stock market index designed to replicate the performance of 300 stocks traded in the Shanghai and Shenzhen stock exchanges); second, they were not suspended during the period we just mentioned, in case accidental events bring about significant impact on their price and affect forecast results; third, their closing prices in the start time, that is, January 1, 2016, are above 30 to ensure the volatility for high-frequency exchange.This leaves 42 stocks in the sample, which are listed in Table 1.The number of increasing directions and decreasing directions for each stock's closing price per minute is also shown in Table 1, and their numbers are relatively close.The historical data was obtained from the Wind Financial Terminal, produced by Wind Information Inc. (the Wind Financial Terminal can be downloaded from http://www.wind.com.cn).
Many fund managers and investors in the stock market generally accept and use certain criteria for technical indicators as the signal of future market trends [12,41].This work selects 13 technical indicators as feature subsets by the review of domain experts and prior researches; that is, the input data X at each moment (e.g., X  ) consists of 13 basic indicators that can be obtained directly from almost all trading software.These basic indicators are listed in Table 2, and their parameters are using the default value of the Wind Financial Terminal.As mentioned above, Y is defined as the closing price at each moment.
Most of the related articles use the traditional data partitioning method; that is, the entire dataset is directly split into training set and testing set [12,22,40,42].However,   segmentation on training set and testing set of the experimental data.As Figure 2 shows, in the beginning, we select the first  days as training set, and the next  days play the role of testing set.After the first round of experiments, we roll forward the time window for  days, that is, choosing the ( + 1)th day to the ( + )th day as training set and the (++1)th day to the (+2)th day as testing set.Repeat until all the data has been experimented.In other words, this  can be regarded as the model update cycle, and  is the size of the corresponding training data.

Network Architecture.
Given that the LSTM generator takes on the role of prediction and requires more accurate calculations of values than the CNN discriminator, we set the learning rate   to 0.0004 and   to 0.02.The LSTM cell in  contains 121 internal (hidden) units and the parameters are initialized following the normal distribution N(0, 1).
The architecture of discriminative model  is presented in Table 3.We train GAN-FD with  = 2 weighted by  adv =   =  dpl = 1.

Benchmark Methods.
To evaluate the performance of our proposed method, we include three baseline methods for comparison.The first model is ARIMA (1, 1, 1)-GARCH(1, 1), a fitted ARIMA model that forecasts future  values of stock time series and the GARCH model forecasts future volatilities [20].The second one is artificial neural networks (ANN).The parameter optimization method and model architectural is setting as in [21], except that the input layer node is changed to 13 and the network outputs the predicted value instead of two patterns (0 or 1).The third one is support vector machines (SVM).An RBF kernel is used and the parameter is setting as in [25].
We also inspect our GAN-FD model from several ways.The GAN-F model is using a GAN architectural for minimizing forecast error loss, with  adv =   = 1 and  dpl = 0.The GAN-D model is using a GAN architectural for minimizing direction prediction loss, with  adv =  dpl = 1 and   = 0.The LSTM-FD model is a LSTM model aiming at minimizing forecast error loss and direction prediction loss, with 121 internal units in LSTM.Obviously, the main difference between LSTM-FD and GAN-FD is the presence of adversarial training.

Evaluation Metrics.
For each stock at each time , a prediction is made for the next time point  + 1 based on a specific method.Assume the total number of time points being tested is  0 ; we used the following criteria to evaluate the performance of different models.
(1) Root Mean Squared Relative Error (RMSRE) RMSRE is employed as an indicator for the predictive power or prediction agreement.A low RMSRE indicates that the prediction agrees with the real data (the reason why this paper uses RMSRE instead of RMSE is that RMSRE facilitates a uniform comparison of the results of 42 stocks).
(2) Direction Prediction Accuracy (DPA) where DPA measures the percentage of accuracy relating to the series trend.A high DPA promises more winning trades.

Results
. In order to investigate the effect of the model update cycle on the predictive performance, let  ∈ {10, 20, 60} and  ∈ {5, 10, 20}.In China stock exchange market, {5, 10, 20, 60} days represent one week, two weeks, one month, and one quarter.Tables 4 and 5 show the average values of RMSRE and DPA with different (, ).The numbers clearly indicate that GAN-FD and its related methods perform better than three baseline methods in terms of RMSRE and DPA.This targeted method GAN-F brings some improvement in RMSRE, but it does not outperform three baseline methods in DPA.Contrary to GAN-F, GAN-D achieves better results in DPA but failed in RMSRE.LSTM-FD improves the results, since it combines forecast error loss with direction prediction loss for training.Finally the combination of the forecast error loss, direction prediction loss, and adversarial training, that is, GAN-FD, achieves the best RMSRE and DPA in the majority of scenarios.
Let us take a look at the effects of different (M, N) on the experiment.GAN-FD obtains the maximum average DPA (0.6956) and the minimum average RMSRE (0.0079) when   (M, N) is (20,5).It is interesting to note that all these methods work better when  is 5 than when  is 10 or 20, with smaller RMSRE and higher DPA.This implies that very short-term trends are best for predicting the next minute's price.Therefore, a shorter model update cycle (e.g.,  is 5) is preferred.On the other hand, for the same , different  will bring about some changes to the prediction results.From the experimental results, we suggest that  should take the value greater than .This makes intuitive sense.If the training sample is inadequate, it would fail to train the model, especially in the volatile stock markets.We should also notice that when the training set is small while the testing set is large (i.e., (M, N) is (10,20)), most of these methods perform the worst, and the DPA of these methods are no better than random guessing (i.e., 50%).Table 6 shows the number of times for each method to achieve the minimum RMSRE over the 42 stocks.It is noticeable that the results of these three baseline methods are all zero.GAN-FD with its related methods is obviously better than these three baseline methods in RMSRE.Meanwhile, GAN-FD obtains the minimum RMSRE 246 times, accounting for 65.08% in these 378 scenarios (42 stocks and 9 groups (M, N)).The best performance appeared when (M, N) is (20,5), with 40 stocks' minimum RMSRE coming from GAN-FD.
Table 7 shows the number of times for each method to achieve the maximum DPA over the 42 stocks.Compared with the other six methods, GAN-FD achieves the maximum DPA 269 times, accounting for 71.16% in all scenarios.When (M, N) is (10,5), the maximum DPA of 41 stocks in all 42 stocks comes from GAN-FD.Even when (M, N) is (20,20), that is, the worst performance of GAN-FD cases, GAN-FD still obtains maximum DPA in 14 stocks.From the above analyses, the performance of the GAN-FD is significantly better than the other six ways.
The results of each representation are reported in Figures 3-11.We just focus on GAN-FD.As shown in Figures 3-5, the DPA of GAN-FD ranges around 64.59%-72.24%when  is 5, and it slumps to 52.01%-62.71%when  is 20, which is presented in Figures 9-11.When  is 5, the RMSRE of GAN-FD over the 42 stocks varies between 0.48% and 1.49%,           which is lower than other six methods in most cases, while the volatility is smaller.However, the RMSRE of GAN-FD increases dramatically and fluctuates violently when  is 20, and it varies between 1.21% and 4.96%.This further shows that we should reduce the model update cycle  and revise the model parameters regularly to adapt to the change of market style.

Conclusion
In this paper, we propose an easy-to-use stock forecasting model called GAN-FD, to assist more and more nonfinancial professional ordinary investors making decisions.GAN-FD adopts 13 simple technical indexes as input data to avoid complicated input data preprocessing.Based on the deep learning network, this model achieves prediction ability superior to other benchmark methods by means of adversarial training, minimizing direction prediction loss, and forecast error loss.Moreover, the effects of the model update cycles on the predictive capability are analyzed, and the experimental results show that the smaller model update cycle can obtain better prediction performance.In the future, we will attempt to integrate predictive models under multiscale conditions.

Figure 2 :
Figure 2: Rolling segmentation on training set and testing set.The green bar represents the entire dataset, the blue bar represents the training set for a round experiment, and the yellow bar represents the corresponding testing set.

19 Table 7 :
The number of times about the maximum DPA.

Figure 4 :
Figure 4: DPA and RMSRE of each stock when (M, N) is (20, 5) and -axis represents the stock ID.

Figure 8 :
Figure 8: DPA and RMSRE of each stock when (M, N) is (60, 10) and -axis represents the stock ID.

Figure 9 :
Figure 9: DPA and RMSRE of each stock when (M, N) is (10, 20) and -axis represents the stock ID.

Figure 10 : 10 Mathematical
Figure 10: DPA and RMSRE of each stock when (M, N) is (20, 20) and -axis represents the stock ID.

Figure 11 :
Figure 11: DPA and RMSRE of each stock when (M, N) is (60, 20) and -axis represents the stock ID.
X 2 , . . ., X T The generator () is founded on LSTM, which applies to predicting Ŷ+1 .The discriminator () is based on CNN for the purpose of estimating the probability whether a sequence is real (Y) or being predicted ( Ŷ). Conv.means convolutional layer, FC is an abbreviation for fully connected layer.The structure of  and  can be adjusted according to the specific application.

Table 1 :
The sample stocks and their number of increasing directions and decreasing directions.
the trading style of the stock market changes frequently; for example, investors sometimes prefer stocks with high volatility and sometimes tend to invest in technology stocks.Therefore, we should update the model parameters regularly to adapt to the change of market style.In order to make experiments closer to real transactions, we carry out rolling

Table 2 :
Basic indicators for prediction.

Table 3 :
Network architecture of discriminative model .

Table 4 :
Summary of RMSRE with different (M, N).These figures are the average values over the 42 stocks.

Table 5 :
Summary of DPA with different (M, N).These figures are the average values over the 42 stocks.

Table 6 :
The number of times about the minimum RMSRE.
Figure 5: DPA and RMSRE of each stock when (M, N) is (60, 5) and -axis represents the stock ID.Figure 7: DPA and RMSRE of each stock when (M, N) is (20, 10) and -axis represents the stock ID.
Figure 6: DPA and RMSRE of each stock when (M, N) is (10, 10) and -axis represents the stock ID.