Enhancing Stock Price Trend Prediction via a Time-Sensitive Data Augmentation Method

,


Introduction
In financial market, stock price trend is a type of important time series, which is closely relevant to the profits of the investment. Owing to short-term microstructure of the financial market, stock price trend data are highly volatile and uncertain. ough they provide the investors with decision messages for seeking profit maximum of stock investment, forecasting future stock price is still a tough task for decades.
Early methods utilize conventional statistical techniques to predict stock price trend. Among them, autoregression moving average (ARMA) and autoregression-integrated moving average (ARIMA) [1] are the most popular models, and in turn many variants have been explored [2][3][4][5]. For instance, Babu and Reddy [3] proposed a linear hybrid model which consists of ARIMA and GARCH models. Li and Chiang [5] proposed a forecasting model by integrating a neurofuzzy system and ARIMA models. Such statistical methods might be too limited to deal with such a dynamic and complex stock market because they fail to unveil the nonlinearity between stock prices at varied time points.
With the boom of deep learning, deep stock price prediction methods start to surge continually. Benefitting from powerful layer-wise representation, deep models have dominated the stock market prediction field [6,7]. Nelson et al. [8] were the first to apply Vanilla LSTM [9] for stock price prediction and proved its effectiveness as its distinguished ability to capture long-term dependencies in input sequences. Combined with LSTM, some other frameworks [10][11][12][13][14][15][16] are also investigated to promote price prediction accuracy. In [10], to discover stock price patterns, the K-means algorithm is firstly used to cluster stock price subsequences, then a multibranch LSTM model is constructed which makes the final prediction based on the learned k clusters. In [13], both wavelet transform and attention mechanism are integrated into LSTM to make the price prediction. In addition, Zhang et al. [12] leveraged different underlying frequency patterns on the basis of LSTM and discrete Fourier transform (DFT) for stock price prediction. In detail, DFT serves to decompose the hidden states of memory cells into several frequency components, and then an inverse Fourier transform (IFT) process is used to combine such components to reconstruct the above hidden states.
As one knows, such deep models highly rely on large scale datasets, and thereby exhibit the capability of effective stock price prediction. In real life, only collecting around 2,520 samples could take ten years, which is far from the requirements for tuning a large collection of parameters in deep models. As a result, this might possibly induce the risk of model overfitting and thus limit the performance of prediction models on unseen data [17,18]. To defeat this issue, a simple and effective scheme is data augmentation, which aims to augment data by coining new data similar to original data generative distribution. Bengio et al. [19] found that out-of-distribution examples are more beneficial to a deep learner than a traditional shallow one. However, it is nontrivial to exploit most existing data augmentation techniques from image processing regime for stock price data. is is because stock price data fed to the prediction models each time is extremely few, thus any tiny improper operations could hurt the underlying patterns of original data.
In this study, the focus is to address this issue. Here, we propose a simple yet effective data augmentation method for stock price trend prediction. Different from conventional augmentation schemes, which directly impose the transformations such as adding random noises to original time series, our data augmentation method considers how to perform the transformations over the unimportant patterns of original data as well as to preserve the underlying patterns within the dataset.
is increases the data diversity. e insight behind the proposed augmentation method is that low-frequency patterns without noisy corruptions could not hurt the true patterns of original time-series data. As in Figure 1, low-frequency patterns are more relevant to the patterns of original data, as it can be viewed as the substitute of original data, while high-frequency ones are more irrelevant and random. According to this observation, amounts of new time-series samples are coined, and their data distribution resembles original time series. In specific, we first decompose the input time series into diverse frequency components and then adopt some transformations to change some components. In this work, the discrete wavelet transform (DWT) [20] is used, which provides detailed frequency and location information about original data. Besides, according to time-sensitive property of time series, we coin new data by reweighting stock price patterns of different time points in time series. is could avoid the impact of overdue historical data over the coined time series. Ablation studies and extensive experiments are carried out on a real stock price dataset including 50 corporation stocks to verify the efficacy of the proposed data augmentation method.
e main contributions of this work are two-fold: (1) An effective data augmentation method is tailored for stock price data, which coins amounts of new time series by changing high-frequency components of original data while preserving low-frequency components.
(2) Based on the proposed data augmentation method, a decay factor is introduced to control the scale of noise over time series for further refining our method, which distinguishes the importance of the patterns at different time points. is might eliminate the interruption of overdue historical data over the coined time series. e remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 describes the proposed data augmentation method and transformation techniques in our work. In Section 4, a series of comparison experiments are conducted to evaluate the effectiveness of our proposed method. Finally, we conclude the paper in Section 5.

Related Work
Although numerous deep learning models have been widely used and showed the superiority over shallow learning methods in many areas, the overfitting problem often emerges because of insufficient data in many applications, including computer vision, natural language processing, and data mining. Data augmentation is one of the most practical ways to relieve such problem, and then numerous augmentation strategies have been applied in computer vision, such as translation, rotation, scaling, flipping, and shearing. Many convolutional neural networks work well in corporation with such augmentation techniques [21]. However, most of these data augmentation methods might not be brutally applied for other areas. Schlüter and Grill applied a series of data augmentation methods for singing voice detection. Results showed that very few methods induce performance gains in this task [18]. Devries and Taylor showed that performing transformations in the input space has limited effectiveness while operating in the feature space which can achieve a better result in many tasks [22].
When using the operations including translation, flipping, and scaling, the transformed image share the same information as the original image. However, these operations cannot keep the property for time series, as it is not obvious to obtain the discriminative information through a similar operation [23]. us, how to augment the time-series dataset for stock price prediction is still an open problem at present.
Up to now, several studies have tried to address such a problem. Le Guennec et al. [24] utilized window slicing, window warping, and dataset mixing to improve deep CNN models for time-series classification. Fawaz et al. [25] proposed a data augmentation method based on dynamic time warping distance to boost time-series classification. Obviously, such augmentation schemes are specially designed for classification task and might be far from optimal for regression task such as stock price prediction. To augment the dataset for stock price prediction, most efforts focus on utilizing similar stocks with the similar price tendency to expand the dataset. Zhang et al. [26] proposed a solution composed of two stages, i.e., similar stocks were collected according to their retracement probability density function (PDF) in the first stage and then the stocks in the same cluster act as the enlarged dataset to train the model. Besides, Yujin and Young proposed a ModAugNet model including an overfitting prevention LSTM and a prediction LSTM module [27], to relieve the overfitting problem. In the training process, ten companies' stocks highly correlated to the stock market index were collected and then randomnly combined five of them were fed to the prevention LSTM each time. In the end, the final prediction is made together with the features extracted from the target stock index.
Unlike such studies, which aims to augment the dataset by collecting similar stocks to enlarge the dataset [26,27], we propose a more general data augmentation method for stock prediction analysis based on discrete wavelet transform (DWT), which requires no specific knowledge from external environments. us, our method is free from specific situations and can cooperate with other augmentation methods as above.

Method
is section firstly overviews the proposed data augmentation method, then details each procedure used in the data augmentation method.
3.1. e Overall Pipeline. As discussed above, the proposed data augmentation method augments dataset by changing high-frequency components with some transformations whilst keeping low-frequency ones unchanged. e idea behind our method is that low-frequency components are close to the original data; thus, low-frequency patterns without noisy corruptions will not hurt the true patterns of time series.
us, it is more likely to generate new data following the same distribution as the original data. e overall process of the proposed method is shown in Figure 2.
To decompose the original data in frequency domain, a series of techniques can be applied here, such as Fourier transform and discrete wavelet transform (DWT) [20]. In this paper, DWT is used as it can provide detailed frequency and location information with respect to the original data. By changing high-frequency components, we can coin amounts of time series. e usual operations to realize this goal contain data corruptions with random noises and interpolation [28]. Such methods are not optimal for time series. To this end, we design a novel transform operation for time series by introducing a decay factor to control the scale of noises over the original data during different time durations. As one knows, time series are time sensitive. When they are treated fairly and operated with the same operations, it might be harmful for keeping the underlying patterns of original data. is easily makes the ground truth uncertain. at is, the generated time series could be nothing but noises. e proposed transform operation introduces a decay factor to keep the underlying important information.
In this way, the resultant synthetic time series are generated by combining the new transformed high-frequency components with original low-frequency ones.
To summarize, the proposed augmentation method is composed of three stages. A time series is decomposed into the corresponding high-frequency components and lowfrequency ones at the first stage. en, the proposed transform operation is performed over the high-frequency components, while preserving the low-frequency components. In the end, we compose the transformed high-frequency components and low-frequency ones into a brand time series.

Data Decomposition in Frequency Domain.
In the proposed data augmentation method, the original time series need to be mapped into frequency domain. Many candidates can be used for this purpose. Among them, the discrete wavelet transform (DWT) is a typical continuous signal decomposition method. It can decompose time series into a set of diverse frequency subseries using a series of high-pass and low-pass filters in a level-by-level manner.
is meets the requirements of the proposed data augmentation method.
For clarity, we review DWT for subsequent sections. Given a time series x � x 1 , x 2 , . . . , x T , the low-and Complexity high-frequency subseries generated in the ith level are denoted as x l (i) and x h (i). en, the corresponding lowand high-frequency subseries can be obtained using a lowpass filter l � l 1 , l 2 , . . . , l K and a high-pass filter h � h 1 , h 2 , . . . , h K . e concrete functions are as follows: where x l n (i) is the nth element of the low-frequency subseries in the ith level. As x l (0) is set to the input time series, low-and high-frequency subseries in the (i + 1)th level x l (i + 1) and x h (i + 1) can be generated from the 1/2 down sampling of the intermediates a l (i + 1) and a h (i + 1), respectively. With the above transform, a set of diverse frequency subseries can be obtained from the original time series as X(L) � x h (1), x h (2), . . . , x h (L), x l (L) , where L is the maximum level, and the frequency from x h (1) to x l (L) is from high to low.

Transform over High-Frequency Components.
To augment dataset, each sampled time series fed to DWT is decomposed into diverse frequency subseries. en, a series of transformations can be operated in the high-frequency components to generate new series. Among them, adding the noises following the Gaussian distribution to the highfrequency components is the most commonly used way. e operation can be formulated as follows: where s is the original time series, which refers to the highfrequency subseries. A is the constant parameter which controls the scale of the noise. X is the noise matrix, of which distribution is with the zero mean and standard deviation kσ 0 , wherein σ 0 is the standard deviation across the whole time series. Unlike image or natural language data, time series such as stock price data are time sensitive. at is, the stock price at the current time point is closely related to that at shorttime points rather than overdue time points. To tackle this problem, a decay factor λ ∈ (0, 1) is introduced, which controls the scale of noises added to the data at different time points: where s i is the ith entry of the high-frequency subseries and i is the index. L 0 and L sub are the length of original time series and subseries, respectively. In this way, it is more likely to generate data in the same distribution as original data whilst preserving the truly underlying patterns of the original series near the ground truth. Except for simply adding noises to original data, interpolation [28] can be also used here, which is a data transformation commonly used in image processing. For each sample in the dataset, we find the near neighbors to generate new data with interpolation: where s refers to the high-frequency subseries of the input series and s is the counterpart of the neighbor sequence. β is the coefficient in the range {0, 1}, which controls the freedom degree of interpolation. For example, when β is set to 0.5, both original time series and the neighbor ones are balanced. In our work, as the nearest neighbor is too similar to the original time series, we just choose one neighbor which is several time steps near the target one to perform interpolation.
To intuitively understand the above transformations, we illustrate different results of three transformations over the high-frequency components of two synthetic time series in Figure 3. And the details about how the augmented data were utilized are illustrated as Figure 4.

Stock Price Prediction.
Given a time series of stock prices p t | t � 1, 2, . . . , T , where T is the length of the sequences fed to deep models, and deep models aim to predict the next price p T+1 . To get a higher accuracy in stock price, numerous models have been applied and made the progress to some extent. Among them, LSTM serves as the most effective one which captures long-and-short-term dependencies of the input sequences. In this work, we choose LSTM as the base model to conduct the stock price trend prediction. e  Complexity structure of LSTM can be formulated in the following functions: where x t is the input value at each time t, h t and h t− 1 are hidden states of the LSTM, and c t is the memory state. sigmoid(·) and tanh(·) are two types of the activation functions for three types of gating units: the input gate i t , forget gate f t , and output gate o t . W * and b * denote weight matrices and bias vectors, respectively. "∘" represents the operation of point-wise multiplication. Parameters of the model can be learned by standard back propagation with the mean squared error according to the MSE as the objective function: where N is the number of training samples. y i and y i are the predicted value and the ground truth of the ith sample in the training set, respectively. On the basis of the base model, we can evaluate the efficacy of our proposed data augmentation method.

Experiments
In this section, we evaluate the effectiveness of our method on a real-world dataset.

Dataset.
e used dataset is a real-life stock price dataset. It includes the daily open prices of 50 stocks among 10 sectors from 2007 to 2016. e list of the stock symbols is given in Table 1. We treat the dataset from 2007 to 2014 as training set, while stock prices in both 2015 and 2016 are regarded as the validation set and test set, respectively. e LSTM model is trained on the training set of these 50 stocks, and then the average accuracy is evaluated on the test set to validate the performance of the trained model. To augment the dataset and enhance the performance of models, the proposed data augmentation methods were also applied, as shown in Figure 4. For each instance in the training set, a new training sample was also gemnerated to augment the dataset.
where N is the number of training samples. y i and y i are the predicted value and the ground truth of the ith sample of the training set, respectively. In general, the hidden state dimension of LSTM is set to 50 and the length of time series fed to the model is L � 24. In addition, the batch size is set to 50. All the parameters are optimized in 2,000 epochs with the RMSProp optimizer and standard mean square error (MSE). In the procedure of the data preprocessing, a soft threshold is used to denoise high-frequency components of training samples induced by the wavelet transform [21]. en, a set of transformation techniques are adopted, including random noise corruption, decay-scale noise corruption, and interpolation. Likewise, the same operations are also applied for original data. Experimental results are shown in Tables 2-4. To show the effectiveness of our method, two LSTM models, respectively, trained on augmented dataset and original dataset are tested on several individual stocks, and the resultant square error curves are shown in Figure 5. Tables 2∼4 show the results of comparison experiments. In these tables, the left records result from transformation techniques applied to the high-frequency patterns, while the right ones are from the identical transformations applied to original time series. From Table 2, the scale of the random noise has a direct influence on the performance of LSTM. When the scale of random noise is very low (which is set to 0.05), both LSTM models trained on the augmented dataset and original time series can achieve a relative sound results. With the rise in the scale of noise, LSTM on the right tends to be worse than before. e reason could be that the pattern of the input sequence has been damaged when the scale of the random noise is over a certain threshold. With the increase in the scale coefficient, LSTM trained on the augmented dataset from the high-frequency patterns performs better than that trained on the counterpart augmented from original time series. is could be attributed to the fact that adding random noise to high-frequency patterns as well as preserving low-frequency counterparts can capture the primary patterns of original time series, which confirms our previous claims.
Another observation is that although different scales of noise have been applied to high-frequency patterns and original time series, respectively, LSTM models achieve limited performance gains. e reason is probably that the importance of data in different periods is not the same. If we simply treat them equally, this might damage the underlying patterns of original time series and make the ground truth confusing. To verify the viewpoint, a decay factor λ is introduced to control the scale of noises, and the results are shown in Table 3. In the experiments, as the trained LSTM models work well when λ is set to 0.1. It can be observed that using λ, LSTM models achieve performance gains, since the decay factor λ can maintain main patterns near the ground truth, to some degree. When adding the decay-scale noise to    Table 4, the effectiveness of the interpolation transformation can be observed. When the parameter λ stays in a low level, LSTM trained on the augmented dataset performs better than that trained on the counterpart augmented dataset from the original time series. With the rise in the scale of noises, the first LSTM can achieve a relative sound result while the second does not work well. e reason could be that the interpolation over the high-frequency patterns can still keep the underlying patterns of original time series, which implies the efficacy of our method.

Conclusion
In this paper, we propose a general data augmentation method, which can be applied to the time series without any specific knowledge. It aims to preserve the main patterns of the original time series as it only operates on the highfrequency components. To keep most information near the real label, a decay factor is introduced to control the scale of noises added to time series. is ensures the coined data to be time sensitive. To evaluate the efficacy of the proposed data augmentation method, we conduct the experiments on the real stock price dataset based on the basic LSTM model. Experiment results show that the proposed data augmentation method can boost stock price prediction performance of the basic LSTM model.

Data Availability
e dataset used in our manuscript is widely used in many articles which can be downloaded from the Internet.

Conflicts of Interest
e authors declare that they have no conflicts of interest.