For profit maximization, the model-based stock price prediction can give valuable guidance to the investors. However, due to the existence of the high noise in financial data, it is inevitable that the deep neural networks trained by the original data fail to accurately predict the stock price. To address the problem, the wavelet threshold-denoising method, which has been widely applied in signal denoising, is adopted to preprocess the training data. The data preprocessing with the soft/hard threshold method can obviously restrain noise, and a new multioptimal combination wavelet transform (MOCWT) method is proposed. In this method, a novel threshold-denoising function is presented to reduce the degree of distortion in signal reconstruction. The experimental results clearly showed that the proposed MOCWT outperforms the traditional methods in the term of prediction accuracy.
Stock price prediction is a typical problem based on time series forecasting, and various stock forecasting methods emerge in an endless stream. Stock price prediction means to predict the stock price after a certain time, so as to help investors realize the maximum benefit. The methods proposed in the literatures can be roughly divided into two categories containing traditional mathematical methods and economic methods. Many previous works are based on traditional statistical methods [
The deep learning method has a very superior performance compared with the traditional statistical method. One of the main reasons is that with the direct analysis, deep learning can map the original data to a nonlinear model thereby giving a better fitting effect through the multilayer neural network. In addition, deep learning has the advantage of self-selection in the application of the financial field. Most financial data is highly noisy and unstable. With deep learning, various events that have a significant impact on finance can be expressed by knowledge maps. And then features are selected automatically through deep networks to adjust parameters and weights. The results obtained in this way may be more accurate and objective. Recurrent neural networks [
In this paper, the wavelet denoising method is introduced into data preprocessing. The data with wavelet preprocessed were used as the training data. Our main contributions are as follows. First, we have improved the traditional wavelet denoising method and its performance is better than the traditional wavelet denoising method. Secondly, we propose a new multioptimal combination wavelet transform (MOCWT) method. Compared with the traditional wavelet method and the improved wavelet method, MOCWT has the best performance.
The remainder of this paper is organized as follows. Some related works are reviewed in Section
The issue of stock forecasting has been widely concerned by researchers. Various stock forecasting methods emerge in an endless stream. For example, some researchers use a news-oriented approach to predict stock trends.
Ziniu Hu et al. [
Currently, the deep learning methods to predict stock prices have become the most widely used method. Yue-Gang Song et al. [
In the case of stock forecasting, unlike the deep learning methods that have been widely used, there are a few researchers using convolutional neural network (CNN). CNN is originally used in image processing and has excellent performance. Ehsan Hoseinzade et al. [
Stock price data is a typical time series data; in this section, the LSTM model is utilized to handle the stock price forecasting task. First, different structures of LSTM are utilized. Then, a new method named multioptimal combination wavelet transform (MOCWT) is proposed for the aim of data denoising.
LSTM network is a special RNN. Due to its unique structure, LSTM is suitable for handling and predicting problems with long intervals and delays in time series. LSTM is commonly used in autonomous speech recognition [
The concept of gates is brought in LSTM. Through “gates,” the transmission of information can be controlled in LSTM, which results in enabling the activation of long-term information. The simplest way of controlling the transmission of information is to multiply the corresponding points of two matrices with exactly the same size. All the points of multiplied matrix is in the range of
Figure
LSTM Neural Unit.
Cell information transmission is controlled by forgetting gate. The passed information can be determined by both of the output of the hidden layer
The function of the input gate is to control which information among input information (
The output gate determines which information in the cell can be output. Similarly, the activation function used by the input information is to set the forgetting gate. And then the cell is activated by the tanh activation function. Finally, the dot multiplication determines which information should be output. The formula is expressed as
In the RNN, the hidden layer unit usually only includes a single activation function. During the training process, the network is usually optimized by a backpropagation algorithm. Due to the multiplication mechanism of the reverse parameter optimization process, when the number of layers of the neural network is large, the gradient disappearance problems or the gradient explosion problems easily emerge. These problems can be well avoided by the structural mechanism of LSTM. In LSTM model, the structure of the hidden layer unit is more complicated, usually including multiple active layers, add operations, and multiplication operations.
In the experiment of this paper, LSTM with different number of layers were used to find the best training model. During the training process, data truncation length is set to 30, and the data distribution for training is shown in Figure
Model loading training data description.
Wavelet transform is a time-frequency local analysis. The multiscale refinement of wavelet transform can be carried out by the stretching and translation of wavelet. When the frequency is high, the time is subdivided. When the frequency is low, the frequency is subdivided. Thereby the details of signal can be analyzed explicitly.
Let the signal length be
Wavelet transform of a signal is its time domain and frequency domain transform. Useful information is extracted from the signal or noise information is removed; the basic wavelet is regarded as the simulation unit. To better preserve the original information, signals can be decomposed into a set of high frequency and low frequency by wavelet transform. The traditional high-pass or low-pass filter directly processes the original signal without decomposition, which may miss some usefulness of the signal information. Wavelet decomposes the original signal by operations such as stretching and translation of the basic wavelet. Then, a series of wavelet coefficients are obtained. Finally, through the low-pass or high-pass operation, the low-frequency information CA or the high-frequency information CD of the signal are obtained, respectively. Figure
Multilayer wavelet decomposition.
Since the effective signal generally is continuous in the time domain, the corresponding wavelet coefficients are generally large after the wavelet transform. While the noise signal generally is random and discontinuous in the time domain, accordingly, after the wavelet transform, the corresponding wavelet coefficients are relatively small. With a presetting threshold, the low-frequency wavelet coefficient and high-frequency wavelet coefficient are filtered. Then, the remaining part is inversely transformed by wavelet. Finally, the original signal is reconstructed. The wavelet noise reduction process is shown in Figure
Wavelet noise reduction process.
In the subsequent experiments, discrete wavelet transform DWT is used to discretize the power series of scale parameters, and then the time is discretized for analyzing signal.
The basic wavelet in the subsequent experiments is Haar wavelet, which is a common basic wavelet. It has basic properties and meets the requirements of discrete wavelet transform. Its characteristics are as follows: Haar wavelet has the property of tight support; and discrete transformation can also be implemented; in addition, its support length is 1 and it is symmetric.
(1) One characteristics of the Haar wavelet is compacted support, with which the function has an outstanding sharp drop-off performance.
(2) Haar wavelet has a small support length, which conveniently shortens the computation time and obviously reduces the data processing time and the training time.
(3) The Haar wavelets are symmetric, which beneficially reduces the distortion rate during signal analysis and signal reconstruction. Therefore, the real price can be greatly restored after noise reduction.
In order to reduce the fluctuations in stock prices, the threshold
The threshold function mainly contains a soft threshold-denoising method and a hard threshold-denoising method. The soft threshold-denoising method is defined as follows: wavelet coefficient with its absolute less than the threshold is reset to zero; when the absolute value of the wavelet coefficient is greater than the threshold, the absolute value of the wavelet coefficient is subtracted from the threshold value. Expressions are as follows:
The hard threshold-denoising method is defined as follows: wavelet coefficient with its absolute less than the threshold is reset to zero; inversely, wavelet coefficient retains. Expressions are as follows:
The soft threshold-denoising method is continuous. Although without additional oscillation, it fatally brings in a constant deviation. The hard threshold-denoising method with smaller mean square error inevitably introduces jumping points and additional oscillation into the signal.
Both of the common methods above have their own drawbacks, respectively. In this paper, a new multioptimal combination wavelet transform (MOCWT) method is proposed. Compared with the two traditional threshold functions, MOCWT has the advantage of continuity without jump phenomenon of hard threshold function, which can smooth the signal perfectly after noise reduction. Constant deviation will not be generated by MOCWT.
When the value of the signal is changing, especially, a produced difference can obviously reduce the degree of distortion in signal reconstruction. The expression is
In this section, the experimental results comparing the proposed MOCWT with original wavelet method and improved wavelet method are given to demonstrate its outstanding performance.
The source data used in this experiment is the opening price of the S&P 500 index for nearly eighteen years. S&P 500 index is a comprehensive stock index that records 500 listed companies in the United States. Because it contains a lot of companies, the changes of the broader market can be greatly reflected. It also has the characteristics of wide sampling, strong representativeness, high precision, and good continuity.
In addition, the total error of training is employed as the evaluation criterion of experiment.
In Figure
Loss of different iterations.
Iterations | 1 | 301 | 3301 |
| |||
Loss | 0.010164 | 0.004343 | 0.000170 |
Distribution of real and predicted values.
The result of the first iteration
The results of iteration 301
The result of the 3301 iteration
In the experiment, different modes of wavelet were used for data preprocessing. The preprocessed data was tested on LSTM models with 1, 2, 3, and 4 layers. With the increase of the number of layers in the model, its performance gradually deteriorated. The following five sets of experiment result show that the LSTM with two-layer structure has the best performance.
The experimental results of the original data without wavelet preprocessing, the data with original wavelet soft threshold preprocessing, and the data with original wavelet hard threshold preprocessing were compared. The experimental results are shown in Figure
Comparison of wavelet transforms and unused wavelet transforms.
Figure
The experimental results as shown in Figure
The original wavelet threshold mode is improved, in which the threshold control function is introduced. The experimental result of the method without wavelet, original wavelet method, and improved wavelet method were compared and shown in Figures
Comparison of unused wavelets, original wavelet soft thresholds, and improved wavelet soft threshold results
Comparison of unused wavelets, original wavelet hard thresholds, and improved wavelet hard threshold results
Experimental results reveal that the improved method is effective. The performance of the improved hard threshold processing method is improved obviously. The minimum loss reaches 2.453. And the improved soft threshold method performance is relatively weak. Its loss decreases by 5.163%.
The original wavelet soft threshold, the improved wavelet soft threshold, and the performance of MOCTW are compared. Figure
Comparison of original wavelet soft threshold, improved wavelet soft threshold, and MOCTW results
Original wavelet hard threshold, improved wavelet hard threshold, MOCTW results comparison
The original wavelet hard threshold, the improved wavelet hard threshold, and the performance of MOCTW are also compared. The experimental results are shown in Figure
The experimental results clearly show that the performance of MOCWT is the best. Compared with the original and improved wavelet method, MOCWT has an obviously improved performance. Although the processing loss of MOCWT is nearly the same as that of the improved soft threshold when the number of LSTM layers is 2. Generally, the performance of MOCWT is outstanding compared with the traditional methods and the improved methods.
From above experimental results, we have successfully demonstrated that the MOCWT models are able to effectively improve the prediction accuracy.
In this paper, we improved the original wavelet denoising method. The performance of improved method is better than traditional methods.
In addition, we propose a new multioptimal combination wavelet transform (MOCWT) method, the experimental results of which show that its performance is the best compared with the traditional wavelet method. The proposed MOCWT is obviously superior to the traditional method and the improved method. For the original data method without wavelet processing, the prediction results will always have a large oscillation, and the fitting effect of the real data is poor. The overall performance of the model is affected. The experimental results illustrate that the data characteristics are of great significance to the performance of the whole model.
In this work, there still are some key experimental points worth exploring in the future. For example, the optimization method and structure of the neural network and the loss function, as well as the parameter variables in the experiment, are all worth further optimizing.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that there is no conflict of interest regarding the manuscript.
This work is supported by National key Research and Development Program of China under Grants nos. 2017YFB1103603 and 2017YFB1103003, National Natural Science Foundation of China under Grants nos. 61602343, 51607122, 61772365, 41772123, 61802280, 61806143, and 61502318, Tianjin Province Science and Technology Projects under Grants nos. 17JCYBJC15100 and 17JCQNJC04500, and Basic Scientific Research Business Funded Projects of Tianjin (2017KJ093, 2017KJ094).