Improving Deep Learning for Forecasting Accuracy in Financial Data

,


Introduction
Recently, artificial intelligence has had a great impact on the global business environment and has found applications in many different fields. Financial data is challenging to analyze because it possesses a lot of uncertainty. Artificial intelligence can be used to classify financial data for analysis, allowing the team to screen and analyze data more quickly and to help them make more precise decisions, significantly reducing human error, bring better returns to customers, make more accurate predictions of possible outcomes, and facilitate risk control. Financial forecasting is an important part of financial data analysis. A variety of effective analytical methods have been proposed for this purpose, including short-term prediction methods such as regression analysis, exponential smoothing, and autoregressive moving average models [1][2][3][4][5][6]. Research on recurrent neural networks (RNN), one of the effective and popular artificial intelligence methodologies, began in the late 1980s, but, with recurrent neural networks, one needs to train millions of parameters, which was difficult to accomplish at that time. However, with the development of optimization methods and parallel computing in recent years, computers now have the ability to complete the training of millions of parameters, which has once again made loopy neural nets such as RNN a hot topic. Mikolov et al. and Wu et al. proposed and applied a language model based on RNN, which has achieved great success in the field of natural language processing (NLP) [7,8]. Lipton et al. adapted the RNN approach for speech and handwriting recognition [9], achieving an improved accuracy rate of 20% over previous results. Bengio et al. [10] found that as the memory of the historical state gradually increased, the problems of gradient disappearance and gradient divergence occurred.
eir findings indicate that this type of neural network can only memorize the transient historical state. Long short-term memory (LSTM) [11] is based on using recurrent neural network, with the addition of three gate control structures, to solve the problem of gradient disappearance, thus allowing the training of a neural network model with a longer period of memory.
Generally, support vector machine (SVM) and extreme learning machine (ELM) classifiers are suitable for use in classification case studies requiring high classification accuracy [12,13]. Predictions can be made based on financial training data to determine whether trends will go up or down but not how much the rise or fall will be. However, SVM has the following disadvantages [12]: (1) the SVM algorithm is difficult to implement for large-scale training samples because quadratic programming is used to solve the support vector, which involves the calculation of an m-th order matrix. If the m number is large, a lot of memory and computing time will be consumed for storage and calculation of the matrix. (2) SVM also has difficulty solving multiclassification problems. e classical SVM algorithm can only handle two-class classification. However, for practical applications of data collection, it is generally necessary to solve multiclass classification problems. It is difficult to analyze little data. When there is little training data, the accuracy of the training results is very high but the accuracy of the actual prediction result is very low, and the accuracy of the secondary prediction will be different. ELM reduces the complexity behind feedforward networks by generating sparse, randomly connected hidden layers. It requires less computation time, but its actual performance depends on the different tasks and data [13]. Due to the aforementioned problems, in this paper, we propose using deep learning (the LSTM model) to improve the prediction results based on financial data. e model contains multilayer networks including the input layer, output layer, hidden layer, and countless neurons and nonlinear excitation functions. It is hoped that the accuracy of the prediction can be improved using this approach. Some deep learning research methods are introduced below.
Proportional conjugate gradient backpropagation is a network training function where weights and bias values are updated according to the proportional conjugate gradient method. It can train any network as long as its weight, net input, and transfer functions have derivative functions. Backpropagation is used to calculate derivatives of performance with respect to the weight and bias variables; see Møller [14] for a more detailed discussion of the scaled conjugate gradient algorithm. e long short-term memory method is a special RNN model originally proposed to solve the problem of gradient divergence encountered with the RNN model. In the traditional RNN, the training algorithm uses backpropagation through time (BPTT). When the time is long, the residuals that need to be returned will decrease exponentially, resulting in slow network weight update, which cannot reflect the long-term memory effect of RNN. erefore, a storage unit is needed to store the memory. e LSTM model was proposed to alleviate this problem; for related theory, please refer to [11].
In 1998, Huang proposed the Hilbert-Huang transformation, which has since received extensive attention from the academic community. is is an effective method for analyzing nonlinear, nonstationary time series, including the empirical modal decomposition method and Hilbert transform [15]. Zhang et al. [16] used the ensemble empirical mode decomposition (EEMD) method to study and analyze changes in and the characteristics of international crude oil prices, decomposing them into short-term fluctuations, medium-term fluctuations, and long-term trends. Wang et al. [17] studied the EMD-HW bagging method based on empirical mode decomposition, moving block bootstrap and Holt-Winter forecasting. Guhathakurta et al. [18] applied the EEMD method to analyze the relationship between the Indian stock market and the exchange rate and concluded that the impact models of the stock market and the exchange rate market are similar. Khalid et al. [19] found that the empirical mode decomposition method could replace the mean square error and the mean absolute error criterion of all other models for stock market returns and direction prediction. Islam et al. [20] applied the EMD method for the decomposition of data sequences, in comparison with the wavelet decomposition method, finding the EMD method to be more effective. Fang [21] applied the EEMD technique for analysis of the psychological state of investors in their study of the relationship between stock prices and investor psychology. Recently, an integrated approach using multiple models has been used for better performance in prediction problems [22][23][24][25][26][27][28][29][30]. For example, it was found that wind speed can be more accurately predicted by combining EMD and different prediction techniques [24]. Liu et al. proposed a neural-network-based EMD hybrid wind speed predictor, in which each IMFS/residue component is trained using the appropriate backpropagation techniques [26]. In Ren et al. [27], a combination of support vector regression (SVR) and EMD was used for accurate wind energy prediction. All the above studies show that the application of EMD technology for wind speed prediction improves the overall accuracy and prediction ability of conventional methods.
Financial data are classified as broadband data in the frequency domain, which means that they contain a large range of fluctuations, so it is not enough just to use LSTM to make predictions. In this study, EMD is used to transform the raw nonlinear financial data into a limited number of intrinsic mode functions (IMF) and a residual. e bandwidth of these IMFs becomes narrower, with regular cyclic, periodic, or seasonal components in the time domain. LSTM is a good way to predict cycles, periods, or seasonality. e prediction result is obtained by adding all the IMFs together. Compared with using only the LSTM, the EMD-LSTM can reduce the root-mean-square-error (RMSE) compared with real data. e paper contributes to a deeper understanding of the application of deep learning combined with EMD for 2 Discrete Dynamics in Nature and Society financial data forecasting, significantly increasing the accuracy of the prediction models.

Description of the Research Data.
In the 21st century, corporate social responsibility (CSR) has become a factor that enterprises must take into consideration to ensure sustainable operations. Broadly speaking, CSR means that, in addition to pursuing the best interests of stockholders, companies must also take into account the interests of other stakeholders, including employees and consumers, suppliers, the community at large, and environmental concerns. Concern with CSR emerged during the extremely prosperous period of industrial development in the 20th century. When a developed country reaches a certain level of maturity in terms of industrial and commercial development, the population at large and the company begin to think about the relationships between the enterprise and the environment, community, labor force, and so forth. With the growth in economic globalization and the continued expansion of multinational corporations since the 1980s, labor relations in several countries have become extremely unbalanced. e protection of the rights and interests of labor has become a social issue of global concern, and the question of social responsibility has become more important. In western countries, where the labor force is often overseas, especially the United Kingdom and the United States, some have begun to argue that the cost and responsibility on government for social welfare should be reduced and that business enterprises should bear more of this social responsibility. e corporate social responsibility movement was initiated in developed European and American economies and gradually evolved into a worldwide trend.
Given the global emphasis, more and more companies are paying attention to CSR, demonstrating better external development than nonexecutive companies, with relatively good financial performance. Corporate governance includes mechanisms for guiding and managing enterprises and how they implement the responsibilities of business operators to protect the legitimate rights and interests of shareholders while also taking into account the interests of other stakeholders. Generally speaking, corporate governance mechanisms are divided into two types, external and internal. External governance refers to the promotion of private profits and protection of shareholders' rights through the actions of government, judicial units, and external market forces. On the other hand, internal governance aims to achieve operational objectives through the ownership structure and the functions of the board of directors and management, for the best interests of the company and all shareholders, to assist in the management of the company, and to provide effective monitoring mechanisms. e "Taiwan Corporate Governance 100 Index" lists companies on the Taiwan Stock Exchange (including domestic listed companies and first listed companies of foreign companies, excluding Taiwan Depository Receipts). e constituent "indexes" have been selected through several quantitative criteria as outlined below (https://cgc.twse.com. tw): (1) Sample parent company: public shares listed for public offering. First, stocks with a minimum daily trading amount of 20% during the most recent year are deleted. en, stocks that meet the liquidity test standards for the sample are selected, that is, the stocks that meet the "20% of the results of the recent 1-year corporate governance evaluation" and "the net value of the net income per share at the end of the previous year must not be less than the denomination." en, they are ranked according to "after-tax net profit in the most recent year" and "revenue growth rate in the most recent year," and the respective top rankings are sorted from small to large. e top 100 stocks are selected as constituent stocks. In other words, there is not a single factor list for "Corporate Governance Assessment." e "Corporate Governance 100 Index" is subject to review in July each year. It is subject to liquidity inspection, corporate governance evaluation and screening, and three financial indicators (net value per share not less than the denomination, after-tax net profit ranking, and revenue growth rate). e data for the Taiwan CSR index were sourced from https://www.taiwanindex.com.tw for the period from June 15, 2015, to December 12, 2018, to obtain a total of 863 datasets. ere are daily data, five per week. e entire dataset is divided into two parts, with 90% of the total 779 datasets used for training (from June 15, 2015, to August 14, 2018) and the other 10% used for verification (a total of 84 datasets, from August 15, 2018, to December 12, 2018).
Next, the verified indicator root-mean-square-error (RMSE) is calculated using the following equation: where y t is the real data (verification); y t is the prediction data. e smaller the RMSE, the closer the prediction data is to the real data (verification), and the larger the RMSE, the greater the difference between the predicted data and the real data (verification).

Long Short-Term
Memory. e long short-term memory model is a special RNN model that is proposed to solve the problem of gradient divergence encountered with the RNN model. In the traditional RNN, the training algorithm uses backpropagation through time (BPTT). When the time is long, the residuals that need to be returned will decrease Discrete Dynamics in Nature and Society 3 exponentially, resulting in slow network weight update, which cannot reflect the long-term memory effect of RNN. erefore, a storage unit is needed to store the memory. e LSTM model is proposed to alleviate this problem. For a discussion of the related theory, please refer to [11].

Empirical Mode Decomposition.
e Hilbert-Huang transform is a new tool for nonstationary data analysis. Financial data is nonlinear, nonstationary, and complex and has no rules. After EMD is used for decomposition into multiple IMF bases, each IMF can be decomposed to discover inherent laws that are hidden in the data; for the related theory, please refer to [15]. e sifting procedure starts with the identification of the neighborhood minima and maxima of a periodic arrangement X(t). First, recognize all the nearby maxima; then, interface with them with a cubic spline line to frame the upper envelope e u (t). Rehash the methodology for the nearby minima to deliver the lower e l (t). e local mean can be determined by e mean is assigned in (2), and the contrast between the data and m 1 (t) in the main part is acquired by the accompanying condition: In the consequent sifting process, h 1 (t) is viewed as the information: e EMD can rehash this sifting system k times, until h 1k is an IMF. Now which is the primary IMF part obtained from the data. e standard deviation decides when to halt the sifting procedure. is can be revised by restricting the size of the standard deviation (SD), processed from two continuous sifting results as follows: At the point when the SD can be set somewhere in the range of 0.2 and 0.3, the primary IMF c 1 is acquired, which can be composed as follows: Note that the buildup r 1 still contains some helpful information. We can, in this manner, treat the buildup as new information and apply the above methodology to get is technique should be rehashed until the last arrangement r n conveys no oscillation data. e rest of the arrangement is the pattern of this nonstationary information X(t). Combining (6) and (7) yields the EMD for the first sign: In this manner, one can decompose the information into n-empirical modes and buildup r n , which can be either the mean pattern or a steady pattern.
e IMFs c 1 , c 2 , . . . , c n incorporate distinctive recurrence groups extending from high to low.

Results and Discussion
e up and down movement of Taiwan's Corporate Governance 100 Index is discussed below. Figure 1 shows Taiwan's CSR index. e statistical characteristics are as follows: the mean is 5412, the standard deviation is 600.1820, and the variance is 360220. is dataset contains a total of 863 points, from June 15, 2015, to December 12, 2018. e Y-axis shows the TW CSR index, that is, the TWSE Corporate Governance 100 Index. e start date is June 15, 2015, and the starting date index is 5000. After June 2015, the index fell, mainly due to the global stock market crash that occurred in August 2015. e biggest reason for this was that China's economic slowdown was worse than expected. e US Federal Reserve raised interest rates in September 2015 and international oil prices fell below $40, causing panic in international financial markets. In August 24, 2015, the Taiwan stock market fell 583 points, 7.5%, the biggest one-day drop in history, to the lowest level in 33 months. Taiwan's economy improved in January 2016, and Taiwan's CSR index also rose. According to the index Company statistics from December 2016 to December 2017 led to an increase in the Corporate Governance 100 Index by 18.39%, at a semiannual rate of 5.63%. Both increases exceed that of the simultaneous weighted index for December 2016 to December 2017 which was 15.99% at a semiannual performance rate of 4.96%. Companies showing good corporate governance were favored by investors. In October 2018, the Federal Reserve (Fed) continued to raise interest rates, while the International Monetary Fund (IMF) revised their global economic growth rate forecast for the next year. However, US economic growth has since slowed down, and the trade war that shows no sign of ending in the short term has caused the global stock market to fall. e US Dow Jones Industrial Average fell by more than 800 points and Asian stocks fell into a bear market. e Taiwan stock marketweighted index has fallen by 1,517 points since October 2018, and the TWSE CG 100 Index also fell. e statistical properties of this data are as follows: the mean is 5412, the standard deviation is 600.1820, and the variance is 360220, for a total of 863 points.
Proof of Taiwan's CSR index data is nonlinear and nonstationary [31]. e main characteristic of nonlinearity is that if there is a disproportionate relationship between the input and output for the equation describing a certain system, it is called nonlinear data. As can be seen in Table 1 4 Discrete Here, x t is Taiwan's CSR index; m x for the mean is 5412; and σ 2 x is the variance. As can be seen in Figures 2(a) and 2(b) the monthly mean and the monthly variation of the Taiwan CSR index are different at each time point. erefore, the Taiwan CSR index data can be defined as nonstationary.
is study uses LSTM regression networks for TW CSR index forecasting. e LSTM model parameter settings are as follows: the specified LSTM layer has 200 hidden units. First, the adaptive moment estimation optimizer is selected and trained for 250 periods. To prevent system divergence, the gradient threshold is set to 1. An initial learning rate of 0.005 is specified and the learning rate is reduced by multiplying it by a factor of 0.2 after 125 epochs. e TW CSR index dataset includes a total of 863 datasets, from June 15, 2015, to December 12, 2018. All the data are divided into two parts, with 90% used for training (a total of 779 points, from June 15, 2015, to August 14, 2018). e other part is used for verification (10% or 84 points in total, from August 15, 2018, to December 12, 2018). e LSTM model parameters used for the regression are set as follows: the specified LSTM layer has 200 hidden units. First, the adaptive moment estimation optimizer is selected and trained for 250 periods. To prevent system divergence, the gradient threshold is set to 1. An initial learning rate of 0.005 is specified and the learning rate is reduced by multiplying it by a factor of 0.2 after 125 epochs. Figure 3 shows the LSTM's TW CSR index forecast results. Figure 4 shows the LSTM's TW CSR index forecast and the actual data verification results. e RMSE is 333.9627. e decomposed data can reflect fluctuation information on different time scales while retaining the characteristics of the original data. e TW CSR index is first decomposed into short-, medium-, and long-term time series components. Here, there are six components, labelled IMF1 to IMF6. Figure 5 shows the results for IMF1. In terms of statistical characteristics, this is highfrequency data with an average period of 0.9279 weeks. e mean is −0.0390, the standard deviation is 30.2396, the variance is 914.4325, and the Pearson correlation coefficient is 0.0598. Figure 6 shows the LSTM prediction results for IMF1, and Figure 7 shows the prediction and actual verification results of LSTM for IMF1 with an RMSE of 2.7274. Figure 8 indicates the results for IMF2. Its statistical characteristics are as follows: the average period is 2.7839 weeks, the mean is 1.0316, the standard deviation is 41.5703, the variance is 1728.1, and the Pearson correlation coefficient is 0.0806. is is the second-highest-frequency data. Figure 9 shows the prediction results obtained with LSTM for IMF2, and Figure 10 shows the prediction and actual verification results obtained with LSTM for IMF2; the RMSE is 77.4748. Figure 11 shows the results for IMF3. e statistical characteristics are as follows: the average period is 7.3394 weeks, the mean is 3.2523, the standard deviation is 64.4050, the variance is 2.3327, and the Pearson correlation coefficient is 0.5110. Figure 12 shows the LSTM prediction results for IMF3, and Figure 13 shows the LSTM prediction and actual verification results for IMF3. e RMSE is 115.9812. IMF3 is comprised of intermediate frequency data. Figure 14 shows the results for IMF4. e  Discrete Dynamics in Nature and Society 5 statistical characteristics are as follows: the average period is 18.6308 weeks, the mean is 2.0298, the standard deviation is 79.5222, the variance is 6323.8, and the Pearson correlation coefficient is 0.1010. Figure 15 shows the prediction results obtained with LSTM for IMF4, and Figure 16 shows the prediction and actual verification results obtained with LSTM for IMF4. e RMSE is 51.8842. Figure 17 shows the IMF5 analysis. e statistical characteristics are as follows: the average period is 48.44 weeks, the mean is −25.8465, the standard deviation is 85.7134, the variance is 7346.8, and the Pearson     Discrete Dynamics in Nature and Society correlation coefficient is 0.5485. Figure 18 shows the LSTM prediction results for IMF5, and Figure 19 shows the prediction and actual verification results obtained with LSTM for IMF5. e RMSE is 35.5218. Figure 20     Figure 21 shows the prediction results obtained with LSTM for IMF6. Figure 22 shows the prediction and actual verification results obtained with LSTM for IMF6. e RMSE is 52.6335.
ese IMFs are added to restore the predicted data. Figure 23 shows decomposition by the EMD followed by predictions by the LSTM method, after all the  IMF prediction results are added together. Figure 24 shows that the LSTM adds all the EMD prediction results to the actual verification RMSE of 175.7331. Table 2 shows the statistical recognition of the decomposition  Discrete Dynamics in Nature and Society sequences. EMD is used to decompose the data to different IMFs with simpler statistical properties for LSTM prediction according to the characteristics of different IMFs. Table 3 indicates the difference in the RMSE between the real data and the predicted results for the two methods. It can be seen that the LSTM plus EMD RMSE is 175.7331, better than the LSTM predicted RMSE, which is 333.9627.

Conclusion
is paper proposes an empirical modal decomposition method to improve deep learning for the prediction of financial trends and financial data. Deep learning technology (e.g., LSTM) is suitable for big data prediction, but it can also be used for small data prediction with only poor accuracy. In fact, there are many practical situations where big data cannot be obtained, and only small data can be obtained for prediction. Data from Taiwan's CSR index were used for this study starting on June 15, 2015, with a total of 863 datasets. It was found that deep learning technology alone (in this case LSTM) is not good at predicting accuracy with small data. e EMD is used in this study to improve the accuracy. e standard method for dividing a dataset is 70% for training and 30% for testing. e more training data, the more accurate the results obtained. In many cases, the amount of data used for training is determined based on the characteristics of the data. In the study, the best results were obtained when 90% of the TW CSR index dataset was used for training and 10% for testing. In MF6, we can observe that the index is at its lowest on January 5, 2016 (4693), and at its highest on April 18, 2018 (6161). During this period, the total index rose by 1468, with the highest fall after April 18, 2018. Verification and comparison of the two methods show that EMD plus LSTM produces less error than prediction results obtained with only the LSTM model. e advantages of the proposed model are as follows: 1. EMD does not require complex mathematical operations. 2. EMD can analyze the frequency of data changes over time, disassembling complex financial data into components with multiple simple characteristics, and predictions made based on these components can improve prediction accuracy. 3. is research model is suitable for trending data such as economics or finance. Many new and improved EMDs have been proposed. e latest EMD prediction results for comparison could be a good direction for future research.

Data Availability
e Taiwan CSR index data used to support the findings of this study have been deposited in the Taiwan Corporate Governance 100 Index repository (https://www. taiwanindex.com.tw).

Conflicts of Interest
e authors declare that they have no conflicts of interest.