DLI: A Deep Learning-Based Granger Causality Inference

Integrating autoencoder (AE), long short-term memory (LSTM), and convolutional neural network (CNN), we propose an interpretable deep learning architecture for Granger causality inference, named deep learning-based Granger causality inference (DLI). Two contributions of the proposed DLI are to reveal the Granger causality between the bitcoin price and S&P index and to forecast the bitcoin price and S&P index with a higher accuracy. Experimental results demonstrate that there is a bidirectional but asymmetric Granger causality between the bitcoin price and S&P index. And the DLI performs a superior prediction accuracy by integrating variables that have causalities with the target variable into the prediction process.


Introduction
Time series is a series of observation values of a variable arranged in a chronological order, which reflects the change of a phenomenon itself with time if there are no exogenous variables. Generally speaking, time series analysis focuses more on predicting the future based on the existing historical data [1][2][3] than interpreting the causalities which may exist among the variables. Exploring the causalities among financial time series can be important for portfolio management [4]. As a decentralized cryptocurrency, bitcoin has attracted more and more investors and traders owing to high-investment returns in recent years [5]. From January 1, 2014, to December 31, 2018, bitcoin price jumped from $771 to $3742 (USD), which made bitcoin a promising investment cryptocurrency. Interestingly, Yermack [6] asserted that bitcoin was not a currency as it performs poorly as a unit of account and as a store of value. And Corbet et al. [7] supported the conclusion of Yermack that bitcoin was a speculative asset rather than a currency. Moreover, Dyhrberg [8] proved that bitcoin can serve as a hedge against the stock market, and it is a helpful tool for both portfolio diversification and risk management. erefore, it is of great importance for investors and traders to forecast the bitcoin price and investigate the causes of its volatility.
In most circumstances, causality inference among financial time series is based on the Granger causality [9]. As a predictive causality, the Granger causality refers to that a time series x Granger-causes y if x's values provide statistically significant information about future values of y, i.e., predictions of y based on its prior values, and the prior values of x are better than predictions of y based only on its prior values. Some traditional approaches for Granger causality inference mainly include vector autoregression (VAR) [10], vector error correction model (VECM) [11], and their variants [12,13]. VAR and VECM are valid mostly when the input is stationary data. However, the results of some unit root test methods, such as ADF [14], showed that most economic time series are not stationary, while they may be stationary after preprocessing. Hence, traditional Granger causality inference for nonstationary time series needs to preprocess the input to reach a stationary sequence, which may bring pretesting distortions. e Wald test [15] has attracted much attention because there is no pretesting distortion, and it is based on a standard asymptotical distribution, irrespective of the unit roots and the cointegrating properties of the data [16]. However, the Wald test method may be inefficient since it intentionally overfits the VAR. Moreover, those aforementioned approaches are not good at capturing the complex representation of the input data.
Deep learning-based architecture could learn more abstract representation from the input data without data stationarity requirement. Chong et al. [17] proposed a deep learning-based stock market forecasting model to examine the ability of three unsupervised feature extraction methods of predicting future market behaviour. Based on a deep learning model, Chen et al. [18] built a computeraided diagnosis and decision-making system for medical data from MR images. Long et al. [19] proposed a multifilter neural network that integrated convolutional and recurrent neurons for feature extraction on economic time series samples and price volatility prediction. And the aforementioned deep learning-based forecasting models achieved promising forecasting performances. Lahmiri and Bekiros [20] employed LSTM for cryptocurrency prediction, which proved deep learning was highly efficient in predicting the inherent chaotic dynamics of cryptocurrency markets.
ose aforementioned deep learningbased models are prone to perform better than traditional econometric methods, which suggest the deep learningbased architecture is more potent in dealing with financial time series data.
In this paper, we construct a deep learning-based Granger causality inference architecture, named DLI, which consists of AE, CNN, and LSTM. e two contributions of our work are exploring the Granger causality between the bitcoin price and S&P index and predicting the bitcoin price and S&P index with a higher accuracy. e remainder of this paper is organized as follows. Available datasets we employed are presented in Section 2.
e proposed DLI is depicted in Section 3. Experiments and results are introduced in Section 4. Our contributions and future work are summarized in Section 5.

Data
We took the bitcoin price 1 and S &P index 2 as experimental datasets. Both of them can be downloaded from the Yahoo website, and their relative prices are in US dollars. Without loss of generality, we take the daily closing price as the day's price. e descriptive statistics for the bitcoin price and S&P index covering the period from January 1, 2014, to December 31, 2018, can be found in Table 1. e sample of the bitcoin price and S&P index contains 1,826 and 1,258 data points, respectively. Since stock markets are usually closed for holidays or other reasons, we employed AE to remove the data noise caused by default values.
To obtain a desirable model, we divide the experimental data into three parts: 70% training dataset, 10% validation dataset, and 20% test dataset. e training dataset is to reach a sound model, the validation dataset is to further determine the parameters of the whole network, and the test dataset is to test the generalization ability of the model.

Model Development
Autoencoder is a simple but powerful unsupervised deep learning model. A typical AE consists of three layers: input layer, hidden layer, and output layer, as shown in Figure 1.
And its output layer is an approximate reconstruction of the input layer, which can be used for filtering and representation learning. In the proposed DLI, we adopt AE as a filter to denoise the origin input, which is helpful for improving prediction accuracy.
Long short-term memory is a widely used deep learning model, which focuses on processing sequence data, such as time series data and speech. It is an extension of the recurrent neural network by adding the gate mechanism, which shows a better performance in longterm prediction. In the proposed DLI, we hope it can achieve a long-term accurate prediction by introducing the LSTM model.
Convolutional neural network is also a widely used deep learning model [21], which focuses on processing time series data (1D CNN), image (2D CNN), and video or medical image (3D CNN). CNN includes the convolution layer and pooling layer, as shown in Figure 1. And it can greatly reduce the amount of parameters and speed up training by local receptive fields and shared weights. Moreover, LeCun and Bengio [22] showed that time series have a strong 1D structure: variables that are spatially or temporally nearby are highly correlated, and CNN can effectively extract the spatial feature of time series. erefore, CNN is introduced into the proposed DLI to extract the spatial feature and to speed up training. Figure 1 shows the graphic illustration of the DLI which consists AE, CNN, and LSTM. We assume that both S&P index (X) and bitcoin price (Y) are time series of length T, . Let x t be the S&P index at time t and y t be the bitcoin price at time t. e DLI consists of three processing stages: denoising, feature extracting, and forecasting. As described in Section 2, since stock markets are usually closed for holidays or other reasons, the S&P index time series has many default values. erefore, at the denoising stage, AE is firstly used for data filtering to remove the noises in the S&P index. At the feature extracting stage, the denoised S&P index and bitcoin 2 Complexity price would be taken as the inputs of CNN and LSTM to extract deep representations, respectively. At the forecasting stage, we would obtain the bitcoin price prediction through a fully connected layer. e optimization of the DLI model is to minimize the reconstruction error of AE and the training error of the whole model. At the denoising stage, the output of AE is an approximate copy of the input. erefore, we have to minimize the reconstruction error between the input and the output, which could maintain the economic significance of the S&P index. e reconstruction error of AE is defined as follows: where f(·) and g(·) are activation functions, W and W ′ are weights, and b and b ′ are biases. It is necessary for obtaining a sound model to minimize the training error of the whole model. e objective function of the whole model can be described as where y t denotes the predicted value.

Empirical Results
In this part, we will explore the Granger causality between the bitcoin price and S&P index. To investigate whether the S&P index Granger-causes the bitcoin price, we firstly predict the bitcoin price without considering the S&P index, as shown in Figure 2. en, for comparison, we take the S&P index as auxiliary information to predict the bitcoin price, as shown in Figure 3. In the same way, to investigate whether the bitcoin price Granger-causes the S&P index, we firstly predict the S&P index without considering the bitcoin price, as shown in Figure 4. en, for comparison, we take the bitcoin price as auxiliary information to predict the S&P index, as shown in Figure 5. In addition, we employ the traditional approach ARIMA to demonstrate the superiority of the proposed model. Owing to the continuous value prediction, we employ the root mean squared errors (RMSEs) as the forecasting performance indicator. e smaller the RMSE value, the better the prediction performance. And the corresponding prediction RMSEs are shown in Table 2.
From Table 2, we can see that the bitcoin price prediction RMSE of the DLI decreases by 92.10% and 23.32% compared with that of the ARIMA and LSTM, respectively. And the S&P index prediction RMSE of the DLI significantly  4 Complexity decreases by 98.06% and 50.96% compared with that of the ARIMA and LSTM, respectively. e above results demonstrate that both bitcoin price and S&P index prediction performances would be enhanced with consideration of the S&P index and bitcoin price, respectively. And the prediction performance improvement of the S&P index is more significant than that of the bitcoin price. erefore, we can conclude that there is a bidirectional but asymmetric Granger causality between the bitcoin price and S&P index.

Conclusions
In this paper, we proposed an interpretable deep learningbased Granger causality inference architecture by integrating AE, CNN, and LSTM, named DLI. e proposed DLI, as a deep learning-based model, one of its advantages compared with traditional econometric models is that it can process big data efficiently and retain its original economic significance of variables after data preprocessing. Our two contributions are exploring the Granger causality between the bitcoin price and S&P index and predicting the bitcoin price and S&P index with a higher accuracy. Our experiments reveal a bidirectional but asymmetric Granger causality between the bitcoin price and S&P index. And the DLI performs a superior prediction accuracy by integrating variables that have causalities with the target variable into the prediction process.
In future work, the proposed DLI can be extended to some other economic variables to provide a reasonable reference for portfolio management, or it can be used for prediction in other scientific fields. Moreover, the DLI can also be extended from two variables to multivariables to determine causalities among the multitime series.
Data Availability e raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

Conflicts of Interest
e author declares that there are no conflicts of interest regarding the publication of this paper.