^{1}

^{2}

^{2}

^{1}

^{2}

In this study, we analyze the term structure of credit default swaps (CDSs) and predict future term structures using the Nelson–Siegel model, recurrent neural network (RNN), support vector regression (SVR), long short-term memory (LSTM), and group method of data handling (GMDH) using CDS term structure data from 2008 to 2019. Furthermore, we evaluate the change in the forecasting performance of the models through a subperiod analysis. According to the empirical results, we confirm that the Nelson–Siegel model can be used to predict not only the interest rate term structure but also the CDS term structure. Additionally, we demonstrate that machine-learning models, namely, SVR, RNN, LSTM, and GMDH, outperform the model-driven methods (in this case, the Nelson–Siegel model). Among the machine learning approaches, GMDH demonstrates the best performance in forecasting the CDS term structure. According to the subperiod analysis, the performance of all models was inconsistent with the data period. All the models were less predictable in highly volatile data periods than in less volatile periods. This study will enable traders and policymakers to invest efficiently and make policy decisions based on the current and future risk factors of a company or country.

A credit default swap (CDS) is a credit derivative based on credit risk, similar to a bond. The prices of both CDSs and bonds change depending on the risk of the reference entity. If the reference entity has a higher risk, then the CDS spread is set higher. To manage credit risk, we can use a CDS contract. The CDS seller (protection seller) insures the protection buyer’s risk in the event of a credit default, such as bankruptcy of the reference entity, debt repudiation, or, in the case of a sovereign bond, a moratorium. There are two ways for a protection seller to compensate the protection buyer’s loss. The first is to buy the underlying asset at face value; the second is to pay the difference between the remaining value and the face value. In this way, the protection buyer can hedge his or her credit risk and give the CDS spread to the protection seller.

A CDS spread is an insurance fee that a protection buyer pays to the protection seller, often quarterly. Its value is determined by factors such as the probability of credit default and recovery rate. The recovery rate is the percentage of the bond value that the reference entity offers to the protection buyer when a credit default happens. Therefore, if the recovery rate is high, the CDS spread will be low. The CDS spread will be high if the default rate is high, which indicates a high probability of credit default. Because the CDS spread indicates the bankruptcy risk of institutions or countries, it is an important economic index that is being actively traded. According to the Bank for International Settlements, the total outstanding notional amount of CDS contracts was $7809 billion in the first half of 2019.

To date, numerous studies have been conducted on the prediction of financial asset values. For example, Li and Tam [

As mentioned above, several studies have attempted to predict various financial market indices with machine-learning methods; however, research on CDS term structure is limited. CDS term structure reflects the conditions for monetary policy and companies’ future risk expectations. CDS spread can be classified into two types. The first one is sovereign CDS, which has a country as its reference entity. Sovereign CDS spreads reflect the creditworthiness of a country. That is, the sovereign CDS spread can be considered as a measure of the sovereign credit risk [

The CDS term structure is important because it integrates the future risk expectations of both markets and companies by offering CDS spreads over time. Thus, we can confirm various types of information from the CDS term structure, such as firm leverage and volatility, as shown by Han and Zhou [

In this study, we analyze the CDS term structure, particularly sovereign CDS, forecast it using machine-learning models, and identify the most suitable model for predicting CDS term structure. We consider model-driven and data-driven methods: the Nelson–Siegel model, RNN, SVR, LSTM, and GMDH. The Nelson–Siegel model, as a model-driven method, was devised to fit the yield term structure; however, in this study, it was fitted to the CDS term structure to extract the term structure parameters and forecast the CDS term structure with the AR(1) model. RNN, SVR, LSTM, and GMDH are machine-learning models that specialize in predicting time-series data. RNN memorizes previous information and uses it to predict future information. LSTM is basically the same as RNN; however, it memorizes only significant information based on some calculations. SVR is derived from the structural risk minimization principle [

Machine learning is widely used in various fields to analyze data and forecast future flow. For example, Yan and Ouyang [

Methodologically, we adopt Nelson–Siegel as a model-driven method and RNN, LSTM, SVR, and GDMH as data-driven methods to predict the CDS term structure for the period (2008–2019). We optimize the data-driven models using a grid search algorithm with the Python technological stack. Furthermore, these tests are explored using subperiod analyses to investigate changes in the model performances over the experimental period. Specifically, we split the entire sample period into two subperiods: January 2008–December 2011 (subperiod 1) and January 2012–December 2019 (subperiod 2), because subperiod 1 contains financial market turbulence due to the global financial crisis and European debt crisis. Through this subperiod analysis, we investigate the change in the forecasting performance of all methods in both high-variance and relatively low-variance data. This kind of subperiod analysis is common in other studies [

In time-series forecasting, sequence models, either RNN, LSTM, or a combination of both, are frequently used owing to considerations of time. The sequence model recognizes time as an order and can check how it changes according to the order; therefore, it can be applied to data, such as weather and finance. According to Siami-Namini and Namin [

This paper is organized as follows: in the next section, we review our dataset and present a statistical summary of the CDS term structure; we describe our methods: Nelson–Siegel, RNN, SVR, LSTM, and GMDH, and we explain hyperparameter optimization and its application to the CDS term structure;

The CDS spread can be classified into several categories. The classification method usually depends on the frame of the credit event. The full restructuring clause is the standard term. Under this condition, any restructuring event could be a credit event. The modified restructuring clause limits the scope of opportunistic behavior by sellers when restructuring agreements do not result in a loss. While restructuring agreements are still considered as credit events, the clause limits the deliverable obligations to those with a maturity of less than 30 months after the termination date of the CDS contract. Under the modified contract option, any restructuring event, except the restructuring of bilateral loans, could be a credit event. Additionally, the modified-modified restructuring term is introduced because modified restructuring has been too severe in its limitation of deliverable obligations. Under this term, the remaining maturity of deliverable assets must be less than 60 months for restructured obligations and 30 months for all other obligations. Under the no restructuring contract option, all restructuring events are excluded under the contract as “trigger events.”

For this type of CDS, we will use a full restructuring sovereign CDS spread dataset because other datasets are unavailable for long periods. Sovereign CDS spread reflects the market participants’ perceptions of a country’s credit ratings. Our data cover the period from October 2008 to October 2019 and maturities of six months and 1, 2, 3, 4, 5, 7, 10, 20, and 30 years. All data were sourced from Datastream and correspond to the daily closing price of the CDS spread. The term structure of the CDS spread normally shows upward sloping curves, as seen in Figure

CDS term structure from 2008 to 2019.

Summary statistics for the historical CDS term structure.

Index | Mean | Std. dev. | 1st per. | 10th per. | 25th per. | Median | 75th per. | 90th per. | 99th per. |
---|---|---|---|---|---|---|---|---|---|

CDS6M | 12.081 | 9.995 | 1 | 4.745 | 6.263 | 9.122 | 13.688 | 22.98 | 58 |

CDS1Y | 12.305 | 9.718 | 2 | 4.84 | 6.54 | 9.715 | 14.158 | 23.39 | 57.43 |

CDS2Y | 14.117 | 10.126 | 4.02 | 6.21 | 7.5 | 11.406 | 16.49 | 23.54 | 62 |

CDS3Y | 16.083 | 10.838 | 5.2 | 7.29 | 9.063 | 13.35 | 19.48 | 26.57 | 68 |

CDS4Y | 19.038 | 11.557 | 6.29 | 8.57 | 11.33 | 16.005 | 23.04 | 30.743 | 73 |

CDS5Y | 22.154 | 12.35 | 7.91 | 10.38 | 13.88 | 19.16 | 27.783 | 35.561 | 78 |

CDS7Y | 26.715 | 11.819 | 11.95 | 14.95 | 18.8 | 23.57 | 32.499 | 40.731 | 78 |

CDS10Y | 31.02 | 11.813 | 14.12 | 17.695 | 23.790 | 27.963 | 36.783 | 44.774 | 80 |

CDS20Y | 33.903 | 11.986 | 17.83 | 18.77 | 24.33 | 34.01 | 39.597 | 47.952 | 80 |

CDS30Y | 34.709 | 12.214 | 16.788 | 19.04 | 24.159 | 35.485 | 40.493 | 49.265 | 80 |

per.: percentile.

Nelson and Siegel [

The Nelson–Siegel model is a simple but effective method for modeling a term structure, and various studies have used the model to predict the yield curve or other term structures. For example, Shaw et al. [

In this study, we attempted to fit the CDS curve to the Nelson–Siegel model by estimating the time-decay parameter

SVR is a field of machine-learning models derived from SVM. SVM is an algorithm that returns a hyperplane that separates the training samples into two labels, positive and negative. We refer to the distance between the closest point and the hyperplane as the “margin,” and the goal of SVM is to identify the hyperplane that maximizes the margin. There are two types of margin. The first type is a hard margin, which is for linearly separable datasets, meaning that every point does not violate its label. In other words, all the points can be classified into their labels with a hyperplane. The second one is a soft margin, which is for nonseparable cases. In this case, some points in the dataset, called “outliers,” are incorrectly classified. There are two ways to select a soft margin hyperplane. On the one hand, we can make the margin larger and take more errors (outliers). This is usually used for datasets that have only a small number of outliers. On the other hand, we can choose a hyperplane that has a small margin and minimize the empirical errors. This is useful for datasets with dense point distributions, where it is difficult to separate the data explicitly.

Additionally, the kernel trick can be used for linearly nonseparable datasets. Kernel represents a function that maps origin data points to a higher dimensional dataset that is separable. The reason it is called the “kernel trick” is that, although the dimension of the dataset is increased, the cost of the algorithm does not increase much.

SVM originated from the statistical learning theory introduced by Vapnik and Chervonenkis. The characteristic idea of SVM is to minimize the structural risk, while artificial neural networks (ANNs) minimize the empirical risk. Furthermore, SVM theoretically demonstrates better forecasting than articular neural networks, according to Gunn et al. [

SVR is derived from SVM. It is a nonlinear kernel-based approach, and the main idea is to identify a function whose deviation from the actual data is located within the predetermined scale. SVR is applied to a given dataset

Using Lagrange multipliers and the Karush–Kuhn–Tucker condition, the dual problem for the optimization problem (

To solve the above problem, we do not identify the nonlinear function

The selection of the kernel has a significant impact on its forecasting performance. It is a common practice to estimate a range of potential settings and use cross-validation over the training set to determine the best one. In this research, we use three kernel functions: polynomial, Gaussian, and Sigmoid, as presented in Table

Summary of kernels.

Types of kernel | Kernel |
---|---|

Polynomial kernel | |

Gaussian kernel | |

Sigmoid kernel |

Cao and Tay [

An ANN is a classification or prediction process that imitates human neurons. The output of a simple ANN model is generated by multiplying weights assigned to input data. After comparing the output data and the real values to be predicted, we create new weights adjusted according to the error. The step in which weights are multiplied by the input data is called forward propagation, and the step in which the error is calculated and weights are adjusted is called backpropagation. The final goal of the ANN model is to determine the weights that minimize the error between the predicted and target values.

A CNN is a machine-learning method that uses a neural network algorithm. It consists of convolution layers, pooling layers, and neural network layers. A convolution layer uses a “filter” to analyze data, typically vectorized image data. The filter analyses small sections while moving over the entire dataset, and each section expresses a “feature” of the data with pooling layers.

An RNN is another representative neural network model that has a special hidden layer. While a simple neural network has a backpropagation algorithm and adjusts its weights to reduce prediction errors, the RNN has a hidden layer that is modified by the hidden layer of the previous state. Each time the algorithm operates, the RNN hidden layer affects the next hidden layer of the algorithm. Because of its characteristics, RNN is an optimized method to analyze and predict nonlinear time-series data, such as stock prices. It is an algorithm operating in sequence with input and output data. It can return a single output from one or more input data and return more than one output from one or more input data. One of its characteristics is that it returns the output in every hidden time-step layer and simultaneously sends it as input data to the next layer; we demonstrate the simplified structure in Figure

RNN cell.

The greatest difference between RNN and CNN or multilayer perceptron (MLP) is that CNN and MLP do not consider previous state data in later steps, but RNN considers both the output of the previous state and the input of the present state. Furthermore, as it is optimized to deal with sequential data, it is used in text, audio, and visual data processing.

However, RNN has a vanishing gradient problem in long backpropagation processes. The algorithm of an RNN is based on gradient descent and modifies its weights in each time-step after one forward propagation process. Weights are modified with error differentials so that these rapidly converge to zero with repetitive backpropagation—this is called the vanishing gradient problem. To solve this problem in long-term time-series data, LSTM is widely used.

To solve the vanishing gradient problem of RNN, Hochreiter and Schmidhuber [

LSTM cell.

Input data

The processes performed by each gate are expressed as follows:

To develop an LSTM model, we must assign the initial values of

GMDH is a machine-learning method based on the principle of heuristic self-organizing, proposed by Ivakhnenko [

Suppose that there is a set of

We train a GMDH network to predict the output

Now, the GMDH network is determined by minimizing the squared sum of differences between sample outputs and model predictions, that is,

The general connection between input and output variables can be expressed by a series of Volterra functions:

In this study, we use the second-order polynomial function of two variables, which is written as

The main objective of the GMDH network is to build the general mathematical relation between the inputs and output variables given in equation (

These parameters can be obtained from multiple regression using the least squares method, and we can compute them by solving some matrix equations. Refer to [

GMDH cell.

Hyperparameter optimization refers to the problem of determining the optimal values of hyperparameters that must be set up in advance to perform training and that can complete the generalized performance of the training model to the highest level. In the deep-learning model, for example, the learning rate, batch size, etc. can be regarded as hyperparameters, and in some cases, they can be added as targets for exploration as hyperparameters that determine the structure of the deep-learning model, such as the number of layers and the convolution filter size. Hyperparameter optimization typically includes manual search, grid search, and random search.

Manual search is a way for users to set hyperparameters individually and compare performances according to their intuition. After selecting the candidate hyperparameter values and performing training using them, the performance results measured against the verification dataset are recorded, and this process is repeated several times to select the hyperparameter values that demonstrate the highest performance. This is the most intuitive method; however, it has some problems. First, it is relatively difficult to ensure that the optimal hyperparameter value to be determined is actually optimal because the process of determining the optimal hyperparameter is influenced by the user’s selections. Second, the problem becomes more complicated when attempting to search for several types of hyperparameters at once. Because there are some types of hyperparameters that have mutually affecting relationships with others, it is difficult to apply an existing intuition to each single hyperparameter.

Grid search is a method of selecting candidate hyperparameter values within a specific section to be searched at regular intervals, recording the performance results measured for each of them, and selecting the hyperparameter values that demonstrated the highest performance (see Hsu et al. [

Random search (see Bergstra and Bengio [

The grid search and random search algorithms are illustrated in Figure

Comparison between (a) grid search and (b) random search.

We used 2886 daily time-series data points on CDS term structure from October 2008 to October 2019. Because international financial markets from 2008 to 2011 were unstable, we divided these data into two subperiods, and we measured the forecasting performance of the five methods we used in both high-variance and relatively low-variance data. The first training dataset is from 1st October 2008 to 22nd January 2019 (full period), the second one is from 1st October 2008 to 9th September 9th 2011 (subperiod 1), and the third one is from 2nd January 2012 to 22nd January 2019 (subperiod 2). We selected our test dataset as the last 200 days (from 23rd January 2019 to 29th October 2019, test dataset 1) for each maturity in the full period, the subperiod 2, and last 80 days (from 12th September 2011 to 30th December 2011, test dataset 2) for the subperiod 1. There is a gap between subperiod 1 and subperiod 2 because of the test dataset 2 for subperiod 1 training set. These all cases are summarized in Table

Descriptions of the training and test datasets.

Case | Training set | Test dataset |
---|---|---|

Case 1 | Full period (2008/10/01–2019/01/22) | Test dataset 1 (2019/01/23–2019/10/29) |

Case 2 | Subperiod 1 (2008/10/01–2011/09/09) | Test dataset 2 (2011/09/12–2011/12/30) |

Case 3 | Subperiod 2 (2012/01/02–2019/01/22) | Test dataset 1 (2019/01/23–2019/10/29) |

Summary statistics for test dataset 1 for the full period and subperiod 2 training sets.

CDS6M | CDS1Y | CDS2Y | CDS3Y | CDS4Y | CDS5Y | CDS7Y | CDS10Y | CDS20Y | CDS30Y | |
---|---|---|---|---|---|---|---|---|---|---|

Mean | 5.18 | 5.07 | 5.42 | 6.32 | 7.52 | 9.25 | 13.22 | 15.93 | 18.39 | 18.76 |

Std. dev. | 1.84 | 1.85 | 1.92 | 1.86 | 1.78 | 1.56 | 1.21 | 1.52 | 1.43 | 1.89 |

Summary statistics for test dataset 2 for the subperiod 1 training set.

CDS6M | CDS1Y | CDS2Y | CDS3Y | CDS4Y | CDS5Y | CDS7Y | CDS10Y | CDS20Y | CDS30Y | |
---|---|---|---|---|---|---|---|---|---|---|

Mean | 13.99 | 14.44 | 19.74 | 25.55 | 30.83 | 35.83 | 40.10 | 44.76 | 48.17 | 49.23 |

Std. dev. | 3.77 | 4.02 | 3.85 | 3.73 | 3.79 | 4.44 | 4.98 | 4.63 | 3.42 | 3.20 |

Predictions of each model and target CDS term structure from six months to five years maturity for test dataset 1 with the full period training set (case 1).

Predictions of each model and target CDS term structure from seven years to 30 years maturity for test dataset 1 with the full period training set (case 1).

Predictions of each model and target CDS term structure from six months to five years maturity for test dataset 2 with the subperiod 1 training set (case 2).

Predictions of each model and target CDS term structure from seven years to 30 years maturity for test dataset 2 with the subperiod 1 training set (case 2).

Predictions of each model and target CDS term structure from six months to five years maturity for test dataset 1 with the subperiod 2 training set (case 3).

Predictions of each model and target CDS term structure from seven years to 30 years maturity for test dataset 1 with the subperiod 2 training set (case 3).

Our main findings can be summarized as follows: first, as shown in Figures

Error of each method for all maturities from Table

Error of each method for all maturities from Table

Error of each method for all maturities from Table

Error statistics of each method for all maturities for test dataset 1 with the full period training set (case 1).

Type | Method | 6M | 1Y | 2Y | 3Y | 4Y | 5Y | 7Y | 10Y | 20Y | 30Y | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|

RMSE | N-S | 2.33 | 2.06 | 2.54 | 3.07 | 3.30 | 2.88 | 1.39 | 1.47 | 1.97 | 1.92 | 2.29 |

RNN | 1.70 | 1.71 | 1.83 | 1.76 | 1.66 | 1.45 | 0.95 | 1.10 | 1.10 | 1.64 | 1.49 | |

LSTM | 2.09 | 1.89 | 2.31 | 2.71 | 1.91 | 1.28 | 0.87 | 1.10 | 1.08 | 1.70 | 1.69 | |

SVR | 1.75 | 1.78 | 1.87 | 1.81 | 1.72 | 1.49 | 0.99 | 1.19 | 1.15 | 1.70 | 1.55 | |

GMDH | 1.63 | 1.67 | 1.73 | 1.68 | 1.59 | 1.35 | 0.90 | 1.08 | 1.08 | 1.61 | 1.43 | |

MSE | N-S | 5.41 | 4.22 | 6.45 | 9.45 | 10.88 | 8.32 | 1.94 | 2.17 | 3.86 | 3.68 | 5.64 |

RNN | 2.88 | 2.93 | 3.34 | 3.09 | 2.77 | 2.09 | 0.90 | 1.21 | 1.21 | 2.71 | 2.31 | |

LSTM | 4.38 | 3.58 | 5.33 | 7.33 | 3.68 | 1.63 | 0.76 | 1.22 | 1.17 | 2.89 | 3.20 | |

SVR | 3.08 | 3.17 | 3.50 | 3.27 | 2.97 | 2.22 | 0.99 | 1.41 | 1.31 | 2.89 | 2.48 | |

GMDH | 2.67 | 2.80 | 2.98 | 2.81 | 2.53 | 1.82 | 0.81 | 1.16 | 1.16 | 2.60 | 2.13 | |

MAPE (%) | N-S | 37.30 | 32.47 | 46.24 | 48.08 | 43.09 | 29.50 | 7.89 | 6.82 | 8.95 | 6.51 | 26.69 |

RNN | 12.56 | 12.59 | 12.38 | 9.65 | 8.07 | 6.18 | 3.40 | 3.50 | 2.81 | 4.33 | 7.55 | |

LSTM | 15.94 | 11.40 | 31.72 | 30.77 | 14.62 | 5.98 | 3.56 | 3.64 | 3.75 | 3.89 | 12.53 | |

SVR | 12.18 | 12.89 | 11.93 | 9.78 | 8.50 | 6.34 | 3.47 | 3.81 | 3.09 | 3.67 | 7.57 | |

GMDH | 11.88 | 12.28 | 11.03 | 9.63 | 7.96 | 5.90 | 3.48 | 3.44 | 3.06 | 4.11 | 7.28 | |

MPE (%) | N-S | −19.03 | −29.26 | −43.62 | −45.40 | −40.38 | −26.86 | −3.79 | −0.13 | −2.76 | −1.55 | −21.28 |

RNN | −1.91 | −4.32 | −1.39 | −2.44 | −1.49 | −0.77 | −0.41 | 0.13 | −0.77 | −2.26 | −1.56 | |

LSTM | 3.57 | −2.54 | −6.23 | −0.68 | 2.90 | 1.16 | 0.43 | −0.42 | −0.87 | −1.22 | −0.39 | |

SVR | −5.53 | −6.23 | −5.29 | −3.94 | −3.43 | −2.04 | −0.99 | −1.72 | 0.08 | −0.90 | −3.00 | |

GMDH | −1.05 | −2.94 | −2.38 | −0.15 | −0.37 | −0.02 | 0.47 | 0.19 | 0.54 | 0.98 | −0.47 | |

MAE | N-S | 1.76 | 1.39 | 2.26 | 2.84 | 3.08 | 2.67 | 1.07 | 1.10 | 1.67 | 1.27 | 1.91 |

RNN | 0.70 | 0.67 | 0.75 | 0.69 | 0.68 | 0.62 | 0.46 | 0.56 | 0.52 | 0.82 | 0.65 | |

LSTM | 1.08 | 0.74 | 1.22 | 1.63 | 1.11 | 0.61 | 0.48 | 0.57 | 0.69 | 0.75 | 0.89 | |

SVR | 0.67 | 0.67 | 0.70 | 0.69 | 0.70 | 0.63 | 0.47 | 0.60 | 0.58 | 0.71 | 0.64 | |

GMDH | 0.68 | 0.66 | 0.68 | 0.70 | 0.68 | 0.60 | 0.48 | 0.56 | 0.58 | 0.81 | 0.64 |

N-S: Nelson–Siegel.

Error statistics of each method for all maturities for test dataset 2 with the subperiod 1 training set (case 2).

Type | Method | 6M | 1Y | 2Y | 3Y | 4Y | 5Y | 7Y | 10Y | 20Y | 30Y | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|

RMSE | N-S | 3.75 | 4.10 | 4.11 | 4.00 | 4.29 | 5.27 | 5.29 | 4.95 | 4.03 | 3.71 | 4.35 |

RNN | 2.78 | 4.25 | 3.71 | 3.48 | 3.99 | 4.92 | 5.63 | 5.18 | 3.69 | 3.20 | 4.08 | |

LSTM | 3.89 | 4.20 | 3.71 | 3.50 | 3.57 | 4.56 | 5.64 | 5.19 | 3.68 | 3.36 | 4.13 | |

SVR | 4.04 | 4.39 | 3.77 | 3.46 | 4.02 | 5.10 | 5.85 | 5.37 | 3.63 | 3.30 | 4.29 | |

GMDH | 3.92 | 4.18 | 3.63 | 3.30 | 3.78 | 4.70 | 5.44 | 5.00 | 3.62 | 3.36 | 4.09 | |

MSE | N-S | 14.03 | 16.80 | 16.90 | 16.00 | 18.36 | 27.82 | 27.99 | 24.51 | 16.25 | 13.75 | 19.24 |

RNN | 7.74 | 18.07 | 13.78 | 12.13 | 15.90 | 24.20 | 31.69 | 26.86 | 13.59 | 10.27 | 17.42 | |

LSTM | 15.10 | 17.65 | 13.74 | 12.24 | 12.74 | 20.82 | 31.77 | 26.93 | 13.53 | 11.27 | 17.58 | |

SVR | 16.33 | 19.30 | 14.22 | 11.95 | 16.16 | 25.98 | 34.26 | 28.86 | 13.18 | 10.87 | 19.11 | |

GMDH | 15.36 | 17.43 | 13.19 | 10.88 | 14.30 | 22.11 | 29.62 | 25.03 | 13.07 | 11.32 | 17.23 | |

MAPE (%) | N-S | 28.13 | 20.83 | 13.93 | 11.42 | 12.08 | 13.47 | 11.50 | 9.20 | 6.08 | 6.83 | 13.35 |

RNN | 19.36 | 28.14 | 16.36 | 11.69 | 11.16 | 11.68 | 11.62 | 9.71 | 6.26 | 5.19 | 13.11 | |

LSTM | 26.75 | 26.79 | 16.36 | 11.79 | 10.41 | 10.59 | 11.61 | 9.82 | 6.32 | 5.51 | 13.60 | |

SVR | 27.32 | 28.52 | 16.40 | 11.48 | 11.14 | 12.07 | 12.27 | 10.11 | 6.19 | 5.36 | 14.09 | |

GMDH | 26.44 | 26.58 | 15.65 | 10.83 | 10.48 | 11.16 | 11.48 | 9.44 | 6.16 | 5.46 | 13.37 | |

MPE (%) | N-S | −14.07 | 6.80 | 6.03 | 0.59 | −3.99 | −7.69 | −3.05 | −0.41 | 3.37 | −4.72 | −1.72 |

RNN | −13.08 | −8.39 | −4.84 | −3.65 | −3.37 | −2.35 | −1.26 | −1.28 | −0.74 | 0.27 | −3.87 | |

LSTM | −6.75 | −4.96 | −5.07 | −3.58 | −4.29 | −1.35 | −1.20 | −1.85 | −1.26 | −0.71 | −3.10 | |

SVR | −7.37 | −8.40 | −4.46 | −3.08 | −2.58 | −2.49 | −2.40 | −1.82 | −0.71 | −0.49 | −3.38 | |

GMDH | −5.89 | −6.42 | −4.03 | −2.44 | −2.42 | −2.49 | −2.38 | −1.75 | −0.60 | −0.36 | −2.88 | |

MAE | N-S | 3.18 | 3.30 | 3.07 | 3.04 | 3.62 | 4.46 | 4.52 | 4.16 | 3.09 | 3.15 | 3.56 |

RNN | 2.28 | 3.58 | 3.02 | 2.84 | 3.34 | 4.13 | 4.68 | 4.34 | 2.99 | 2.54 | 3.37 | |

LSTM | 3.33 | 3.57 | 3.01 | 2.86 | 3.11 | 3.76 | 4.68 | 4.36 | 3.00 | 2.66 | 3.43 | |

SVR | 3.41 | 3.67 | 3.05 | 2.81 | 3.39 | 4.27 | 4.88 | 4.50 | 2.96 | 2.60 | 3.55 | |

GMDH | 3.32 | 3.47 | 2.93 | 2.67 | 3.18 | 3.92 | 4.54 | 4.19 | 2.95 | 2.65 | 3.38 |

Error statistics of each method for all maturities for test dataset 1 with the subperiod 2 training set (case 3).

Type | Method | 6M | 1Y | 2Y | 3Y | 4Y | 5Y | 7Y | 10Y | 20Y | 30Y | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|

RMSE | N-S | 2.50 | 2.18 | 2.57 | 3.06 | 3.26 | 2.83 | 1.39 | 1.54 | 1.96 | 1.94 | 2.32 |

RNN | 1.69 | 1.72 | 1.78 | 1.71 | 1.61 | 1.40 | 0.93 | 1.13 | 1.05 | 1.70 | 1.47 | |

LSTM | 2.11 | 1.74 | 2.07 | 2.00 | 1.59 | 1.46 | 1.06 | 1.10 | 1.04 | 1.32 | 1.55 | |

SVR | 1.76 | 1.78 | 1.84 | 1.77 | 1.69 | 1.48 | 0.97 | 1.14 | 1.09 | 1.68 | 1.52 | |

GMDH | 1.45 | 1.58 | 1.63 | 1.65 | 1.57 | 1.24 | 0.84 | 1.00 | 1.04 | 1.56 | 1.36 | |

MSE | N-S | 6.24 | 4.75 | 6.60 | 9.36 | 10.62 | 8.00 | 1.93 | 2.37 | 3.83 | 3.75 | 5.74 |

RNN | 2.86 | 2.96 | 3.18 | 2.94 | 2.59 | 1.95 | 0.87 | 1.28 | 1.10 | 2.88 | 2.26 | |

LSTM | 4.44 | 3.02 | 4.29 | 3.99 | 2.54 | 2.13 | 1.13 | 1.21 | 1.08 | 1.74 | 2.56 | |

SVR | 3.11 | 3.18 | 3.41 | 3.14 | 2.84 | 2.19 | 0.95 | 1.31 | 1.20 | 2.84 | 2.42 | |

GMDH | 2.11 | 2.51 | 2.64 | 2.73 | 2.47 | 1.55 | 0.71 | 1.00 | 1.09 | 2.42 | 1.92 | |

MAPE (%) | N-S | 30.77 | 22.06 | 31.04 | 31.76 | 29.36 | 22.11 | 7.47 | 7.49 | 9.08 | 6.22 | 19.74 |

RNN | 13.60 | 13.46 | 12.23 | 10.25 | 7.84 | 6.14 | 3.29 | 4.04 | 2.96 | 5.45 | 7.93 | |

LSTM | 19.34 | 12.80 | 29.63 | 15.97 | 7.25 | 6.69 | 5.65 | 4.03 | 3.23 | 3.42 | 10.80 | |

SVR | 12.32 | 13.05 | 11.73 | 9.49 | 8.05 | 6.28 | 3.68 | 3.39 | 2.84 | 3.97 | 7.48 | |

GMDH | 10.88 | 10.80 | 9.69 | 9.11 | 7.94 | 5.25 | 3.22 | 3.34 | 2.69 | 3.60 | 6.65 | |

MPE (%) | N-S | 4.40 | 17.04 | 27.21 | 28.27 | 25.98 | 18.84 | 1.88 | −1.95 | 0.03 | −1.39 | 12.03 |

RNN | 1.00 | −1.82 | −1.04 | 0.59 | −1.36 | −0.15 | −0.75 | 1.24 | 0.47 | 3.23 | 0.14 | |

LSTM | −0.68 | −3.77 | −16.35 | 7.06 | −0.69 | −0.96 | 2.42 | −0.66 | 0.54 | −1.07 | −1.42 | |

SVR | −5.49 | −6.74 | −5.55 | −3.33 | −2.07 | −0.92 | −1.52 | −0.66 | −0.28 | −1.56 | −2.81 | |

GMDH | 1.14 | −4.17 | −3.01 | −3.82 | −3.24 | −0.73 | 0.34 | −1.66 | −0.29 | 0.19 | −1.53 | |

MAE | N-S | 1.96 | 1.48 | 2.29 | 2.82 | 3.04 | 2.60 | 1.05 | 1.15 | 1.69 | 1.17 | 1.93 |

RNN | 0.77 | 0.72 | 0.74 | 0.74 | 0.66 | 0.62 | 0.45 | 0.66 | 0.56 | 1.09 | 0.70 | |

LSTM | 1.01 | 0.73 | 0.95 | 1.19 | 0.64 | 0.69 | 0.75 | 0.62 | 0.60 | 0.68 | 0.79 | |

SVR | 0.67 | 0.67 | 0.69 | 0.68 | 0.68 | 0.63 | 0.49 | 0.54 | 0.53 | 0.76 | 0.63 | |

GMDH | 0.64 | 0.60 | 0.61 | 0.66 | 0.66 | 0.53 | 0.44 | 0.52 | 0.50 | 0.71 | 0.59 |

The purpose of this study is to compare the prediction of CDS term structure between the Nelson–Siegel, RNN, LSTM, SVR, and GMDH models. We determined the most suitable model to predict time-series data, especially the CDS term structure. The CDS spread is a default risk index for a country or company; hence, this study is useful because it not only offers the best time-series forecasting model but also predicts future risk.

Existing studies on the prediction of CDS term structure and other risk indicators using machine-learning models remain few; most focus on stock price prediction. This study is significant because it demonstrated that various machine-learning models can be applied to other time-series data, and further research on various time-series data using machine-learning models is expected. This study also confirmed that data-driven methods, such as RNN, LSTM, SVR, and GMDH, outperform the model-driven Nelson–Siegel method, which is usually used in analyzing the CDS term structure. The performance of model-driven methods could decline if the data have a significant number of outliers because it is dependent on the assumption that the dataset can be formalized on a specific formula. In our dataset, the presence of outliers made it difficult to make predictions with model-driven methods. On the contrary, data-driven methods were not affected by outliers (see Solomatine et al. [

Some studies show that linear models such as AR are better than ANNs [

Based on the empirical findings given in

Our findings can help investors and policymakers analyze the risk of companies or countries. The CDS spread is an index that represents the probability of credit default; thus, this study offers a measure to predict future risk. For instance, Zghal et al. [

Future studies should apply this same experiment to datasets other than CDS data for comparing the forecasting performance of model-driven and data-driven methods, such as the implied volatility surface. The implied volatility surface is a fundamental concept for pricing various financial derivatives. Therefore, for a long time, many researchers have been working on it, and various models have been developed [

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

The authors are grateful to the editor Baogui Xin for the valuable comments which helped to significantly improve this paper. This work was supported by the Gachon University Research Fund of 2018 (GCU-2018-0295) and by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (no. 2019R1G1A1010278).