Construction and Application Research of Isomap-RVM Credit Assessment Model

Credit assessment is the basis and premise of credit risk management systems. Accurate and scientific credit assessment is of great significance to the operational decisions of shareholders, corporate creditors, and management. Building a good and reliable credit assessment model is key to credit assessment. Traditional credit assessment models are constructed using the support vector machine (SVM) combined with certain traditional dimensionality reduction algorithms. When constructing such a model, the dimensionality reduction algorithms are first applied to reduce the dimensions of the samples, so as to prevent the correlation of the samples’ characteristic index from being too high. Then, machine learning of the samples will be conducted using the SVM, in order to carry out classification assessment. To further improve the accuracy of credit assessment methods, this paper has introduced more cutting-edge algorithms, applied isometric feature mapping (Isomap) for dimensionality reduction, and used the relevance vector machine (RVM) for credit classification. It has constructed an Isomap-RVM model and used it to conduct financial analysis of China’s listed companies.The empirical analysis shows that the credit assessment accuracy of the Isomap-RVM model is significantly higher than that of the Isomap-SVM model and slightly higher than that of the PCA-RVM model. It can correctly identify the credit risks of listed companies.


Introduction
By constructing an accurate and reliable credit assessment model, we can conduct an in-depth analysis of the financial data of listed companies and identify the financial risks of such companies.This is of great significance to the operational decisions of shareholders, corporate creditors, and management.Along with the development of large data methods, the theory of applying machine learning methods to construct credit assessment models has become increasingly reliable.Machine learning methods are superior to traditional multivariate discriminant analysis methods and logistic discriminant analysis models, when processing any data subject to less stringent hypothetical restrictions and dealing with nonlinear relationships.At present, commonly used machine learning methods with good data classification results include the neural network (NN) [1] and the support vector machine (SVM).Odom and Sharda [2] built an early warning model for financial crises.They did so by applying the neural network and comparing this model with the Fisher multivariate discriminant analysis model.The research results showed that the artificial neural network has higher prediction accuracy and robustness.However, the neural network has methodological defects, such as slow convergence rate, overfitting, and falling into local minima [3,4].Later, Cortes and Vapnik [5] proposed the vector machine method, which is based on the theory of statistical machine learning.This method laid stress on structural risk minimization and is able to effectively overcome the defects of the neural network.Min and Lee [6] applied the SVM to build a financial early warning model for the purpose of corporate bankruptcy prediction.The results showed that the SVM has higher discriminant analysis accuracy than BP neural network, MDA, and logit models.However, SVM has the following main defects: the penalty parameter C must be determined in the model building process, and the selection of kernel function must comply with "Mercer's theorem" [7].For these reasons, this paper suggests using a relevance vector 2 Mathematical Problems in Engineering machine (RVM) to overcome the defects of SVM.RVM is another efficient supervised learning method proposed by Tipping [8].By applying this method to conduct machine learning under the Bayesian theory, the model obtained will be sparser than with SVM, and the result probability output can also be obtained.While properly maintaining all the advantages of SVM, this method has reduced the inaccurate assignment of key parameters, broadened the application scope of vector machines, provided a greater degree of freedom, and effectively overcome the defects of SVM [9].This will help to improve the accuracy of vector machine classification.
In credit risk assessment samples, there tends to be a close correlation between the selected financial risk characteristic indices.High dimensionality and high correlation of the sample characteristic indices may have a strong impact on the accuracy of risk assessment.Therefore, a data dimensionality reduction method is required for the pretreatment of the sample indices, so as to reflect the main features of the data as much as possible and reduce correlation between the characteristic indices.At present, the principal component analysis method (PCA) is one of the most widely used methods [10].However, as a linear dimensionality reduction method, PCA may not achieve satisfactory dimensionality reduction results when applied to nonlinear data.Therefore, this paper attempts to apply a nonlinear dimensionality reduction method-isometric mapping (Isomap)-to conduct dimensionality reduction pretreatment on the sample data.Isomap is a nonlinear manifold learning algorithm proposed by Tenenbaum et al. [11].By seeking low-dimensional embedding among high-dimensional manifolds, this algorithm has maintained low-dimensional embedding in the neighborhood structure between high-dimensional manifold data points, while producing excellent robustness and global optimality.Lin et al. [12] used the Isomap-SVM, PCA-SVM, and SVM separately to conduct risk assessment classification of more than one hundred listed Taiwan companies.It proved that Isomap-SVM has the highest prediction accuracy.This research showed that, in the process of nonlinear data classification, Isomap can improve accuracy through reasonable dimensionality reduction.Ribeiro et al. [13] constructed a Semi-Supervised Isomap model with Isomap and SVM.They used the Semi-Supervised Isomap model, SVM, RVM, and KNN separately to conduct bankruptcy prediction of more than one thousand industrial French companies.The results showed that the classification accuracy of Semi-Supervised Isomap model is comparable to SVM and RVM.But in the study they did not propose the idea of constructing a new model by combining Isomap with other machine learning methods such as RVM.
By applying the Isomap method to reduce the dimensions and avoid correlation between sample characteristic indices and by using the RVM to conduct the classification, this paper builds an Isomap-RVM model.It conducts a credit assessment classification analysis of China's listed companies.It then compares this model with the Isomap-SVM and PCA-RVM models and examines Isomap-RVM model discriminant analysis accuracy.

Isomap-RVM Model Construction
2.1.RVM Classification Method.Assume a training set of  samples, where its characteristic index part constitutes a vector set {x  },  = 1, 2, . . ., ; the corresponding output target value of the training set is t = [ 1 ,  2 , . . .,   ]  , where   may only be either 0 or 1, which corresponds to two categories.Like the support vector machine, x has the following nonlinear function expression: where w = [ 0 ,  1 , . . .,   ]  stands for the weight, while (x, x  ) stands for the kernel function selected.There are four commonly used kernel functions, as follows.
Linear kernel: Polynomial kernel: Neural kernel: RBF kernel: In this study, radial basis function (RBF) is adopted.The reason is that, compared with other kernel functions, RBF can be applied to conduct nonlinear mapping without substantially increasing the complexity of the model.
After constructing (x; w), the correspondence between t and x can be expressed by the following sigmoid function: Then, provided that the weight w is known, the probability of t is as follows: After calculating the best weight w  , substitute w  and the sample testing set characteristic index x * into (6); we will get the probability of the testing set output target value  * .
To avoid overlearning due to the fact that most weights are not zero, it can be assumed that w is subject to a normal distribution: where  = [ 0 ,  1 , . . .,   ]  .
The prior probability of weight is expressed as The posterior probability (w | t, ) of weight is expressed as The best weight w  is the weight that maximizes the posterior probability (w | t, ).Provided that  is known, then w  can be expressed as follows: Let Then lg[(t | w)(w | )] is expressed as follows: where  = diag( 0 ,  1 , . . .,   ).
Use the Newton iteration method to seek w  in ( 14): where When Δw is obtained, w  and  should be updated by formula Substitute the new value back into (15) to calculate Δw over and over again until Δw is reduced to a level that is below a certain limit; then we can substitute w  into (6) to calculate the accurate classification probability and determine the sample category based on the classification probability (  = 1 | w).

Dimensionality Reduction Method Based on Isomap Manifold Learning.
Isomap is developed on the basis of multidimensional scaling (MDS).It aims at seeking out the lowdimensional coordinates embedded in the high-dimensional space and achieves dimensionality reduction by constructing the shortest path distance matrix for high-dimensional sample data, under the assumption that the intrinsic geometric properties of the sample data remain unchanged [14].The dimensionality reduction steps are shown below.
Step 1. Construct a neighborhood graph .With respect to the characteristic index vector set {x  },  = 1, 2, . . ., , for a total number of  samples in the training set and the testing set, each original sample x  has  characteristic indices x  = ( 1 ,  2 , . . .,   ).The Euclidean distance (x  , x  ) between each x  and each of the remaining vectors in the vector set can be calculated by the following formula: When x  is one of the  vectors closest to x  , they should be deemed as neighboring vectors.Such neighboring  + 1 vectors will together constitute an undirected graph ; the connection between x  and its neighboring vector will constitute the boundary of graph , where the boundary value is the distance between any two vectors constituting the boundary.
Step 2. Construct the shortest path distance matrix.Apply Dijkstra's algorithm to calculate the shortest path distance between any two vectors on the neighborhood graph .
Above all, for any two vectors x  and x  , the boundary value should be directly taken as the distance when they constitute the boundary, or the distance may be assumed to be ∞ when constituting no boundary at all; that is, Then, find out all the indirect paths between x  and x  along the boundary, where the path distance should be the sum of boundary values (denoted as  1 ,  2 , . . .,   ); next, compare all the path distances and find the shortest distance   (x  , x  ): Use the squared value of the shortest path distance between each pair of vectors in  to create the shortest path distance matrix Step 3. Apply a classical MDS algorithm to compress the vector to a -dimensional vector.Let Calculate the matrix : Eigenvectors  1 ,  2 , . . .,   corresponding to a maximum of  eigenvalues ( 1 ,  2 , . . .,   ) of  will constitute the eigenvector matrix  = [ 1 ,  2 , . . .,   ].Then, the dimensionality reduction result   is shown as follows: The dimensionality reduction error is where  2 is the correlation coefficient; is the Euclidean distance matrix of the vectors in the dimensional space.
The reduced dimensionality  may be determined by observing the Er error curve.When there is an inflection point on the error curve or the Er turns to be stable and sufficiently small, the dimensionality used is the optimal dimensionality .

Isomap-RVM Model Classification Steps.
Step 1. Conduct normalized pretreatment of the sample data.Normalize different characteristic indices into a [0, 1] interval, maintain the commensurability of different indices during operation, and improve the operation speed.For  samples, the normalization formula for a sample index is where,  = 1, 2, . . ., ,   is the value of the normalized characteristic index of the th sample;   is the value of the original characteristic index of the th sample;  max and  min are the maximum and minimum values, respectively, of the characteristic index in all samples.
Step 2. Apply Isomap to reduce the dimensions of the normalized sample data, reduce the correlation between the sample characteristic indices, and improve the sample quality.
Step 3. Train the training set data and apply the genetic algorithm to optimize the kernel width of RVM.Step 4. Substitute the optimized kernel width, sample training set data, and the characteristic index part of testing set data into RVM for classification and get the final result.

Empirical Analysis
3.1.Samples and Indices.Special treatment (ST) is used to signify listed companies that have undergone a financial crisis in China's securities market.In research, ST companies are generally deemed as companies with poor credit standing, while non-ST companies are generally deemed as companies with normal credit standing.A listed company may be declared as an ST company in year  based on the financial report for year  − 1; hence using the financial data of year  − 1 to predict whether such listed company will be declared as an ST company in year  will overestimate the assessment capability of the model.For this reason, this paper has adopted the financial data of year  − 2 to predict whether such listed company will be declared as an ST company in year .The raw data are derived from the financial index database on CSMAR Chinese listed companies.This paper has selected 116 companies subjected to ST between 2009 and 2013 in this database.Such new ST sample companies exclude those declared as ST companies due to nonfinancial reasons.348 non-ST sample companies in the same industry for the corresponding year were selected, based on the ratio of 1 : 3. Altogether 464 sample companies were chosen for the empirical analysis.The sample size is shown in Table 1.Sample category label is expressed as 0 or 1, where 0 means that the sample is an ST sample and 1 means that it is a non-ST sample.The sample characteristic index is derived from the financial index data two years before the corresponding year (i.e., 2007∼2011) of each sample.This paper has selected 25 indices as the sample for estimating the characteristic index of a company's credit standing (as shown in Table 2) based on the following 8 aspects of the listed companies: cash flow capacity, return on equity, earning capacity, operating capacity, growth capacity, risk level, short-term solvency, and long-term solvency [15][16][17][18].
The 464 samples were randomly grouped into the training set and testing set.This paper classified 320 samples into the training set (accounting for about 69% of the total sample size), including 80 ST companies and 240 non-ST companies.144 samples were classified into the testing set (accounting for about 31% of the total sample size), including 36 ST companies and 108 non-ST companies.

Empirical Results
. This paper has established three credit assessment models-Isomap-RVM, Isomap-SVM, and PCA-RVM using MATLAB software.The aim is to examine were used when writing the PCA-RVM and Isomap-RVM; toolboxes IsomapR1 and libsvm-mat-2.89-3[21] were used when writing the Isomap-SVM.The kernel width of RVM and parameters  and  of SVM were optimized using the genetic algorithm.This paper adopted a default value of 7 for the adjacency parameter  of the Isomap algorithm.Substituting this value into the sample data, we can get the residual error curve as shown in Figure 1. Figure 1 shows that the Er error, starting from the dimensionality of 12, begins to become stable from below 0.05.Thus, the reduced dimensionality of 12 was chosen.
After the data had been normalized, it was input into the SPSS, and PCA analysis was conducted.When the main component is extracted up to 12, the accumulating contribution rate will reach 83.968%.Therefore, this paper has selected information covering 85% of the original indices (i.e., the accumulating contribution rate = 85%) to conduct PCA dimensionality reduction and used the Isomap dimensionality reduction model to compare the results.
During the training process, the vector machine kernel parameters for the three models were optimized and acquired using the genetic algorithm (GA), where the RVM kernel width of the Isomap-RVM model was 1.457; the RVM kernel width of the PCA-RVM model was 0.534; and, for the Isomap-SVM model,  = 3.9174,  = 1.7073.
The prediction results of the credit assessment are determined by the prediction accuracy of the test sample and are often measured by two types of errors.Type I error occurs when mistakenly classifying any ST company as a non-ST company; Type II error occurs when mistakenly classifying any non-ST company as an ST company.The results of prediction on the testing set by the three models are shown in Table 3.
It turned out that Isomap-RVM has the lowest total false rate (i.e., 9.72%), followed by PCA-RVM (10.41%).Isomap-SVM has the highest total false classification rate (i.e., 13.89%).The probability of a Type II error occurring in any of the three models is minimal, which indicates that these three models can all correctly identify any non-ST company and will not classify it as an ST company.This is because non-ST companies have normal financial data and are more easily distinguished than those classified as ST companies due to significant deterioration of financial data.The false classification gap among these three models is mainly manifested by the number of occurrences of Type I error.
By comparing the false results of Isomap-RVM and Isomap-SVM after dimensionality reduction, we can discover that, for the same data, the classification accuracy of the RVM model is significantly higher than that of the SVM model.In particular, the occurrence rate of Type I errors using the SVM method is 13.89% higher than when using RVM.Compared with the RVM model, the SVM model is more likely to classify any ST company as a non-ST company.This will underestimate the risk of financial deterioration of listed companies and may not help the companies and investors effectively circumvent and spread their financial risk.
By comparing the classification results of Isomap-RVM and PCA-RVM, we can discover that the classification accuracy of these two models is quite close.The total false classification rate of Isomap-RVM is lower than that of PCA-RVM, but the Isomap-RVM had one more Type I error occurrence than the PCA-RVM.These results may not indicate that using Isomap instead of PCA to reduce dimensions can better help improve the classification accuracy of the model, but it still shows that, for nonlinear data, Isomap can also be deemed as a reliable dimensionality reduction method for reducing data correlation.

Summary and Conclusions
This paper has constructed a credit assessment model-Isomap-RVM.Compared with existing credit assessment models, the methods applied by the new model have the following two major advantages: while inheriting advantages of SVM, RVM has met the Mercer conditions of the kernel function and avoided the penalty parameter selection process; Isomap is suitable for dimensionality reduction of nonlinear samples.Isomap-RVM uses Isomap to reduce the dimensions of the sample so as to reduce the sample correlation and uses RVM to conduct the training and classification prediction so as to create a new machine learning model and achieve better credit assessment results.
The empirical results show that when using the Isomap-RVM model to conduct a credit assessment of listed companies, the accuracy is significantly higher than when using the Isomap-SVM model and slightly higher than when using the PCA-RVM model.In all cases (i.e., total false classification, Type I error, or Type II error), the classification error occurrence rate of the RVM method is lower than that of the SVM method.This demonstrates that the RVM method performs better than the SVM method in terms of credit assessment.
From the perspective of characteristic index dimensionality reduction, the total false classification rate of the RVM method using Isomap to reduce dimensions is lower than that of the RVM method using PCA to reduce dimensions, yet Type I error occurrence rate of the former is higher than that of the latter.The classification accuracy gap between the two methods is minimal.Since the degree of dimensionality reduction effect is related to the sample data structure, the reasons for this result may be that the nonlinear structure of the sample selected in this research is not clear enough.However, the empirical results still show that, just like PCA, Isomap can also reliably achieve data dimensionality reduction and thereby improve the assessment precision of the model.
There are two issues which will require further investigation in subsequent research.Firstly, only China's listed companies were selected as the sample in this research, so the sample data structure lacked diversification.Hence, subsequent research may select company data from other countries for analysis.It may also further analyze the effect on model accuracy when applying the Isomap method and the PCA method to RVM.Secondly, the optimization algorithm for the RVM kernel width is not limited to GA.Other parameter optimization algorithms may be applied to conduct research on how to further improve the assessment accuracy of the model.

Table 1 :
Sample size of listed companies.

Table 3 :
False results of prediction on the testing set by the three models.