Assessment of credit risk is of great importance in financial risk management. In this paper, we propose an improved attribute bagging method, weight-selected attribute bagging (WSAB), to evaluate credit risk. Weights of attributes are first computed using attribute evaluation method such as linear support vector machine (LSVM) and principal component analysis (PCA). Subsets of attributes are then constructed according to weights of attributes. For each of attribute subsets, the larger the weights of the attributes the larger the probabilities by which they are selected into the attribute subset. Next, training samples and test samples are projected onto each attribute subset, respectively. A scoring model is then constructed based on each set of newly produced training samples. Finally, all scoring models are used to vote for test instances. An individual model that only uses selected attributes will be more accurate because of elimination of some of redundant and uninformative attributes. Besides, the way of selecting attributes by probability can also guarantee the diversity of scoring models. Experimental results based on two credit benchmark databases show that the proposed method, WSAB, is outstanding in both prediction accuracy and stability, as compared to analogous methods.

The assessment of credit risk has become increasingly crucial for financial institutions because high risks associated with inappropriate credit decisions may result in great losses [

Quantitative credit scoring has gained more and more attention in recent years because an improvement in accuracy, even a fraction of a percent, can translate into significant future savings for the credit institutions [

Numerous models have been developed to evaluate consumer loans and improve credit scoring accuracy [

Ensemble learning that combines outputs from multiple individual classifiers is one of the most important techniques for improving classification accuracy in machine learning [

Theoretical and experimental results suggest that combining classifiers can give effective improvement in accuracy if classifiers within an ensemble are not correlated with each other [

For attribute bagging models, the selection of optimal attribute subsets plays an important role. Usually, attributes are selected randomly to construct attribute subsets. This method is called randomly selected attribute bagging (RSAB) [

To overcome the shortcomings of RSAB, we propose a new attribute ensemble learning method, namely, weight-selected attribute bagging (WSAB). WSAB is based on the fact that some attributes are more important for the classification problem than others [

The implementation of the WSAB model contains two phases. In the first phase, weights of attributes need to be calculated using some attribute evaluation method. The weight

The rest of this paper is organized as follows. The related research work is reviewed in Section

Bagging [

Boosting [

Standard bagging and AdaBoost do not need too many rounds in training. Experimental results [

Compared to data partitioning ensemble methods, attribute partitioning ensemble methods can make individual classifiers within an ensemble more “independent” [

The bagging method based on attribute partitioning can be called attribute bagging (AB). The AB method generates attribute subsets through selecting attributes from the whole attribute set without replacement. Then, projections of training examples onto attribute subsets are created. Each child classifier is trained based on each projection, respectively, and all child classifiers are aggregated by some combination strategy. During the test phase, a test instance is fed to all child classifiers simultaneously and a collective decision is obtained based on the aggregation strategy. In conventional attribute bagging methods, attribute subsets are generated through randomly selecting attributes from the whole attribute set. This method is called randomly selected attribute bagging (RSAB). For RSAB, all attributes have the same probability to be selected into one attribute subset. However, some attributes are very important but the others are not important for classification problems. As mentioned before, RSAB has a deficiency that some attribute subsets may only contain the attributes that contribute less to classification. Such classifiers are prone to resulting in bad bagging results.

To overcome the deficiency of RSAB, some optimization methods are used to select optimal attribute subsets. Guerra-Salcedo and Whitley use a genetic algorithm (GA) to explore the space of all possible feature subsets [

In the first phase of WSAB modeling, weights of attributes need to be computed using attribute (or feature) evaluation method. In practice, many methods can be used to obtain weights of attributes such as linear SVM (LSVM), principal component analysis (PCA), correlation analysis, F-score model, LDA, and multivariate adaptive regression splines (MARS). Different approaches to decide weights of attributes are of different characteristics. In this paper, we attempt to employ LSVM and PCA, respectively, to evaluate weights of attributes.

The SVM, proposed by Vapnik [

A linear SVM in 2-dimensional space with a maximum-margin hyperplane.

Assume that there exists a training example set

Then, the optimal hyperplane decision function

SVM with linear kernel can be used to evaluate the weights of attributes. The decision function can be rewritten as

According to [

Attribute (or feature) evaluating using LSVM has a strict theoretical foundation [

For linear SVM, it turns out that

Support vector machine is based on the structural risk minimization theory and is an outstanding classification method. Meanwhile, the credit scoring problem is also a classification problem in essence. Hence, it is natural and reasonable that linear support vector machine is adopted to evaluate weights of attributes. In other words, the weights decided by LSVM are closely related to classification ability of classifiers.

As an alternative method, PCA is also used to evaluate weights of attributes in this paper. The main idea of PCA is to find the principal directions which best describe the distribution of credit samples within the entire credit sample space. The original variables are transformed to a set of new variables which are uncorrelated with each other and can be ranked from large to small in terms of variance such that the first several variables retain most of the variation in the entire original data.

Considering the set

Thus, the vectors

The eigenvectors can be ranked via their corresponding eigenvalues from large to small to reflect their importance extent in characterizing the variation of original data. These eigenvectors span a new space, and all training samples are projected into the new space.

An example

The eigenvalue

PCA and LSVM provide two different ways of evaluating weights of attributes. LSVM method selects attributes in the original space while PCA method selects attributes in the transformed feature space. Additionally, weights of attributes decided by LSVM reflect the classification ability of the corresponding attributes, and those obtained by PCA reflect the description capability for original data distribution.

After obtaining weights of attributes, we can select attributes based on probabilities decided by weights of attributes and train multiple classifiers using different attribute subsets, respectively, to perform attribute bagging.

The essence of weight-selected attribute bagging (WSAB) lies in the way of selecting attributes for each individual classifier in an ensemble (bagging). Concretely speaking, weights of attributes are used to construct many different attribute subsets such that the attributes with the larger weights have the larger probabilities to be selected into each attribute subset. The selection of attributes for each single attribute subset does not permit repetition of attributes that is, there are no repeated attributes in each attribute subset, but the same attributes can be chosen into different attribute subsets. Thus, the subsets containing unimportant features only can be avoided with a larger probability, compared to randomly selected attribute bagging (RSAB), so that the classification accuracies of individual classifiers can be guaranteed. On the other hand, the diversity of individual classifiers can be still ensured because attributes are chosen by probabilities and there exist differences among different attribute subsets.

Subsequently, like standard attribute bagging (AB), projections of training examples onto these attribute subsets are created. Individual scoring models are trained based on each projection, respectively, and all individual scoring models are aggregated by a specific strategy for test instances.

The appropriate size of attribute subset can be determined by cross-validation technique. The original training set is divided into two parts—a new training set and a validation set. With the attribute subset size changing from 1 to

The main steps of WSAB are as follows.

Compute weights of attributes,

Decide an appropriate attribute subset size,

Generate a series of attribute subsets through repeating the following substeps.

Construct an array with

Perform the following cycle to construct an attribute subset with

randomly select an element of the array into the subset;

delete all positions of the chosen element from the array;

if

Create projections of training examples onto the selected attribute subsets.

Train individual classification models based on each projection, respectively, and use all individual scoring models to vote for test instances.

The modeling process of WSAB is illustrated in Figure

Weight-selected attribute bagging model.

Two datasets are described in Table

Description of credit datasets.

German credit | Australian credit | |
---|---|---|

Number of observers | 1000 | 690 |

Number of predictive attributes | 24 | 14 |

Percentage of good credit | 70% | 44.5% |

Percentage of bad credit | 30% | 55.5% |

We randomly divided the whole dataset into two parts—training set and test set. Training set takes two-thirds of the whole dataset and testing set takes one third of the whole dataset. The SVM with a Gaussian kernel was chosen as the basic classifier. Grid search in training set was used to decide the best parameters of SVM. The PR_tools [

The LSVM was performed on training dataset to obtain the weight of the

Importance percentage of each attribute for Australian dataset.

Importance percentage of each attribute for German dataset.

For Australian dataset, the top four important attributes make up 61.30% of the sum of all attribute weights, while the four most unimportant attributes only make up 3.82%. In addition, the most important one makes up 28.91% of the sum of all attribute weights and the most unimportant one makes up only 0.37% of the sum. The statistical results mean that, for Australian dataset, a few attributes are very important yet some other attributes contribute less to classification.

For German dataset, the top seven important attributes make up 56.33% of the sum of all attribute weights, while the four most unimportant attributes only make up 8.46%. Meanwhile, the most important attribute makes up 12.76% of the sum of all attribute weights and the most unimportant one makes up only 0.71% of the sum. Compared to Australian dataset, we can conclude that the importance extent of each attribute in German dataset is more evenly distributed.

We also performed PCA to calculate weights of attributes. Just as described in Section

The eigenvalues

Importance percentage of each new attribute for Australian dataset.

Importance percentage of each new attribute for German dataset.

For Australian dataset, the top four important attributes make up 78.3% of the sum of all attribute weights, while the four most unimportant attributes only make up 1.80%. Additionally, the most important attribute makes up 30.2% of the sum of all attribute weights and the most unimportant one makes up only 0.23%. The results reflect the fact that several main attributes contribute more to the description of the original data for Australian dataset, yet some other attributes provide less information for the original data.

For German dataset, the top seven important attributes make up 63.25% of the sum of all attribute weights, while the seven most unimportant attributes only make up 5.85%. The most important attribute makes up 13.5% of the sum of all attribute weights and the most unimportant attribute makes up only 0.33%. Therefore, the importance of each attribute for German dataset is more evenly distributed, compared to Australian dataset.

Interestingly, the results obtained via PCA are similar to those via LSVM not only for Australian dataset but also for German dataset, although LSVM and PCA adopt different approaches to calculating the weights of attributes one for data classification, and the other for data description.

The size of attribute subset is critical for attribute bagging. Hence, this section will evaluate classification accuracy of the WSAB with different sizes of attribute subsets. Meanwhile, several other related methods were also compared.

For the convenience of expression, the WSAB using LSVM to determine weights of attributes is abbreviated as LSVM-WSAB; the WSAB using PCA to calculate weights of attributes is denoted as PCA-WSAB; the randomly selected attribute bagging is written as RSAB. For each bagging method as well as for each size of attribute subset ranging from 1 to

Average accuracies of bagging methods and BS-SVM with the size of attribute subset changing based on Australian dataset.

Average accuracies of bagging methods and BS-SVM with the size of attribute subset changing based on German dataset.

From Figures

LSVM-WSAB, PCA-WSAB and RSAB are more accurate than BS-SVM for large size of attribute subset. The maximum accuracies of LSVM-WSAB, PCA-WSAB, and RSAB are higher than that of BS-SVM. The results prove that attribute bagging can improve effectively the performance of single classifier.

Additionally, PCA-WSAB needs less attributes to reach the maximum accuracy than LSVM-WSAB. The reason lies in the fact that PCA eliminates the correlation among attributes and the important attributes are more concentrated on several eigenvectors.

Moreover, an interesting finding for WSAB is that small size of attribute subsets can achieve high accuracy for Australian dataset, whereas for German dataset, accuracy of WSAB rises slowly with the size of attribute subset increasing. As we mentioned before, the important attributes in Australian dataset are concentrated on only several variables, and the remaining attributes contribute less to classification. Hence, WSAB model can acquire enough information using a small size of attribute subset for Australian dataset. However, for German dataset, the importance of attributes is more evenly distributed, and thus WSAB needs larger size of attribute subset to obtain enough information for classification.

When computing average accuracy of 30 trials for each attribute bagging model as well as for each attribute subset size, the standard deviation of accuracy was also computed to evaluate the classification stability of attribute bagging models. The standard deviations are shown in Figure

Standard deviations for three attribute bagging methods: LSVM-WSAB, PCA-WSAB, and RSAB based on Australian dataset.

Standard deviations for three attribute bagging methods: LSVM-WSAB, PCA-WSAB, and RSAB based on German dataset.

From Figures

The highest accuracy of each model and the corresponding standard deviation (Std) are shown in Table

The highest accuracy of each model and the corresponding standard deviation (Std) for Australian dataset.

LSVM-WSAB |
PCA-WSAB |
RSAB |
BS-SVM |
SVM | |
---|---|---|---|---|---|

Accuracy | 0.8912 | 0.8918 | 0.8880 | 0.8800 | 0.8750 |

Std | 0.0013 | 0.0015 | 0.0030 | 0 | 0 |

The highest accuracy of each model and the corresponding standard deviation (Std) for German dataset.

LSVM-WSAB |
PCA-WSAB |
RSAB |
BS-SVM |
SVM | |
---|---|---|---|---|---|

Accuracy | 0.7924 | 0.7930 | 0.7860 | 0.7824 | 0.7729 |

Std | 0.0034 | 0.0033 | 0.0043 | 0 | 0 |

From Tables

In this section, each attribute bagging model adopts its optimal attribute subset size, which is calculated by cross-validation. Then, we compare the accuracies of attribute bagging models including LSVM-WSAB, PCA-WSAB, and RSAB, as well as data partitioning ensemble models including standard bagging (SB) and AdaBoost, with the number of voters changing. For each number of voters and for each model, the experiments were also repeated 30 times, and the average accuracy was computed. The results are illustrated in Figure

The average accuracies of attribute bagging, standard bagging (SB), and AdaBoost with the number of voters changing for Australian dataset.

The average accuracies of attribute bagging, standard bagging (SB), and AdaBoost with the number of voters changing for German dataset.

From Figures

Moreover, the accuracy of standard bagging model increases gradually before the number of voters reaches 20, and its accuracy maintains at a certain level after the number of the voters is larger than 20. The accuracy of AdaBoost fluctuates most sharply with the number of voters changing. Meanwhile, the accuracies of attribute bagging models increase quickly before the number of voters reaches 20 and then their accuracies also maintain at certain levels. This is because standard bagging and AdaBoost sample from training dataset with all attributes for each voter, whereas attribute bagging only uses part of attributes. Therefore, attribute bagging needs to use more voters to “cover” all attributes, and with the number of voters increasing, more information is integrated into bagging model. The higher accuracies achieved by attribute bagging models support the conclusion that attribute bagging models are superior to data partitioning ensemble models.

For small number of voters, both LSVM-WSAB and PCA-WSAB perform better than RSAB. This further proves our idea that WSAB model can utilize important attributes to obtain better classification results. For large numbers of voters, WSAB model performs slightly better than RSAB for Australian dataset and much better than RSAB for German dataset. Therefore, the conclusion can be made that WSAB outperforms RSAB.

Besides computing average accuracies of 30 trials for each number of voters, we also computed the standard deviation of accuracy to evaluate the classification stability of different methods. The standard deviations of classification accuracy for each model are shown in Figure

Standard deviations of accuracy of attribute bagging, standard bagging, and AdaBoost with the number of voters changing for Australian dataset.

Standard deviations of accuracy of attribute bagging, standard bagging, and AdaBoost with the number of voters changing for German dataset.

From Figures

Furthermore, WSAB is more stable than RSAB. The reason is that WSAB can select important attributes for each child classifier, such that the accuracies of child classifiers in WSAB fluctuate less than those of RSAB. Therefore, from the viewpoint of the whole results, WSAB is more stable than RSAB.

When the number of voters is larger than 50, the accuracy and stability of each ensemble model maintain certain levels. Therefore, in order to compare the performance of all ensemble models, we show the accuracies of all ensemble models in Table

The accuracy of each model and the corresponding standard deviation for Australian dataset.

LSVM-WSAB | PCA-WSAB | RSAB | SB | AdaBoost | |
---|---|---|---|---|---|

Accuracy | 0.8919 | 0.8922 | 0.8890 | 0.8807 | 0.8779 |

Std | 0.0011 | 0.0013 | 0.0020 | 0.0026 | 0.0114 |

The accuracy of each model and the corresponding standard deviation for German dataset.

LSVM-WSAB | PCA-WSAB | RSAB | SB | AdaBoost | |
---|---|---|---|---|---|

Accuracy | 0.7918 | 0.7924 | 0.7842 | 0.7775 | 0.7731 |

Std | 0.0032 | 0.0029 | 0.0045 | 0.0041 | 0.0121 |

From Tables

This paper presents the WSAB for credit risk evaluation. The implementation of WSAB includes two steps. The first step is to determine weights of attributes. During the second step, attributes are selected into attribute subsets according to the probabilities determined by attribute weights. This method of modeling makes the WSAB have two advantages, namely, improving the accuracy of each individual classifier in ensemble and increasing the diversity among all individual classifiers. For the first merit, the more important attributes can be incorporated into each attribute subset with the larger probabilities so that each individual classifier can acquire high classification accuracy. For the second merit, the way of selecting attributes by probability makes different attribute subsets have different unimportant attributes which are of small weights, and consequently the diversity among different classifiers can be still guaranteed. In fact, accuracy and diversity are two critical factors for bagging. Experimental results also confirm the superiority of WSAB over randomly selected attribute bagging (RSAB), especially over standard bagging, AdaBoost, and individual classifier.

Broadly speaking, the WSAB provides a framework of evaluating credit risk. In this framework, any attribute weighting method and any basis classifier can be combined. This paper adopts two completely different ways to compute weights of attributes: LSVM and PCA. The weights obtained by LSVM emphasize the classification ability of attributes, and the weights from PCA reflect the description ability of attributes for original data. However, credit scoring is just considered as a classification problem, for which LSVM seems to be more suitable than PCA, and experimental results also demonstrate the conclusion.

The next work will attempt to combine other approaches of computing weights of attributes and other basis classifiers to perform credit risk evaluation and then to compare their performances in terms of accuracy and stability. Additionally, the WSAB can also be applied to other practical systems, such as stock market prediction [

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions which have led to great improvement on this paper. This work is supported by the National Natural Science Foundation of China (no. 61271374) and the Beijing Natural Science Foundation (no. 4122068).