A Novel Ensemble Credit Scoring Model Based on Extreme Learning Machine and Generalized Fuzzy Soft Sets

.is paper mainly discusses the hybrid application of ensemble learning, classification, and feature selection (FS) algorithms simultaneously based on training data balancing for helping the proposed credit scoring model perform more effectively, which comprises three major stages. Firstly, it conducts preprocessing for collected credit data. .en, an efficient feature selection algorithm based on adaptive elastic net is employed to reduce the weakly related or uncorrelated variables to get high-quality training data. .irdly, a novel ensemble strategy is proposed to make the imbalanced training data set balanced for each extreme learning machine (ELM) classifier. Finally, a new weighting method for single ELM classifiers in the ensemble model is established with respect to their classification accuracy based on generalized fuzzy soft sets (GFSS) theory. A novel cosine-based distance measurement algorithm of GFSS is also proposed to calculate the weights of each ELM classifier. To confirm the efficiency of the proposed ensemble credit scoring model, we implemented experiments with real-world credit data sets for comparison. .e process of analysis, outcomes, and mathematical tests proved that the proposed model is capable of improving the effectiveness of classification in average accuracy, area under the curve (AUC), H-measure, and Brier’s score compared to all other single classifiers and ensemble approaches.


Introduction
Nowadays, financial institutions tend to adopt different risk assessment and credit scoring models to reduce potential risk to a certain extent [1]. By analyzing customer credit data to figure out the probability that potential borrowers will default on their loans, the evaluation approaches can be utilized to turn customer data into principle, which could support credit decisions [2]. In that way, an effective credit scoring model can be a reliable supporting system to help managers in making their financial decisions.
To handle the potential risk of financial services, in the past few years, increasingly financial institutions are moving from traditional manual methods to advanced approaches that require building various types of evaluation models. For credit evaluation, three main methods, which are statistical approaches, nonparametric approaches, and AI methods, are being widely utilized [3][4][5][6][7]. ese three methods work efficiently in various circumstances. Statistical methods consist of different models, which include discriminant analysis models, linear probability models, and probit and logit models. Yet nonparametric approaches tend to utilize the decision tree, K-nearest neighbor algorithm, fuzzy logic, Naïve Bayes, and so on. AI methods are more advanced and technology-dependent, such as artificial neural networks, support vector machines (SVM), particle swarm optimization (PSO), and genetic algorithm (GA).
Many researches also indicated that ensemble approaches show more effective performance in the evaluation of credit than single classifiers. To avoid the downsides of single classifiers, an increasing number of researchers have switched to using customized and combined various methods instead of using individual classification models separately. e principle of the hybrid approach is to perform preprocessing on the data input to the classifiers. e focus is to gather information from group-based classifiers based on the same issue and then export these strengths to get valid credit scoring decisions [8,9]. In recent years, the research of fuzzy soft sets theory has made great progress, especially in the application field of multiattribute decision making [10,11]. e development of fuzzy soft sets theory can provide a new perspective for us to build more state-of-the-art ensemble data classification and credit evaluation models [12]. e motivation of our study is to construct a more reliable credit scoring model that can generate accurate outcomes within imbalanced data. ree main approaches will be addressed to achieve this goal: (1) improved elastic net-based feature selection, (2) novel ensemble strategy and learning algorithm for imbalanced credit data, and (3) dynamic weighting method for single ELM classifiers based on new proposed similarity measure of GFSS. Because in real applications of credit risk evaluation, especially in peer-to-peer lending, credit data could be gathered from many different channels, including social networking and judicial administration platforms. e data collected from these channels are usually very sparse, redundant, rough, and imbalanced (good customers generally outnumber bad customers) and often consist of various weakly related or even uncorrelated features [13,14].
ese data characteristics will make the commonly used credit scoring models unstable, which leads to the credit evaluation results become unreliable and inaccurate.
rough the above three approaches proposed in this article, the problems arising in credit scoring for imbalanced data can be handled and solved effectively.
In Section 2, we will talk about the construction of a new ensemble credit scoring model. e experimental outcomes will be discussed in Section 3. Finally, Section 4 concludes the paper.

New Ensemble Credit Scoring Model
is section mainly talks about the construction of the ensemble classification model for credit scoring.

Adaptive Elastic Net-Based Feature Selection.
A large number of researchers have studied the appropriate feature selection approaches for credit scoring, such as cost-sensitive [15], information gain ratio [16], and genetic algorithm [17].
e Lasso estimator can reduce the regression coefficients to zero in L_1-norm.
is method can also reduce features (variables) as well as select the most important one to build simple but effective models while keeping the high efficiency. Denote historical credit scoring data as (x i , y i ), i � 1, 2, . . . , N, where x i � (x i1 , x i2 , . . . , x ip ) are variables for customers and y i are category tags (binary responses, denote 0 as default and 1 as nondefault). e regression model could be constructed as follows: where β 0 and β j are the intercept and regression coefficients, respectively. Suppose that every observation is not correlated and that all the variables are normalized. e Lasso proper estimate of β could be constructed as Based on the information above, a large λ would reduce some coefficients in β j to zero. at is, Lasso reduces the coefficients to zero while λ gradually increases. In addition, the Lasso model is able to hold any number of variables. erefore, both the reduction of coefficients and the selection of features (variables) can be carried out at the same time.
Although Lasso has been proved to be easily interpretable and effective under various circumstances, it still has some shortcomings [18]. Zou and Hastie [19] put forward an expansion approach called elastic net to conduct selection. Similarly, the elastic net is also able to conduct automatic selection of variables and shrinkage of coefficient at the same time and select groups of correlated variables. For any constant and nonnegative λ 1 and λ 2 , the estimation of β by the elastic net β Enet could be carried out as follows: where p j�1 |β j | is an element of L1-norm and λ 2 p j�1 β 2 j is the L2-norm element.
In addition, Zou and Zhang [20] also pointed out that the elastic net does not possess the oracle property. ey then proposed a new adaptive elastic net which combines the L2 penalty with the weighted L1 penalty to penalize the squared error loss. erefore, the adaptive elastic net could be treated as the package of the adaptive Lasso and elastic net. e valuation of β by the adaptive elastic net β Enet could be calculated as where ω j � (|β Enet |) − c ,c is positive, while λ * 1 is fixed and nonnegative.
Using formula (4), we will be capable of obtaining the most significant attributes ("big fish") from the variable pool. en, we can plug them into credit scoring models to get a more precise result but with minimum computational and operational cost.

ELM-Based
Classifier. ELM model, as a single-hidden layer feedforward neural network (SLFN), can select the input weight and hidden biases randomly without any adjustment during its process. e Moore-Penrose generalized inverse matrices of the hidden-layer output matrix can be utilized to analyze and determine the output weights. ELM exhibits excellent performance of generalizing and the reduction in the iterative time of training process. Clearly, it is more effective than any other ANN-type machine learning algorithms [21].
For the historical training credit data set (x i , y i ) that is mentioned above, the input vector x i � (x i1 , x i2 , . . . , x ip ) T ∈ R p is the i th sample with p-dimensional features, and Y � [y 1 , y 2 , . . . , y N ]. en, p is the amount of input neurons. p is also equivalent to the input features. Let L be the amount of hidden neurons. Denote C as the amount of output neurons, which is also equivalent to the category number. Denote the input weight matrix as Κ � [k 1 , k 2 , . . . , k L ], where k j � [k j1 , k j2 , . . . , k jn ] is the vector connecting the p input neurons with the jth hidden neuron. b � [b 1 , b 2 , . . . , b j , . . . , b L ] T is the bias value of the hidden neurons, where b j is the bias value of the jth hidden neurons. e above parameters do not change during the whole process. e output could be computed by as follows: where G(x) is the activation function. Let H be the output of all the samples. It can be calculated by using the following equation: e ith column represents the ith hidden nodes output vector relative to the inputs x 1 to x N . e jth row represents the output vector of the hidden layer relative to the input x j . e output of ELM can be calculated by where α i � [α i1 , α i2 , . . . , α iC ] T is the weight vector that connects the ith hidden nodes with the output nodes. ELM is able to evaluate those N samples without any mistake. In other words, L j�1 ‖o j − y j ‖ � 0. en, the following equation can be obtained: Equation (8) can also be rewritten as follows: Based on (9), the value of output weight could be estimated using a least square solution as follows: where H † stands for the Moore-Penrose generalized inverse of H. For credit scoring classification, the outcome of ELM is as follows:

Ensemble Strategy for Imbalanced Data.
To better solve the classification of imbalanced data, a considerable number of approaches have been used. ey can be categorized into three types: preprocessing, cost-sensitive learning, and ensemble methodology. Preprocessing is able to decrease the classification bias based on the bias-variance decomposition to enhance the single classifier. Undersampling [22,23], oversampling [24,25], and strategic sampling are extensively utilized to offset imbalanced data. Ensemble methodology can be viewed as a decisionmaking process that combines both individual learning algorithms and their outcomes in parallel to obtain the ultimate result. e basic idea behind the ensemble methodology is that the algorithm will get a number of single classifiers from the training set, and then it uses some ensemble strategies to integrate them to raise the accuracy and reliability of classification. Bagging [26], boosting [27], and stacking [28] are the most common ensemble approaches in credit scoring.
A novel ensemble strategy is planned for imbalanced data according to its imbalance ratio, which could determine the number of ELM classifiers that apply as single classifiers to predict credit scoring data, as well as the number of samples that feed into each ELM as training data.
For any given historical credit training data set that contains N samples, there are N + "good applicants" and N − "bad applicants," such that N + + N − � N. en, the imbalance ratio is called IR, which can be calculated as follows: After obtaining the IR, the amount of single ELM classifiers M in the ensemble model also can be calculated as follows: where the symbol ⌈ · ⌉ represents "ceiling" operation. Equation (13) can help us to not only determine the number of ELM classifiers needed in the ensemble credit scoring model but will also guide us to make the imbalanced data become balanced for each classifier. Regarding the ensemble strategy for imbalanced data, we proposed Mbased value. In the remainder of this subsection, we will elaborate the proposed strategy in detail.
Firstly, calculate the imbalance ratio IR for any given historical credit training data set on the basis of N + and N − . Secondly, determine the number of ELM classifiers M using (13).
irdly, for the first M-1 ELM classifiers, we feed N − "good applicants" samples and N − "bad applicants" samples into the ELM classifiers to make sure that the training data sets of the first M − 1 classifiers are balanced, and random sampling without replacement method is employed to extract N − samples from N + "good applicants" samples for each classifier. After finishing the first M-1 training data sets extraction, there are N + − (M − 1) · N − "good applicants" sample that have not been extracted.
Finally, for the last ELM classifier, we let the remaining N + − (M − 1) · N − "good applicants" samples into the training data sets. Considering that N + − (M − 1) · N − < N − , we will employ the SMOTE algorithm to create M · N − − N + "good Mathematical Problems in Engineering 3 applicants" samples from N + "good applicants" samples. us, for the last ELM classifier, there are still N − "good applicants" samples and N − "bad applicants" samples fed into it as the training data set. rough the described processes above, the ensemble strategy for the imbalanced data we proposed has been realized, and the training data for each classifier is balanced. In the next subsection, we will further introduce the GFSS theory-based ensemble credit scoring approach, which utilizes the results of each single ELM classifier.

GFSS
eory-Based Ensemble Credit Scoring Model. Since we have the results of each ELM classifier, we also need to figure out the weights with respect to their performance. e accuracy of classification is expected to be greatly improved. e theory of soft sets, which is firstly put forward by Molodtsov [29], can be regarded as a way for solving the uncertainties in imprecise environments (e.g., credit scoring area). Maji et al. [30] launched a research focusing on both fuzzy and soft sets. We will firstly introduce the principle of generalized fuzzy soft sets and then put forward a similarity measure of generalized fuzzy soft sets using angular cosine. After that, we can get the weights of each credit scoring model using similarity measure and the accuracy of classification. Finally, we are able to build the generalized fuzzy soft sets theory-based ensemble credit scoring model.

GFSS
eory. Based on the theories proposed by Molodtsov [29] and Maji et al. [30], we can make some definitions for fuzzy soft sets.
Definition 1. Denote U as the initial universal set. Denote P as a set of parameters. P(U) is the power set of U. (F, P) is a soft set over U if P is a mapping given by F : P ⟶ P(U). Definition 2. Denote U as the initial universal set. Denote P as a set of parameters. e power set of all fuzzy subsets of U is I U . Let A ⊂ P : A pair (F, P) be the fuzzy soft set over U if F is a mapping given by F : en, Maji and Samanta's [31] definition of GFSS is as follows: Definition 3. Denote U as the initial universal set. Denote P as a set of parameters. Let (U, P) be the soft universe. Denote F : P ⟶ I U and µ as the fuzzy subset of P, i.e., µ: P ⟶ I � [0, 1], where I U is all fuzzy subsets of U. Denote F µ as the mapping given by F µ : P ⟶ I U × I, which can also be denoted as F µ (e) � (F(e), µ(e)), where F(e) ∈ I U . In this way, F µ can be viewed as a GFSS over the soft universe (U, P).
For every e i , F µ (e i ) � (F(e i ), µ(e i )) illustrates both the level of belonging of the subsets of U to F(e i ) and the possibility of belonging.
In this paper, U denotes the historical data of customer credit X � x 1 , x 2 , . . . , x n , where x i ∈ R p , and F(e i ) represents the performance of the classification by a single customer from a single classifier and µ(e i ) denotes the overall degree of classification of a certain single classifier.

Similarity Measure of GFSS.
It is important to address the similarity measurement of GFSS during the setting of GFSS and the establishment of our model. us, we establish a new approach for the similarity measure of GFSS for the scoring of credits.
Definition 4. For M single classifiers, denote η t and μ m as the elements of F µ . e definition is as follows: where y t (t � 1, 2, . . . , T) is the category tag of the tth customer (binary responses denote 0 as default and 1 as nondefault); y m t (m � 1, 2, . . . , M) is the forecasting result of the tth customer predicted by the mth classifier and is between 0 and 1. η m t is the accuracy degree of classification of a single classifier for a single customer. is accuracy ranges between 0 and 1. is is in line with our initial intuition. In addition, μ m is the mth classifier's overall classification performance, which can be calculated as follows: where TP m , FN m , TN m , and FP m are elements of the confusion matrix (Table 1). TP m is the amount of good customers accurately labeled as good, TN m is the amount of bad customers accurately labeled as bad, FN m is the amount of good customers falsely labeled as bad, and FP m is the amount of bad customers falsely labeled as good. e greater the value of μ m , the more accurate the result is calculated by the mth classifier. Based on the above discussion, it can be noted that η it and μ m from Definition 4 are able to evaluate the performance of classification of every single model. erefore, we can build the GFSS F m μ (η, μ) of the mth classifier as follows:

Ensemble Credit Scoring
Modeling. e determination of the weight for every single model is the most important where 0 ≤ ϖ m ≤ 1 and M m�1 ϖ m � 1. us, the final score could be calculated as Figure 1 presents the flow-process diagram of the ensemble model. e algorithm of "ELM and GFSS eory-Based Hybrid Ensemble" model is described. We will call this EGHE in the following parts (Algorithm 1).

Preparation of Dataset.
During the evaluation, we collected many different private and public data sets. We have collected a total of six credit data sets and four additional imbalanced data sets with different IR are obtained, i.e., three public and three private. e public sets can be obtained from the UCI Machine Learning Repository. ey are realworld credit score data sets and are now widely used by researchers. e German, Australian, and Japanese data sets are used for extra verification. e private data sets consist of the Iranian data set that has also been widely used in many studies and the Bene 1 and 2 data sets, which can be obtained from two key financial institutions in Benelux et al. [32]. is Iranian set has various customer data of many small Iranian private banks [33,34]. Four additional imbalanced datasets are also from the Machine Learning Repository, UCI. ey are Shuttle, Skin_segment, MiniBooNE, and LC2017Q1 which contains loan data of the first quarter in 2017 from Lending Club. e characteristics of all experimental data sets can be found in Table 2.

Mathematical Problems in Engineering
In this paper, we compared our proposed model with the other four state-of-the-art models, namely, C5.0 decision tree, SVM with Radial Basis Function, kernel SVM-R, Deep Belief Networks (DBN), and Bayes to validate the performance of our approaches. All the continuous attributes will be discretized into various intervals. Every single data set will be divided into a two-thirds training set and one-third testing set randomly. We use the open source platform Rstatistics (version R-3.2.2.) to conduct our experiments.

Experimental
Results. Different methods are utilized as comparison models to test the validity of the EGHE credit scoring model.
Firstly, the FS algorithm that is based on AEnet is used to obtain the highly correlated variables after initial data gathering and preprocessing. We could notice that after selection, the variables in these ten data sets are all decreased to various degrees (Table 3). In consideration of the complexity of computation, deleting irrelevant or weakly correlated variables is becoming increasingly important for bigdata-oriented credit assessment issues.
Step 1. Preprocessing of data.
Step 3. Imbalanced data rebalancing by using the proposed ensemble strategy.
Step 4. Credit scoring of every single ELM classifier.
Step 7. Calculate the weight of mth classifier using (18).
Step 8. Get the final credit score of every customer y t . ALGORITHM 1: EGHE algorithm.  It is noteworthy that, compared with C5.0, SVM-R, DBN, and Bayes, ELM has manifested the superiority in accuracy on the vast majority of data sets. Table 6 reports the average running time (total time for training and testing) for all models. For these experiments, we use an Intel i5-8500 with CPU at 3.0 GHz and 16 GB of RAM.
From Table 6, we can see that ELM costs less time than other single models to carry out credit scoring activities. e efficiency of computing resources also makes ELM a great match for ensemble learning and modeling.
After implementing feature selection, completing the ensemble strategy, and individual model classification, the EGHE model can be achieved. Based on (14), (15), and (17)- (19), the weights of single ELM classifiers are calculated according to their efficiency, respectively. To validate the availability of EGHE, we employed several ensemble models in contrast with EGHE. ese models were split into two parts. e first one contains four FS algorithms with GFSSbased combination. ey are cost-sensitive, GA, information gain ratio (IGR), and elastic net (Enet). Cost-sensitive, GA, and IGR are popular feature selection approaches in the credit scoring area [16,35,36]. e second group applies four other approaches with AEnet-based feature selection, which were weighted average (WAVG) [37], majority voting (MajVot) [38], weighted voting (WVOT) [39], and fuzzy soft set (FSS).
ose methods are frequently adopted in the establishment and utilization of different combination models. ey also employ ELM as the classifier but did not take the ensemble strategy that is proposed above, only using random sampling methods to make all training data sets become balanced. Table 7 displays the results of AUC, H-measure, and Brier's score for all ensemble models.   Tables 7 and 5, we can see that, compared with the single classifiers, ensemble methods reveal significant advantages with regard to the accuracy of classification. Compared with other single classifiers and combined approaches in both groups, EGHE has an advantage in all metrics across all datasets. Experiments on several stateof-the-art ensemble models are performed to verify the effectiveness of the EGHE model. ey are an EMPNGAbased multistage hybrid model put forward by Zhang and Xia [37]; the heterogeneous ensemble credit model put forward by Xia et al. [40]; EBCA-RF&XGB-PSO model that is put forward by He et al. [41]; heterogeneous ensemble learning-based two-stage credit risk model (TSHE) proposed by Papouskova and Hajek [42]; twin neural networks (TNN) proposed by Jayadeva et al. [43]; and a new rule-based knowledge extraction (RKE) method proposed by Mahani and Baba [44] recently. Table 8 gives the results of ensemble models in different data sets.
From Table 8, we could tell that the results of these models are very close. e accuracy of EGHE model is better than most of the other models but Iranian. e EBCA-RF&XGB-PSO model achieved a high accuracy of 0.921 in the Iranian data set because it uses the Extended Balance Cascade method that can effectively solve the issue of class imbalance. However, the ensemble strategy and GFSS theory-based model EGHE can deal with the thorny problem of unbalanced data classification better in most experimental data sets; even in some severely skewed data sets, ideal outcomes have been achieved, such as in Shuttle, Skin_segment, MiniBooNE, and LC2017Q1.

Conclusion
In this paper, we proposed a novel ensemble credit scoring model called EGHE, which integrates efficient feature selection algorithm, novel ensemble strategy, and GFSS-based weighting method for single ELM classifiers. In the proposed model, the adaptive elastic net-based feature selection algorithm was firstly utilized to obtain high-quality training data to improve the evaluation efficiency without reducing the predictive precision. ELM model was employed as basic classifier, and a novel ensemble strategy was generated to make the imbalanced training data sets become balanced for each ELM classifier. Additionally, we proposed a new weighting method to build the GFSS theorybased ensemble credit scoring model. Dual-scale classification accuracy metric that is based on new similarity measurement of GFSS was constructed to compute the final weight of every single classifier. e biggest contribution of this paper is that the proposed EGHE is able to predict credit risk reliably and accurately, especially for unbalanced credit data. Comparisons between EGHE and other credit scoring models were implemented on ten real-world datasets with four metrics (average accuracy, AUC, H-measure, and Brier's score). A variety of state-of-the-art ensemble models were employed to compare with EGHE to prove its validity. e experiments results demonstrated that the proposed EGHE model was robust and represented a positive development in credit scoring.

Data Availability
(1) e "Germany" data set used to support the findings of this study are included within the following URL: http://archive. ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29. (2) e "Australia" data set used to support the findings of this study are included within the following URL: http://archive.ics. uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval %29. (3) e "Japan" data set used to support the findings of this study are included within the following URL: https://archive.ics. uci.edu/ml/datasets/Japanese+Credit+Screening. (4) e "Iran" data set used to support the findings of this study are included in [34,45] (5) e "Bene 1" and "Bene 2" data set used to support the findings of this study are included within the following article: [32]. (6) e "Shuttle" data set used to support the findings of this study are included within the following URL: http://archive.ics.uci.edu/ml/datasets/statlog+(shuttle). (7) e "Skin_segment" data set used to support the findings of this study are included within the following URL: http://archive.ics. uci.edu/ml/datasets/Skin+Segmentation. (8) e "MiniBooNE" data set used to support the findings of this study are included within the following URL: http://academictorrents.com/ details/7fafb101f9c7961f9b840daeb4af43039107ddef. (9) e "LC2017Q1" data set used to support the findings of this study are included within the following URL and article: [41] http:// www.lendingclub.com.

Disclosure
Dayu Xu and Xuyao Zhang are co-first authors.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.