Improved AHP Model and Neural Network for Consumer Finance Credit Risk Assessment

. With the rapid expansion of the consumer financial market, the credit risk problem in borrowing has become increasingly prominent. Based on the analytic hierarchy process (AHP) and the long short-term memory (LSTM) model, this paper evaluates individual credit risk through the improved AHP and the optimized LSTM model. Firstly, the characteristic information is extracted, and the financial credit risk assessment index system structure is established. The data are input into the AHP-LSTM neural network, and the index data are fused with the AHP so as to obtain the risk level and serve as the expected output of the LSTM neural network. The results of the prewarning model after training can be used for financial credit risk assessment and prewarning. Based on LendingClub and PPDAI data sets, the experiment uses the AHP-LSTM model to classify and predict and compares it with other classification methods. Experimental results show that the performance of this method is superior to other comparison methods in both data sets, especially in the case of unbalanced data sets.


Introduction
Accompanied by the rapid expansion of the consumernance industry and the continuous expansion of the consumer credit scale, various nancial credit problems are relatively severe [1].With the establishment of the public credit investigation system, the demand for personal consumption credit has become increasingly strong [2].In order to adapt to the new changes, commercial institutions gradually began to expand the personal credit investigation business and the personal credit investigation system gradually moved toward marketization [3].e pattern of China's personal credit investigation market has shown a trend of diversi cation, and the design of the personal credit risk assessment model will be its core advantage and the key to lasting management [4].rough the use of appropriate evaluation methods, accurate and e cient identi cation of borrowers are likely to default so as to reduce bad debt losses of banks and consumer nance and other lending institutions to ensure the stable development of social economy [5].
In view of di erent credit risk assessment problems, risk assessment methods are constantly updated and developed.e authors of [6] stated that the online loan borrowers' credit risk assessment method based on the AHP-LSTM model extracted features from personal information, constructed the AHP-LSTM model through multigranularity scanning and the forest module, and predicted default of borrowers.At the same time, the Gini index was used to calculate the importance score of random forest features, and the Bo da counting method was used to sort and fuse the results [7].However, there is still more research space for the model to solve the problem of an unbalanced sample category [8].
e authors of [9] adopted a personal credit assessment based on the heterogeneous integration algorithm model to solve the problem that it is di cult to assess customer personal credit in bank loan risk control.e AUC value of the proposed heterogeneous ensemble learning model reaches 0.916, which is an average increase of 7.38% compared with the traditional machine learning model and has good generalization ability [10].e method based on the synchronous processing of sample undersampling and feature selection by the gray wolf optimization algorithm uses the classifier as the heuristic information of the gray wolf optimization algorithm to conduct intelligent search so as to obtain the combination of the optimal sample and the feature set [11].Tabu table strategy was introduced into the original gray wolf algorithm to avoid local optimization [12].Compared with other methods, the performance of this method in different data sets proves that this method can effectively solve the problem of sample imbalance, reduce the dimension of feature space, and improve the classification accuracy.
Studies on the missing value filling method (QL-RF) based on the Q learning and random forest and integrated classification model (QXB) based on the bagging framework using fusion quantum particle swarm optimization (QPSO) and XG Boost have also been further optimized [13].Among them, QL-RF is superior to the traditional RF filling method under G-means, F1-measure, and AUC, and QXB is significantly superior to SMOTE-RF and SMOTE-XG Boost [14].
e proposed method can effectively deal with the deletion and classification problems under high-dimensional unbalanced data [15].A personal credit evaluation model is established by using the support vector machine (SVM) [16].A genetic algorithm is introduced to optimize the model's parameters, and validity analysis and extension analysis are performed for the samples of two P2P lending platforms.Based on the empirical results, this paper discusses the potential risks of credit brushing, which can effectively solve the problem of personal credit evaluation of P2P lending platforms and has good robustness and popularization [17].
is paper uses the LSTM network to establish the personal credit evaluation model by improving the analytic hierarchy process.e final evaluation result of the traditional analytic hierarchy process is related to the subjective scale of the participants in the evaluation, which may lead to an inconsistent judgment matrix, requiring consistent testing and modification many times and resulting in a large workload in the evaluation process.e AHP-LSTM model can use the model to predict default in the case of unbalanced positive and negative samples, which improves the accuracy of credit risk assessment.e improved analytic hierarchy process can intuitively and comprehensively reflect the level of independent ability of credit risk evaluation but also better reflect the comprehensive independent ability of credit risk evaluation and can solve multiobjective complex problems.e concept of the optimal matrix is used to improve the traditional analytic hierarchy process. is method can make the evaluation results automatically meet the consistent requirements, simplify the consistent testing steps, and greatly reduce the workload of evaluation.
is paper consists of four main parts.e first part is related background introduction.
e second part is the methodology, which introduces the improvement of the analytic hierarchy process and the LSTM model and further establishes the credit risk assessment model.e third part is the result analysis and discussion.e fourth part is the conclusion.

Analytic Hierarchy Process.
e analytic hierarchy process (AHP) is an analytical and decision-making method for solving multiobjective complex problems.Firstly, the complex problem is decomposed into several evaluation factors and the corresponding index system is established.
en, the evaluation factors are divided into different hierarchical structures according to the subordinate relationship, and the hierarchical structure model is constructed.Using the important degree of M, the 1∼9 scale theory was introduced to obtain the quantitative judgment matrix, where the importance and secondary of the 1∼9 scale are defined in Tables 1 and 2, respectively.Finally, the relative weights of the factors at each level are calculated, and the consistency check is carried out.

Improve Analytic Hierarchy Process.
It can be seen from the algorithm steps of the traditional analytic hierarchy process that the final evaluation result of this method is related to the subjective scale of participants.Suppose there is a big difference between participants' subjective cognition and objective reality, in that case, there may be an inconsistent judgment matrix, which requires consistent testing and correction many times, resulting in a large workload in the evaluation process.To solve this problem, an improved analytic hierarchy process is applied, and the specific steps are as follows.
(1) According to the principle of the analytic hierarchy process, we construct the judgment matrix G i , which is shown in the following equation: In the formula, g xy is the importance of the factor x relative to the first factor y and g xy > 0, We add the row to get the sum vector as shown in the following equation: We normalize the vector to get the weight vector as shown in the following equation: We check consistency.
In order to coordinate the evaluation factors, the concept of the optimal matrix is used to improve the traditional analytic hierarchy process. is method can make the evaluation results automatically meet the consistency requirements, simplify the consistency testing steps, and greatly reduce the workload of evaluation.(6) We carry out total hierarchy sorting.e importance of the factor at the bottom relative to the factor at the top can be obtained by calculating layer by layer along the hierarchy structure, and the total ranking of the hierarchy can be completed.

Quantitative Evaluation Analysis of the Credit Risk
Assessment.Credit risk assessment adopts the improved analytic hierarchy process to carry out independent capability assessment.e specific algorithm steps are as follows: (1) Construct the judgment matrix K Given credit risk assessment, the expert survey table is firstly developed, and the relative importance of the factors at the ability level and the relative importance of the factors at the index level corresponding to different abilities are scored by the method of 1-9 scale theory.is paper takes the scoring results of relative importance of competency factors as an example and gives the judgment matrix of competency factors to target factors.
(2) Construct the antisymmetric matrix K 1 � lgK of K (3) Solve the optimal transfer matrix K 2 of the antisymmetric matrix K 1  (9) M is the vector and‾M is the weight vector between evaluation indexes.
At this point, the relative importance weights of competency layer factors to target the layer are obtained.
In this method, the comprehensive ability of credit risk assessment is decomposed to construct the lowest level evaluation index set that can reflect the independent ability of the credit risk assessment.Adopting the bottom-up approach, the influence of different evaluation indicators on the comprehensive independent ability of the credit risk assessment is reflected in the way of weight.On this basis, the paper provides a method that can quantitatively evaluate the independent ability of the credit risk assessment.e method in this paper can fully reflect the level and comprehensive autonomy of the credit risk assessment.

WT-LSTM Model.
e wavelet transform itself has the ability to process nonstationary financial time series data.Trend information and fluctuation information can be separated through multilayer decomposition and reconstruction of original signals.e decomposition process is as follows: (1) Decomposition by the fast binary orthogonal wavelet transform (Mallat algorithm) is shown in the following equation: where B and A are the low-pass filter and the highpass filter, respectively, t is the decomposition time, and G 0 is the initial time series.e original data are decomposed into D 1 and G 1 components for the first time, and the approximate signal G 1 is decomposed into G 2 and D 2 for the next time.is process continues for t times until t + 1 signal sequence is obtained.
(2) Data loss caused by binary sampling is recovered and reconstructed by the interpolation method as shown in the following equation: where B * and A * are the dual operators of B and A, respectively, and make the sum of reconstructed sequences equal to the original sequence.
In the selection of the wavelet, Daubechies 4 with the largest applicable range was selected, and the number of decomposition layers was 4. e wavelet transform is used to decompose the price time series of the credit risk assessment.Firstly, a low-frequency approximate sequence G 4 and highfrequency detail sequences D 1 , D 2 , D 3 , and D 4 are obtained by decomposition.
e interpolation method is used to reconstruct the approximate sequence G 4 and detail sequences D 1 , D 2 , D 3 , and D 4 .LSTM was used to predict the reconstructed subsequences, and the final prediction result was obtained by summing up the predicted subsequences.
e prediction process is shown in Figure 1.
e training parameters of the LSTM model are set as follows: the number of hidden cells of the LSTM layer is 200, the maximum number of training iterations is 200, the gradient threshold is set to 1, and the initial learning rate is 0.005.After 125 iterations, the learning rate is reduced by a multiplying factor of 0.2, and the predicted step size is 1.

CEEMDAN-LSTM Model.
Adaptive noise-complete empirical mode decomposition (CEEMDAN) is based on an improvement of empirical mode decomposition (EMD), which is also a method for analyzing nonlinear and nonstationary data by breaking down the sequence into a series of intrinsic mode function (IMF) components to represent data features at different time scales.However, the unimproved EMD has the defect of mode aliasing, so CEEMDAN based on the white noise method of ensemble empirical mode decomposition (EEMD) proposed the improvement measure of adding independent distributed Gaussian white noise into the original data.
e adaptive noise addition solved the problem of mode aliasing and excessive residual noise simultaneously and improved the decomposition efficiency.e CEEMDAN algorithm process is as follows.
(1) We add Gaussian white noise with normal distribution on the basis of original time series as shown in the following equation: where j(t) is the original sequence, ω y (t) is the white Gaussian noise, ε 0 is the standard deviation of noise, and n is the number of noise addition.(2) e first-order modal component xwf x 1 (t) is obtained according to the EMD method, the mean value is taken as the first xwf component, and the residual value after the first stage is calculated as follows: 4 Advances in Multimedia (3) Similarly, we take the residual term as the original time series, repeat steps ( 1) and ( 2), and add the adaptive Gaussian white noise for EMD decomposition to obtain A and the corresponding residual value.We repeat the above steps until the residual can no longer be decomposed, that is, the residual term has become the xwf 2 (t) monotone function or constant.When the amplitude is lower than the established threshold and cannot continue to extract the next modal function, the decomposition process ends.Finally, Z orthogonal xwf functions and the nal trend term res z are obtained as shown in the following equation: Based on the advantages of CEEMDAN in sequence decomposition, this paper constructs the CEEM-DAN-LSTM model to predict the price of the credit risk assessment.
Because the short-term uctuation is re ected by the high-frequency component, it has little impact on the original sequence, and the average value is close to zero.erefore, the high-frequency component can be screened through t-test, and then the new sequence can be adjusted in combination with xwf b and other factors.e trend term re ects the trend of the original sequence.
( When CEEMDAN was used to decompose the original sequence, the white Gaussian noise with a standard deviation of 0.2 was added, the number of additions was 500, and the maximum number of iterations was 2000.

CEEMDAN-SE-LSTM Model.
Based on the above models, sample entropy is introduced as the basis for imf component reconstruction.Starting from time series complexity, the sample entropy quantitatively describes the system complexity and regularity degree so as to judge the probability of generating new patterns.e larger the calculated entropy value is, the more complex the time series is and the higher the probability of generating a new mode is.Conversely, the simpler the sequence is, the lower the probability of generating a new mode is.e sample entropy is calculated as follows: (1) For given time series y(n), a set of K-dimensional vectors j z (1), . . ., j z (t − z + 1) can be obtained according to the sequence number, where j z (x) j(x), j(x + 1), . . ., j(x ) e distance of j z (x) is de ned as the absolute value of the maximum di erence between the two corresponding elements, which are denoted as d[jz(x), jz(y)].(3) Given threshold h, for each I, we calculate the number of d[j z (x), j z (y)] < r, which is denoted as T z (x) and de ned in the following equation:

Advances in Multimedia
(4) We calculate the mean of all the above de ned values, which are denoted as H r z as shown in the following equation: (5) We repeat the above steps to get H r z+1 .When n is limited, the estimated value of sample entropy SampEn is calculated as follows: Based on the characteristics of sample entropy in judging the sequence complexity and the probability of new patterns, this paper introduces the calculation of the sample entropy as the basis of reconstruction.Di erent from the previous two models which take low frequency and high frequency as the basis for reconstruction, this model needs to calculate sample entropy.e closer the sample entropy, the more similar the representative components and the more consistent the uctuations.e prediction process of this model is shown in Figure 2.

Framework Design.
Based on the hierarchy structure of the credit risk assessment public opinion index system, we use the improved AHP combined with LSTM to establish the AHP-LSTM credit risk early warning model and to carry out the credit risk assessment network public opinion early warning analysis.e framework of the early warning model is shown in Figure 3. e modeling process of the AHP-LSTM model proposed in this paper is as follows: (1) e AHP algorithm is used to analyze the training data set samples, the data feature components are obtained, and a new sample set is formed.(2) e LSTM network is built.We take the training set as the input of the LSTM network and samples in step 1 as the expected output of the LSTM network.(3) We set LSTM parameters and perform network training.(4) We take the test data as the input of LSTM, build a network public opinion warning model according to the expected output, and carry out the credit risk assessment and warning.

Model Training.
Several credit risk data are selected as samples and trained by the AHP-LSTM model.Firstly, the AHP-LSTM model parameters are determined.For the adjustment of the hidden layer node number and learning e ciency, the method of control variables is adopted, and nonkey parameters are determined rst.en, the learning e ciency η was then set and attenuated at a certain speed, and the results were normalized to [0, 1.0].In order to avoid the unsatisfactory e ect of random initialization, it is necessary to conduct multiple training and nally determine the optimal parameters.

Data Sources and Data
Preprocessing. e experiment used two data sets.887,979 credits were issued between 2007 and 2015 and rst downloaded from the LendingClub website.e speci c information is listed in Table 3, in which the default samples account for 7.6% of the total samples.e data set is mainly used for experimental veri cation.e speci c information is listed in Table 4.
Data preprocessing mainly includes two steps: data cleaning and feature preprocessing.e rst step is to conduct data cleaning on the samples.Firstly, features with missing values greater than 95% are screened out to test whether features are closely related to default.en, "MISSING" was used to ll in the vacancy value of categorical features from the missing value of features in the default sample.After the outliers are removed for numerical features, the corresponding feature mean value is used to ll the vacancy value.In feature preprocessing, the original features are processed to generate derivative variables.In order to reduce the computation amount, the number of category features with more categories is reduced.e data of LendingClub and PPDAI after nal processing are 62 and 58 dimensions, respectively.

Parameter Debugging and Comparative Experiment.
In order to ensure that the proportion of di erent samples in the training set and the test set is the same as that in the original data set, hierarchical sampling is used for crosscutting due to the large gap between the number of normal performance samples and default samples in the natural samples.
In the case of unbalanced positive and negative samples, the model can be predicted by the preprocessed data, and the Literature [18], Literature [19], Literature [20] and Literature [21] were selected as compared methods.
e results are listed in Table 5. Due to the imbalance of positive and negative samples in the data set, the recall rate and F1 value of all methods are relatively low, but the recall rate of the AHP-LSTM model is still 15.51% higher than that of the  6 Advances in Multimedia suboptimal literature [20] method.As for the accuracy and other indicators, except the literature [19] method that has a slightly higher accuracy, the proposed method has higher accuracy than other methods.In addition, the average accuracy of this method is the highest among all methods, and the standard deviation is relatively small.Experimental results show that this method has better performance and stronger stability.
In addition, in order to balance positive and negative samples, the undersampling operation is performed on the preprocessed data.
e experimental results are listed in Table 6.All indexes of the AHP-LSTM model are higher than those of other methods except that the literature [21] model has a slightly higher accuracy.e average accuracy of this method is the highest among all methods, and the standard deviation is relatively small.e above two experiments show that the AHP-LSTM model still has strong stability in the case of unbalanced positive and negative samples.
In the LendingClub data set, ROC curves of di erent methods before and after undersampling of normal performance samples are shown in Figures 4 and 5. e closer the curve is to the upper left corner (0,1) in the ROC curve, the better the performance.As can be seen from Figures 4  and 5, under the same FPR, the TPR of the AHP-LSTM  Advances in Multimedia model method is higher than that of other compared methods, indicating that the AHP-LSTM model method has better performance.
To verify the stability and universality of the method in this paper, the model is used to evaluate the credit risk of borrowers in the PP DAI data set.e experimental results of each method are listed in Table 7. e indexes of the AHP-LSTM model method and compared method are mostly more than 95%, especially the accuracy and the other indexes of the AHP-LSTM model method reach 100%, which is higher than other compared methods, due to the small gap between experimental results under di erent methods.

Display and Analysis of the Feature Importance Score.
Based on the LendingClub data set, this paper constructs the CREDITrisk assessment model of P2P online loan borrowers based on the AHP-LSTM model and solves the feature importance score of the model so as to explain the model to some extent.e top ten features of feature importance are selected here, and their normalized importance values are listed in Table 8.Among them, the rst "initial rating" refers to the user credit rating assessed by the letter of credit, which is divided into three levels: A, B, and C. Each level is divided into 1, 2, and 3 categories.Among them, A1 borrowers have the best credit rating and di erent credit ratings re ect the credit quality of borrowers."Certi ed status" indicates whether LendingClub has veri ed the borrower's income.
e veri ed income indicates that the borrower's income is real and relatively reliable.e "home state" refers to the state where the borrower lives when applying for a loan.In dealing with this feature, this paper divides the 50 states of the United States into three categories according to their economic development level.States with high economic development levels have many borrowers and a large number of defaults.
e purpose of the loan is mainly divided into debt consolidation, credit card repayment, house decoration, and other situations, and people with di erent borrowing purposes have di erent default rates.Finally, as for the loan interest rate, the number of repayments due this month and other characteristics, the higher the loan interest rate and amount, and the greater the borrower's probability of default are taken into account.
Similarly, in the PPDAI data set, this paper uses the credit model based on the AHP-LSTM model to predict.e normalized values of the top 10 features and importance in the feature importance score are listed in Table 9.In both data sets, the "initial rating" takes the rst place, so it can be used as an important reference index for the lender to predict whether the borrower defaults.e"loan type" can be divided into safety standard receivables, e-commerce, the ordinary standard, etc. e ordinary standard is the most common type of standard.Security standard receivable refers to the standard that the lender meets the amount of safety standard receivables that is greater than a certain value, and the loan credit score is greater than a certain value.E-commerce means that the borrower has passed the e-commerce certi cation, and the store runs well.It can be seen that the division of di erent populations has a certain in uence on the prediction results of the model.In addition, mobile phones, household registrations, and other certications re ect the authenticity of information lled in by borrowers, which is of certain importance to model prediction.
To sum up, the model can screen out features that greatly impact the prediction of whether a borrower defaults and the Literature [21] Literature [18] Literature [20] Literature [19] Proposed Literature [20] Literature [18] Literature [19] Literature [21] Proposed

Conclusion
With the continuous development of the financial industry, consumer financial risks greatly impact the market and individuals.e accuracy of personal credit risk assessment plays a positive role in reducing the losses of banks, consumer finance, and other lending institutions, which is conducive to the stability of the market.Based on the analytic hierarchy Process (AHP) and the LSTM model, this paper evaluates individual credit risk through the improved AHP and the optimized AHP-LSTM model.Based on the LendingClub and PPDAI data sets, the experiment uses the AHP-LSTM model method to classify and predict.It is compared with the random forest and the wide and deep model.Experimental results show that the performance of this method is superior to other comparison methods in both data sets, especially in the case of unbalanced data sets.In addition, this paper explains the prediction results of the model through the measure of feature importance, which is in line with people's intuitive and objective understanding.
In order to solve the problem of sample class imbalance, this paper simply uses undersampling technology to balance the model.In the follow-up work, cost-sensitive learning or other more effective class imbalance learning methods can be combined to improve the model performance further.In addition, to enhance the practicability and stability of the model, it can be applied more fully to the anticheating scenario.However, when the dimension of data features is high and sparse, the algorithm in this paper may not be able to find the optimal subspace, which is also the direction of further optimization.

Figure 3 :
Figure 3: Framework of the AHP-LSTM food safety network public opinion warning model.

Figure 4 :
Figure 4: ROC curve comparison of di erent methods in the original data set.

Figure 5 :
Figure 5: ROC curve comparison of di erent methods in the undersampled data set. .

Table 1 :
Definition of the primacy property of the 1∼9 scale.

Table 2 :
Definition of the secondary property of the 1∼9 scale.According to k * ij � 10 K 2ij , the quasi-optimal uniform matrix K * is constructed as shown in the following equation: e t-test with a 0.05 signi cance level and a nonzero mean for xwf x (t) was conducted successively.(3)After the sequential test, the rst component xwf g (t) with a signi cant nonzero mean is obtained; the high-frequency subsequence xwf b is obtained by adding xwf 1 (t) to xwf v−1 (t), and the low-frequency subsequence xwf 1 is obtained by adding xwf g (t) to xwf z (t), res z that continues as a trend item.Parameter settings refer to Torres parameter settings.

Table 3 :
Information description of the LendingClub data set.Amount of loan, amount of promised repayment, number of maturities, interest rate of loan, sum of interest so far, total amount of payment received recently, month of initiating loan, outstanding principal amount, etc

Table 4 :
Information description of the PPDAI data set.

Table 5 :
Performance comparison of methods in the original data set (unit: %).

Table 6 :
Performance comparison of methods in the undersampled data set (unit: %).

Table 7 :
Performance comparison of methods in the PPDAI data set (unit: %).

Table 8 :
Feature importance scores in the LendingClub data set.

Table 9 :
Feature importance scores in the PPDAI data set.