Research on Credit Risk Identification of Internet Financial Enterprises Based on Big Data

)e advent of the era of big data has provided a new way of development for Internet financial credit collection. )e traditional methods of credit risk identification of Internet financial enterprises cannot get the characteristics of credit risk zoning, leading to large errors in the results of credit risk identification.)erefore, this paper proposes a new method of credit risk identification based on big data for Internet financial enterprises. According to the big data perspective, the credit risk assessment steps of Internet financial enterprises are analyzed and the weight of assessment indicators is calculated using the improved analytic hierarchy process (AHP), and the linear weighted synthesis method is applied to comprehensively assess the credit of clients. Using the unique characteristics of big data credit risk region division, the big data credit risk is determined by rule-based matching method. )e eXtreme Gradient Boosting (XGBoost) machine learning algorithm is used to establish a credit risk identification model of Internet financial enterprises. )e kappa coefficient and ROC curve are used to evaluate the performance of the proposed method. Experimental results show that the proposed method can accurately assess the credit risk of Internet financial enterprises.


Introduction
Internet finance refers to a new financial form that, based on traditional finance, realizes functions such as payment, financing, and credit intermediaries under a series of Internet technologies such as big data technology and cloud computing technology [1]. e forms of Internet finance are generally third-party payment, financial e-commerce, credit evaluation, Internet money funds, big data index funds, and other models. Internet finance is the assimilation of the Internet field and the financial field. However, it is not the simple integration of the Internet and finance but the innovative transformation of financial business by using a safe and effective Internet network under certain market conditions. It is the fusion of traditional financial business and the Internet spirit. Internet finance is a new thing, its development speed and scale far exceed that of other industries, and its development prospect is great; therefore, the Internet finance industry is a sunrise industry. e risk of Internet finance is a big difficulty to the development of the industry. e investigation on the risk management of Internet finance has become a hot topic in society and academia [2]. Jiang and Yuan [3] proposed a credit evaluation method in e-commerce transactions based on decision analysis. e credit risk influencing factors of business transactions are divided into several evaluation factors by introducing the Kanno doff integral evaluation method, and the credit risk F-integral evaluation model is constructed to divide the transaction credit grade. e interval distribution of commodity prices is used to add points to successful transactions and deduct points from failed transactions at different levels, to solve the problem of cycle deception in transactions. Based on the credit scoring results, the risk calculation method of e-commerce transactions is designed to evaluate the credit degree of the current transaction. Yang et al. [4] used the enterprise transaction data of an Internet financial platform as the object, analyzed the propagation behavior of overdue loan default, and proposed to build a model to identify the high-risk enterprises of the Internet financial platform through the propagation characteristics. Based on constructing SIS and SIR models based on threshold propagation and random propagation, the model is transformed into an algorithm that can evaluate the enterprise value at risk and further verified and compared with the actual default data. However, the above two traditional methods cannot obtain the regional division characteristics of credit risk, resulting in large errors in the results of credit risk identification. Gao and Xiao [5] studied the management of credit risk for consumer finance using big data. eir risk management model exhibited good prediction ability, can discriminate between normal loan customers and default loan customers, and is appropriate for practical personal credit risk control business. Wang [6] preprocessed the Internet financial credit data and selected the variables for the active credit tracking model of BP neural network using adaptive genetic algorithm. Liu [7] studied the importance of machine learning and big data as an efficient data exploration approach for insurance risk management using random forest algorithm. Fatao et al. [8] employed supply chain financial credit risk indicators and built an online evaluation index model for supply chain financial credit risk in commercial bank. Shen [9] established effective financial risk early warning systems and techniques and took effective actions to avoid risks and to guarantee the normal operation of Internet banking. e financial risk early warning system based on large data will quickly expand under the financial Internet era background. Lyu and Zhao [10] investigated the use of compressed sensing in risk evaluation of Internet finance based on big data. Yang et al. [11] developed an Internet supply chain financial risk managing model through data science. Zhang [12] constructed a financial investment risk method with the help of an intelligent fuzzy neural network. Similarly, Teles et al. [13] proposed a credit risk prediction model applying artificial neural network (ANN) and Bayesian network models. Xu et al. [14] employed backpropagation neural network (BPNN) and information entropy to recognize and classify the risk of the bank branches. e neural network can solve the nonlinear problem without depending on the function setting to get a more precise simulation effect, so it can measure the estimate effect of the risk early warning model more precisely. In this study, we propose a new credit risk identification method for Internet financial enterprises based on big data. e XGBoost algorithm is employed to develop a credit risk identification model of Internet financial enterprises.
e performance of the model is evaluated using the kappa coefficient and ROC curve. Results show that the proposed risk identification model can accurately measure the credit risk of Internet financial enterprises. e rest of the paper is arranged as follows. In Section 2, the index weight calculation method is discussed. Section 3 illustrates the proposed XGBoost algorithm for the identification of credit risk. e results are given in Section 4, and Section 5 concludes the paper.

Credit Risk Detection of Internet Financial Enterprises Based on Big Data
When designing the enterprise credit risk assessment index system, it is essential to consider the defects of the traditional enterprise credit risk assessment index systems [15]. In the development of a credit risk assessment index system, the integration of dynamic data and static data is adopted to complete data mining, analysis, and modeling from big data. Moreover, the enterprise identity data, behavior data, and external data are used to construct an enterprise credit risk assessment index system [16]. It is crucial that the enterprise credit risk evaluation index system shall be mainly designed from the perspectives of income information, loan information, account information, repayment and overdue information, and third-party information, such as multi-end loan information, black and gray list, credit information, and so on. e enterprise credit risk evaluation index system based on big data has the characteristics of rich evaluation data items, a combination of static data and dynamic data, wide data sources, and timeliness. Although the traditional enterprise credit risk assessment index system in the crowd coverage is not ideal, the data are mostly static, and the data authenticity cannot be verified. e data of the big data platform for Internet enterprises are mostly used from multiple channels, such as APP product data of companies, credit information data of the PBC, credit information data of Internet credit information companies, online shopping data of e-commerce companies, data of third-party cooperative enterprises, public data crawled by crawlers, and data published by the Public Security and Inspection Law. e data used in this study are desensitized from many aspects to ensure the safety of the data.

Credit Evaluation Index Weight Calculation.
e improved analytic hierarchy process (AHP) method [17] is used to compute the weight of evaluation indicators. is method uses the concept of optimal transfer matrix to improve AHP, make it naturally meet the consistency requirements, and directly calculate the weight value. e major steps are as follows: (i) A credit risk evaluation index system is established based on market transactions and the evaluation index is set according to the customer credit evaluation index system. (ii) A judgment matrix is built. After establishing the matrix according to the credit risk evaluation index system, the weight of each level index in the customer credit risk evaluation index system is determined by AHP. By comparing the two elements, the relative importance of each element in the hierarchy relative to a certain factor in the upper hierarchy is determined, and a judgment matrix is created. A comparison matrix of two factors for a given criterion is calculated as follows: where y ij is the scale of the importance of factor Y i and factor Y j relative to the criterion index. (iii) e weight of evaluation indexes at all levels is calculated based on the improved AHP method. ere is no need to do a consistency check after calculating the weights of indexes by the improved AHP method. Firstly, the judgment matrix is modified to get the quasi-optimal matrix Y * , followed by the square root method is used to solve the Y * eigenvector. Next, the elements of the judgment moment Y * are multiplied by the lines to get the following expression: (2) By dividing the product into n power roots, an equation P i � N 1/n i is obtained, and the root vector P � [P 1 , P 2 , . . . , P n ] Q is normalized, i.e., the sorting weight vector P can be calculated as e improved AHP method is represented in Figure 1.

Credit Evaluation Based on Linear Weighted Synthesis
Method. e linear weighted synthesis method for customer credit evaluation is a kind of comprehensive method to obtain the comprehensive evaluation value by weighted summation of each index value. e evaluation value of the credit risk of the first j customer can be calculated as where the index value of the i th index of the j th customer is Q ji , j � 1, 2, . . . , m. e static client credit risk rating is classified into four levels, with grade 1 denoting 89-100 points, grade 2 denoting 75-88 points, grade 3 denoting 59-74 points, and grade 4 denoting ≤58 points. e weighted credit scoring method is used in static client credit risk rating. e first step is to score some credit survey indicators as a whole, the second step is to uniformly weighted average, and the last step is to get the method of credit risk score. is can be computed using the following equation: where the credit score of a customer is represented by X, e weight of the proposed i th credit investigation index is represented by b i and set to n i�1 (b i · Y i ) � 1. e evaluation score of the i th evaluation index is shown by Y i . is method is applied to quantitative analysis and research of customer credit. e theoretical basis of financial enterprise decision making is mainly to obtain the credit status of the enterprise based on objective data, while the static client credit rating method is a simple and easy method to understand and operate.

Credit Risk Detection Based on Big Data.
According to the regional division characteristics of big data credit risk [18], the credit risk is detected by rule-based matching method. e centralized detection process is shown in Figure 2.
e data are divided into different packets and matched. If the matching process is successful, the output is generated. e credit risk of big data is divided into five areas and the centralized detection problem will be converted to the target maximum value, and read the data of the fitness function before the processing step, where y k (x) is the actual output of data, y k ′ (x) represents the expected output of data, and k shows the total amount of data obtained. e matching process of packets given in Figure 2 is as follows.  Figure 3.

High Credit Risk
H(x) is set to indicate the height at which the string S moves upward, l indicates the length of the string, R(x) indicates that the string S appears at the position closest to the right of the data x, and i indicates that the position below the position does not match (the length from the far left), as shown below: the data X are in the string.

Matching Rules for Low Credit Risk Data.
When different strings are aligned, the moving distance is determined by matching rules of low credit risk data during the matching process from right to left. e specific matching process is as follows. e string S is entered and the moving distance is initialized. Next, the string is traversed from right to left, and the traversal position i is analyzed as given in the following equation: All the strings are aligned and matched one by one from right to left. If it is matched with the leftmost end of the string, it means that the matching is successful. According to the matching process, it can ensure that any distance is a safe match and no omission will occur. Furthermore, it can achieve centralized and precise detection of big data credit risk areas at the fastest matching speed.

Credit Risk Identification Model of Internet Financial Enterprises
Based on the XGBoost algorithm [19], the credit risk identification model of Internet financial enterprises is constructed. It is a common and effective open-source implementation of the gradient boosted trees algorithm. e XGBoost algorithm provides better performance because of its vigorous handling of different data types, distributions, relationships, and the variety of hyperparameters that can be fine-tuned [20]. e XGBoost algorithm can be used for regression, binary and multiclass classification, and ranking problems [21]. e basic element of constructing the XGBoost model is a tree set, and the binary tree structure is in the classification and regression tree which can reflect the actual result of the decision tree [22]. ere are two branches in the structure of the decision tree, namely, "Yes" and "No," which correspond to the right and left branches. Each feature variable is partitioned by the binary tree, and the feature space is partitioned to obtain several leaf nodes.
Suppose a set B � (x i , y i ) , in which there are m variables and n samples. rough K functions, the output B of the prediction model can be obtained based on the regression tree integration model: where Γ � f(x) � ω q(x) (q: R m ⟶ T, ω i ∈ R m ) represents the regression tree space; ω i shows the score corresponding to the i leaf; T is the number of leaf nodes present in the tree structure; q indicates the tree structure; f k represents the tree; and x i is the independent variable corresponding to the i th sample. e tree model is trained with the objective function u as given in the following equation: where o is the convex loss function to measure the difference between the real value y i and the predicted value y i and Ω stands for penalty term, and its expression is as follows: where ‖ϕ‖ represents the regular term and c stands for leaf node penalty, which is mainly used to avoid overfitting problems.
In the process of credit risk identification of Internet financial enterprises, the European space cannot be directly used to optimize the objective function. erefore, the credit risk identification model is trained through boosting learning strategy model, and the specific process is as follows: i represents the function newly added to Round t. Based on the above process, the objective function can be converted as where a is a constant. e credit risk identification model of Internet financial enterprises based on the RB-XGBoost algorithm is brought into the square loss function in the objective function and the following equation is obtained.
where B (t−1) i stands for residual. e loss function can be approximated by a Taylor expansion to obtain the following expression: When the loss function is a square loss during training, the following equation can be computed.
Substitution of parameters g i and h i in the objective function yields the following expression: where B (t−1) i represents the output of the model in Round t − 1 training and B i represents the dependent variable in the objective function, and if B i is known, the above objective function [20] can be simplified to obtain the following expression: where g i and h i are the parameters in the loss function. In different loss functions, the values of the above parameters are different; therefore, the values of g i h i are determined by the loss function. Hence, each tree is redefined by the following equation.
where ω represents the weights of leaf nodes in the tree structure, ω q(x) denotes the predicted values obtained by the tree model, and q: R d � 1, 2, . . . , T { } C shows the tree structure.
Model complexity includes two parts: L2 regularization of leaf node score and a total number T of leaf nodes. Model complexity Ω(f t ) can be obtained by tree definition: e smoothness of leaf nodes can be improved by L2 regularization to solve the overfitting problem. When the complexity of the model increases, there are two different types of accumulation, one of which is I j � i|q(x i ) � j , where I j represents the set of samples in a leaf node j. After adding complexity to the objective function, the final objective function, that is, the credit risk identification model of Internet financial enterprises, is obtained: e credit risk identification model of Internet financial enterprises constructed above is used to complete risk identification.

Experimental Results and Analysis
To examine the credit risk evaluation ability of the proposed credit risk identification model of Internet financial enterprises, help the enterprises to avoid the risk of electricity arrears, and provide the basis for urging the payment of electricity fees, five Internet financial enterprises in a certain city are selected as experimental objects, and the annual reports of these five companies in the recent three years are Mobile Information Systems selected as experimental data samples to carry out the experimental analysis.

Accuracy Test of Credit Risk Assessment with Different
Methods. To evaluate the credit risk of five selected Internet financial enterprises, the proposed method is used along with the methods given in [3,4], and the results of these methods are compared with the actual credit risk of the 5 companies. e experimental results are shown in Figure 4.
It can be seen that the credit risk grade scores of the five enterprises evaluated by the proposed method are closer to the actual credit risk grade scores of the five enterprises. is is because the proposed method can effectively combine the actual and objective data of the market transaction data of five companies to obtain the credit status of the enterprises. In addition, the proposed method can accurately obtain detailed information of the credit risk of the enterprises and establish an accurate credit risk evaluation index system, which makes the results of credit risk evaluation more scientific and accurate. e comparison results for the credit risk assessment accuracy of the two methods are shown in Figure 5.
By analyzing Figures 4 and 5, we can see that this method can effectively measure and evaluate all credit risk indicators and obtain the credit risk of enterprises. When the number of samples reaches 20000, the credit risk evaluation accuracy of references [3,4] is 0.50% and 0.30%, respectively, whereas the accuracy of credit risk evaluation of the proposed method reaches 0.92%.

Kappa Coefficient and ROC Curve Test of Different
Methods. To verify the recognition accuracies of all the three methods, the kappa coefficient and ROC curve are used. e kappa coefficient can weigh the difference between the predicted result and the real result. e kappa coefficient K can be computed as where p o represents the proportion of correctly identified samples in the total number of samples and p e is the randomness ratio. e higher the kappa coefficient K is, the more accurate is the recognition result of the method. e kappa coefficients of the proposed method, the method in reference [3], and the method in reference [4] are shown in Table 1.
It is evident that the kappa coefficients of the proposed method are higher than those of reference [3] and reference [4] in multiple iterations, which indicates that the proposed method can accurately identify the credit risks of Internet financial enterprises. is is because the method constructs a risk identification index system based on the data with high Method of reference [3] Method in this paper Method of reference [4] Actual credit risk  balance and completes the identification of credit risks of Internet financial enterprises based on the high-precision risk identification indexes.
In the ROC curve, the abscissa is the real case rate and the ordinate is the false-positive case rate. e larger the area enclosed by the ROC curve and the abscissa is, the higher is the recognition accuracy of the method. e proposed method and the methods given in reference [3] and reference [4] are used to identify the credit risks of different Internet financial enterprises, and the obtained ROC curve is shown in Figure 6. Figure 6 shows that the area of ROC curves and abscissa obtained by the proposed method is larger than the area of ROC curves and abscissa obtained by the methods proposed in [3,4], indicating that the accuracy of the proposed method is higher, and the identification of credit risks can be accurately completed in the Internet financial enterprises.

Conclusion
e rapid development of Internet finance provides new financing channels for the development of small and microenterprises and individual entrepreneurship. e conventional methods of credit risk prediction of Internet financial enterprises cannot get the characteristics of credit risk zoning, leading to large errors in the results of credit risk identification. In this study, a new method of credit risk identification based on big data for Internet financial enterprises is proposed. e risk evaluation steps of Internet financial enterprises are studied, and the importance of assessment indicators is measured using the improved AHP method. e linear weighted synthesis method is employed to systematically assess the credit of clients. Based on the unique characteristics of big data credit risk region division, the big data credit risk is predicted with the help of rulebased matching method. e XGBoost supervised machine  Mobile Information Systems learning algorithm is used to develop a credit risk prediction model of Internet financial enterprises. e performance of the model is evaluated with the kappa coefficient and ROC curve. Experimental results show that the proposed method can correctly assess the credit risk of Internet financial enterprises.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.