Construction of Credit Evaluation Index System for Two-Stage Bayesian Discrimination: An Empirical Analysis of Small Chinese Enterprises

In China, small enterprises have a direct role in economic growth, but they have diﬃculty in ﬁnancing development. To address this problem, this paper creates a small business credit evaluation index using a two-stage Bayesian discriminant model. In the ﬁrst stage, customers are distinguished by whether they are in default, and in the second stage, customers with continuing default are divided into those with a high default loss rate and those with a low default loss rate. The literature to date has identiﬁed a credit index only for the ﬁrst stage; the credit evaluation index proposed here is based on two stages, which is more sensitive. Then, we conduct an empirical analysis using credit data on 3,111 small enterprises in China with a two-stage nonparametric Bayesian discriminant model and a parametric discriminant model, and then, we test the two indicator systems with discriminant accuracy and an ROC curve; the discriminant accuracy of the established index system is 77.95% and 70.95%, respectively, and their prediction accuracy is 0.902 and 0.866, respectively; they show that the constructed indicator system is robust and eﬀective. Finally, we conduct a comparative analysis of discriminant accuracy in three models, ﬁnding that the two-stage nonparametric model is optimal, the two-stage logistic regression model is suboptimal, and the two-stage parametric model is poor.


Introduction
Small business is one of the most active economic parts of the Chinese economy, but its development has been limited to some extent by its difficulty in obtaining financing. To reduce risk, commercial banks in China often extend loans with mortgage guarantees. Small enterprises themselves financial system is not standardized, their financial information is not perfect, and it is difficult for them to provide mortgage guarantee. For these and other reasons, they face problems in obtaining bank loans. Many papers on small business credit evaluation address the problems with small business loans [1,2]. At the same time, the issue facing banks is how to determine which factors explain the credit status of small enterprises. e number of such credit indicators for small enterprises are so numerous that a problem of information duplication arises. ey need a way to evaluate enterprise credit that is both more sensitive and less quantitatively intensive. erefore, this paper proposes a two-stage discriminant method for the selection of credit evaluation indicators to construct a scientific and complete indicator system that can distinguish the state of default of small enterprises. In addition, we believe that the two-stage credit evaluation index screening model can build a more sensitive credit evaluation index system, which can provide a reference for banks to conduct a scientific evaluation and credit evaluation of small enterprises.
Most existing systems for evaluating credit discriminate between these factors based only on whether enterprises are in default [3,4], rather than the level of loss from that default. Credit index discrimination cannot fully reflect the change of the default loss only based on whether the default occurs or not. Different default losses, such as partial default and total default, can also significantly affect the credit status of small enterprises. However, most people ignore the different credit characteristics of high default loss and low default loss. Moreover, the indicators in the credit evaluation index system must be those that can identify different credit states and have high discriminability. Based on the construction principle of the credit evaluation index system, we dig into the information that can identify the credit status from the perspective of two stages and then select the indicators. erefore, in this paper, we propose a system for evaluating credit with two stages of discrimination, twostage credit evaluation index discrimination means that the customers are divided into default and nondefault for the first stage of discrimination, and the defaulting customers are further divided into customers with high default loss rate and customers with low default loss rate for the second stage of discrimination. e credit evaluation index system is more sensitive because it uses a broader selection of credit indicators, and it is of great economic significance to the selection of credit indicators and the construction of credit evaluation index system in the future. To construct this system for evaluating small business credit, we use nonparametric and parametric Bayesian discrimination and propose employing combination screening based on Bayesian discrimination and cluster analysis, which can reduce the number of indicators that need to be included, thereby creating a more feasible evaluation method. Our sample comprises credit data on 3,111 Chinese small enterprises. We test the models with discriminant accuracy and an ROC curve, the discriminant accuracy of the index system constructed by the two-stage nonparametric Bayesian discriminant model and the two-stage parameter Bayesian discriminant model is 77.95% and 70.95%, respectively; their prediction accuracy is 0.902 and 0.866, respectively; they show that the constructed index system is robust and effective. en, comparative analysis of discriminant accuracy of three models shows that the two-stage nonparametric model is optimal, the two-stage logistic regression model is suboptimal, and the two-stage parametric model is poor. We can see that the two-stage nonparametric Bayesian discriminant model has a stronger sensitivity and higher ability of default discrimination, which can be applied in practice, and opens up a new way of two-stage discriminant in the construction of credit evaluation index system in the future. Moreover, aiming at the massive indicators of big data, this study can build a more sensitive indicator system with a more compact number of indicators, which has a very good application value for the selection of credit indicators. is is a point that needs special emphasis.
Small enterprises are of key importance in the Chinese economy, and traditional systems used for credit evaluation are typically based on the international 5C principles (character, capital, capacity, collateral, and business condition). Standard & Poor's credit rating is mainly based on business and financial indicators. e US credit rating agency, Moody's, evaluates enterprises based on their capital structure, sales growth, and other aspects. e Fitch rating agency mainly evaluates credit for enterprises based on their structure, corporate profitability, and corporate strategy. When the Industrial and Commercial Bank of China (ICBC) evaluates the creditworthiness of enterprises, it considers shareholders, economic conditions, development prospects, solvency, and other aspects. e China Construction Bank (CCB) uses a credit evaluation system that examines a small enterprise's financial risk, account behavior, operating environment, operating status, development potential, number of personnel, credit standing, and other indicators. e rating objects of Standard & Poor's, Moody's, and Fitch's mainly include bond rating, national sovereign rating, and listed company rating. erefore, the credit evaluation index system they adopt is applicable to large and medium-sized enterprises, but not appropriate for small enterprises in China, which have imperfect financial information.
In addition, many papers have been written on the evaluation of small business credit. ey mainly used parametric methods to evaluate credit for small enterprises.  [8]. Karminsky and Khromova (2016) constructed a representative variable scale with a potential impact on scoring, based on a Bankscope database containing financial information on international banks from 1996 to 2011. e ordered probit model leads to the conclusion that macrovariables improve the explanatory power of the model [9]. Lv et al. (2017) used a model combining single-factor analysis and logistic regression not only select the indicators that have a significant influence on the personal credit but also calculate the influence degree of specific indicators on the borrower's own credit status, that is, the weight of each index [10]. Dyatchkova et al. (2018) used the model method to study the relationship between the credit rating of BRICS industrial enterprises and financial indicators and evaluated the Tobit regression model of the selected group to establish some credit rating models for BRICS industrial companies [11]. Louzada et al. (2018) proposed a survival credit risk model that jointly adapts to three default times in bank loan portfolios and adopted the maximum likelihood estimation program for parametric estimation and Monte Carlo simulation to evaluate its limited sample performance [12]. Dai et al. (2018) proposed a personal credit assessment method based on partial least squares for the current credit problems of commercial banks and tested German credit data. e study showed that this method was simple, feasible, and effective [13]. Zhang and Chi (2018) introduced multiobjective planning to establish a credit rating model and conducted an empirical analysis based on data for 6,155 enterprises. e results showed that this method could ensure a balance between the two criteria, avoid excessive concentration of debtors in a specific rating, and contribute to the establishment of a reasonable credit rating system [14].
Later, considering the limitations of parametric methods, they gradually advanced to nonparametric and artificial intelligence methods. Traczynski (2017) used the Bayesian model average method to screen the default prediction indicators and established a default prediction indicator system, such as the ratio of total liabilities to total assets and the fluctuation of market returns. e advantage of this screening method is that the cross-model aggregation of information or impact on specific industries has a greater effect than a single model [15]. Bou-Hamad (2017) proposed a comprehensive framework based on the random forest (RF) method and the Bayesian model average (BMA) to study the importance of ordinal variables in credit risk assessment and default prediction [16]. Sun et al. (2018) proposed a new decision tree (DT) ensemble model for unbalanced enterprise credit evaluation based on integrated minority oversampling technology (hit) and differential sampling rate (DSR) bag-loaded integrated learning algorithm, which is called DT-SBD. e results showed that the DTE-SBD model is significantly better than the five models of pure DT and oversampling DT and has a positive effect on imbalances in enterprise credit evaluation [17]. Shi et al. (2018) found a new attribute reduction method and applied it in the evaluation of small business financing ability [18]. Du (2018) used the genetic backpropagation (BP) algorithm to optimize the connection weight and threshold of the neural network, which solved problems such as the slow convergence speed of the BP neural network. e research showed that the genetic neural network method could be used in an enterprise credit rating [19]. Hsu et al. (2018) proposed a new classification model based on the biological heuristic computing mechanism by combining the artificial swarm (ABC) method with support vector machine (SVM) technology based on an actual ten-year dataset extracted from the Compustat credit rating database, so as to improve the credit rating and credit rating change forecast [20]. Bai et al. (2019) proposed a new bank credit value assessment method combining fuzzy rough set and fuzzy c-means clustering, using rule-based method results to predict farmers' reputations. e results show that education and skills are key factors in improving farmers' reputations [21]. Of course, there are concerns about the impact of default losses on credit ratings; Shi et al. (2019) optimized the credit rating and provided guidance for the mismatch between credit ratings and loss given default (LGD) in the existing credit rating literature [22]. ey divided the credit rating of enterprises mainly by the default loss rate but did not apply the default loss rate to the construction of the enterprise credit evaluation index system. e existing literature focuses on the parametric screening of indicators, few studies have been conducted on the screening of credit evaluation indicators using nonparametric methods, and even fewer studies on the screening of credit evaluation indicators under parametric and nonparametric comparative analysis. However, in other disciplines, nonparametric methods have achieved remarkable results: Abad and Briec (2019) analyzed the concept of boron generation technology, introduced the b-one-time hypothesis, and gave an example of the new b-processing hypothesis for convex and nonconvex nonparametric technology [23].
In the existing research on discrimination in a credit evaluation index, it is a disadvantage to discriminate only whether a customer defaults [14][15][16]21]. On the one hand, the discrimination in a credit index does not reflect changes in default loss only in terms of a change in default status but in different types of default loss, such as partial default and total default and distinguishing between partial default and total default reflects the credit status of small enterprises more accurately. We consider the impact of different default losses on credit evaluation index screening, which is also the main contribution of this study. On the other hand, in the case of massive indicators, many indicators are selected by existing researches, some of which cannot distinguish the default status of small enterprises significantly, which does not fully reflect the credit status of small enterprises. In order to solve this problem, we conduct the first-stage indicator screening based on the default state, and then, conduct the second-stage indicator screening based on the default loss rate, which is the second contribution of this study. Moreover, most of the existing studies are based on parametric methods [5][6][7][8][9][10][11][12][13][14]; however, most of the evaluation indicators do not follow a normal distribution, and their distribution is unknown.
erefore, nonparametric methods are adopted in this study to screen the indicators, which is also the third contribution of this study. Based on the above, we construct two complete credit evaluation index systems for small enterprises based on a two-stage nonparametric Bayesian discriminant model and a two-stage parametric Bayesian discriminant model and determine how to construct an index system with greater ability to identify the default loss rate by comparing changes in discriminant accuracy after screening the index. e article is organized as follows. Section 2 discusses the principles on which our models are constructed and explains the methodology used in the construction of the models, Section 3 details how the sample data are screened for use in the models, Section 4 compares our model to the traditional model, and Section 5 offers our conclusions.

Two-Stage Bayesian Discriminant
Principle. By constructing a two-classified Bayesian discriminant model for default customers and nondefault customers, high default customers, and low default customers, we can determine the influence of the index on discriminant accuracy. en, the identification ability of all samples of nonparametric Bayesian and parametric Bayesian is tested by comparing the accuracy of the discriminant of the default loss rate of the enterprise. e construction and comparison principles of the credit evaluation index system for small enterprises are illustrated in Figure 1.

Two-Stage Nonparametric Bayesian Discriminant
Method. (1) One-Stage Screening between Default and Nondefault. P is the posterior probability of the sample from i; G 1 is the number of default enterprises; G 2 is the number of nondefault enterprises; x is the sample to be determined; p i is the prior probability of the sample from i; f i (x)-is kernel density function of i; Bayesian discriminant function is as follows [24]:

Mathematical Problems in Engineering
in which the posterior probability of the sample from i equals the ratio of the product of the prior probability of the sample from i and the kernel density function of i and the product of the prior probability of the sample from each population and the kernel density function of each population. n i is the number of samples in I; then, the prior probability of the sample from i is as follows [24]: in which the prior probability of the sample from i equals the ratio of the sample number of i to the total sample number. h n is window width; K(x) is the kernel density function of the population; X ij is the sample j in i, and then, the kernel density function of i is as follows [25]: Using this equation, with the data of the known sample and the selected kernel function and the bandwidth, we can estimate the distribution density function of the population, and we can use a cross-validation method to get a reasonable window width from the existing data without making any assumptions about the estimated density function.
By substituting the results calculated by equations (2)-(3) into (1), we can obtain the posterior probability of the samples from different populations, and the rule for judging which population the samples are from is as follows [26]: in which if P(G 1 |x) > P(G 2 |x), then the probability of the sample from G 1 is greater than that from G 2 , and then, the sample to be determined is part of the default sample. If P(G 1 |x) < P(G 2 |x), the probability that the sample comes from G 1 is less than the probability that the sample is part of G 2 , so the sample to be determined is a nondefault sample. D is the number of default samples determined by nonparametric Bayesian discriminants; n 1 is the number of actual default samples. U is the number of nondefault samples determined by nonparametric Bayesian discriminant; n 2 is the number of actual nondefault sample. M is the discriminant accuracy of all samples. e equation is as follows [27]: in which the discriminant accuracy of all samples equals the arithmetic average of discriminant accuracy of default and nondefault samples. e greater the discriminant accuracy of all samples, the better the discriminant effect of the indicator system is.
Step 1. e normalized data on n indicators of all samples are substituted into equations (1)-(5) so that the discriminant accuracy M 0 of n indicators can be obtained.
e first n indicator is deleted, and the remaining n − 1 indicators are substituted into equations (1)-(5) so that the discriminant accuracy M 1 of n − 1 indicators can be obtained.
Step 3. All the indicators are deleted one by one to obtain the discriminant accuracy M i of n − 1 indicators.  Step 4. C i is the degree of influence of the i th index on the discriminant accuracy of the default state, and the equation is as follows: in which the degree of influence of the i th index on the discriminant accuracy is the difference between the discriminant accuracy after deleting the i th index and the discriminant accuracy of all indicators, which reflects the importance of the i th index to discriminant accuracy in the indicator system.
Step 5. C i of all the indicators is calculated, and the retention or deletion of indicators is determined based on the relationship between C i and 0. If the discriminant accuracy of the i th index is greater than that of all indicators after its deletion, then the discriminant accuracy of the index system is improved after its deletion, so it should be deleted. If the discriminant accuracy after deleting the i th index equals the discriminant accuracy of all indicators, then the deletion of this index has no influence on the discriminant accuracy of the index system, so this index should be deleted. If the discriminant accuracy after deleting the i th index is less than that of all indicators, then the discriminant accuracy of the index system is lower after this index is deleted, so this index should be retained.
Step 6. Y i is the proportion of the i th index to the degree of discriminant accuracy; |C i | is the absolute value of the i th index on the degree of accuracy of the discriminant, k is the number of indicators in which C i is less than 0, and Y (k) is the cumulative proportion of the first k indicators of the degree of discriminant accuracy. e equations are as follows: where we select the indicators according to the criterion that the cumulative proportion of the influence degree on the discrimination accuracy is greater than 95%.
(2) Two-Stage Screening between High Default and Low Default.
e nonparametric clustering of the indicators retained after the first-stage screening is carried out in the same criterion layer, and 71 default samples are divided into subsamples for a high default loss rate and a low default loss rate.
In nonparametric clustering, the class is defined by the mode of the probability density function. x t is the i th index; n is the number of indicators; n t is the number of indicators in the neighborhood of x t ; v t is the volume of the nearest neighbor of x t , in which the sphere that has the center of index x t is called the neighborhood of x t , and the index in the x t neighborhood is called the adjacent index of x t . e equation is as follows [28]: in which the estimated value of probability density is the number of indicators contained in the sphere centered at this point divided by the product of the total number of indicators and the volume of the sphere.

Two-Stage Parametric Bayesian Discriminant Method.
(1) One-Stage Screening for Default and Nondefault. P is the posterior probability of the sample from the i th population; G 1 is the population of default enterprises; G 2 is the population of nondefault enterprises; x is the sample to be determined; p i is the prior probability of the sample from the i th population; f i (x) is the density function of the i th population; the calculation of Bayesian discriminant function is exactly the same as formula (1). And the calculation of the prior probability is the same as above.
Each population that follows the normal distribution of the mean is μ i , and the covariance matrix is Σ i ; the density function of the i th population is as follows [29]: e determination of the Bayesian discriminant rule, the measurement of discriminant accuracy, and the specific steps in Bayesian discriminant index screening are the same as the nonparametric method mentioned earlier.
(2) Two-Stage Screening between High Default and Low Default. Sample ordered clustering is carried out within the same criterion layer for the indicators retained after the first stage screening, and 71 default samples are divided into subsamples with a high default loss rate and a low default loss rate.
Sample ordered clustering is based on the sum of the squares of deviations. S is the sum of the squares of the deviation of the category k. m i is the number of indicators in the i th category, X i j is standardized data of the j th index in the i th category, X i is the mean of indicators in the i th category, and the equation for S is as follows [30]: In equation (10), the best combination is determined by calculating the sum of the squares of the total deviations, that is, we should calculate the minimum value of equation (10). e determination of Bayesian discriminant rules, the measurement of discriminant accuracy, and the specific steps in Bayesian discriminant index screening are the same as the screening methods for default and nondefault described earlier. e results of Bayesian discrimination and clustering can be obtained by SAS V9 software.

ROC Curve for Validity Test of the Index System.
e purpose is to test the validity of parametric Bayesian discrimination and nonparametric Bayesian discrimination for index screening through the area under the curve (AUC) Mathematical Problems in Engineering value of the receiver operating characteristic (ROC) curve. Because of the complexity of the calculation process, it can be obtained through SPSS 22 software. e samples correctly judged as high default (y j � 1) are recorded as TP (true positive); the high default samples misjudged as low default are denoted as FN (false negative); the samples correctly judged as low default (y j � 0) are denoted as TN (true negative); the low default samples misjudged as high default are denoted as FP (false positive). Two indicators are required for ROC curve mapping: sensitivity and specificity, respectively, as follows:

Samples and Data Sources.
is empirical sample in the paper consists of data on loans for 3,111 small enterprises from the database at a Chinese commercial bank, of which 71 are in default and 3,040 are not. Based on the Small Business Credit Evaluation System constructed by foreign financial institutions, such as S & P and Moody's, as well as Chinese financial institutions such as the ICBC and CCB, we select a total of 107 debt rating indicators divided into two primary criterion layers (repayment ability and repayment willingness) and seven secondary criterion layers (internal financial factors and nonfinancial factors). e internal financial factors comprise four criteria-solvency, profitability, operational capacity, and growth capacity-each of which has three levels. Because of the unavailability of some data, 26 indicators, such as economic environment and national policies, are excluded from the initial 107, leaving a total of 81 indicators, as shown in column 1 of rows 1 to 81 in Table 1.

Standardization of Index Data.
We obtain standardized scores for each indicator according to standardized scoring methods for different indicators [31], and then, insert standardized data into the relevant rows in Table 1.

One-Stage Screening Method between Default and
Nondefault. Taking the "solvency" of the three-level criterion layer as an example, this paper illustrates the specific process of nonparametric Bayesian discriminant screening to distinguish the indicators of enterprise default status. e 20 indicators of "solvency" in the three-level criterion layer are in column 1 of Table 2.
Using the nonparametric Bayesian discriminant method, we can obtain the default sample discrimination accuracy M a0 , nondefault sample discrimination accuracy M b0 , and all sample discrimination accuracy M 0 of 20 indicators. Finally, we calculate the influence degree of each index on the discrimination accuracy C i and select the index according to C i . e screening results are shown in Table 2.
We repeat this screening process for each criterion layer and thereby obtain the screening results of all indicators. e first stage of nonparametric Bayesian discrimination screens 81 indicators, leading 36 indicators to be retained and 45 indicators to be deleted.

Two-Stage Screening Method for High Default and Low
Default. For the 36 indicators retained after the first-stage screening, the nonparametric clustering is carried out in the same criterion layer, 71 default samples are divided into subsamples for a high default loss rate and a low default loss rate, and the second-stage screening is carried out through nonparametric Bayesian discrimination, and the index that can distinguish between a high and low default loss rate is selected. Nonparametric clustering results show that 50 of the default samples have a high default loss rate and 21 have a low default loss rate. Because the calculation process is complex, we use SAS software to perform it.
In the criterion layer for solvency, eight indicators in the three-level criterion layer are retained and two are deleted.
e screening results are shown in Table 3. Finally, in the second stage of nonparametric Bayesian discrimination, 36 indicators are screened, leading to the deletion of 12 indicators and retention of 24.

Constructing a Two-Stage Parametric Bayesian
Discriminant Model

One-Stage Screening Method for Default and
Nondefault. Similarly, solvency in the three-level criterion layer is used as an example, using the parametric Bayesian discriminant method, 8 of the 20 indicators of "solvency" in the three-level criterion layer retained, and 12 are deleted. e screening results are shown in Table 4. Finally, in the first stage of parametric Bayesian discrimination, 81 indicators are screened, of which 50 are deleted and 31 are retained.

Two-Stage Method for Screening High Default and Low
Default. Sample ordered clustering is carried out within the same criterion layer for the indicators retained after the firststage screening, and 71 default samples are divided into subsamples for a high default loss rate and a low default loss rate. e results show that 50 of the default samples have a high default loss rate, and the remaining 21 have a low default loss rate. e solvency criterion layer has 20 indicators, of which 8 remain after the first-stage screening. e screening results are shown in Table 5. Finally, in the second stage of parametric Bayesian discrimination, 31 indicators are screened, of which 17 are deleted and 14 are retained. No.
(1) Indicator name Total outstanding loans as a percentage of total assets − 0.01% 0.01% 0.04% 100.00% Table 3: Screening results of the second stage of nonparametric Bayesian discrimination. No.
(1) Indicator name  Table 4: Screening results of the first stage of parametric Bayesian discrimination. No.
(1) Indicator name On the basis of one stage, the nonparametric Bayesian model is used to discriminate the remaining 36 indicators, and the indicators that can significantly distinguish a high default loss rate from a low default loss rate are selected, and a two-stage credit evaluation index system is constructed. e results are shown in Table 6.

Two-Stage Parametric Bayesian Discriminant Evaluation Index
System. By discriminating 81 indicators with a parametric Bayesian model, we identified the indicators that can distinguish between default and nondefault and construct a one-stage evaluation index system of default and nondefault, and this index system mainly includes 31 indicators such as quick ratio and product sales scope.
On the basis of one stage, the parametric Bayesian model is used to discriminate the remaining 31 indicators, and we identify the indicators that can significantly distinguish a high default loss rate and a low default loss rate and construct a two-stage credit evaluation index system. e results are shown in Table 7.

Testing Validity of the Index System.
By selecting different critical values, multiple groups with different levels of confidentiality and specificity can be obtained, as shown in Figure 2. e ROC curve can be obtained by SPSS software.
In Figure 2, the PER-1 curve represents the identification results of the 14 indicators screened with parametric Bayesian discrimination between a high and low default loss rate, and the PER-2 curve represents the identification results of the 24 indicators screened with nonparametric Bayesian discrimination between a high and low default loss rate. AUC is 0.866 under the PER-1 curve and 0.902 under the PER-2 curve. e AUC area determined with nonparametric Bayesian discrimination is greater than that determined by parametric Bayesian discrimination. erefore, the nonparametric Bayesian discrimination has a good effect on distinguishing a high and a low default loss rate, and the selected index system has a strong ability to distinguish high and low default loss rate.

Stability Test of the Index System.
We randomly select 80% of the original data as the training set and 20% of the original data as the test set for three simple cross-validations; the verification results are shown in Table 8.
Since the training set and the test set are randomly selected, the number of samples with high and low default loss rates in the two-stage index screening is not exactly the same, and the final screened index system is not completely the same. However, the mean value of the three simple cross-validation shows that the discriminant accuracy of all samples of the nonparametric Bayesian discrimination model is 97.22%, that of all samples of the parametric Bayesian discrimination model is 56.90%, and that of the logistic regression model is 74.56%, which show that the two-stage credit evaluation index screening model is stable.

Comparing the Accuracy of Two-Stage Evaluation Index
System. Table 9 shows that, after the nonparametric Bayesian discriminant screening, the discriminant accuracy of the index system, composed of 24 indicators, for all samples is 77.95%, among which the discriminant accuracy of samples with a high loss rate is 94.00% and that of samples with a low default loss rate is 61.90%. After parametric Bayesian discriminant screening, the discriminant accuracy of the index system composed of 14 indicators for all samples is 70.95%, including 80.00% for samples with a high default loss rate and 61.90% for samples with a low default loss rate. In this paper, we compare the two classifications logistic regression model with the Bayesian discriminant model. After logistic regression screening, the discriminant accuracy of the index system composed of 9 indicators for all samples is 71.57%, in which the discriminant accuracy for samples with a high default loss rate is 86.00%, and the discriminant accuracy for samples with a low default loss rate is 57.14%. We confirm that the nonparametric Bayesian method can improve the accuracy of all samples and is better at judging the default loss rate of small enterprises. Among the three models, the overall accuracy of the parameter method is the lowest, since the parameter method assumes that the index data obey the normal distribution, but most of the data do not obey the normal distribution; in reality, this leads to the Table 5: Screening results of the second stage of parametric Bayesian discrimination. No.
(1) Indicator name low discrimination accuracy of the constructed credit evaluation index system, which may cause higher misjudgment losses. erefore, the index system constructed by using nonparametric method can not only enable Banks and other financial institutions to correctly assess the credit status of small enterprises, solve the problem of financing difficulties for small enterprises but also reduce the potential losses of banks and other financial institutions. Recovery rate of all assets in cash 5 Ratio of net assets to the loan balance at the end of year 6 Cash ratio 7 Total outstanding loans as a percentage of total assets 8 Total liabilities net cash flow ratio of operating activities 9 Operating profit ratio 10 Gross profit rate 11 Cost-profit ratio 12 Net cash flow from operating activities 13 Working capital allocation ratio 14 Rate of return on investment 15 Growth rate of operating income 16 Growth rate of total assets 17 Internal nonfinancial factors Working experience in related industry 18 Date of establishment 19 External environment Industry sentiment index 20 Debtor basic information Legal representative loan default record 21 Age 22 Time in that position 23 Business reputation Legal disputes in enterprises 24 Collateral guarantee factor Collateral score Velocity of current assets 5 Rate of return on investment 6 Growth rate of total assets 7 Internal nonfinancial factors Working experience in related industry 8 Product sales scope 9 Proportion of the total amount of loans collected by the enterprise through bank 10 External environment Balance of per capita savings of urban and rural residents at the end of year 11 Debtor basic information Marital status 12 Basic credit condition Corporate credit granting in the past three years 13 Business reputation Legal disputes in enterprises 14 Collateral guarantee factor Collateral score

Conclusions
Using a sample of small businesses, we construct two credit evaluation indicators and determine which methods are more effective in evaluating their creditworthiness. In the first stage, using Bayesian discrimination, enterprises can be divided between those in default and nondefault; then, using the clustering method, default customers are divided between those with a high default loss rate and a low default loss rate so as to build a stronger sensitivity index. Finally, we construct an index system composed of 24 indicators using the nonparametric Bayesian discriminant model and an index system composed of 14 indicators using the parametric Bayesian discriminant model. We confirm the effectiveness of both models with an ROC curve, showing that a more sensitive indicator system can be built.
A comparative analysis of the discriminant accuracy of the three models shows that the two-stage nonparametric model is optimal, the two-stage logistic regression model is suboptimal, and the two-stage parametric model is poor. So the index built using the nonparametric Bayesian discrimination model is the best and has strong default discrimination ability, which can be applied in practice. Due to the limitation of research ability, this study still has some limitations. e credit evaluation index system constructed in this paper is based on isolated time points, which not only ignores the potential change trend of samples but also leads to the inaccurate index system due to the abrupt change of some sample data. It is one of the future research directions to construct the credit evaluation index system by comprehensively considering the credit status of each period.
Data Availability e empirical sample in this paper consists of data on loans for 3,111 small enterprises from the database of a Chinese commercial bank.

Conflicts of Interest
e authors declare that they have no conflicts of interest.