Prediction of Unbalanced Financial Risk Based on GRA-TOPSIS and SMOTE-CNN

­e nancial status of an enterprise is related to its healthy and long-term development, and whether the interests of investors and bank loans can be guaranteed. To improve the prediction accuracy of corporate nancial risk, this paper proposes a prediction model for corporate nancial risk that integrates GRA-TOPSIS and SMOTE-CNN. First, using GRA-TOPSIS to make a comprehensive evaluation of the nancial situation of listed companies. Second, the evaluation results are clustered to obtain the scientic level and interval of nancial risk, which lays the foundation for the supervised learning of the convolutional neural network.­en, the SMOTE algorithm is introduced to solve the problem of data imbalance of enterprises at all levels, and the focal loss function is used instead of the cross-entropy loss function to further balance the data. Finally, the listed companies in A shares are randomly selected, and experiments were designed to verify the performance of the model built in this paper.­e results show that the prediction accuracy of the nancial risk prediction model based on GRA-TOPSIS and SMOTE-CNN is 98.57%, which indicates that the model is feasible and has certain reference value.


Introduction
With the deepening of economic reform, China's economy has developed rapidly and become the second largest economy in the world. e resulting social competition for enterprises is becoming more and more intense [1,2]. Finance also faces many risks and challenges [3]. Listed companies are the representatives of enterprises and the backbone of the national economy. erefore, to maintain the sustainable and high-quality development of country's economy, it is necessary to ensure the healthy and long-term growth of listed companies. Since corporate nancial status is a direct manifestation of achievements of enterprise development and is the focus of all the stakeholders of enterprise including operators, corporate creditors, and investors [4], it is particularly important to accurately evaluate and monitor it.
Financial risk prediction is an early warning mechanism and real-time monitoring method established to prevent enterprises from making mistakes and facing risks [5]. e research started in the 1930s. First Fitzpatrick proposed the univariate nancial evaluation model [6], which was simple to operate and has a single indicator, but it was not accurate enough, and then Altman proposed and improved the Z-score model to predict nancial risk, but the long-term evaluation ability was weak [7]. Odom and Shardal introduced a neural network for corporate bankruptcy prediction, and the veri cation found that its accuracy was better than other existing models [8], but technical requirements were high at the time. Later, Shaverdi et al. used fuzzy AHP to determine the index weight and then use fuzzy TOPSIS to determine the nancial level of petrochemical enterprises [9]. Deng et al. built a dynamic rating system based on DEA and analytic hierarchy process to determine the nancial status of Chinese nuclear power enterprise [10]. Chen constructed a nancial performance evaluation system from four aspects: pro tability, operating ability, debt payment ability, and development ability and measured the performance level through fuzzy comprehensive evaluation [11].
With the development of enterprises and the increase of nancial data of enterprises, traditional statistical methods can no longer accurately predict nancial status. Some scholars began to use machine learning methods for financial risk prediction and the most widely used are BP neural network [12][13][14], SVM [15,16], and decision tree [17]. As such, Zhou et al. measured and warned the risks of real estate companies through the implementation of the PSO-SVM model [18]. Feng et al. constructed a corporate finance risk warning model-based BP neural network to predict financial crises and proved the accuracy is at least 2% higher than traditional method [19]. Liao and Liu applied the decision tree method to enterprise financial risk early warning and provided a reference for risk control decisionmaking [20].
With the wide application of deep learning, scholars have begun to introduce it into the research of the financial field. Chen used convolutional neural network to make financial quantitative investment and obtained investment strategy with higher accuracy and reduced investment risk [21]. Abudureheman et al. built a performance evaluation of enterprise innovation capability based on fuzzy system model and convolutional neural network, which was significant to promote enterprise development [22]. Yin et al. built a convolutional neural network model to supply chain financial risk early warning, but the enterprises were divided into two categories only according to whether they are "ST," and the sample is small [23]. Besides, more and more companies are introducing deep learning into their financial management.
In addition, in the construction of the financial risk evaluation index system, research shows that the quality of financial reports has a certain impact on investment efficiency [24]. Tangible resources and operational performance can promote financial performance [25], and different industries have some different detailed factors that can have a certain impact on financial performance. For example, lean manufacturing has a certain role in promoting the performance of the pharmaceutical industry [26]. erefore, on the basis of previous research, this paper selects the highly professional NetEase financial website to crawl financial report data. rough correlation analysis, 28 secondary indicators belonging to 5 primary indicators are selected from 69 financial indicators for financial risk prediction and early warning of A-share companies. Regardless of the specific industry, it can better reflect the overall development level of Chinese listed companies.
From the current research, it can be seen that there are still some problems in the research on corporate financial risk early warning: (1) Most early warning research only realizes the measurement and rating of corporate financial situation, and lacks intelligent prediction. When the sample size is large, it is difficult to visually see the financial risk status of the enterprise. (2) When measuring and scoring, only a single model is used to evaluate the financial situation, and the evaluation results are affected due to the different emphasis of information. At the same time, some shallow networks of machine learning are prone to overfitting, which affects the prediction accuracy. (3) When classifying enterprise risk levels, dividing them into two categories only depends on whether the enterprise is processed by "ST," resulting in an extremely unbalanced sample size.
Considering the above problems, this paper proposes a corporate financial risk early warning model based on GRA-TOPSIS and SMOTE-CNN. e GRA-TOPSIS fusion model is used to score the financial situation of the enterprise, and according to the score results, K-means is used for clustering to get the risk level label, and the CNN model is trained to realize intelligent prediction. e integration of supervised learning and unsupervised learning makes up for the lack of intelligent prediction in previous research and the difficulty in obtaining enterprise financial level labels. At the same time, considering that the number of enterprises with financial health and heavy warning is smaller, and the number of enterprises with general finance is larger, it can be regarded as a multiclassification problem of unbalanced data. erefore, this paper uses the SMOTE algorithm to oversample a small number of samples and uses the focal loss function replaces the traditional multiclass cross-entropy loss function, which further balances the data by assigning different weights to the unbalanced data. e other parts of the paper are arranged as follows: the second chapter is data and methodology, mainly introduces the source of experimental data and the principles of the methods used in the experiment. e third chapter is results and analysis, mainly through empirical analysis to get the experimental results, according to the experimental results analyze which indicators should be paid more attention in corporate financial risk, and verify the effectiveness and progress of the model proposed in the article through comparison. e fourth chapter is conclusions and prospects, point out the conclusions of this research and the aspects of further research that can be carried out in the future.

Data Collection and Index System Construction
is experiment uses the "read_html" function of the pandas module in python to quickly and accurately capture three-year financial data of 4,727 A-share listed companies from the financial website. ese companies belong to the industries of information technology, finance, manufacturing, communication, and education. Among them, there are 146 ST enterprises. A total of 14,181 samples were obtained as research objects, and samples with abnormal data and missing data are directly eliminated, leaving 13,190 samples in the end, including 401 "ST" samples. After SMOTE oversampling, the sample size is 23770. e article divides the training, validation, and test according to the ratio of 6 : 2 : 2, and the sample sizes are 14262, 4754, and 4754, respectively.

Construction of Index System.
is paper establishes a sound evaluation index system for prediction of corporate financial risk. First of all, 69 financial indicators are used as evaluation indicators, combined with the correlation coefficient method, and using the IBM SPSS Statistics 22 software for correlation analysis. Finally, 28 evaluation indicators in this study are selected from the five aspects of main financial indicators, solvency, growth, profitability, and operating, as shown in Table 1. e GRA model mainly judges the correlation of the sequence based on the similarity of the curve change trend between the corresponding points of the sequence. While the TOPSIS model is an approach to the ideal solution, which is sorted by the distance between the evaluation target and the best and worst sequences. e combined algorithm steps are as follows: (1) Determine the multi-attribute evaluation matrix.
Assuming that n factors are influencing corporate financial, and there are m companies, the evaluation matrix is Standardize the index using the maximum and minimum normalization methods: Positive index: Negative index: (2) e entropy method determines the index weight, the weight of the j-th index is w j , and the weight matrix is W � w 1 , w 2 , · · · , w n .
(3) e TOPSIS method determines the Euclidean distance between the evaluation object and the positive and negative ideal solutions. Calculate the evaluation matrix: A � w i * (a ij ) m×n e positive and negative ideal solution of a ij is d + i , d − i represent the euclidean distance between a ij and its positive and negative ideal solutions: en the closeness between the ith enterprise and the ideal enterprise is (4) e GRA method determines the degree of relevance. e correlation coefficient matrix between each comparison sequence and the best reference sequence and the worst reference sequence is as follows: e formula for calculating the gray correlation degree is as follows: (5) Integrated Euclidean distance and gray correlation degree for comprehensive evaluation. Build a more reasonable comprehensive evaluation model through weighted processing, and the calculation formula is Finally, calculate the comprehensive evaluation score of each enterprise, the calculation formula:

Clustering Algorithm.
After the comprehensive evaluation, the classification of financial risk is achieved by clustering the unlabeled data. Clustering is a typical unsupervised learning algorithm, which refers to dividing samples into multiple clusters according to a certain standard. Commonly used clustering methods are K-means clustering, hierarchical clustering, SOM clustering, and FCM clustering. Because K-means clustering has the characteristics of high efficiency, high accuracy, and strong interpretability, this paper chooses K-means clustering algorithm to generate corporate financial risk grade labels. e algorithm steps are as follows: Scientific Programming (1) Initialize the cluster centers: according to experience, select K from the sample set as the initial clustering centers, and determine the maximum number of iterations. (2) e Euclidean distance between each sample and the clustering center is calculated, and the samples to be clustered are judged to belong to the same class according to the distance, with the following formula: (3) Calculate the average of each category of samples as the new cluster center, the formula is as follows: (4) Repeat Steps 2 and 3 until the clustered data points do not change or the number of iterations is reached.

SMOTE Algorithm.
Among the five risk classes obtained by clustering, the enterprises with excellent and poor finance are few, so the SMOTE algorithm is used to balance the samples. SMOTE is an improved scheme based on the random oversampling algorithm [27], and its basic idea is to analyze the minority class samples and add new samples to the dataset by artificially synthesizing them according to the minority class samples, which can effectively avoid the overfitting problem. e algorithm steps are as follows: (1) Calculate the Euclidean distance from each minority sample x to the minority sample set to obtain k nearest neighbors. (2) Determine the oversampling magnification N, randomly select a sample x i from k nearest neighbors, take a random number between 0 and 1, and synthesize a new sample for each sample x according to formula (14).
(3) Repeat Step 2 to get a new data set.

Convolutional Neural Network.
After generating labels by cluster analysis, convolutional neural networks are used to achieve corporate financial risk classification prediction. Convolutional neural network is a feed-forward neural network, one of the representative algorithms of deep learning, whose most important feature is the sharing of weights, which can greatly improve the time required for learning and reduce the amount of data needed to train the model by reducing the parameters [28]. e traditional convolutional neural network is generally composed of convolutional layer, pooling layer, and fully connected layer.
e input data are extracted through the convolutional layer, and the output result is passed to the pooling layer to further extract and filter information, and finally through the fully connected layer output, the structure is shown in Figure 1. Convolutional neural networks are generally used in computer vision, natural language processing, and other fields. In the past two years, studies have introduced them into corporate bankruptcy prediction [29], indicating that it is feasible for convolutional neural networks to process financial statement information.
e data enters the convolution layer through the input layer, and the convolution operation formula is as follows: Among them, X i is the output of the ith layer of convolution, X i-1 is the output of the previous layer, W i is the weight parameter of the ith layer of convolution, and b i is the offset. e formula of the shape value output by convolution is as follows: where m is the output of the upper layer, n is the number of convolution kernels, u is the number of edge padding, and v is the step size. e convolutional data enter the pooling layer. In this experiment, max-pooling is selected, and the shape of the output data is the same as the calculation formula of the convolutional layer. e last is the fully connected layer, which acts as a classifier in the convolutional neural network.
e commonly used activation functions are Softmax and sigmoid. e activation function used by the fully connected layer in the author's experiment is Softmax: To further reduce the impact of data imbalance on the classification accuracy, the focal loss function is used instead of the cross-entropy loss function during training. e formula is as follows:

Model Evaluation Indicators.
In essence, prediction of financial risk can be regarded as an unbalanced multiclassification problem. For multiclassification problems, microaverage and macroaverage are generally used to evaluate the performance of the model. is experiment uses macro averaging to evaluate model performance. It is to calculate the precision and recall of each category separately and then calculate the average as the macroprecision and the macro recall. e macro_F1 is the harmonic average of the macroprecision and the macro recall. e calculation formula is as follows: e financial risk prediction model based on GRA-TOPSIS and SMOTE-CNN is shown in Figure 2. e model is mainly divided into four parts: data preprocessing, unsupervised learning, supervised learning, and model performance evaluation. e preprocessing module mainly deletes the missing values and outliers of the data crawled on the financial website and then removes the correlation between the indicators through the Person correlation analysis to determine the evaluation index system. e unsupervised learning module focuses on comprehensive evaluation of financial status and clustering based on the evaluation results to generate class labels. e supervised learning module mainly uses convolutional neural networks to classify and predict financial risk levels based on the samples processed by SMOTE oversampling and the labels generated by clustering. e model performance evaluation module is mainly to observe the accuracy, macroprecision, macrorecall, macro_F1, and other indicators by comparing with other models to prove the advancement and effectiveness of the model proposed in this paper.

GRA-TOPSIS Comprehensive Evaluation and Regression
Analysis. Calculate the Euclidean distance d + i and d − i between the evaluation object and its positive and negative ideal solutions by formulas (5)∼ (8). Calculate the correlation degree r + i , r − i between each comparison sequence and the best and worst reference sequence by formulas (11)∼ (13). en, the weighted closeness S + i and S − i of the coupled TOPSIS and GRA are calculated by formulas (14)∼(16), and the comprehensive evaluation score C i of the coupled model is further calculated. e six companies with the highest scores and the six companies with the lowest scores in the single evaluation model and the GRA-TOPSIS coupling model are shown in Table 2. e evaluation results of the coupling model and standardized indicator data are brought into the multiple linear regression model to further analyze the relationship between the indicators and enterprise financial risk. e coefficients of the indicators calculated by IBM SPSS Statistics 22 software are shown in Table 3.
It can be seen from Table 2 that among the top 6 companies, the coupled model is exactly the same as that of single TOPSIS, and it is the same as that of single GRA with three companies, and in the last six companies, the coupled model and single TOPSIS and single GRA are exactly the same. Calculate the range and variable coefficient of the comprehensive evaluation of corporate financial risk under single GRA, single TOPSIS, and GRA-TOPSIS models. e range is 0.1134, 0.0002, and 0.1233, and the variable coefficient is 0.0081, 0.0113, and 0.0111. Larger range and variable coefficient indicate a higher level of dispersion and  discrimination of the composite evaluation scores, so the coupled model is better than a single composite evaluation model for distinguishing the financial risk of each firm. From the regression coefficients in Table 3, we can see that in the evaluation of enterprise risk, the asset-liability ratio X 9 , the equity ratio X 12 , and the proportion of three expenses X 21 have a negative effect on corporate performance, and other indicators have a positive effect, which is consistent with the indicator system proposed in this article. And the coefficients of the indicators do not differ much, among which the more important ones are main operating profit margin X 17 , cost profit margin X 18 , current ratio X 6 , ratio of shareholders' equity to fixed assets X 10 , operating profit margin X 19 , and return on operating cash flow of assets X 27 , with regression coefficients of 0.027, 0.027, 0.026, 0.026, and 0.026, respectively. e less significant effects are cash flow ratio X 28 , total assets margin X 16 , total assets turnover rate X 25 , cash ratio X 7 , and proportion of fixed assets X 13 with regression coefficients of 0.021, 0.022, 0.023, 0.023, and 0.023, respectively.

Unlabeled Data Clustering Based on K-Means Clustering.
e authors chose to cluster them into five levels, "AAA," "AA," "A," "B," "C," which represent financial health, financial good, financial general, financial light warning, and financial heavy warning. Substitutes the data into formulas (17)∼(18) for clustering. For visual display, randomly select 800 samples to draw the clustering results, as shown in Figure 3, and the specific clustering results are shown in Table 4.
As can be seen intuitively in Figure 3, K-means clusters the samples into five levels, with five colors in the figure indicating five levels, respectively. where the first level "AAA" has a higher score in the composite measure of the GRA-TOPSIS model, the financial situation is better but the least number. e three categories in the middle are more concentrated and more numerous, that is, the number of companies with intermediate financial status in the composite measure is higher. Level "C" measure scores the lowest and has a low number. e sample companies are clustered into five levels, and companies are no longer simply divided into two categories according to whether they are "ST," which makes the prediction results more accurate, and can achieve the purpose of early warning.
From the clustering results in Table 4, it can be seen that the average score difference of each category after clustering according to the GRA-TOPSIS model score is 0.0087, 0.0045, 0.0043, and 0.0056, respectively. at is, the two categories of financial health and heavy warnings are significantly different from the average scores of the middle three categories. It also shows that the samples of healthy and heavy warnings have a large degree of dispersion from the middle three types of samples, and the financial status of most companies is at a medium level. is can also be seen from the final number of clusters in each category, there are 298 samples of financial level "AAA," 3019 samples of level "AA," 4754 samples of level "A," 3877 samples of level "B," and 1242 samples of level "C." To provide targeted reference opinions for various levels of enterprises, it is necessary to understand the indicators that play an important role in different categories, so that  Scientific Programming enterprises can centrally monitor and adjust in time. erefore, the entropy method is further used to analyze the weights of various financial indicators of various levels of enterprises, as shown in Figure 4. e results show that (1) among the enterprises whose finances are healthy, the indicators with greater weight are inventory turnover rate X 24 , accounts receivable turnover rate X 23 , ratio of shareholders' equity to fixed assets X10, revenue from main business X 2 , growth rate of main business revenue X 14 , and the cumulative total weight is 0.6132; (2) among the enterprises with good finance, the more weighted indicators are the ratio of shareholders' equity to fixed assets X 10 , revenue from main business X 2 , accounts receivable turnover rate X 23 , and inventory turnover rate X 24 , the cumulative total weight is 0.7186; (3) among the enterprises with a general finance, the indicators with greater weight are inventory turnover rate X 24 , revenue from main business X 2 , ratio of shareholders' equity to fixed assets X 10 , the cumulative total weight is 0.5524; (4) among enterprises with financial light warning, the indicators with greater weight are accounts receivable turnover rate X 23 , ratio of shareholders' equity to fixed assets X 10 , and revenue from main business X 2 , the cumulative total weight is 0.6573; (5) among the enterprises with financial heavy warning, the indicators with greater weight are inventory turnover rate X 24 , revenue from main business X 2 , ratio of shareholders' equity to fixed assets X 10 , accounts receivable turnover rate X 23 , and the cumulative total weight is 0.5504, which has a greater impact on the finance of the fifth category of enterprises.

1DCNN Classification Prediction.
e goal of this experiment is to classify the financial status of the enterprise to achieve the purpose of intelligent forecasting the risk level. e essence is a multiclassification problem. e commonly used loss function is the categorical_crossentropy. Due to the imbalance of the classification samples, the article chooses focal loss to replace the traditional cross-entropy loss function. Focal loss was proposed by He Mingkai in 2017 to improve the effect of dense target detection [30] and has been often used in the field of target detection and natural language processing in the past two years. e optimizer uses the Adam optimizer with faster convergence speed, and the activation function uses Relu and Softmax, batch_size is set as 64 and max epoch is set as 10000. Since there are only 298 "AAA" and 4757 "A" samples in the  sample, the ratio is close to 1 : 16, which is a typical sample imbalance multiclassification problem. erefore, before the data are brought into the neural network for training, they are first oversampled by the SMOTE algorithm. Besides, the model prevents overfitting by adding an Early Stopping mechanism. e patience parameter is set to 50, that is, when the loss function of the verification set does not decrease significantly during 50 iterations, the training is stopped. In addition, L2 regularization is added to control the complexity of the model. e model hyperparameter setting has a large impact on the model accuracy, and this experiment obtains the optimal parameters of the model by adjusting the parameters to observe the accuracy of the validation set. e more important parameters in the convolutional neural network are the learning rate, the number of hidden layers, the number of convolution kernels, and the size of the convolution kernel. Focal loss has parameters a and r. e experiment adjusts the model parameters through the controlled variable method, that is, keeps other parameters unchanged, adjusts one  Scientific Programming parameter successively, observes the accuracy of the validation during model training, and makes it optimal. e accuracy of the validation set corresponding to different parameters is shown in Figure 5, and the optimal parameters of the final model are shown in Table 5.
During the training process of the model, the loss and accuracy change curves of the training set and the validation set are shown in Figure 6. Use the test set to verify the performance of the model and get the confusion matrix as shown in Figure 7. It can be seen that the recognition accuracy of samples with levels "AAA" and "C" is 100%, the recognition accuracy of samples with levels "AA" and "B" is 98%, the level is "A," that is, the recognition accuracy rate of general financial samples is low, which is 96%. e performance of the test shows that the model proposed in this paper is feasible and effective.

Multimodel Performance Comparison.
To further verify the classification effect of the fusion model proposed in this paper, it is compared with GRA-TOPSIS and SMOTE-CNN (without Focal Loss), GRA-TOPSIS and CNN, Kmeans-   Table 6.
It can be seen from Figure 8 that the model in this paper performs better than other comparison models in the four evaluation indicators of macroprecision, macrorecall, mac-ro_F1, and accuracy. And from Table 6, we can see that the values of each evaluation index of the model in this paper are 0.9830, 0.9830, 0.9830, and 0.9857, which are, respectively, 0.0218, 0.0270, 0.0231, and 0.0255 higher than the model without focal loss, compared with the model that has not been processed by the SMOTE algorithm increase by 0.0262, 0.0567, 0.0429, and 0.0281. In addition, compared with SVM, KNN, decision tree and BPNN commonly used in current research, it can also be seen that the model constructed in this paper performs better in each evaluation index. e reasons for the higher accuracy of the model in this paper are firstly, the comprehensive evaluation using the GRA-TOPSIS fusion model and then the clustering process, which is more reasonable compared with the direct clustering results of the indicators; second, the SMOTE algorithm and the focal loss function are introduced to balance the data considering the corporate financial performance data as an unbalanced sample, which has some influence on the classification accuracy.

Conclusion
To address the two problems that the research of corporate financial risk warning only achieves financial status measurement and rating, intelligent prediction is lacking, and the corporate financial data sample is extremely unbalanced. is paper randomly selects a total of 13190 samples of three years of financial data of 4727 listed   companies in A-share as the research object, uses the GRA-TOPSIS model to make a comprehensive evaluation of the enterprises, and realizes the combination of unsupervised learning and supervised learning through K-means clustering and convolutional neural network and then achieve intelligent prediction of risk level. e clustering results are processed by SMOTE, and the focal loss function is introduced to solve the data imbalance problem of each category and improve the model prediction accuracy. e specific research conclusions and prospects are as follows: (1) is paper constructs a GRA and TOPSIS fusion model to make a comprehensive evaluation of the financial status of enterprises and measures the final closeness in terms of similarity and Euclidean distance, which is more scientific and reasonable than a single evaluation method, among which 300896(2020) Imeik has the best finance and 000820(2020) * ST energy saving has the worst finance. (2) Each indicator of enterprise financial data contributes differently to financial status. Based on correlation analysis screening to construct the indicator system, regression model is used to further analyze the indicators that have significant impact, and it is found that the most important indicator is the profitability of main business, and the least important indicator is the cash flow ratio. Data Availability e experimental data in this paper are available from the NetEase Finance website (https://money.163.com).

Disclosure
Hongjiu Liu and Yanrong Hu are the joint first authors.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding this paper.