Market Risk Early Warning Based on Deep Learning and Fruit Fly Optimization

To improve the ability of market to avoid and prevent credit risk and strengthen the awareness of market risk early warning, SMOTE is used to process the unbalanced sample, and fruit fly optimization algorithm (FOA) is utilized to optimize the parameters of support vector machine (SVM), and thus an improved SVM market risk early warning model is proposed. The simulation results show that the proposed model has excellent stability and generalization ability, and it can predict market credit risk accurately. Compared with the prediction model based on FOA-SMOTE-BP and FOA-SMOTE-Logit, the proposed model performs better on the indicators of G value, F value, and AUC value, which provides a reference for market credit risk prediction.


Introduction
Market risk early warning is an important measure to prevent market risk and unknown loss and improve market normalization. In recent years, with the development of intelligent technology, deep learning has been widely used in various elds, including market risk warning. So far, the relevant researchers achieved market risk early warning through the use of deep learning. On the basis of in-depth study of rough set theory (RST), Guan et al. proposed a nancial operation risk early warning model based on BP neural network, which e ectively realizes the prediction of nancial operation risk and pro t risk of family farm [1]. In the cross-border environment, scholars put forward the marginal expected gap, delta conditional value at risk, and conditional capital gap to measure the system risk. e feature pair method based on bilateral balance sheet data is di erent from the paradoxical risk measurement method based on market price. us, a systemic risk early warning method based on the network spectral feature pair method analyzing the core global banking system is proposed. e method provides risk early warning for the unstable nancial markets based on turning points similar to R numbers in popular models [2]. Figini et al. improved the sample performance of parametric models and nonparametric models in credit risk estimation, so as to propose a multivariable outlier detection technology based on local outliers, which can support nancial institutions to make decisions and avoid falling into corporate credit risks [3]. In view of the credit risk assessment of the Internet nance industry, Yang and Yuan applied RBF network to analyze the statistical data of online peer-to-peer lending platform and evaluated the credit risk of the platform, and thus a new early warning method of RBF neural network model is proposed, which can reasonably predict the credit risk status of the industry development [4]. Based on the fuzzy theory and related theories of nancial risk early warning management, Ding proposed the fuzzy comprehensive evaluation method, which realizes more accurate early warning and assessment of potential and obvious risks of nancial enterprises. In addition, the safety of nancial enterprise management is greatly improved, and the losses caused by various risks are reduced [5]. Based on BP neural network, Li constructed a risk assessment model of knowledge transfer in transforming enterprises to realize knowledge management risk warning [6]. Dong adopted the improved K-means algorithm of quantum evolution to divide the risk warning interval by combining the given initial value and the value at risk measured by well-known Chinese online financial companies [7,8]. Zhang and Chen used the autoregressive conditional Fréchet (ACF) model to predict the tail risk of the capital market, so as to identify major crisis sources [9]. Ouyang et al. applied the deep learning algorithm to the early warning of market risk.
e results show that the algorithm has high accuracy compared with the traditional BP and others [10].
Among them, SVM algorithm is widely used in the field of classification because of its nonlinear and small sample advantages, but the parameter optimization of SVM is a research hotspot. For example, Jerlin Rubini and Perumal proposed to optimize the SVM algorithm by using the Drosophila algorithm and applied the optimized algorithm to the classification of chronic kidney disease, showing a good classification effect, indicating that the Drosophila algorithm has great advantages in optimizing SVM [11]. Tian and others used the fruit fly algorithm to optimize the echo state network, which greatly improved the accuracy of prediction [12]. Lu et al. applied the fruit fly algorithm and SVM to the prediction of urban gas load, which greatly improved the accuracy of short-term prediction. It can be seen from the above that the combination of Drosophila algorithm and SVM for classification or prediction has become the focus of current research [13]. e above early warning models based on deep learning realize the early warning of market risks to a certain extent. However, the prediction accuracy needs to be improved. To solve this problem, this paper applies the SVM model with excellent predictive performance and constructs a market risk early warning model by optimizing its parameters and unbalanced samples.

SVM Model.
SVM is a generalized linear classifier, which is proposed based on statistical learning theory and the principle of minimizing structural risk. Its basic principle is to construct an optimal hyperplane to maximize the distance between samples of two different categories, which is shown in Figure 1 [14]. Here, circles and squares represent two different types, respectively, and the optimal hyperplane is to maximize the range between the two dotted lines.
Suppose, dataset � (xi, yi), i � 1, 2, . . . , n, x ∈ R, y ∈ −1, 1 { }, y is the category number. When y � 1, it means that x belongs to the first category. Also, when y � −1, it means that x belongs to the second category. Its linear discriminant function is usually expressed as [15] g where w ′ is the inertia weight and b is a constant. e classification gap is 2/‖w ′ ‖ 2 . When ‖w ′ ‖ 2 is minimum, the classification spacing is maximum. e form of standard SVM is [16] where c is the penalty function and ε is the slack variable. SVM is used to perform nonlinear transformation for undivided linear sample data, namely, φ:R d ⟶ H. us, the data sample space can be mapped in high-dimensional space. When solving, it should meet the requirement of [17] where a i is the Lagrange multiplier and n i�1 y i a i � 0. Since the above equation is constrained by the inequality, there is a unique optimal solution corresponding to the Lagrange multiplier a i * . e optimal classification discriminant function is [18] where b * is the b value obtained from formula (2) and K(x i , x i ) is the kernel function. Take the radial basis function (RBF) as an example, which can be expressed as [19] K where σ is the RBF kernel parameter. According to formulas (4) and (5), the optimal classification discriminant function is p t i m a l h y p e r p l a n e As can be seen from the above analysis, the classification effect of the SVM model mainly depends on two aspects: one is whether the number of classification samples is balanced, and the other is whether the kernel parameters and penalty factors of the model are optimal, while the standard SVM model does not consider the situation [20]. erefore, in order to improve the classification effect of the SVM model, this paper improves the model from the above two aspects.  (7), X i and X are synthesized into a new sample:

SVM Model
(4) Combine X new and X as a new training set to learn on the SVM model.

Optimization of Model Parameters.
To optimize kernel parameters and penalty factors of the SVM model, this paper adopts fruit fly optimization algorithm (FOA) with high searching accuracy to process. Figure 2 shows simulated behavior of fruit fly foraging process [22]. e basic operation is as follows.
(1) Initialize model maximum iteration, population size, fruit fly population location range (LR), and other parameters. In 2D coordinates (X, Y), the initial position of each individual fruit fly is (2) Assign flight direction and distance to all fruit flies and olfactory search is utilized to update [23] X i � X axis + rand(FR), where FR represents the single flight range of fruit fly. (3) According to formula (12), the distance between the individual position of fruit fly and the origin is calculated [24].
(4) Smell i and S i are calculated according to fitness: where fitness is the discriminant function, smell i is the flavor concentration value, and S i is the judgment value of smell i . (5) Update bestSmell and bestIndex: (6) Use visual search to make other fruit flies fly to the best position [25]: (7) Repeat steps (2)-(6) until the algorithm iterates to the set number.
FOA guides the search by the current optimal solution and makes the result close to the optimal solution, so as to realize the parameter optimization.

Market Credit Risk Early Warning Model
Based on Improved SVM Smin and Smaj are used to represent the samples of market credit risk and noncredit risk, and S is the set of all samples. Based on the above improvements, the construction process of the market credit risk early warning model is designed as follows: (1) Calculate k nearest neighbor points of each sample point (x smin, y smin ) in Smin, randomly select a neighbor point |S maj -S min |/2 to subtract (x smin, y smin ) and multiply it by the random number δ in the (2) Repeat the above steps until the number of x new reaches |S maj -S min |/2. (3) Initialize relevant parameters of SVM and FOA. In this paper, referring to reference [26], the maximum iteration number of FOA is set to 100, and the population size is set to 20. (4) Use FOA to optimize the parameters of the SVM model, and the judgment value of flavor concen- Continue to iterate until the optimal bestsmell is less than the set value, and then the value is the optimal parameter. (6) Plug optimal parameters and x new to construct the improved SVM model and perform prediction.
e above process is illustrated in Figure 3.

Experimental Environment Construction.
is experiment is run on 64-bit Windows 7 professional edition system. e CPU is Intel(R)Xeon(R) e5-2620v3 2.40ghz, and the GPU is Tesla K80. In addition, the memory is 16G. e model was built with MATLAB2018a.

Data Sources.
e financial data of 260 listed manufacturing enterprises in Shenzhen and Shanghai from 2018 to 2020 are selected as the experiment data.
rough references [21,27,28], there are a total of 20 financial indicators selected as credit risk warning indicators of listed companies, including 6 first-level indicators such as enterprise operation capacity, growth capacity, profitability, and so on, and 20 second-level indicators such as total asset turnover rate, net asset growth rate, return on net asset, and so on. e indicators are listed in detail in Table 1.

Data Preprocessing
(1) Descriptive Statistics. Since there are significant differences between the mean of the above indicator variables and standard deviation and maximum and minimum values, descriptive statistics of indicator variables are carried out, and the results are shown in Table 2.
(2) Normalization. Considering the dimensional level of index variables, z-score is adopted for normalization, which is shown in the following formula: where μ(x j ) and σ(x j ) represent the sample mean and standard deviation corresponding to index j, respectively. After normalization, descriptive statistics of each indicator variable are shown in Table 3.
(3) Significance Testing. For the selection of indicators that can distinguish credit risks and nonuse risks of listed companies, this paper adopts independent sample T-test to test them, and the results are shown in Table 4. e P value of 7 indicators, such as net asset-liability ratio and operating profit growth rate, is more than 10%, which indicates that it is unable to distinguish the credit risk and noncredit risk, so it is deleted in this paper.

Evaluation Indicators.
Set average accuracy (G), F value, and AUC are used to evaluate the prediction performance of model. Confusion matrix is used to represent the dichotomous dataset of credit risk, which is shown in Table 5.
e model sensitivity (SE), specificity (SP), and precision (P) can be calculated as follows: rough the above three indicators, the following can be calculated: e larger the selected index value is, the better the model performance is.

Model Verification.
e samples are divided into training sets and testing sets according to different proportions, and experiments are carried out under different kernel functions and different optimal parameter values. e results are shown in Table 6. Under the division condition of different sample proportion, models corresponding to different kernel functions and optimal parameter values perform well in G value, F value, and AUC value, and the differences are small, which indicates that the proposed model has good prediction performance and strong generalization ability [29].
In order to more intuitively reflect the prediction performance of different kernel functions and optimal parameter values under different training sample proportions, the prediction results in the above table are plotted in  Figure 4 shows that under different ratios of training set and testing set, the fluctuation range of G values of different kernel functions is small. Compared with the sigmoid and polynomial models, linear and RBF models have higher G values, which indicates that linear and RBF models have slightly better performance. On the other hand, Figure 5 shows that under different proportions of training sets and testing sets, F values of different kernels fluctuate greatly, but the overall F value is high. ere is no trend indicating that the model F value of certain kernels type has the highest value, and thus the proposed model has good generalization ability. As can be seen from Figure 6, the AUC value of the proposed model fluctuates greatly, but the proposed model also achieves good results on this index. To sum up, the model proposed in this paper has good generalization ability and good prediction performance.
Considering that evaluating the model performance only through evaluation indexes lacks a certain scientific character, paired sample T test is adopted to test the prediction performance of different kernel function models, and the results are shown in Table 7. On the G value, the statistics of RBF and polynomial model, linear, and sigmoid model are all less than 10%, which rejects the null hypothesis, indicating that the performance of RBF and polynomial model is significantly different from that of linear and sigmoid model. On the F value, the statistics of all kernel function models are less than 10%, and all accept the null hypothesis, indicating that the performance of different kernel function models is less Start Select the initial samples, and use normalization and statistical analysis methods to preprocess the samples Plug SMOTE algorithm: find k neighbor similar sample of minority sample, select one randomly from k samples, and use the following formula to construct a new sample: Utilize FOA to optimize SVM parameters C and g  Current ratio (X 1 ) Quick ratio (X 2 ) Net asset-liability ratio (X 3 )

Operation ability
Fixed asset turnover ratio(X 4 ) Total asset turnover ratio (X 5 ) Receivable turnover ratio (X 6 ) Profit ability Return on net assets (X 7 ) Return on total assets (X 8 ) Net interest rate in sale (X 9 ) Growth ability Growth rate of basic earnings per share (X 10 ) Growth rate of total operating revenue (X 11 ) Operating profit growth rate (X 12 ) Growth rate of net assets (X 13 ) Index per share Earning per share (X 14 ) Net assets per share (X 15 ) Retained earnings per share (X 16 ) Operating revenue per share (X 17 ) Asset structure Asset-liability ratio (X 18 ) Rights multiple (X 19 ) Capital fixed ratio (X 20 )   In conclusion, the change of kernel function has little influence on the prediction performance of the proposed model, which means that the prediction performance of the proposed model is relatively stable.      Table 8. Compared with FOA-SMOTE-BP and FOA-SMOTE-Logit, the proposed model has the best performance on the indicators of G value, F value, and AUC value, indicating that the model proposed in this paper has the best prediction performance.
In order to observe the prediction performance of different models intuitively, the results of the above table are drawn as shown in Figure 7. It can be seen from the figure that the G value, F value, and AUC value curves of the proposed model are significantly higher than those of the comparison model, indicating that the proposed model has better prediction performance.

Conclusion
In summary, the proposed market risk early warning method based on deep learning takes SVM as basic model and uses SMOTE to deal with unbalanced sample. In addition, minority samples are subjected to oversampling, and FOA is utilized to tune model parameters. us, the classification effect of the model is improved. e empirical results show that the proposed model has excellent stability and generalization ability and can accurately predict market credit risk. Compared with FOA-SMOTE-BP and FOA-SMOTE-Logit models, the proposed model performs better on G value, F value, and AUC value indicators and has better prediction performance, which provides a reference for market credit risk prediction research. e contribution of this study is to use a new improved SVM to predict the market risk. It provides a new reference for information management and prevention of the market. However, as the limitation of conditions, there are still some deficiencies to be improved. In the selection of market credit risk indicators, they are selected just through references, without considering the actual situation of China's manufacturing industry, which may affect the final market risk prediction results. To avoid the influence of index selection on prediction accuracy, the next research will try to independently determine the relevant index variables affecting market credit risk.

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.