Enterprise Financial Early Warning Based on Improved Whale OptimizationAlgorithm:Optimize thePerspectivewithIndicators

e key to solving the problem of redundant nancial indicators in addressing nancial warning issues is to reduce the dimensionality of the original nancial indicators. is paper proposes a model based on the whale optimization algorithm with mixed strategy (IWOA) combined with support vector machine (SVM), namely, the IWOA-SVM early warning model, which simultaneously performs index optimization and dimensionality reduction, and nancial risk early warning identication. is paper takes a total of 302 enterprises specially treated in Shanghai and Shenzhen stock exchanges and normal enterprises of the same specication as the research samples to design the model. e results show that the improved whale optimization algorithm has better optimization speed and accuracy and improves the search ability of the original algorithm for the optimal solution. Compared with other dimensionality reduction methods, the IWOA-SVM model has the lowest index dimension after dimensionality reduction and has more excellent recognition eect. e dimensionality reduction results have certain universality for dierent classiers, which provides a new idea for the selection of indicators for nancial early warning.


Introduction
Since its inception in 1932, nancial early warning models have undergone a transformation from univariate early warning models to multivariate early warning models, to logistic early warning models [1] and modern nancial early warning models based on machine learning and integrated learning. When nancial early warning models are built, they are prone to redundant nancial indicators due to the richness and complexity of nancial indicators. e presence of redundant nancial indicators not only increases the computational e ort of the model but also reduces the accuracy of its identication. erefore, through the selection and dimensionality reduction of the original nancial indicators, screening the optimal nancial indicator combination is very important in the construction of the nancial early warning model. Fang and Yang [2] proposed the SGL-SVM method and applied it to the prediction of nancial distress, which reduced the original 90-dimensional indicator variables to 24 dimensions, eliminating a large amount of noisy data while obtaining good identi cation results. Chen [3] and Fang [4] et al. used PCA to reduce the dimensionality of the indicator data and selected the top principal components with high variance contribution instead of the original indicators. In the face of high-dimensional indicators, Huang et al. [5] rst performed independence tests on indicators to eliminate insigni cant early warning indicators, after which they used random forest and XGBoost to calculate the importance of indicators, eliminated insigni cant indicators, and used KPCA to reduce 14-dimensional nancial indicators to 7dimensional to construct a combined KPCA-WLSSVM model with high predictive power and generalization e ect. Li et al. [6] selected the top ten features in terms of importance for feature optimization based on the feature importance evaluation of the random forest algorithm, and the results showed that the overall accuracy, sensitivity, specificity, and AUC of early warning model were all improved after feature optimization. Zhou et al. [7] used grey clustering to select valid variables in their study on early warning of credit risk for listed companies, followed by logistic regression models for prediction. Luo and Wang [8] used an improved MRMR algorithm in constructing a financial early warning model, taking into account both feature relevance and redundancy for feature preference. Xiaoyan et al. [9] used significance tests and normality tests to select features with significant differences as a way to improve the accuracy of financial early warning models.
Feature selection is an important tool for feature dimensionality reduction. When preferential dimensionality reduction is performed based on the correlation information of indicators, features that are favorable to the identification results may be excluded, and the interpretability of the model is relatively poor when the financial early warning model is constructed using factor analysis. Feature preference dimensionality reduction using metaheuristic algorithms has been widely used in areas such as behavioral recognition [10], network intrusion detection [11], and performance prediction [12]. For the field of financial early warning, metaheuristic algorithms are mostly used for hyperparameter optimization of classifiers [13][14][15] and are less frequently applied in financial indicator preference dimensionality reduction. e whale optimization algorithm (WOA) is a novel metaheuristic algorithm proposed by Mirjalili and Lewis [16], which, like other metaheuristics, suffers from the problem of low accuracy in finding the optimal solution and how to improve the algorithm's ability to search for the optimal solution has received much attention from scientists [17,18]. Kaur and Arora [19]introduced a chaos mechanism to optimize the initial position of the population. Introducing adaptive weights [20] or improving the convergence factor [21] for the algorithm can balance the search ability between the early and late stages of the algorithm. erefore, this paper uses improved whale optimization algorithm with mixed strategy (IWOA) and support vector machine (SVM) to preferentially reduce the dimensionality of the original financial indicators and construct an IWOA-SVM financial warning model.

Whale Optimization
Algorithm. e whale optimization algorithm is a new intelligent optimization algorithm that has been proposed based on the special feeding behavior of humpback whales, which consists of three main parts: encircling prey, bubble-net attack, and searching for prey.

Encircling Prey.
During the whale feeding process, the location of the prey is first observed and searched to surround it. e optimal solution to the problem is predetermined to be the location of the prey, and during the iteration of the algorithm, the fitness value calculated by the fitness function is used to evaluate the merit of each group of financial indicators. erefore, the design of the fitness function is crucial. In order to take into account the smallest possible dimension of the financial indicators while having a high accuracy rate, the fitness function in this paper is Among them, Acc represents the correct rate of five-fold cross-validation of each financial indicator feature combination in the SVM classifier, feasel is the dimension of financial indicator features included in randomly selected individuals, and numfea represents the total dimension of financial indicator features.
In the iteration process of the algorithm, the position of the individual with the optimal fitness value of the current population is taken as the optimal position, and other individuals are close to the optimal position. e mathematical expression is where X best (t) denotes the position vector of the optimal individual of the current population, X(t) denotes the position vector of each individual in the current population, and t denotes the current number of iterations, the expressions of A and C are as follows.
where a is the convergence factor, r 1 and r 2 are random numbers within [0, 1], and T max denotes the maximum number of iterations.

Bubble-Net Attack.
is phase consists of two types of position update, contraction bracketing, and spiral: first, contraction bracketing, in which the position is updated by adjusting the convergence factor a in equation (3); and second, spiral position update, which simulates a whale spiraling up to hunt close to its prey, with the mathematical expression: where D P denotes the distance between an individual whale and the current optimal solution, b denotes a constant in the shape of the spiral, and l denotes a random number in the range [−1, 1]. At this stage, the individual position of the whale is updated by randomly selecting contraction encirclement and spiral contraction. e mathematical expression is X best (t) + D P · e bl cos (2πl), p ≥ 0.5, where p represents a random number in the range [0, 1].

Searching for Prey.
Individual whales search for prey by randomly swimming away. e random search strategy allows the algorithm to have a good global search performance and randomly selects whale individuals in the population to update their position when |A| ≥ 1. e mathematical expression is where X ran d (t) denotes a randomly selected individual whale.

Improvements to the Whale Optimization Algorithm.
As the whale optimization algorithm has the problems of poor global search ability, easily falling into local optimal solutions, and poor convergence accuracy in the computation process, this paper uses Gauss mapping to initialize the population to improve population diversity, introduces adaptive weights to balance global and local search ability, and uses a stochastic dimension-by-dimension variation strategy based on Cauchy mutation and reverse learning to improve the ability to jump out of local optima with a hybrid strategy to improve the whale optimization algorithm.

Gauss Mapping to Initialize the Population.
e original whale optimization algorithm initializes the population in a random way, that is, it randomly generates a combination of financial indicators. In order to expand the population search range and improve the population diversity, this paper adopts Gauss mapping to initialize the population, which is more traversable and more uniformly distributed compared to the original algorithm, and the mathematical expression of Gauss mapping is Among them, mod represents the remainder function, [ ] represents rounding, and X K represents the chaotic sequence generated by Gauss mapping.

Adaptive Weights.
Adaptive weights can effectively balance global and local search capabilities, for which an adaptive weighting factor ω is introduced with the following mathematical expression.
where t is the current number of iterations and T max is the maximum number of iterations. e equation for updating the location of individual whales after the introduction of adaptive weights is as follows.

Stochastic Dimension-by-Dimension Variation Strategy Based on Cauchy Mutation and Reverse Learning Cauchy
Mutation. e Cauchy distribution, similar to the normal distribution, is one of the common distributions in the probability theory and is characterized by a uniform distribution of variances due to its slow decline from peak to zero values, the mathematical expression of which is Reverse learning improves the search performance of an algorithm by solving the current solution backward in the same space and is widely used in various optimization algorithms with the mathematical formulation of When only one variation strategy is selected in the algorithm, it will lead the algorithm into the local optimum problem, and the traditional variation approach mostly uses random variation or variation in all dimensions, while the dimension-by-dimension variation approach can avoid the influence between dimensions and fully explore the optimal solution; therefore, this paper uses a random dimension-bydimension variation strategy based on Cauchy mutation and reverse learning to perturb the optimal individual.
Since the new solution generated is not necessarily better than the optimal position, a greedy rule is used to decide whether to adopt the new solution.

IWOA-SVM Financial Early Warning Model
Construction. Due to a large number of linear and nonlinear complex relationships between the financial indicators of an enterprise, when constructing a financial early warning model, low-quality financial indicator data can lead to feature confounding and high model computation, reducing the financial early warning capability of the model. erefore, it is necessary to reduce the dimensionality of financial indicators and select the optimal combination of financial indicators with good predictive power. e IWOA-SVM financial early warning model constructed in this paper improves the whale optimization algorithm through mixed strategies and improves the search ability for the optimal Scientific Programming solution, that is, the combination of financial indicators with the best predictive ability, to improve the model recognition effect. e flow chart of the IWOA-SVM model is as follows Figure 1.
e key steps in the IWOA-SVM financial early warning model are as follows: (1) Data standardisation: due to the different magnitudes of the financial indicator values and the fact that SVM is typically a classification algorithm based on a distance metric, to avoid impact on the model performance, the financial indicator data is normalized by the formula as follows: where μ is the mean and δ is the standard deviation of the data.  Update position from (17) Update position from (15) Update position from (16) Calculating the updated fitness value company is specially treated is recorded as year t, and continuous financial data are obtained for the previous two years. In order to eliminate the possible adverse effects of imbalanced categorical data on the model, after excluding companies with serious data deficiencies, the positive and negative samples were matched according to a 1 : 1 ratio, using the rule that companies belonging to the same or similar industries and the overall size of the companies were comparable. e final selection of 151 ST companies and 151 normal companies yielded a total of 302 valid samples and 906 sets of observations. All data were sourced from the CSMAR database.

Selection of Indicators.
When constructing financial early warning models for listed companies, it is common to start with financial indicators that reflect the operating status of the company. In this paper, a total of 32 financial indicators are selected from four aspects: solvency, profitability, operational capability, and development capability, specifically five indicators are selected from solvency. In order, they are current ratio (X1), quick ratio (X2), gearing ratio (X3), equity multiplier (X4), and equity ratio (X5); 12 indicators were selected in terms of profitability, in order, they are return on assets (X6), net profit margin on total assets (X7), net profit margin on current assets (X8), net profit margin on fixed assets (X9), EBITDA (X10), earnings before interest, tax, depreciation and amortization (X11), gross operating margin (X12), operating profit margin (X13), net operating margin (X14), management expense ratio (X15), financial expense ratio (X16), and cost margin (X17); 11 indicators were selected from the operating capacity, in order of accounts receivable to revenue ratio (X18), inventory to revenue ratio (X19), operating cycle (X20), accounts payable turnover (%) (X21), current assets to revenue (X22), current assets turnover (X23), fixed assets to revenue (X24), fixed assets turnover (X25), noncurrent assets turnover (X26), capital intensity (X27), and total assets turnover (X28); and four indicators were selected in terms of growth capacity, in the following order: growth rate of total operating revenue (X29), growth rate of total operating costs (X30), growth rate of administrative expenses (X31), and sustainable growth rate (X32). Considering that nonfinancial indicators are also important in risk identification and early warning, this paper selects two nonfinancial indicators at the level of management's governance capacity, namely, the proportion of management men (X33) and the average age of management (X34) in that order. e finalized corporate financial early warning model contains a total of 34 indicators with a large number of features, which should be subject to dimensionality reduction.

IWOA-SVM Model Identification Results.
e article divides the training and test sets in the ratio of 6 : 4, sets the initial population of the whale optimization algorithm to 20 and the number of iterations to 50, and runs the WOA-SVM before improvement and the IWOA-SVM model after improvement twenty times, respectively, under the same hardware conditions, and the number of financial indicators, the number of convergence generations, and accuracy curves are shown in Figure 2 after ranking them according to the accuracy rate from smallest to largest, and the distribution is shown in Table 1. After comparison, it can be seen that compared with the WOA-SVM model, the IWOA-SVM model has different effects in terms of convergence speed, convergence accuracy, and reducing the dimension of financial indicators. As can be seen from Figure 2, the IWOA-SVM model largely outperforms the WOA-SVM model in terms of the number of generations of convergence and the number of financial indicators when ranked from small to large in terms of accuracy. As can be seen from Table 1, after the improvement of the mixed strategy, the average number of convergence generations was reduced from 12 to 5.15, a reduction of 6.85; the average number of financial indicator dimensions was reduced from 14.90 to 10.95, a reduction of 3.95; meanwhile, the average accuracy of IWOA-SVM model identification was increased from 84.50% to 86.74%, an increase of 2.24%, and the highest accuracy was increased from 86.23% to 87.60%, an increase of 1.37%. It can be seen that the convergence speed and optimization-seeking ability of the mixed strategy improved whale optimization algorithm proposed in this paper are significantly improved over the original algorithm, which can eliminate redundant indicators while having good accuracy and can meet the financial warning needs of this paper.
e IWOA-SVM model identified the highest accuracy of 87.60%, while the financial indicators with an accuracy of 87.60% were obtained from two groups.
e first group includes current ratio (X1), quick ratio (X2), equity multiplier (X4), equity ratio (X5), total assets net profit margin (X7), EBITDA (X11), average age of management (X34), current assets to revenue ratio (X22), fixed assets to revenue ratio (X24), noncurrent assets turnover ratio (X26), and capital intensity (X27), a total of 11 financial indicators; the second group includes current ratio (X1), equity multiplier (X4), total assets net profit margin (X7), EBITDA (X10), EBITDA (X11), cost margin (X17), the average age of management (X34), accounts receivable to revenue (X18), inventories to revenue (X19), fixed assets to revenue (X24), noncurrent asset turnover (X26), total asset turnover (X28), and sustainable growth rate (X32), a total of 13 financial indicators. e identification accuracy of the two sets of financial indicators, the error rate of the first category, and the error rate of the second category are shown in Table 2. e first type of error rate (error I) is the proportion that the model identifies ST samples as normal samples, and the second type of error rate (error II) is the proportion that the model identifies normal samples as ST samples. In consideration of the principle of accounting prudence, error I should be avoided as far as possible, and a high error rate of St samples will lose the practical significance of financial early warning. Comparing the identification results of the first group and the second group, the second group had 27 ST samples predicted as non-ST, the first group had 25 ST samples predicted as non-ST, and the first group had a lower error I, better early warning effect. erefore, the combination of financial indicators of the first group is selected as the Scientific Programming optimal combination for financial early warning in this paper. A total of 156 ST samples, 162 non-ST samples were correctly identified, 25 STsamples were predicted as non-ST, and 20 non-ST samples were predicted as ST, with an error rate of 13.81% for the error I and 10.99 for the error II. e meanings of the financial indicators in the first group are shown in Table 3.

Comparison of Different Dimensionality Reduction
Methods. In order to compare the effect of model dimensionality reduction, this paper selected principle component analysis (PCA), mutual information and maximal information coefficient (MIC), recursive feature elimination (RFE), and XGBoost importance ranking four kinds of dimensionality reduction, with the increase of the selected dimensionality SVM recognition accuracy changes as shown in Figure 3, respectively, and selected the combination of indicators with the highest recognition rate as the corresponding optimal combination; recognition results are shown in Table 4. It can be seen from Table 4 that, among the four dimensionality reduction methods, the XGBoost importance ranking method selects the smallest combination dimension of financial indicators and has the lowest type two error rate, but the type one error rate is also at the highest level. In contrast, the IWOA-SVM model constructed in this paper improves the accuracy by 4.68%, reduces the type one error rate by 7.18%, and reduces the   type two error rate by 2.2% when the index dimension is reduced by 1 dimension. e mic dimensionality reduction model has the highest accuracy and the lowest type one error rate, but the accuracy is 4.13% lower than the IWOA-SVM model, the type one error rate is 4.97% higher, and the type two error rate is 3.3% higher. However, the feature dimension of the model is as high as 25 dimensions, and the dimensionality reduction effect is not obvious. In summary, the IWOA-SVM model constructed in this paper can achieve optimal identification results with a minimum number of financial indicators.

Algorithm Performance Comparison.
To compare the performance of the IWOA-SVM algorithm, salp swarm algorithm (SSA), particle swarm optimization (PSO), flower pollination algorithm (FPA), and grey wolf optimizer (GWO) were selected. Under the same hardware Total assets/total owners' equity Equity ratio (X5) Total liabilities/total owners' equity Total assets net profit margin (X7) Net profit/average balance of total assets Earnings before interest, tax, depreciation, and amortization (X11) Net profit + income tax expense + finance costs + depreciation of fixed assets, depreciation of oil and gas assets, depreciation of productive biological assets + amortization of intangible assets + amortization of long-term amortization expenses Current assets to revenue ratio (X22) Current assets/operating income Fixed assets to revenue ratio (X24) Fixed assets/operating income Non-current assets turnover ratio (X26) Operating income/noncurrent assets Capital intensity (X27) Total assets/operating income Average age of management (X34) e average of the ages of all directors and supervisors of the company (directors and supervisors whose ages are not disclosed are not included in the calculation)   conditions, keep the same parameter settings as IWOA-SVM. After running 20 times, the maximum and average values of the recognition accuracy are shown in Figure 4. It is easy to find that the IWOA-SVM model in this paper has the highest recognition accuracy and average accuracy, and the model performance is better than other metaheuristic algorithms.

Comparison of Universality of Dimension Reduction
Effect. In order to verify the algebra liability and identification of the preferred dimensionality reduction effect of the IWOA-SVM financial warning model, five financial warning models based on logistic regression (LG), random forest (RF), K-nearest neighbor (KNN), decision tree (DT), and SVM were constructed simultaneously in this paper. Identify the original financial indicator combination and the optimal financial indicator combination obtained in this paper. e identification results are shown in Table 5. It can be seen from the table that the logistic regression, random forest, K-nearest neighbor, and decision tree models are obtained using this paper. After the optimal combination of financial indicators, the model recognition accuracy increased by 4.41%, 2.75%, 4.13%, and 2.2%, respectively, and the first type error rate decreased by 1.1%, 2.21%, 6.08%, and 1.66%, respectively. Compared with the SVM model for identifying the original data set, the IWOA-SVM model has an accuracy improvement of 6.88% to 87.60%, and the first type error rate decreased by 5.53% to 13.81%, and the second type error rate decreased by 8.24% is 10.99%. Collectively, it appears that the IWOA-SVM financial warning model constructed in this paper has the highest accuracy rate, the lowest type one error rate, and the best model identification effect.

Discussion and Conclusions
To address the issue of indicator selection in the construction of financial early warning models, this paper proposes the IWOA-SVM financial early warning model. A sample of 151 ST companies and 151 normal companies listed on A-shares in Shanghai and Shenzhen in 2019-2021 was used for the empirical analysis, and a mixed strategy is used to improve the original whale optimization algorithm for the problems of poor merit-seeking ability and slow convergence accuracy. e empirical analysis shows that first, the IWOA-SVM algorithm reduced the original 34dimensional metrics to 11 dimensions, with a model recognition accuracy of 87.60%, an improvement of 6.88%, and a type one error rate of 13.81%, a reduction of 5.53%, the dimensionality reduction and recognition better than the four dimensionality reduction methods such as PCA. Second, compared to the original whale algorithm, the improved whale optimization algorithm with mixed strategy reduced the average financial indicator dimension by 3.95 and the average number of convergence generations by 6.85, the highest accuracy was improved by 1   algorithm's convergence speed and optimization-seeking ability was enhanced. irdly, using logistic regression, random forest, K-nearest neighbor, and decision tree to identify the combination of indicators before and after dimensional reduction and optimization, the accuracy rate has improved to different degrees, indicating that the method of dimensional reduction and optimization of indicators has certain universality for different classifiers. Taken together, the IWOA-SVM financial early warning model proposed in this paper has the highest accuracy rate and the lowest type I error rate, providing a new way of thinking about the selection of indicators for the construction of financial early warning models, which is an accurate, efficient, and scientific financial early warning model.

Data Availability
e data presented in this study are available on request from the corresponding author.