A Hybrid Analysis Approach to Improve Financial Distress Forecasting : Empirical Evidence from Iran

Bankruptcy prediction is an important problem facing financial decision support for stakeholders of firms, including auditors, managers, shareholders, debt-holders, and potential investors, as well as academic researchers. Popular discourse on financial distress forecasting focuses on developing the discrete models to improve the prediction. The aim of this paper is to develop a novel hybrid financial distressmodel based on combining various statistical andmachine learningmethods.Thenmultiple attribute decision making method is exploited to choose the optimized model from the implemented ones. Proposed approaches have also been applied in Iranian companies that performed previousmodels and it can be consolidated with the help of the hybrid approach.


Introduction
Listed companies financial distress prediction is important to both listed companies and investors.However, due to the uncertainty of business environment and strong competition, even companies with perfect operation mechanism have the possibility of business failure and financial bankruptcy.So whether listed companies financial distress can be forecasted effectively and timely is related to companies' development, numerous investors' interest, and the order of capital market [1][2][3].
Most topical studies have adopted a multiple-variable approach to the prediction of financial distress by combining accounting and nonaccounting data in a variety of statistical formulas [4][5][6][7].While the predictive value of accounting information was based on samples of industrials or on nonindustrials, the misclassification rates were low.Hence the explanatory variables had a significant predictive power.Ratios based on accounting earnings reported cash flow and book debt figured prominently in various statistical formulas, especially those applied to the industrial sector such as univariate analysis, multiple discriminant analysis, logit, and probit model [8][9][10][11].Although these methods use history samples to create diagnostic model, they cannot inductively learn from new data dynamically.This greatly affects the forecasting accuracy.More recently, many studies have demonstrated that artificial intelligence such as decision trees, neural networks, and support vector machine can be alternative methods for financial distress prediction [12][13][14].Moreover, there are various studies on the comparison between statistical and machine learning methods in terms of their ability to predict financial data [15][16][17][18].
On the other hand, the recent researches have exploited multiple attribute decision making (MADM) methods in financial analysis to improve the final outputs [19][20][21][22].
This paper puts emphasis on optimizing the financial distress forecasting in the case of listed companies in Tehran Stock Exchange (TSE) of Iran by the hybrid approach which outperforms existing discrete models significantly.Because of the importance of financial ratios to describe a company's situation, factor analysis is applied to summarize the effect of financial ratios; then the combinations of all of them are exploited.Subsequently, the extracted predictors are utilized to forecast financial distress in a hybrid approach through traditional statistical modeling distress and machine learning methods for classifying business.In this analysis, another important issue is homogenizing business via clustering method to improve prediction models.Also, MADM method 2 Mathematical Problems in Engineering is used to distinguish the best model via different classification performances measures.Consequently, a comparison of the final results shows that the prediction of the financial distress is significantly consolidated.
The paper is organized as follows: Section 2 presents a short review of the literature in the field of financial distress forecasting.Section 3 briefly describes the applied methods.Then Section 4 explains the proposed approach and also the empirical evidence from Iran is presented.The paper ends with concluding remarks in Section 5.

Review of the Literature
The early prediction of distress is essential for companies and investors or lending institutions that wish to protect their financial investments.As a consequence, modeling, prediction, and classification of companies to determine whether they are potential candidates for financial distress have become key topics of debate and detailed research.
Corporate bankruptcy was first modeled, classified, and predicted by Beaver [23].He defined financial distress as bankruptcy, insolvency, and liquidation for the benefit of a creditor, firms defaulted on loan obligations or firms miss preferred by dividend payments.In this study, "cash flow to total debt" had the highest discriminatory power of the ratios examined.Altman's model is perhaps the best known of the early studies [4].He developed a -score bankruptcy prediction model and determined a cut point of -score (2.675) to classify healthy and distressed firms.The results showed that the -score model had a sound prediction performance one year and two years before financial distress but did not indicate good prediction utility three to five years before financial distress.A number of authors such as Taffler [24,25], Pantalone and Platt [26], Betts and Belhoul [27], and Piesse and Wood [28] followed his work and applied the score model into different markets, different time periods, and different industries.Also, Deakin [29] and Blum [30] used multiple-variable statistical techniques subsequent to Altman [31].
Furthermore, most recent studies have adopted a multiple-variable approach to the prediction of financial distress by combining accounting and nonaccounting data in a variety of statistical formulas.In the reviewed literature, 64% of all authors used statistical techniques whose overall predictive accuracy was 84%.25% of the authors used machine learning models whose overall accuracy was 88%, and 11% of the authors used theoretical models whose accuracy was calculated as 85% [32].Table 1 briefly presents some recent researches in financial distress forecasting.
In general, the investigation of the studies carried out on the value of data of financial cases of bankruptcy prediction shows that the accounting data are able to predict the financial distress in the companies.We must, however, consider this point that there is no high unity (of views) regarding the kind of the financial ratios used in prediction of financial distress and the yielded results according to different financial ratios and methods of research.In this research, some ratios that have a high unity of views are used [37][38][39].

Method(s)
Author Applying support vector machines to bank bankruptcy analysis using practical steps [33] Erdogan, 2013 Presenting particle swarm optimization techniques to obtain appropriate parameter settings for subtractive clustering and integrates the adaptive-network-based fuzzy inference system (ANFIS) [16] Chen, 2013 Applying a simple hazard model to develop an early warning system of bank distress in the gulf cooperation council countries [34] Maghyereh and Awartani, 2014 Presenting a statistics-based wrapper for SVM-based financial distress identification by using statistical indices of ranking-order information from predictive performances on various parameters [35]

Li et al., 2014
Examining the effect of the filter and wrapper based feature selection methods and applying different classification techniques [36] Liang et al., 2015

Methodology
In this section, the methods applied in our paper are briefly described.Factor analysis, -means method, discriminant analysis, logit model, decision trees, neural network, and TOPSIS are presented, respectively.

Factor Analysis.
Factor analysis is a dimension reduction method of multivariate statistics, which explores the latent variables from manifest variables.Two methods for factor analysis are generally in use, principal component analysis and the maximum likelihood method.The main procedure of principal component analysis can be described in the following steps when applying factor analysis [40].
Step 1. Find the correlation matrix () or variance-covariance matrix for the objects to be assessed.
Step 3. Consider the eigenvalue ordering ( 1 > ⋅ ⋅ ⋅ >   > ⋅ ⋅ ⋅ >   ;   > 1) to decide the number of common factors and pick the number of common factors to be extracted by a predetermined criterion.
Step 4. According to Kaiser [41], use Varimax criterion to find the rotated factor loading matrix, which provides additional insights for the rotation of factor-axis.
Step 5. Name the factor referring to the combination of manifest variables.

K-Means
Method.This method clusters  objects into  ( < ) deterministic partitions by minimizing the total squared error function given by MacQueen (1967) [40,42]: where  is the number of clusters in the data,  ℎ is the center of cluster ℎ, and  is the data point in the cluster ℎ.Different solutions can be attained depending on the initial guess of cluster centers; therefore, the procedure should be repeated multiple times, and the final solution is selected as the one that gives the maximum separation between clusters.

Discriminant Analysis.
Let  be a -dimensional normal random vector belonging to class where  1 ̸ =  2 , and Σ is a positive definite symmetric matrix.If  1 ,  2 , and Σ are known, the optimal classification rule is Fisher's linear discriminant rule, where , and  denotes the indicator function, with value 1 corresponding to classifying  to class 1 and 0 to class 2. Fisher's rule is equivalent to the Bayes rule with equal prior probabilities for two classes.The misclassification rate of the optimal rule is where  is the standard normal distribution function.
In practice, Fisher's rule is typically not directly applicable, because the parameters are usually unknown and need to be estimated from the samples.Let { 1, ,  = 1, . . .,  1 } and { 2, ,  = 1, . . .,  2 } be independent and identically distributed random samples from   ( 1 , Σ) and   ( 2 , Σ), respectively.The maximum likelihood estimators of  1 and  2 and Σ are where  =  1 +  2 , and setting and Σ −1 =  −1  (or generalized inverse  −  when  −1  does not exist), Fisher's rule becomes the classic LDA [40,43]: 3.4.Logit Model.Binary responses, for example, success and failure, are the most common form of categorical data and the most popular model for them is logit model.For a binary response, , and a vector of explanatory variables, x, let () denote the success probability when  takes value .This probability is the parameter for the binomial distribution [44].The logistic regression model has a linear form for the logit of this probability, where   is the probability of any sample belonging to   , which is estimated using   /.The set attribute  has  different values { 1 ,  2 , . . .,   }.A property can be divided into subsets  = { 1 ,  2 , . . .,   }, where   contains a number of  values in this sample and they have a value of   in .If we select test attribute , these subsets correspond to set , which contains nodes derived from growing the branches.  assumes that   is a subset of the samples of class   .Thus,  can be divided into subsets of entropy or expected information, which is given by where the item ( 1 +  2 + ⋅ ⋅ ⋅ +   )/ subset is on the right of the first  and is equal to the number of subsets of the sample divided by the total number of  in the sample.Equation ( 10) is a given subset for   : where   =   /|  | is a sample of   based on the probability of belonging to class   .Equation ( 11) is a branch that will be used for encoding information: In other words, Gain() is attributable to a value of that property because of the expectations of the entropy of compression.Thus, a smaller entropy value leads to a lower correlation, whereas a higher corresponding information gain produces a subset of the division with higher purity.Therefore, the test attribute decision tree selects the properties with the highest information gain.This creates a node and marks the property, where each value of the property creates a branch and divides the sample accordingly.
The decision tree contains leaves, which indicate the value of the classification variable, and decision nodes, which specify the test to be carried out.For each outcome of a test, a leaf or a decision node is assigned until all the branches end in the leaves of the tree [45][46][47].

Neural Network.
Neural network is a technique that imitates the functionality of the human brain using a set of interconnected vertices.It is based on an artificial representation of the human brain, through a directed acyclic graph with nodes (neurons) organized into layers.In typical feedforward architecture, there are a layer of input nodes, a layer of output nodes, and a series of intermediate layers.The input signals are multiplied by their corresponding weights to give the value of  as in where   is weighted sum of input signals at node ;  0 is threshold (bias) value;   is the weight associated with the connection between node  and the input node ;   is a value of input node ;  is number of input nodes.A sigmoid activation function ( 13) is applied to the weighted sum: The value calculated from ( 13) is the output signal from node , which can be considered as the input signal to the next layer [48].

TOPSIS. TOPSIS (technique for order preference by
similarity to an ideal solution) method is a popular approach to MADM (multiple attribute decision making) that has been widely used in the literature.It presented by Hwang and Yoon consists of the following steps [49].
Step 1.The decision matrix is normalized through the application of Step 2. A weighted normalized decision matrix is obtained by multiplying the normalized matrix with the weights of the criteria, Step 3. Positive indicator score ( * ) (maximum value) and negative indicator score ( − ) (minimum value) are determined by Step 4. The distance of each alternative from  * and  − is calculated using Step 5.The closeness coefficient for each alternative (CC  ) is calculated by applying Step 6.At the end of the analysis, the ranking of alternatives is made possible by comparing the CC  values.

The Proposed Approach and Empirical Evidence
Corporate bankruptcy forecasting plays a central role in academic finance research, business practice, and government regulation to financial decision support.Consequently, accurate default probability prediction is extremely important.
The main purpose of this study is not only improving the prediction performance model through hybrid analysis approach but also employing a multiple attribute decision making (MADM) method to make optimum decision for choosing the best alternative classification.Figure 1 briefly presents the flowchart of the optimization approach.
As shown in Figure 1, the first step consists of a list of important and available financial ratios including liquidity measurement ratios, profitability indicator ratios, debt ratios, operating performance ratios, cash flow indicator ratios, and investment valuation ratios.In the proposed approach, all ratios would be considered because of the efficiency of forecasting approach.In fact, the factor analysis is used for dimension reduction when the numbers of predictors are high.If this method is not applied, it may cause overfitting during modeling.Also, by applying the factor analysis, our predictors are more influential and significant than before because they present more information on the listed companies.
Another suggestion in this research to present more sophisticated models is homogenizing of the company's performance.It is fairly clear that there are a variety of businesses which can bring about the inefficiency of forecasting models.As a solution, this step tries to cluster businesses based on an influenced ratio and then exploit forecasting methods.A comparison among performance measurements presents remarkable improvement among recent financial distress modelings.
Subsequently, the extracted predictors from the factor analysis are utilized to forecast financial distress through traditional statistical modeling distress and machine learning methods in each cluster separately.
As the last but not the least step, based on the different classification performances measures, we try to choose the best model from the data set.It is because up till now there is no best classification method which can cover the best score in all evaluation measures.Different multiple attribute decision making (MADM) methods often produce different outcomes for selecting or ranking a set of decision alternatives involving multiple attributes.The TOPSIS is one of the famous MADM methods used to distinguish the best model.Also, other MADM methods can be applied.Consequently, the final results show that the prediction of the financial distress is significantly consolidated.
In the following, empirical evidence related to the proposed approach is also presented.Population under the study is the accepted manufacturing companies in Tehran Stock Exchange (TSE) for one year ended on March 21, 2011.The reason for this choice is the availability of financial information of these companies.There are 461 companies listed in TSE with 37 industry groups, of which 412 are manufacturing companies and 49 are the nonmanufacturing ones.The number of the manufacturing companies is more than other listed companies subject to granting more loans due to their extensive activities.A sample of 180 companies is chosen for this research.
In Tehran Stock Exchange, the measure for companies exiting capital market is the commercial law of 141 acts.According to those acts, companies are known as bankrupt whose retained losses are more than 50% of their capital.58 companies are bankrupt under this law.The rest of nonbankrupt companies were randomly selected from the remaining list.
In this research, some ratios that have a high unity of views for 180 manufacturing companies quoted in Tehran Stock Exchange for one year (year ended on March 21, 2011) were used.The required data to calculate the ratios have been gathered from companies' balance sheets and income statements.The financial ratios used in the prediction are listed in "The Definition of Variables Used." Popular discourse on financial distress prediction deals with the selection of important variables because of enormous financial ratios [27,30].Hence, in this study, to reduce the variables dimension and summarize their effects on factors by the analysis of correlation matrix, we applied the factor analysis.Consequently, Table 2 shows four common factors by the principal component method and Varimax rotation.They account for a cumulative proportion of 93 percent of the total sample variance.
It is fairly clear that the first factor represents the debt and cash flow conditions (a strong combination of DR, DER, ROS, ROE, and WCTA).The second factor almost symbolizes the liquidity conditions (a strong combination of TA, CR, and QR), and the third one approximately denotes the operating performance of the listed companies (a strong combination of SFA, STA).The last factor is the investment conditions (a combination of SI, TLDS, CLOE, and TLOE).For example, we have Then, thanks to the existing difference in the size and type of the industries, the listed companies have not been homogeneous, causing the inefficiency of prediction models.As a solution, we tried to cluster listed companies by using the -means technique based on a ratio.The current ratio (CR) is a popular financial ratio used to test a company's liquidity by deriving the proportion of current assets available to cover current liabilities.Therefore, it is crucial to improve distress prediction; the listed companies were clustered based on their capital positions.Table 3 shows the result.Hence, the listed companied can be divided into three levels of current ratio (CR): small, average, and high.
In the next step, the usual statistical distress prediction methods are applied.First of all, the logit models are where  1 is the debt and cash flow conditions,  2 is the liquidity conditions,  3 is the operating performance, and  4 is the investment conditions.All listed companies divided into 3rd cluster are not distressed.Also, the average error count of clustered distress is about 7.5 percent less than that without cluster analysis for logistic regression.
The next statistical method for bankruptcy prediction is discriminant analysis.Table 4 presents the linear discriminant functions for financial distress in each cluster.The average error count is significantly reduced in comparison with the general function.
In addition, to implement the machine learning methods, decision trees and neural network are applied where analysis for each cluster is not necessary separately.This is because both methods can be used for both classification and clustering purposes.Actually, in this part, the important issue is to utilize all ratios and extract factors as predictors separately.
To classify the distressed companies through decision tree, 100 companies were randomly used to build the tree and the rest for test (see Figure 2).
It means that if, for a company,  1 (the debt and cash flow conditions) is greater than 0.41, then the company is not distressed, and if  1 is less than 0.23 and  2 (the liquidity conditions) is less than 0.16, then the company is distressed.To evaluate the built trees, we applied about 30 percent of data.The results showed that the tree of directly financial ratios had a greater misclassification error than that of factors (20%).In other words, comparison between two trees showed that to make tree by factors more information was used; hence a more reliable result was achieved.
At last, neural network is implemented by three hidden layers and hyperbolic function.The final result shows the overall accuracy of 94 percent, which is 12 percent less than all ratios directly used.
On the other hand, there are four models to forecast financial distress (in fact, there are more classification methods that can be applied), and we have to present the best one from our data set.As mentioned above, the TOPSIS method is used to choose the best model based on the different classification performances measures.
and AUC is the area under a receiver operating characteristic (ROC) curve, where a ROC space is defined by FP and TP as and -axes, respectively, which depicts relative trade-offs between true positive and false positive.
As mentioned, there are various methods to choose an ideal alternative from MADM problems and as stated before, TOPSIS is one of them.In the TOPSIS approach, the best alternative is the nearest one to the ideal solution and the farthest one from the negative ideal solution.Also, it is assumed that all the criteria have identical weights and importance.Table 6 presents a brief calculation of this method, where   is normalized criteria (accuracy, error rate, precision, sensitivity, specificity, and AUC, resp.),  * is the distance from the ideal alternative,  − is the distance from the negative ideal alternative, and CC is the relative closeness to the ideal solution.
Based on the last column of Table 6, the decision trees, neural network, logit analysis, and discriminant analysis are the better models, respectively, while if decision maker wants to choose the best model by the information in Table 5, then the judgment may be probably different.

Conclusion
The enterprise bankruptcy forecasting has always been an important issue in the business and financial decision support.In this research, applying a hybrid approach is suggested to improve the prediction performance and give more supportive results.First of all, factor analysis was used to determine and summarize some combinations of financial ratios correlated together.After that, -means algorithm was used to cluster companies, homogenize them, and get much accurate results.
Later, to predict the financial distress, the multiple logistic regression analysis, the multiple discriminant analysis, the decision tree, and the neural network, that are all the famous methods in this field, were applied.Finally, the best model classifier was chosen with the help of multiple attribute decision making (MADM), the TOPSIS method.
The proposed approach, which has also been applied in Iranian companies in Tehran Stock Exchange which used to employ previous performance models, can be consolidated with the help of the hybrid analysis.The comparison among the used methods clearly showed that the decision tree and then the neural network had a remarkable performance in comparison to others.
The hybrid approach advanced provides insight into the complex interaction of the common bankruptcy prediction methods and suggests avenues for applying MADM methods in this area in the future research.

TA:
The total asset CR: The ratio of the current assets to the current liabilities QR: The ratio of the amount of cash and equivalents, short, and accounts receivable, term investments to the current liabilities DR: The ratio of the total liabilities to the total assets DER: The ratio of the total liabilities to the total owner's equity SI: The ratio of the sales to the number of the inventories TLDS: The ratio of the total liabilities to the daily sales SFA: The ratio of the sales to the fixed assets STA: The ratio of the sales to the total assets ROS: The ratio of the net income to the sales ROE: The ratio of the net income to the average owner's equity CLOE: The ratio of the current liabilities to the owner's equity TLOE: The ratio of the total liabilities to the owner's equity WCTA: The ratio of the working capital to the total assets.

Figure 1 :
Figure 1: The flowchart of the optimization approach.

Figure 2 :
Figure 2: A part of decision trees by extracted factors.

Table 1 :
Some recent researches in financial distress forecasting.
[45].Decision Trees.A decision tree (DT) is a machine learning technique used in classification, clustering, and prediction tasks.A well-known tree-growing algorithm for generating DT is Quinlan's ID3[45].It starts from the root node.The root node is one of the best attributes.The property values are then generated that correspond to each branch.
Each branch generates a new node.For the best attributes according to the selection criteria, ID3 uses an entropy-based definition of the information gain to select the test attribute within the node.The entropy characterizes the purity of a sample set.Suppose  is a set  of data samples.We assume that the class label attribute has  different values, the definition of  different classes is   ( = 1, ..., ), and set   is the number of samples in class   .Equation (8) is the given sample classification based on the expectations of the information:

Table 4 :
The linear discriminant functions for financial distress.
Table 5 illustrates some classification performances measures for the applied methods, where TP is true positive, FP is false positive, TN is true negative, and FN is false negative and

Table 6 :
Calculation of the TOPSIS method.