Research of Financial Early-Warning Model on Evolutionary Support Vector Machines Based on Genetic Algorithms

A support vector machine is a new learning machine; it is based on the statistics learning theory and attracts the attention of all researchers. Recently, the support vector machines (cid:2) SVMs (cid:3) have been applied to the problem of ﬁnancial early-warning prediction (cid:2) Rose, 1999 (cid:3) . The SVMs-based method has been compared with other statistical methods and has shown good results. But the parameters of the kernel function which inﬂuence the result and performance of support vector machines have not been decided. Based on genetic algorithms, this paper proposes a new scientiﬁc method to automatically select the parameters of SVMs for ﬁnancial early-warning model. The results demonstrate that the method is a powerful and ﬂexible way to solve ﬁnancial early-warning problem.


Introduction
The development of the financial early-warning prediction model has long been regarded as an important and widely studied issue in the academic and business community.Statistical methods and data mining techniques have been used for developing more accurate financial early-warning models.The statistical methods include regression, logistic models, and factor analysis.The data mining techniques include decision trees, neural networks NNs , fuzzy logic, genetic algorithm GA , and support vector machines SVMs etc 1 .However, the application of statistics was limited in the real world because of the strict assumptions.
Recently, SVM, which was developed byVapnik Vapnik 1995 , is one of the methods that is receiving increasing attention with remarkable results.In financial applications, time series prediction such as stock price indexing and classification such as credit rating and financial warning are main areas with SVMs 2 .However, as SVMs are applied for pattern classification problems, it is important to select the parameters of SVMs.
This paper applies the proposed evolutionary support vector machine based on genetic algorithms model to the financial early-warning problem using a real data set from the companies which come into market in China.

Genetic Algorithm (GA)
A GA is a flexible optimization technique inspired by evolutionary notions and natural selection.A GA is based on an iterative and parallel procedure that consists of a population of individuals each one representing an attempted solution to the problem which is improved in each iteration by means of crossover and mutation, generating new individuals or attempted solutions which are then tested by 3 .
There are three main questions that have become in relevant topics in GA design research: 1 encoding; 2 operators; 3 control parameters.The GA starts to work by selecting a sample randomly or by means of any other procedure of potential solutions to the problem to be solved-previously the problem has to be formulated in chromosomes notation.In a second step the fitness value of every chromosome potential solution -in accordance with an objective function that classifies the solutions from the best to the worstis computed 4 .The third step applies the reproduction operator to the initial set of potential solutions.The individuals with higher fitness values are more largely reproduced.One of the most common methods which is used in this paper is the "roulette wheel."This method is equivalent to a fitness-proportionate selection method for population large enough.There are two essential actions in the GA procedure: 1 the creation of attempted solutions or ideas to solve the problem through recombination and mutation; 2 the elimination of errors or bad solutions after testing them by selecting the better adapted ones or the closer to the truth.

Support Vector Machines (SVMs)
Since SVMs were introduced from statistical learning theory by Vapnik, a number of studies have been announced concerning its theory and applications 5 .A simple description of the SVMs algorithm is provided as follows.
Given a training set T { x 1 , y 1 , x 2 , y 2 , . . ., x l , y l } ∈ X, Y l with input vectors x i x ∈ R n and target labels y i ∈ {−1, 1}, the support vector machines SVMs classifier, according to Vapnik's theory, finds an optimal separating hyper plane which satisfies the following conditions: with the decision function f x sign ω • x b .To find the optimal hyper plane: ω • x b 0, the norm of the vector needs to be minimized, on the other hand, the margin 1/ ω should be maximized between two classes.
The solution of the primal problem is obtained after constructing the Lagrange.From the conditions of optimality, one obtains a quadratic programming problem with Lagrange multipliers α i 's.A multiplier α i exists for each training data instance.Data instances corresponding to nonzero α i 's are called support vectors 6 .
On the other hand, the above primal problem can be converted into the following dual problem with objective function and constraints: with the decision function Most of classification problems are, however, linearly nonseparable in the real world.In the nonlinear case, we first mapped the data to a high-dimensional space, using a mapping, φ : R d → H. Then instead of the form of dot products, "kernel function" K is issued such that K x i , x j φ x i • φ x j .We will find the optimal hyper plane: ω • φ x b 0 with the decision function f x sign α i • y i φ x φ x i b .In this paper, RBF kernel functions are used as follows: K x, y e − x−y 1/σ 2 .Using the dual problem, the quadratic programming problems can be rewritten as Min : 1 2.4

Evolutionary Support Vector Machines Based on Genetic Algorithms
As SVMs are applied for pattern classification problems, it is important to select the parameters of SVMs 7 .This paper applies genetic algorithms to define the parameters of SVMs.The steps of evolutionary support vector machines based on genetic algorithms are given as follows.
Step 1. Define the string or chromosome and encode parameters of SVMs into chromosomes.In this paper, the radial basis function RBF is used as the kernel function for financial warning prediction.There are two parameters while using RBF kernels: C and δ 2 .In this study, C and δ 2 are encoded as binary strings and optimized by GA.In addition, the length of the GA chromosome used in this paper is 18.The first 9 bits represent C and the remaining 9 bits represent δ 2 .
Step 2. Define population size and generate binary-coded initial population of chromosomes randomly.The initial random population size is 40.
Step 3. Define probability of crossover Pc and probability of mutation Pm and do the operation of GA selection, crossover and mutation .Generate offspring population by performing crossover and mutation on parent pairs.There are different selection methods to perform reproduction in the GA to choose the individuals that will create offspring for the next generation 8 .One of the most common method and the one used in this paper is the "roulette wheel." Step 4. Decode the chromosomes to obtain the corresponding parameters of SVMs.
Step 5. Apply the corresponding parameters to the SVMs model to compute the output o k .Each new chromosome is evaluated by sending it to the SVMs model.
Step 6. Evaluate fitness of the chromosomes using o k fitness function: predictive accuracy and judge whether stopping condition is true, if true end; if false, turn to Step 3. The fitness of an individual of the population is based on the performance of SVMs.
Considering the real problem, we define the predictive accuracy of the testing set as the fitness function.It is represented mathematically as follows: where Y i is one, if the actual output equals the predicted value of the SVMs model, otherwise Y i is zero.

The Selection of Input Variables
There are many financial ratios which can represent the profitability of company, and the differences between industries are obviously, such as, household appliances and pharmaceutical industry.So the horizontal comparability of many financial ratios is not reasonable.This paper focuses on the profitability of company, and then selects six ratios as the input variables: 1 Sell profit rate; 2 Assets earning ratio; 3 Net asset earning ratio; 4 Profit growth rate of main business; 5 Net profit growth rate; 6 Total profit growth rate 9 .

The Selection of Output Variable
We assume that the economy environment is similar, and select ROE Rate of Return on Common Stockholders' Equity as the standard of selection of output variable because ROE is one of the important ratios which are used to judge the profitability 10 .The method is represented as follows.We select those companies whose ROE is greater than 0 in the year n − 1 and the year n, and we distinguish those companies into two kinds by judging the numerical of ROE in year n 1: the first kind, ROE is greater than 0, and the output is y 1; the second kind, ROE is equal or less than 0, and the output is y −1.

Research Data and Experiments
The research data we employ is from the companies which come into market in China, and consists of 50 medium-size firms from 1999 to 2001.The data set is arbitrarily split into two subsets; about 50% of the data is used for a training set and 50% for a testing set.The training data for SVMs is totally used to construct the model.The testing data is used to test the results with the data that is not utilized to develop the model.The training set is shown at Table 1.The results in Table 3 show that the overall prediction performance of the second model on the testing set is consistently good.Moreover, the accuracy and the generalization using evolutionary support vector machines are better than that of the first model.

Conclusions
In this paper, we applied evolutionary support vector machines based on genetic algorithms to financial early-warning problem and showed its attractive prediction power compared to the pure SVMs method.In this paper we utilize genetic algorithms in order to choose optimal values of the upper bound C and the kernel parameter δ 2 that are most important in SVMs model selection.To validate the prediction performance of this evolutionary support vector machines based on genetic algorithms model, we statistically compared its prediction accuracy with the pure SVMs model, respectively.The results of empirical analysis showed that proposed model outperformed the other methods.
In a classification problem, the selection of features is important for many reasons: good generalization performance, running time requirements, and constraints imposed by the problem itself 11 .While this study used six ratios as a feature subset of SVMs model, it should be noted that the appropriate features can be problem-specific; hence it remains an interesting topic for further study to select proper features according to the types of classification problems.
Obviously, after the application of genetic algorithms, there is a significant improvement in the accuracy.That is just what we need in the selection of parameters of SVMs for financial early-warning model.

Figure 1 :
Figure 1: The results of SVMs with various C where δ 2 is fixed at 10.

Table 1 :
The training set.

Table 2 :
Classification accuracies of various parameters in the first model.

Table 3 :
Prediction accuracy of the second model.