A support vector machine is a new learning machine; it is based on the statistics learning theory and attracts the attention of all researchers. Recently, the support vector machines (SVMs) have been applied to the problem of financial early-warning prediction (Rose, 1999). The SVMs-based method has been compared with other statistical methods and has shown good results. But the parameters of the kernel function which influence the result and performance of support vector machines have not been decided. Based on genetic algorithms, this paper proposes a new scientific method to automatically select the parameters of SVMs for financial early-warning model. The results demonstrate that the method is a powerful and flexible way to solve financial early-warning problem.
The development of the financial early-warning prediction model has long been regarded as an important and widely studied issue in the academic and business community. Statistical methods and data mining techniques have been used for developing more accurate financial early-warning models.The statistical methods include regression, logistic models, and factor analysis. The data mining techniques include decision trees, neural networks (NNs), fuzzy logic, genetic algorithm (GA), and support vector machines (SVMs) etc [
Recently, SVM, which was developed byVapnik (Vapnik (1995)),is one of the methods that is receiving increasing attention with remarkable results. In financial applications, time series prediction such as stock price indexing and classification such as credit rating and financial warning are main areas with SVMs [
This paper applies the proposed evolutionary support vector machine based on genetic algorithms model to the financial early-warning problem using a real data set from the companies which come into market in China.
A GA is a flexible optimization technique inspired by evolutionary notions and natural selection. A GA is based on an iterative and parallel procedure that consists of a population of individuals (each one representing an attempted solution to the problem) which is improved in each iteration by means of crossover and mutation, generating new individuals or attempted solutions which are then tested by [
There are three main questions that have become in relevant topics in GA design research: (1) encoding; (2) operators; (3) control parameters. The GA starts to work by selecting a sample (randomly or by means of any other procedure) of potential solutions to the problem to be solved—previously the problem has to be formulated in chromosomes notation. In a second step the fitness value of every chromosome (potential solution)—in accordance with an objective function that classifies the solutions from the best to the worst-is computed [
Since SVMs were introduced from statistical learning theory by Vapnik, a number of studies have been announced concerning its theory and applications [
Given a training set
To find the optimal hyper plane:
The solution of the primal problem is obtained after constructing the Lagrange. From the conditions of optimality, one obtains a quadratic programming problem with Lagrange multipliers
On the other hand, the above primal problem can be converted into the following dual problem with objective function and constraints:
Most of classification problems are, however, linearly nonseparable in the real world. In the nonlinear case, we first mapped the data to a high-dimensional space, using a mapping,
In this paper, RBF kernel functions are used as follows:
As SVMs are applied for pattern classification problems, it is important to select the parameters of SVMs [
Define the string (or chromosome) and encode parameters of SVMs into chromosomes. In this paper, the radial basis function (RBF) is used as the kernel function for financial warning prediction. There are two parameters while using RBF kernels: C and
Define population size and generate binary-coded initial population of chromosomes randomly. The initial random population size is 40.
Define probability of crossover (Pc) and probability of mutation (Pm) and do the operation of GA (selection, crossover and mutation).
Generate offspring population by performing crossover and mutation on parent pairs. There are different selection methods to perform reproduction in the GA to choose the individuals that will create offspring for the next generation [
Decode the chromosomes to obtain the corresponding parameters of SVMs.
Apply the corresponding parameters to the SVMs model to compute the output
Evaluate fitness of the chromosomes using
Considering the real problem, we define the predictive accuracy of the testing set as the fitness function. It is represented mathematically as follows:
There are many financial ratios which can represent the profitability of company, and the differences between industries are obviously, such as, household appliances and pharmaceutical industry. So the horizontal comparability of many financial ratios is not reasonable. This paper focuses on the profitability of company, and then selects six ratios as the input variables: (1) Sell profit rate; (2) Assets earning ratio; (3) Net asset earning ratio; (4) Profit growth rate of main business; (5) Net profit growth rate; (6) Total profit growth rate [
We assume that the economy environment is similar, and select ROE (Rate of Return on Common Stockholders' Equity) as the standard of selection of output variable because ROE is one of the important ratios which are used to judge the profitability [
The research data we employ is from the companies which come into market in China, and consists of 50 medium-size firms from 1999 to 2001. The data set is arbitrarily split into two subsets; about
The training set.
Variables | ||||||||||
Companies | Sell profit rate | Assets earning ratio | Net asset earning ratio | Profit growth rate of main business | Net profit growth rate | Total profit growth rate | Output | |||
Chenming Paper | ||||||||||
Foshan electrical and lighting | ||||||||||
Huali Group | ||||||||||
GreeElectric appliances | ||||||||||
Zhuhai Zhongfu | ||||||||||
Zijiang enterprise | ||||||||||
Qingdao Haier | ||||||||||
Fujian Nanzhi | ||||||||||
ST Swan | ||||||||||
ST Macro | ||||||||||
ST Tianyi | ||||||||||
ST Jizhi | ||||||||||
ST Hushan | ||||||||||
ST Jiangzhi | ||||||||||
ST Ziyi | ||||||||||
Xiaxin electronic | ||||||||||
Chunlan Gufen | ||||||||||
Shangfeng Industrial | ||||||||||
Aucma | ||||||||||
Xinjiang Tianhong | ||||||||||
Jincheng Paper | ||||||||||
Wuzhong Yibiao | ||||||||||
Qingshan Paper | ||||||||||
Hakongtiao |
Additionally, to evaluate the effectiveness of the proposed model, we compare two different models.
The first model, with arbitrarily selected values of parameters, varies the parameters of SVMs to select optimal values for the best prediction performance.
We design the second model as a new scientific method to automatically select the parameters optimized by GA.
Based on the results proposed by Tay and Cao (2001), we set an appropriate range of parameters
Classification accuracies of various parameters in the first model.
1 | 10 | 30 | 50 | 80 | ||||||
C | train | Test | train | test | train | test | train | test | train | test |
1 | 86.87 | 87.50 | 80.00 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 |
10 | 93.33 | 75.50 | 86.67 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 |
30 | 93.33 | 75.50 | 86.67 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 75.00 |
50 | 100 | 79.17 | 86.67 | 79.17 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 75.00 |
90 | 100 | 79.17 | 93.33 | 87.50 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 75.00 |
100 | 100 | 79.17 | 93.33 | 79.17 | 66.67 | 70.83 | 66.67 | 79.17 | 66.67 | 70.83 |
150 | 100 | 79.17 | 86.67 | 79.17 | 66.67 | 70.83 | 66.67 | 79.17 | 66.67 | 70.83 |
200 | 100 | 79.17 | 86.67 | 75.00 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 |
250 | 100 | 79.17 | 86.67 | 79.17 | 66.67 | 70.83 | 66.67 | 70.83 | 66.67 | 70.83 |
The results of SVMs with various C where
Table
Prediction accuracy of the second model.
Training | Testing |
---|---|
93.33 | 87.50 |
The results in Table
In this paper, we applied evolutionary support vector machines based on genetic algorithms to financial early-warning problem and showed its attractive prediction power compared to the pure SVMs method. In this paper we utilize genetic algorithms in order to choose optimal values of the upper bound C and the kernel parameter
In a classification problem, the selection of features is important for many reasons: good generalization performance, running time requirements, and constraints imposed by the problem itself [
Obviously, after the application of genetic algorithms, there is a significant improvement in the accuracy. That is just what we need in the selection of parameters of SVMs for financial early-warning model.