The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.


Introduction
The support vector machines (SVMs) have been widely used in many applications, including the decision-making application [1], forecasting malaria transmission [2], liver fibrosis diagnosis [3], and pattern classification [4]. The SVM achieves the tradeoff between the minimum training set error and the maximization of the margin based on the Vapnik-Chervonenkis theory and structural risk minimization principle. Thus it has the best generalization ability [5][6][7]. Essentially, the SVM is a convex quadratic programming method with which it is possible to find the global rather than the local optima. However, the setting of the parameters for the SVM classifier plays a significant role, which includes the penalty parameter and the smoothness parameter of the radial-based function. The penalty parameter maintains the balance between the fitting error minimization and model complexity. The smoothness parameter of the kernel function is used to determine the nonlinear mapping from the input space to the high-dimensional feature space.
In general, the redundant features of the classifier usually significantly slow down the learning process as well as make the designed SVM classifier overfitting the training data. In general, an effective feature selection can tackle the cure-ofdimension problem as well as decrease the computation time. Practically, the grid search [8,9] checks all possibilities of the parameters, C and , under exponentially growing sequences. More precisely, the search for the two parameters is limited to the intervals of the 1/2 15 ≤ ≤ 2 15 and 1/2 5 ≤ ≤ 2 5 . In practical application, the grid search is usually vulnerable to the local optimum. In other words, if the initial parameters and are far from that global optimum, the resulting SVM classifier will not work effectively.
Recently, many bioinspired optimization algorithms such as genetic algorithm [10] and particle swarm optimization [11] have been applied to train the parameters of the SVM classifier together with a powerful features selection for classification. However, the Lagrangian multipliers are still estimated by using a grid search similar to the LIBSVM [12]. However, those of the designed support vector machines 2 Computational Intelligence and Neuroscience present a challenge to be combined into a one-againstall support vector machine classifier, because each SVM always holds a different set of features in the multiclass classifications. Another algorithm, the artificial bee colony algorithm [13], was applied to only train the parameter of Lagrangian multiplier without the parameters of and .
In this paper, the firefly algorithm [14] is used to search for the optimal parameters via the simulation of the social behavior of fireflies and their phenomenon of bioluminescent communication. The proposed algorithm is called the firefly-SVM, in which all parameters including the , , and the Lagrangian multiplier are concurrently trained. In the experiments, the proposed firefly-SVM was evaluated by the classifications for binary and multiclass problems. This paper is organized as follows. The proposed firefly-SVM algorithm is introduced in Section 2. In Section 3, we present our experiments and demonstrate the results. The conclusion and final remarks are given in Section 4.

Support Vector Machines.
The SVM has been one of the more widely used data learning tools in recent years. It is usually used to address a binary pattern classification problem. The binary SVM constructs a set of hyperplane in an infinite dimensional space, which can then be divided into two kinds of representations, such as the linear and nonlinear SVM.
The linear SVM finds the optimal separating margin by solving the following optimization task: where is a penalty value, are positive slack variables, w is a normal vector, and b is a scalar quantity. The minimum problem can be reduced by using the Lagrangian multiplier , which can obtain its optimum according to the Karush-Kuhn-Tucker condition. If > 0, then the corresponding data is called the support vector (SV), and, therefore, the linear discriminate function can be expressed with the optimal hyperplane parameters w and b in the following equation: Equation (1) can be transformed into (4) by its unconstrained dual form: Equation (4) can now be solved using the quadratic programming techniques and the stationary Karush-Kuhn-Tucker condition. The resulting solution W can be expressed as a linear combination of the training vectors and the b can be expressed as the average of all support vectors shown in where SV is the number of support vectors. The linear SVM can be expanded into the nonlinear cases by replacing with a mapping into the feature space ( ), in other words, the can be represented as the form of ( ) ( ) in the feature space. Thus the nonlinear discriminate function can be expressed as follows: where ( , ) = ⟨ ( ), ( )⟩ and ( , ) is the kernel function. The widely used kernel function is the radial basis function (RBF), because of its accurate and reliable performance [15], which is defined as The is the predetermined smoothness parameter that controls the width of the RBF kernel; thus, (4) is rewritten as To evaluate the proposed firefly-SVM, the classification results of 10 two-class data sets of the UCI machine repository are compared to those for the LIBSVM algorithm [12] and PSO-SVM [11] in this paper.
The one-against-all (OAA) and one-against-one (OAO) strategies are widely used for the multiclass classifiers [7].
Computational Intelligence and Neuroscience 3 The OAA approach always compares each class with all the others put together concurrently, and, thus, it always needs the construct support vector machines for a -class classification problem [16]. The OAO approach [17] constructs many binary classifiers for all possible pairs of classes; therefore, it needs the construct ( − 1)/2 support vector machines for a -classification problem. A max-wins voting scheme determines its instance classification. In our past studies of the classification of the ultrasonic supraspinatus images [18], the OAA fuzzy SVM had the best capability in the classification of the ultrasonic images into different disease groups. Therefore, in this paper, the binary firefly-SVM is implemented as the basic SVM to construct the OAA fuzzy SVM for a further comparison to the original fuzzy SVMs in the classification of supraspinatus images.

Firefly Algorithm.
The firefly algorithm is a new bioinspired computing approach for optimization in which the search mechanism is simulated by the social behavior of fireflies and the phenomenon of bioluminescent communication. There are two important issues regarding the firefly algorithm, namely, the variation of light intensity and the formulation of attractiveness. Yang [14] simplifies the attractiveness of a firefly by determining its brightness which in turn is associated with the encoded objective function. The attractiveness is proportional to the brightness. Every member of the firefly swarm is characterized by its brightness which can be directly measured as a corresponding fitness function.
Furthermore, there are three idealized rules: (1) regardless of their sex, any one firefly will be attracted to other fireflies; (2) attractiveness is proportional to brightness, so of any two flashing fireflies, the less bright one will move toward the brighter one; (3) brightness of a firefly is affected or determined by the landscape of the given fitness function ( ); in other words, the brightness ( ) of a firefly can be defined as its ( ).
More precisely, the attractiveness between fireflies and is defined as any monotonically decreasing function as shown in (10), for their distance : where 0 is the sum of initial assigned brightness of these two fireflies. is the light absorption coefficient and is the index of the dimension of the candidate solutions (i.e., fireflies). The movement of a firefly , which is attracted to another more attractive firefly , is determined by the following equations: If there is no firefly brighter than a particular firefly, max , it will move randomly according to the following equation: where rand1 and rand2 are random numbers obtained from the uniform distribution (0, 1).

Training the Nonlinear SVM Using the Firefly Algorithm.
The training of the nonlinear SVM is essentially a constrained optimization problem. The constrained optimization usually first decides the objective function (i.e., fitness function in the firefly algorithm) and the range of each parameter. The designed fitness function of the firefly-SVM is expressed in the following equation: The constraints of the solution string are From these discussions, it is clear that the firefly algorithm starts with a set of firefly population (candidate solutions) in the feature space. The string representation of each firefly (solution) is an important factor for the subsequent steps of the algorithm; the solution string is simulated as the multidimensional vector comprising optimization parameters, including the penalty parameter, smoothness parameter, and Lagrangian multipliers shown in Figure 1. It is evident that each firefly during the course of the search modifies its path according to its brightness. Furthermore, the best firefly performs the random walk to exchange it with a brighter solution. The firefly populations of m initial solutions are generated with + 2 dimensions denoted by D: where 0 ≤ ≤ , ∑ =1 = 0, −15 ≤ log 2 ≤ 15, and −5 ≤ log 2 ≤ 5. The is the multiplier of th training data in the th candidate solution; log 2 and log 2 are the penalty parameter and smooth parameter of the SVM, respectively, constructed by the solution string .
The details of the proposed algorithm are thus described as follows.
Step 1 (set up the parameters of proposed system). This step assigns the parameters including the number of fireflies ( ), the maximum iteration number ( ), and the light absorption coefficient ( ). The solution string of fireflies is randomly generated; it must satisfy the constraint of (13). Let be iteration number and initiated to 0. The initial brightness 0 of each firefly is assigned by its resulting fitness.
Step 2 (update all candidate solutions). The mechanism for updating a candidate solution is stochastic; that is, the solution randomly selects the corresponding solution from this population D. If the fitness ( ) is less than fitness ( ), the firefly will move toward the firefly ; as a result, the corresponding string is modified according to the following equation: where where is the sum of the fitness values of the two solutions of these two fireflies, and is the light absorption coefficient, = ( 1 , 2 , . . . , +2 ) a random walk ranged with −1 < < 1.
If the new solution does not satisfy the solution string constraints, then the new solution will be discarded, or else the original one will be replaced. All candidate solutions will sequentially be updated according to the previous procedure, and then the best one will be calculated for the later process. The best solution will be recorded by BestCu .
Step 3 (update the best solution). If the th firefly is the BestCu , then this firefly will demonstrate a random walk to get the new candidate solution but it still needs to satisfy the solution constraints. If the new one has better fitness than the original one, then the BestCu will be replaced by the new candidate solution; otherwise the candidate solution will be discarded: where = ( 1 , 2 , . . . , +2 ) a random walk ranged with −1 < < 1.
Step 4 (iterative execution and resulted vector output). Add by 1. If reaches the maximum iteration number, then the algorithm terminates and outputs the BestCu as the resulting motion vector; otherwise go to Step 2.

Binary Classification of the UIC Data Set.
The designed platform used to develop the firefly-SVM training algorithm was a personal computer with the Intel Pentium IV 3.0 GHz CPU, 2 GB RAM, using the Window XP operating system and the Visual C++ 6.0 together with an OPENCV library environment. The used parameters are the size of initial fireflies assigned to be 20 and the maximum iteration number to be 200. In order to obtain classification results without partiality, the ten binary class data sets extracted from the UIC database [19]   In all experiments on binary classifications, the fivefold cross validation method is used. In practice, all data samples are divided into fivesubsets with an equal number of samples from each different class. One of thefive subsets is selected as the test set, and the other 4 subsets are put together to form a training set. More precisely, every sample appears in a test set exactly once and appears in the training set four times. In order to verify the effectiveness of the proposed firefly-SVM, the correct classification ratio (CCR) is used for all the data sets. The CCR is defined as follows: In this equation, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives. MCC returns a value between −1 and +1, and it is in essence a correlation coefficient between the observed and the predicted binary classification. Therefore, a Matthews correlation coefficient of +1 indicates a perfect prediction, 0 indicates no better than random prediction, and −1 presents total disagreement between the prediction and the observation. Table 2 shows the CCRs of 10 data sets of UCI data repository using firefly-SVM, the PSO-SVM without a feature selection, and the original LIBSVM with grid search method. The PSO-SVM trains the parameters of and and then classifies all samples of each data set into two different classes, using LIBSVM, while the LIBSVM uses the grid search to find the appropriate parameters of and . As shown in Table 2, the CCR of firefly-SVM is superior to both PSO-SVM and LIBSVM for all data sets. In particular, for data sets with Australian credit approval and SPECTF heart, the proposed firefly-SVM performance exceeds a 4% correct classification rate when compared to the other two methods. Table 3 shows the Matthews coefficients of 10 data sets using the three different classifiers. The results of Table 3 reveal that the firefly-SVM is almost greater than the other two methods, while the low MCCs for the data sets of SPECTF heart, the Pima-Indians-diabetes, and the liver disorder reveal that firefly-SVM still has possible room for improvement with further study. 6 Computational Intelligence and Neuroscience  In order to ensure the convergences of firefly-SVM and PSO-SVM algorithms, the plots of CCRs versus the iteration number of executions of SPECTF heart and the Sonar data sets are shown in Figure 1. The resulting CCRs using the corresponding parameters in intervals of 10 iterations running the firefly-SVM and PSO-SVM are recorded. From the results of Figure 1, we find that the program converges when it runs less than 200 iterations. The total training time for firefly-SVM is about 6.68 seconds, and the training time for PSO-SVM is 5.36 seconds.
Furthermore, we attempt to discuss whether or not the classifier, together with the feature selection mechanism, can improve the classification by using the SVM classifier. In general, the irrelevant or redundant features usually lead to overfitting and even to poor accuracy of the classification. In the firefly-SVM algorithm we use the mutual information as the search criterion to find the powerful features in each data set. In practical applications, the features of all samples of each data set are evaluated using a mutual information criterion, and then the features with higher mutual information are selected as input features for the firefly-SVM algorithm. The detailed algorithms of the mutual information feature selection can be referred to as [18,21]. However, the PSO-SVM of Table 4 integrates the feature selection mechanism, which is used for searching the parameters of and , into the objective function in the training stage using the PSO searching algorithm [22]. Table 4 shows the CCR results of the firefly-SVM and the PSO-SVM classifiers with the feature selection mechanism. Table 4 shows that the CCRs of 10 data sets using the PSO-SVM and firefly-SVM deliver better results than the other results in Table 2.

Multiclass Classification of the Ultrasonic Supraspinatus Images.
In general, the injury of supraspinatus always causes shoulder pain, especially rotator cuff diseases. The ultrasonography is the most frequently used image modality to assess the damage from supraspinatus. According to Neer's diagnosis standards [23], the impingement syndrome diseases of supraspinatus can be divided into three disease groups, namely, tendon inflammation, calcific tendonitis, and supraspinatus tear. Similar to the experiments in a past study [21], the ultrasonic image database used was recorded from 2004 to 2008, and the ages of the patient ranged from 15 to 65 years. The 120 images in this database were captured using an HDI Ultramark 5000 ultrasound system (ATL Ultrasound, CA, USA) with the ATL linear array probe from the National Cheng King University Hospital. These images are divided into four disease groups, that is, normal, tendon inflammation, calcific tendonitis, and tear. In our past referenced studies [18], five multiclass support vector machine algorithms were employed to classify these images. These five   The experimental results of that previous work showed that the CCR of the OAA-FSVM is the best method for the classification of supraspinatus images. The original OAA-FSVM is composed of many binary support vector machines; each support vector machine is trained by LIBSVM, together with a grid search. Similar procedures for feature extraction from supraspinatus, feature selection, and feature normalization were described in [18]. Five powerful texture features, sum average, sum variance, mean convergence, contrast, and difference variance, were used as the features for classification. Furthermore, many measure indices, such as sensitivity, specificity, and -score, are discussed in [24]. In the current experiments, we replaced this LIBSVM based support vector machine with firefly-SVM for comparison. Table 5 shows the performance indices of OAA-FSVM with different trained support vector machines based on the 5-fold cross validation. Referring to Table 5 we find that the false negative using the firefly-SVM is only 2.5%, which is better than the one for LIBSVM. This means that the OAA-FSVM using the firefly-SVM as constructed basis has a lower risk for the patient in diagnosis. At the same time, the 92.5% accuracy of firefly-SVM based OAA-FSVM is superior to the original OAA-FSVM trained by LIBSVM. Table 6 shows the performance indices for the use of firefly-SVM based OAA-FSVM trained by LIBSVM and the original OAA-FSVM trained by LIBSVM. This table shows that the firefly-SVM based OAA-FSVM performs better.

Conclusion
In this paper, we explore the uses of the firefly-SVM for binary and multiclass classification. Based on the results of the current experiments on the binary classification of 10 data sets of the UCI database and the multiclass classification of ultrasonic supraspinatus images, the following conclusions can be emphasized.
(1) The firefly-SVM attempts to simultaneously train three kinds of parameters: penalty parameter, smoothness parameter, and Lagrangian multiplier. Experimental results demonstrate that firefly-SVM is capable of dealing with the applications of pattern classification.
(2) The firefly-SVM training algorithm has better performance than the other two methods in the experiments of binary classification, so it is promising to apply firefly-SVM to other practical problems.
(3) The firefly-SVM may converge with the most optimal solution within a limited time when it associates with the feature selection because of its complexity. Additionally, the firefly-SVM without a feature selection easily and extensively integrates with the multiclass OAA support vector machine, such as the OAA-FSVM method. The experimental results of the classification of ultrasonic supraspinatus images reveal that the use of the firefly-SVM as the basic machine to construct the multiclass support vector machine can effectively improve the classification performances in the multiclass classification of ultrasonic supraspinatus images.