Angle Modulated Artificial Bee Colony Algorithms for Feature Selection

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.


Introduction
Many data mining and machine learning applications suffer from the curse of dimensionality in which a dataset usually involves a large number of features, often including relevant and irrelevant features [1].When the number of features is large, the cost of acquisition of the data will be increased, the performance of classifier may be reduced, and the generalization of the data will be more difficult [2].To cope with this problem, feature selection is one of the methods that eliminate the redundant, uninformative, and noisy features while preserving the accuracy of feature subset.
Filter methods and wrapper methods are two main strategies in feature selection [1].In filter approaches, an algorithm selects the relevant feature subset based on data characteristics.However, wrapper approaches include a classifier to evaluate candidate feature sets.Although the wrapper approaches involve the computational overhead of evaluating candidate feature subsets, they outperform filter approaches in terms of classification accuracy [3].
When a wrapper method is used, the problem of optimal feature subset selection can be seen as NP-Hard because the number of possible feature subsets in search space is 2 where  is the number of features.Evolutionary computation techniques are well-known tools to tackle this kind of problem [2].One of these techniques is Artificial Bee Colony (ABC) algorithm that mimics the foraging behavior of real bee colonies.In recent years, there are some few studies proposed on feature selection based on ABC algorithms.Palanisamy and Kanmani [4] used ABC for feature selection.However, the paper uses only original ABC and does not give any information about the generation of bit vector used in feature selection.Another binary ABC algorithm for feature selection is proposed in [5] but the search equation of the binary ABC algorithm is based on modifying candidate solution without interacting with the other solutions.Thus, the algorithm turns to a randomized algorithm which randomly generates solutions without interaction in population.Moreover, in these approaches, candidate solutions are presented with a bit vector of size .Therefore, for large scale instances, this may lead to taking more time and decreases classification accuracy.Besides, there are also some applications which use ABC in the feature selection step.Syarifahadilah et al. [6] proposed feature selection method for biomarker identification.Uzer et al. [7] developed ABCbased feature selection algorithm in order to diagnose liver diseases and diabetes.Akila et al. [8] identify a user based on analysis of human typing rhythm by using ABC-based feature selection method.
In this study, ABC algorithms employ angle modulation based bit vector generation for feature selection for the first time.In angle modulation based approach, an ABC algorithm, called Angle Modulated Artificial Bee Colony (AMABC) algorithm, selects candidate feature sets with a bit vector obtained by a bit string generator employing a trigonometric function.The main advantage of this approach is that an AMABC algorithm tries to optimize the trigonometric function that has only four parameters in continuous domain.Thus, high-dimensional binary search space can be presented by only 4-dimensional continuous search space for any dataset.Consequently, any ABC algorithm variant applied to continuous optimization problems in the literature can be used for feature selection problem.To do so, we have adopted angle modulation to six ABC variants to show its significant effect on finding relevant feature subset selection on dataset instances having many features.The comparison shows that ABC algorithms with Angle Modulated feature selection significantly improve the classification accuracy using fewer features.
This paper is organized as follows.Section 2 briefly reviews the original ABC algorithms and six variants considered here.Section 3 elaborates application of angle modulation based ABC algorithms to feature selection.Experimental results are presented in Section 4. Finally, Section 5 concludes the paper.

Artificial Bee Colony Algorithm
2.1.The Original Artificial Bee Colony Algorithm.Artificial Bee Colony (ABC) algorithm, which is inspired from the foraging behaviour of real bee colonies, is proposed for tackling optimization problems.It was at first introduced by Karaboga [9], for bound-constraint continuous optimization problems.In ABC algorithm, each candidate solution is assumed as a food source located at the D-dimensional search space.The nectar amount on a food source is referred to as the fitness value of a candidate solution.
Colony life is organized by division of labour.It comprises three types of bees, employed bees, onlooker bees, and scout bees, which are specialized for different tasks.The employed bees forage outside the hive and communicate with onlooker bees through a series of dances when they return to the hive with news of discovered food source.The onlooker bees obtain remarkable accurate information about the location and the quality of the discovered food sources.The attractiveness of the dance, which is assumed as selection probability of a food source, recruits the onlooker bees to help find new good food sources in the vicinity of the discovered one.A food source is abandoned because of its low quality.Then, an employed bee turns to scout bee which flies around looking for food in desirable spots.Based on this phenomenon, the ABC algorithm is composed of four main steps: (1) Initialization Step, (2) Employed Bees Step, (3) Onlooker Bees Step, and (4) Scout Bees Step.Except for the Initialization Step, the algorithm repeats the other steps until a stopping criterion is satisfied.The detailed description of these steps is as follows.
(a) Initialization.A number of initial solutions are discovered or simply created within the bounds of search space using the following formula [9]: where  min  is the lower bound and  max  is the upper bound for each decision variable , of a solution,   . , is a uniformly distributed random number generated between 0 and 1.Furthermore, other control parameters, such as limit representing the maximum number of visits for each solution, are initialized in this step.

(b) Employed Bees
Step.At this step, each employed bee visits a solution,   , to discover a better candidate solution, V  , with the formula [9] where   is the position of the reference solution,  1 is a randomly selected solution,  is a randomly selected dimension ( ∈ {1, 2, . . ., }), and  , is a random number uniformly distributed in [−1, 1].If the candidate solution, V , , is better than   , it replaces   and becomes the new solution.
Otherwise, a counter which holds total number of trials of   is increased.

(c) Onlooker Bees
Step.The onlooker bees also try to discover new solutions around the visited solutions like the employed bees do.However, in this step, information about the quality of solutions discovered by employed bees is shared with the onlooker bees.Therefore, each solution has no equal eligibility of visit but a probability of selection that is defined as follows [9]: where fitness  is the fitness value of the solution   which is defined as where   is the objective value of solution   .If a solution has a higher quality, then the ratio of visiting the solution by an onlooker bee becomes higher.

(d) Scout Bees
Step.A solution can be visited several times by employed and onlooker bees to find new solutions.After the number of unsuccessful trials equals limit value, the solution is marked as abandoned.Then, an employed bee, which is responsible for the abandoned solution, turns to a scout bee.A new solution is explored randomly by the scout bee with (1) at the Initialization Step.

The Considered Artificial Bee Colony Variants.
In this section, we briefly describe five ABC algorithms which we considered here as feature selection methods on various datasets.Modified ABC (MABC) is proposed by Akay and Karaboga [10].MABC algorithm suggests modification in (2) as follows: where MR is modification ratio and SF is scaling factor.While MR controls the ratio of the amount of dimensions to be changed, SF adjusts the perturbation range.Gbest guided ABC (GABC) [11,12] used information of the best solution found so far ( Gbest ) to enhance the intensification behaviour in the search equation of the Employed Bees and the Onlooker Bees Steps.The modified search equation is as follows: V , =  , +  , ( , −  1, ) +  , ( Gbest, −  1, ) , (6) where  Gbest, is the th dimension of  Gbest and  , is a uniform random number in [0, ]. is a control parameter for adjustment perturbation [11,12].It is set to a positive constant value that is usually set as 1 [11].
GbestDist guided ABC (GDABC) [11] is an improved variant of GABC.The search equation of GABC is modified to select preferably a neighbour solution,  1 , according to a probabilistic selection rule, which is defined as V , =  , +  , ( , −  1, ) +  , ( Gbest, −  1, ) , (7) where   is the probability of neighbour   chosen, loc  is the location of a solution, and dist(loc  , loc  ) is the Euclidean distance between two solution locations, loc  and loc  [11].
Chaotic ABC (CABC) [13] algorithm has three variants.For the first variant of CABC, canonical uniform random number generator is replaced with a chaotic random generator using seven different chaotic maps for the Initialization Step.The second variant of CABC proposes chaotic search for Onlooker Bees Step after the number of trials of a solution reaches limit/2.The details of the chaotic search can be found in [13].The third version, which we considered in this study, is the combination of the first two variants of CABC algorithm.
Enhanced ABC (EABC) [14] algorithm proposes two separate search equations for the Employed Bee Step and the Onlooker Bee Step to improve poor convergence performance.The search equation of the Employed Bee Step is defined as follows: V , =  1, +  , ( Gbest, −  1, ) +  , ( 1, −  2, ) , (8) where  and  are random number in the range [0, ] and  * rand(0, 1), respectively, where  is a nonnegative constant and  is a random number generated by standard deviation  and normal distribution with mean .For the Onlooker Bees Step, in order to enhance exploitation,  Gbest is used in the third term of the search equation instead of  2 as follows:

Angle Modulated Artificial Bee Colony Algorithms
Angle Modulated Artificial Bee Colony (AMABC) algorithms are used for finding an optimal solution of binary optimization problems by reducing the problem to a fourdimensional continuous optimization problem.To do so, the algorithm generates bit strings by employing a trigonometric function derived from angle modulation [15] technique which is used in telecommunication systems.The trigonometric function is composed of sines and cosines functions as follows: where  = 2( − ).() has four coefficients (, , , and ) which control the frequency of the sines and cosines functions or shift the function vertically.The coefficient values let the function generate different signals for a given range.Therefore, a number of bits can be generated from the results of the elements ( values) obtained from evenly separated intervals.Figure 1 shows bit string generation by using the trigonometric function with  = 0.6,  = 0.6,  = −0.2, and  = 0.With a range [0, 5] and an interval of 1, a bit string  = 0, 1, 0, 0, 1, 0 can be generated by sampling of () result at each point as follows: When we use angle modulation to generate bit strings, a binary problem can be presented as the task of finding the optimum coefficients values.Thus, optimum binary vector solution to the original problem can be sampled from the resultant function at the evenly spaced intervals.The advantages of this approach for ABC algorithms are as follows: (i) ABC algorithms are originally presented for boundconstraint continuous optimization.They perform  well and are competitive with the contemporary algorithms for continuous optimization problems.However, superiority of the binary variants of ABC algorithms has not been proved yet in the literature.With this approach, ABC algorithms try to find appropriate values of coefficients in continuous space instead of evolving bit strings in binary space.
(ii) This approach decreases the dimension of the problem.For example, a large scale binary problem instance can be represented by a four-dimensional problem instance in continuous space.
(iii) Several ABC variants proposed for continuous optimization can be applied easily to a binary optimization problem without modification on original implementation of the algorithm.

Feature Selection with AMABC Algorithms.
Angle Modulated ABC algorithms are wrapper methods to tackle feature selection.The basic process of feature selection with AMABC algorithms is presented in Figure 2.An AMABC algorithm uses an induction algorithm itself as a part of the evaluation function and it searches for a good subset of features.We have used Support Vector Machines (SVM) [16] and -nearest neighbours (NN) as the induction algorithms [17], which are implemented in WEKA [18].These induction algorithms are used to induce the final classification model as well.
In an AMABC algorithm, each solution contains fourtuple value (, , , and ) of ( 10).For each solution, , output of the function with given (, , , and ) values is sampled to generate candidate string bit   = ( 1  ,  2  , . . .,    ), where  is the number of features.A resultant bit string is composed of 1 s and 0 s, in which 1 indicates a selected feature and 0 an ignored feature, for example, the bit string in Figure 1.Except for the second and the fifth features, other features are unselected.Then, the feature subset encountered by this way is passed to the induction algorithm and estimated classification accuracy is assigned as the value of the objective function of the selected feature subset.The algorithm tries to generate new candidate feature subset by discovering new solutions at each step.When the termination criteria are satisfied, the feature subset having the best-so-far classification accuracy is selected as the solution of the problem.Then, the selected feature set is tested on test instances by using induction algorithm to obtain the final result.

Experimental Results
In this section, we analyse the feature selection performance of Angle Modulated ABC algorithms.The six ABC algorithms are selected for performance comparison, namely, original ABC (OABC), modified ABC (MABC), enhanced ABC (EABC), Gbest guided ABC (GABC), chaotic ABC (CABC), and GbestDist guided ABC (GDABC).Six real world datasets from the University of California [19] are used for the experiments.These datasets have been used by various machine learning studies, contain several numbers of features and different class size, and are summarized in Table 1.
In the experiments, the number of function evaluations for all algorithms is set to 100 * feature size.We use default parameter values in the ABC algorithms.They are given in Table 2.All experiments were performed on a PC with Intel Core i7 2620M 2.40 GHz CPU and 8 GB of RAM.While experiments are conducted, the number of neighbours for NN is selected as 1, and 10-fold Cross-Validation (CV) is applied to get reliable accuracy results.All algorithms are run 20 times and mean values are calculated to get feature selection accuracy.For the experiments, we have conducted nonparametric tests, the Wilcoxon signed ranks test to detect significant differences between the performance accuracies with and without feature selection.The statistical analyses are performed with a significance level  = 0.05 to reject the null hypothesis, which is that no significant difference exists between the algorithms.
Tables 3 and 4 show the mean value of classification accuracy of each algorithm with the standard deviation and average selected feature size for each dataset for SVM and NN as the induction algorithm, respectively.The results that have the highest accuracy value among others are marked in bold in the tables.
When Table 3 is examined, the general accuracy of AMABC algorithms with SVM is not significantly better than the classification results without feature selection.However, AMABC algorithms are selected with few features to obtain similar average accuracy results when compared to the results without feature selection.On the other hand, on IS, SE, and DC datasets, AMABC algorithms significantly improve the classification accuracy results with few features according to the pairwise Wilcoxon test.MABC algorithm is also giving better results than other considered ABC variants.
According to Table 4, for Soybean and Dermatology datasets, MABC shows again better accuracy performance.Original ABC gets first rank in Sonar dataset.For Ionosphere, GDABC algorithm gets higher accuracy than other algorithms.For Horse Colic, CABC, GABC, GDABC, and MABC acquire maximum accuracy rate.Finally, CABC and EABC achieve the highest average classification accuracy.When Table 4 is analysed, it is seen that higher accuracy is obtained with fewer features through the feature selection.When all the algorithms are compared, it seems that none of the algorithms are dominant with respect to classification accuracy.However, the conducted Wilcoxon tests show a significant improvement of classification accuracy when the feature selection is applied by AMABC algorithms for each dataset ( values are lower than 0.05 for all cases).
When the results are compared in Tables 3 and 4, it is clearly seen that NN should be chosen in the induction algorithm for AMABC algorithms although SVM selects fewer features to obtain reasonable results.

Conclusions
In this paper, we introduced angle modulation technique for feature subset selection.The main advantage of angle modulation technique for feature subset selection is that high-dimensional problems can be represented by lowdimensional continuous optimization problem and any optimization technique working in continuous space can be applied to solve optimal feature subset selection with less effort.
As a case study, six variants of ABC algorithms employed angle modulation for feature selection.In an experimental study, feature selection performances of original ABC algorithms and another five ABC algorithms variants are compared on six UCI datasets.The results show that feature selection with AMABC algorithms improved significantly the classification accuracy with fewer feature subsets.
Further research will be performed in order to improve performance accuracy by applying angle modulation to other evolutionary computation approaches such as Particle Swarm Optimization and Differential Evolution.

Figure 2 :
Figure 2: Summarization of the whole AMABC process for feature selection.

Table 1 :
Datasets used for experiments.

Table 2 :
Parameter setting for ABC algorithms.

Table 3 :
Classification accuracy of each ABC algorithm on the tested datasets by using SVM as an induction algorithm.

Table 4 :
Classification accuracy of each ABC algorithm on the tested datasets by using NN as an induction algorithm.