Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.
1. Introduction
Many data mining and machine learning applications suffer from the curse of dimensionality in which a dataset usually involves a large number of features, often including relevant and irrelevant features [1]. When the number of features is large, the cost of acquisition of the data will be increased, the performance of classifier may be reduced, and the generalization of the data will be more difficult [2]. To cope with this problem, feature selection is one of the methods that eliminate the redundant, uninformative, and noisy features while preserving the accuracy of feature subset.
Filter methods and wrapper methods are two main strategies in feature selection [1]. In filter approaches, an algorithm selects the relevant feature subset based on data characteristics. However, wrapper approaches include a classifier to evaluate candidate feature sets. Although the wrapper approaches involve the computational overhead of evaluating candidate feature subsets, they outperform filter approaches in terms of classification accuracy [3].
When a wrapper method is used, the problem of optimal feature subset selection can be seen as NP-Hard because the number of possible feature subsets in search space is 2n where n is the number of features. Evolutionary computation techniques are well-known tools to tackle this kind of problem [2]. One of these techniques is Artificial Bee Colony (ABC) algorithm that mimics the foraging behavior of real bee colonies. In recent years, there are some few studies proposed on feature selection based on ABC algorithms. Palanisamy and Kanmani [4] used ABC for feature selection. However, the paper uses only original ABC and does not give any information about the generation of bit vector used in feature selection. Another binary ABC algorithm for feature selection is proposed in [5] but the search equation of the binary ABC algorithm is based on modifying candidate solution without interacting with the other solutions. Thus, the algorithm turns to a randomized algorithm which randomly generates solutions without interaction in population. Moreover, in these approaches, candidate solutions are presented with a bit vector of size n. Therefore, for large scale instances, this may lead to taking more time and decreases classification accuracy. Besides, there are also some applications which use ABC in the feature selection step. Syarifahadilah et al. [6] proposed feature selection method for biomarker identification. Uzer et al. [7] developed ABC-based feature selection algorithm in order to diagnose liver diseases and diabetes. Akila et al. [8] identify a user based on analysis of human typing rhythm by using ABC-based feature selection method.
In this study, ABC algorithms employ angle modulation based bit vector generation for feature selection for the first time. In angle modulation based approach, an ABC algorithm, called Angle Modulated Artificial Bee Colony (AMABC) algorithm, selects candidate feature sets with a bit vector obtained by a bit string generator employing a trigonometric function. The main advantage of this approach is that an AMABC algorithm tries to optimize the trigonometric function that has only four parameters in continuous domain. Thus, high-dimensional binary search space can be presented by only 4-dimensional continuous search space for any dataset. Consequently, any ABC algorithm variant applied to continuous optimization problems in the literature can be used for feature selection problem. To do so, we have adopted angle modulation to six ABC variants to show its significant effect on finding relevant feature subset selection on dataset instances having many features. The comparison shows that ABC algorithms with Angle Modulated feature selection significantly improve the classification accuracy using fewer features.
This paper is organized as follows. Section 2 briefly reviews the original ABC algorithms and six variants considered here. Section 3 elaborates application of angle modulation based ABC algorithms to feature selection. Experimental results are presented in Section 4. Finally, Section 5 concludes the paper.
2. Artificial Bee Colony Algorithm2.1. The Original Artificial Bee Colony Algorithm
Artificial Bee Colony (ABC) algorithm, which is inspired from the foraging behaviour of real bee colonies, is proposed for tackling optimization problems. It was at first introduced by Karaboga [9], for bound-constraint continuous optimization problems. In ABC algorithm, each candidate solution is assumed as a food source located at the D-dimensional search space. The nectar amount on a food source is referred to as the fitness value of a candidate solution.
Colony life is organized by division of labour. It comprises three types of bees, employed bees, onlooker bees, and scout bees, which are specialized for different tasks. The employed bees forage outside the hive and communicate with onlooker bees through a series of dances when they return to the hive with news of discovered food source. The onlooker bees obtain remarkable accurate information about the location and the quality of the discovered food sources. The attractiveness of the dance, which is assumed as selection probability of a food source, recruits the onlooker bees to help find new good food sources in the vicinity of the discovered one. A food source is abandoned because of its low quality. Then, an employed bee turns to scout bee which flies around looking for food in desirable spots. Based on this phenomenon, the ABC algorithm is composed of four main steps: (1) Initialization Step, (2) Employed Bees Step, (3) Onlooker Bees Step, and (4) Scout Bees Step. Except for the Initialization Step, the algorithm repeats the other steps until a stopping criterion is satisfied. The detailed description of these steps is as follows.
(a) Initialization. A number of initial solutions are discovered or simply created within the bounds of search space using the following formula [9]: (1)xi,j=xjmin+φi,jxjmax-xjmin,where xjmin is the lower bound and xjmax is the upper bound for each decision variable j, of a solution, xi. φi,j is a uniformly distributed random number generated between 0 and 1. Furthermore, other control parameters, such as limit representing the maximum number of visits for each solution, are initialized in this step.
(b) Employed Bees Step. At this step, each employed bee visits a solution, xi, to discover a better candidate solution, vi, with the formula [9] (2)vi,j=xi,j+ϕi,jxi,j-xr1,j,where xi is the position of the reference solution, xr1 is a randomly selected solution, j is a randomly selected dimension (j∈{1,2,…,D}), and ϕi,j is a random number uniformly distributed in [-1,1]. If the candidate solution, vi,j, is better than xi, it replaces xi and becomes the new solution. Otherwise, a counter which holds total number of trials of xi is increased.
(c) Onlooker Bees Step. The onlooker bees also try to discover new solutions around the visited solutions like the employed bees do. However, in this step, information about the quality of solutions discovered by employed bees is shared with the onlooker bees. Therefore, each solution has no equal eligibility of visit but a probability of selection that is defined as follows [9]:(3)pi=fitnessi∑n=1SNfitnessn,where fitnessi is the fitness value of the solution xi which is defined as (4)fitnessi=11+fi,fi⩾0,1+absfi,fi<0,where fi is the objective value of solution xi. If a solution has a higher quality, then the ratio of visiting the solution by an onlooker bee becomes higher.
(d) Scout Bees Step. A solution can be visited several times by employed and onlooker bees to find new solutions. After the number of unsuccessful trials equals limit value, the solution is marked as abandoned. Then, an employed bee, which is responsible for the abandoned solution, turns to a scout bee. A new solution is explored randomly by the scout bee with (1) at the Initialization Step.
2.2. The Considered Artificial Bee Colony Variants
In this section, we briefly describe five ABC algorithms which we considered here as feature selection methods on various datasets.
Modified ABC (MABC) is proposed by Akay and Karaboga [10]. MABC algorithm suggests modification in (2) as follows:(5)xi,j=vij=xij+U-SF,SFxij-xr1,j,Rij<MRxij,otherwise,where MR is modification ratio and SF is scaling factor. While MR controls the ratio of the amount of dimensions to be changed, SF adjusts the perturbation range.
Gbest guided ABC (GABC) [11, 12] used information of the best solution found so far (xGbest) to enhance the intensification behaviour in the search equation of the Employed Bees and the Onlooker Bees Steps. The modified search equation is as follows: (6)vi,j=xi,j+ϕi,jxi,j-xr1,j+ψi,jxGbest,j-xr1,j,where xGbest,j is the jth dimension of xGbest and ψi,j is a uniform random number in 0,C. C is a control parameter for adjustment perturbation [11, 12]. It is set to a positive constant value that is usually set as 1 [11].
GbestDist guided ABC (GDABC) [11] is an improved variant of GABC. The search equation of GABC is modified to select preferably a neighbour solution, xr1, according to a probabilistic selection rule, which is defined as(7)vi,j=xi,j+ϕi,jxi,j-xr1,j+ψi,jxGbest,j-xr1,j,where pk is the probability of neighbour xk chosen, locx is the location of a solution, and dist(locx,locy) is the Euclidean distance between two solution locations, locx and locy [11].
Chaotic ABC (CABC) [13] algorithm has three variants. For the first variant of CABC, canonical uniform random number generator is replaced with a chaotic random generator using seven different chaotic maps for the Initialization Step. The second variant of CABC proposes chaotic search for Onlooker Bees Step after the number of trials of a solution reaches limit/2. The details of the chaotic search can be found in [13]. The third version, which we considered in this study, is the combination of the first two variants of CABC algorithm.
Enhanced ABC (EABC) [14] algorithm proposes two separate search equations for the Employed Bee Step and the Onlooker Bee Step to improve poor convergence performance. The search equation of the Employed Bee Step is defined as follows: (8)vi,j=xr1,j+αi,jxGbest,j-xr1,j+βi,jxr1,j-xr2,j,where α and β are random number in the range 0,A and B∗rand(0,1), respectively, where A is a nonnegative constant and B is a random number generated by standard deviation σ and normal distribution with mean μ. For the Onlooker Bees Step, in order to enhance exploitation, xGbest is used in the third term of the search equation instead of xr2 as follows:(9)vi,j=xr1,j+αi,jxGbest,j-xr1,j+βi,jxr1,j-xGbest,j.
Angle Modulated Artificial Bee Colony (AMABC) algorithms are used for finding an optimal solution of binary optimization problems by reducing the problem to a four-dimensional continuous optimization problem. To do so, the algorithm generates bit strings by employing a trigonometric function derived from angle modulation [15] technique which is used in telecommunication systems. The trigonometric function is composed of sines and cosines functions as follows:(10)gx=sinA×b×cosA×c+d,where A=2π(x-a). g(x) has four coefficients (a, b, c, and d) which control the frequency of the sines and cosines functions or shift the function vertically. The coefficient values let the function generate different signals for a given range. Therefore, a number of bits can be generated from the results of the elements (x values) obtained from evenly separated intervals. Figure 1 shows bit string generation by using the trigonometric function with a=0.6, b=0.6, c=-0.2, and d=0. With a range [0,5] and an interval of 1, a bit string s=0,1,0,0,1,0 can be generated by sampling of g(x) result at each point as follows:(11)sx=1,gx>0,0,otherwise.When we use angle modulation to generate bit strings, a binary problem can be presented as the task of finding the optimum coefficients values. Thus, optimum binary vector solution to the original problem can be sampled from the resultant function at the evenly spaced intervals. The advantages of this approach for ABC algorithms are as follows:
ABC algorithms are originally presented for bound-constraint continuous optimization. They perform well and are competitive with the contemporary algorithms for continuous optimization problems. However, superiority of the binary variants of ABC algorithms has not been proved yet in the literature. With this approach, ABC algorithms try to find appropriate values of coefficients in continuous space instead of evolving bit strings in binary space.
This approach decreases the dimension of the problem. For example, a large scale binary problem instance can be represented by a four-dimensional problem instance in continuous space.
Several ABC variants proposed for continuous optimization can be applied easily to a binary optimization problem without modification on original implementation of the algorithm.
Generating a bit vector by sampling the function with a=0.6, b=0.6, c=-0.2, and d=0.
3.1. Feature Selection with AMABC Algorithms
Angle Modulated ABC algorithms are wrapper methods to tackle feature selection. The basic process of feature selection with AMABC algorithms is presented in Figure 2. An AMABC algorithm uses an induction algorithm itself as a part of the evaluation function and it searches for a good subset of features. We have used Support Vector Machines (SVM) [16] and K-nearest neighbours (KNN) as the induction algorithms [17], which are implemented in WEKA [18]. These induction algorithms are used to induce the final classification model as well.
Summarization of the whole AMABC process for feature selection.
In an AMABC algorithm, each solution contains four-tuple value (a, b, c, and d) of (10). For each solution, k, output of the function with given (a, b, c, and d) values is sampled to generate candidate string bit sk=(sk1,sk2,…,skD), where D is the number of features. A resultant bit string is composed of 1 s and 0 s, in which 1 indicates a selected feature and 0 an ignored feature, for example, the bit string in Figure 1. Except for the second and the fifth features, other features are unselected. Then, the feature subset encountered by this way is passed to the induction algorithm and estimated classification accuracy is assigned as the value of the objective function of the selected feature subset. The algorithm tries to generate new candidate feature subset by discovering new solutions at each step. When the termination criteria are satisfied, the feature subset having the best-so-far classification accuracy is selected as the solution of the problem. Then, the selected feature set is tested on test instances by using induction algorithm to obtain the final result.
4. Experimental Results
In this section, we analyse the feature selection performance of Angle Modulated ABC algorithms. The six ABC algorithms are selected for performance comparison, namely, original ABC (OABC), modified ABC (MABC), enhanced ABC (EABC), Gbest guided ABC (GABC), chaotic ABC (CABC), and GbestDist guided ABC (GDABC). Six real world datasets from the University of California [19] are used for the experiments. These datasets have been used by various machine learning studies, contain several numbers of features and different class size, and are summarized in Table 1.
Datasets used for experiments.
ID
Dataset name
# Features
# Instances
# Classes
HC
Horse Colic
22
368
2
AN
Anneal
38
898
6
IS
Ionosphere
34
351
2
SR
Sonar
60
208
2
SN
Soybean
35
683
19
DE
Dermatology
34
366
6
In the experiments, the number of function evaluations for all algorithms is set to 100∗featuresize. We use default parameter values in the ABC algorithms. They are given in Table 2. All experiments were performed on a PC with Intel Core i7 2620M 2.40 GHz CPU and 8 GB of RAM. While experiments are conducted, the number of neighbours for KNN is selected as 1, and 10-fold Cross-Validation (CV) is applied to get reliable accuracy results.
Parameter setting for ABC algorithms.
Control parameters
Value
Algorithm
Population Size
20
All algorithms
limitF
1
All algorithms
mu
−1
CABC
C
−1
GABC, GDABC
MR
−1
MABC
SF
−1
MABC
All algorithms are run 20 times and mean values are calculated to get feature selection accuracy. For the experiments, we have conducted nonparametric tests, the Wilcoxon signed ranks test to detect significant differences between the performance accuracies with and without feature selection. The statistical analyses are performed with a significance level α=0.05 to reject the null hypothesis, which is that no significant difference exists between the algorithms.
Tables 3 and 4 show the mean value of classification accuracy of each algorithm with the standard deviation and average selected feature size for each dataset for SVM and KNN as the induction algorithm, respectively. The results that have the highest accuracy value among others are marked in bold in the tables.
Classification accuracy of each ABC algorithm on the tested datasets by using SVM as an induction algorithm.
SN
SR
IS
DE
HC
AN
Without feature selection
Accuracy (%)
91.20 ± 0.24
86.15 ± 0.87
87.09 ± 0.48
94.64 ± 0.39
79.10 ± 1.21
99.13 ± 0.11
Average number of features
35
60
34
34
22
38
CABC
Accuracy (%)
85.35 ± 0.720
70.67 ± 0.93
94.30 ± 1.45
97.26 ± 5.22
85.86 ± 1.37
97.77 ± 0.15
Average number of features
12.3
17.6
9.2
13.6
4.5
8.4
EABC
Accuracy (%)
85.94 ± 0.52
73.07 ± 0.01
94.01 ± 0.36
97.26 ± 0.08
85.86 ± 0.5
97.77 ± 0.21
Average number of features
7.8
10.2
8.9
13.4
4.1
7.9
GABC
Accuracy (%)
85.51 ± 0.31
72.11 ± 1.23
94.30 ± 0.55
96.45 ± 0.88
86.14 ± 0.1
97.77 ± 0.23
Average number of features
9.1
18.0
6.7
8.3
5.6
13.7
GDABC
Accuracy (%)
85.36 ± 0.59
73.55 ± 1.03
94.30 ± 0.18
96.72 ± 0.17
85.86 ± 0.93
97.77 ± 0.08
Average number of features
9.4
22.2
11.8
8.9
4.2
15.8
MABC
Accuracy (%)
86.67 ± 0.62
75.96 ± 1.22
95.16 ± 0.46
98.63 ± 0.68
86.41 ± 0.27
97.77 ± 0.01
Average number of features
12.3
8.5
7.4
8.5
3.9
10.0
OABC
Accuracy (%)
84.77 ± 0.65
73.08 ± 0.61
94.30 ± 0.47
96.99 ± 0.30
85.87 ± 1.35
97.77 ± 0.35
Average number of features
5.8
10.3
5.7
4.9
5.9
15.7
Classification accuracy of each ABC algorithm on the tested datasets by using KNN as an induction algorithm.
SN
SR
IS
DE
HC
AN
Without feature selection
Accuracy (%)
91.20 ± 0.24
86.15 ± 0.87
87.09 ± 0.48
94.64 ± 0.39
79.10 ± 1.21
99.13 ± 0.11
Average number of features
35
60
34
34
22
38
CABC
Accuracy (%)
93.92 ± 0.34
88.32 ± 0.90
92.19 ± 0.46
97.43 ± 0.29
85.87 ± 0
99.44 ± 0.12
Average number of features
24.3
34
9.3
20.6
9.5
25.2
EABC
Accuracy (%)
93.66 ± 0.31
87.40 ± 0.54
91.31 ± 0.33
95.57 ± 0.31
84.46 ± 0.50
99.44 ± 0.12
Average number of features
26.8
35.5
10.3
20.8
12.1
26.3
GABC
Accuracy (%)
93.79 ± 0.37
88.89 ± 1.07
91.57 ± 0.50
96.58 ± 0.58
85.87 ± 0
99.42 ± 0.13
Average number of features
24.5
33.9
11.7
21.2
9.3
29.2
GDABC
Accuracy (%)
93.66 ± 0.29
88.51 ± 0.94
92.39 ± 0.42
96.04 ± 0.51
85.87 ± 0
99.41 ± 0.10
Average number of features
24.4
32.7
9.8
22.5
10
28.2
MABC
Accuracy (%)
94.06 ± 0.35
88.70 ± 1.09
91.54 ± 0.53
97.98 ± 0.32
85.87 ± 0
99.41 ± 0.10
Average number of features
25
29.8
8.4
20.2
7.1
27.8
OABC
Accuracy (%)
93.48 ± 0.34
92.74 ± 0.86
91.40 ± 0.35
97.43 ± 0.29
85.27 ± 0.28
99.41 ± 0.10
Average number of features
25.1
33.1
11.6
7.8
7.8
25.8
When Table 3 is examined, the general accuracy of AMABC algorithms with SVM is not significantly better than the classification results without feature selection. However, AMABC algorithms are selected with few features to obtain similar average accuracy results when compared to the results without feature selection. On the other hand, on IS, SE, and DC datasets, AMABC algorithms significantly improve the classification accuracy results with few features according to the pairwise Wilcoxon test. MABC algorithm is also giving better results than other considered ABC variants.
According to Table 4, for Soybean and Dermatology datasets, MABC shows again better accuracy performance. Original ABC gets first rank in Sonar dataset. For Ionosphere, GDABC algorithm gets higher accuracy than other algorithms. For Horse Colic, CABC, GABC, GDABC, and MABC acquire maximum accuracy rate. Finally, CABC and EABC achieve the highest average classification accuracy. When Table 4 is analysed, it is seen that higher accuracy is obtained with fewer features through the feature selection. When all the algorithms are compared, it seems that none of the algorithms are dominant with respect to classification accuracy. However, the conducted Wilcoxon tests show a significant improvement of classification accuracy when the feature selection is applied by AMABC algorithms for each dataset (p values are lower than 0.05 for all cases).
When the results are compared in Tables 3 and 4, it is clearly seen that KNN should be chosen in the induction algorithm for AMABC algorithms although SVM selects fewer features to obtain reasonable results.
5. Conclusions
In this paper, we introduced angle modulation technique for feature subset selection. The main advantage of angle modulation technique for feature subset selection is that high-dimensional problems can be represented by low-dimensional continuous optimization problem and any optimization technique working in continuous space can be applied to solve optimal feature subset selection with less effort.
As a case study, six variants of ABC algorithms employed angle modulation for feature selection. In an experimental study, feature selection performances of original ABC algorithms and another five ABC algorithms variants are compared on six UCI datasets. The results show that feature selection with AMABC algorithms improved significantly the classification accuracy with fewer feature subsets.
Further research will be performed in order to improve performance accuracy by applying angle modulation to other evolutionary computation approaches such as Particle Swarm Optimization and Differential Evolution.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
XueB.ZhangM.BrowneW. N.Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanismsTabakhiS.MoradiP.AkhlaghianF.An unsupervised feature selection algorithm based on ant colony optimizationKohaviR.JohnG. H.Wrappers for feature subset selectionPalanisamyS.KanmaniS.Artificial Bee colony approach for optimizing feature selectionSchiezaroM.PedriniH.Data feature selection based on Artificial Bee Colony algorithmSyarifahadilahM. Y.AbdullahR.VenkatI.ABC algorithm as feature selection for biomarker discovery in mass spectrometry analysisProceedings of the 4th Conference on Data Mining and Optimization (DMO '12)September 2012Langkawi, MalaysiaIEEE677210.1109/dmo.2012.63298002-s2.0-84869384230UzerM. S.YilmazN.InanO.Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classificationAkilaM.Suresh KumarV.AnusheelaN.SugumarK.A novel feature subset selection algorithm using artificial bee colony in keystroke dynamicsKarabogaD.An idea based on honey bee swarm for numerical optimization2005tr06Kayseri, TurkeyErciyes UniversityAkayB.KarabogaD.A modified artificial bee colony algorithm for real-parameter optimizationDiwoldK.AderholdA.ScheidlerA.MiddendorfM.Performance evaluation of artificial bee colony optimization and new selection schemesZhuG.KwongS.Gbest-guided artificial bee colony algorithm for numerical function optimizationAlatasB.Chaotic bee colony algorithms for global numerical optimizationFeng GaoW.Yang LiuS.Ling HuangL.Enhancing artificial bee colony algorithm using more information-based search equationsPamparaG.FrankenN.EngelbrechtA. P.Combining particle swarm optimisation with angle modulation to solve binary problems1Proceedings of the IEEE Congress on Evolutionary Computation (CEC '05)September 2005Edinburgh, ScotlandIEEE899610.1109/CEC.2005.15546712-s2.0-27144547172WestonJ.MukherjeeS.ChapelleO.PontilM.PoggioT.VapnikV.Feature selection for SVMs12Proceeding of the Annual Conference on Neural Information Processing Systems (NIPS '00)2000Denver, Colo, USA668674CoverT.HartP.Nearest neighbor pattern classificationHallM.FrankE.HolmesG.PfahringerB.ReutemannP.WittenI. H.The WEKA data mining software: an updateBacheK.LichmanM.UCI machine learning repository2013, http://archive.ics.uci.edu/ml