A Comparative Analysis of Swarm Intelligence Techniques for Feature Selection in Cancer Classification

Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.


Introduction
Abundant methods and techniques have been proposed for cancer classification using microarray gene expression data. Rapid and recent advances in microarray gene expression technology have facilitated the simultaneous measurement of the expression levels of tens of thousands of genes in a single experiment at a reasonable cost. Gene expression profiling by microarray method has appeared as a capable technique for classification and diagnostic prediction of cancer.
The raw microarray data are images that are transformed into gene expression matrices. The rows in the matrix correspond to genes, and the columns represent samples or trial conditions. The number in each cell signifies the expression level of a particular gene in a particular sample or condition. Expression levels can be absolute or relative. If two rows are similar, it implies that the respective genes are coregulated and perhaps functionally related. By comparing samples, differentially expressed genes can be identified. The major limitation of the gene expression data is its high dimension which contains more numbers of genes and very few samples. A number of gene selection methods have been introduced to select the informative genes for cancer prediction and diagnosis. Feature or gene selection methods remove irrelevant and redundant features to improve classification accuracy. From the microarray data, the informative genes are identified based on their -statistics, SNR, and -test values.
PSO is one of the SI techniques proposed by Kennedy and Eberhart [1] that simulate the behaviour of bird flocking. Yang and Deb [2] proposed the CS inspired by the breeding behaviour of cuckoo. SFL is a memetic metaheuristic that is the combination of two search techniques: the local search of PSO and the competitiveness mixing of the shuffled complex evolution [3]. The randomness in SFL sometimes will not cover an effective area of the search space or it will reflect the same worst solution. To avoid this, the proposed work adopts Lévy flight for position change. The SI techniques such as PSO, CS, SFL, and SFLLF are used for feature selection.

Related Work.
In this section the works related to gene selection and cancer classification using microarray gene expression data are discussed. An evolutionary algorithm 2 The Scientific World Journal is used by Jirapech-Umpai and Stuart [4] to identify the near-optimal set of predictive genes that classify the data. Vanichayobon et al. [5] used self-organizing map for clustering cancer data composed of important gene selection step. Rough set concept with dependent degrees was proposed by Wang and Gotoh [6]. In this method they screened a small number of informative single gene and gene pairs on the basis of their dependent degrees.
A swarm intelligence feature selection algorithm was proposed based on the initialization and update of only a subset of particles in the swarm by Martinez et al. [7]. Gene doublets concept was introduced by Chopra et al. [8] based on the gene pair combinations. A new ensemble gene selection method was applied by Liu et al. [9] to choose multiple gene subsets for classification purpose, where the significant degree of gene was measured by conditional mutual information or its normalized form.
A hybrid method was proposed by Chuang et al. [10], which consists of correlation-based feature selection and the Taguchi chaotic binary PSO. Dagliyan et al. [11] proposed a hyperbox enclosure (HBE) method based on mixed integer programming for the classification of some cancer types with a minimal set of predictor genes. The use of single gene was explored to construct classification model by Wang and Simon [12]. This method first identified the genes with the most powerful univariate class discrimination ability and constructed simple classification rules for class prediction using the single gene.
An efficient feature selection approach based on statistically defined effective range of features for every class termed as effective range based gene selection (ERGS) was proposed by Chandra and Gupta [13]. Biomarker identifier (BMI), which identified features with the ability to distinguish between two data groups of interest, was suggested by Lee et al. [14]. Margin influence analysis (MIA) was an approach designed to work with SVM for selecting informative genes by Li et al. [15]. A model for feature selection using signal-tonoise ratio (SNR) ranking was proposed by Mishra and Sahu [16].
Huang et al. [17] presented an improved semisupervised local Fisher discriminant (iSELF) analysis for gene expression data classification. Alonso-González et al. [18] proposed a method that relaxed the maximum accuracy criterion to select the combination of attribute selection and classification algorithm. A quantitative measure based on mutual information that incorporates the information of sample categories to measure the similarity between attributes was proposed by Maji [19]. A feature selection algorithm which divides the genes into subsets to find the informative genes was proposed by Sharma et al. [20].  [21] to find the degree of gene expression difference between normal and tumor tissues. The top-m genes with the largest -statistic are selected for inclusion in the discriminant analysis. Consider

Gene Selection
Here 1 : mean of normal samples, 2 : mean of tumor samples, 1 : normal sample size, 2 : tumor sample size, V 1 : variance of normal samples, and V 2 : variance of tumor samples.

Signal-to-Noise
Ratio. An important measure used to find the significance of genes is the Pearson correlation coefficient. According to Golub et al. [22] it is changed to emphasize the "signal-to-noise ratio" in using a gene as a predictor. This predictor is shaped with the purpose of finding the prediction strength of a particular gene by Xiong et al. [23]. The signal-to-noise ratio PS of a gene " " is calculated by Here 1 : mean of normal samples, 2 : mean of tumor samples, 1 : standard deviation of normal samples, and 2 : standard deviation of tumor samples. This value is used to reveal the difference between the classes relative to the standard deviation within the classes. Large values of PS( ) indicate a strong correlation between the gene expression and the class distinction, while the sign of PS( ) being positive or negative corresponds to being more highly expressed in class 1 or class 2. Genes with large SNR value are informative and are selected for cancer classification.

-Test.
-test is the ratio of the variances of the given two sets of values which is used to test if the standard deviations of two populations are equal or if the standard deviation from one population is less than that of another population. In this work two-tailed -test value is used to check the variances of normal samples and tumor samples. Formula to calculate the -test value of a gene is given in (3). Top-m genes with the smallest -test value are selected for inclusion in the further analysis. Consider Here V 1 : variance of normal samples and V 2 : variance of tumor samples. In PSO, each single solution is like a "bird" in the search space, which is called a "particle. " All particles have fitness values which are evaluated by the fitness function to be optimized and have velocities which direct the flying of the particles. The particles fly through the problem space by following the particles with the best solutions so far.

Swarm
The original PSO formulae define each particle as potential solution to a problem in -dimensional space. The position of particle is represented as = ( 1 , 2 , . . . , ). Each particle also maintains a memory of its previous best position, represented as = ( 1 , 2 , . . . , ). A particle in a swarm is moving; hence, it has a velocity, which can be represented as = (V 1 , V 2 , . . . , V ).
Each particle knows its best value so far ( best) and the best value so far in the group ( best) among bests. This information is useful to know how the other particles around them have performed. Each particle tries to modify its position using the following information: (i) the distance between the current position and best, (ii) the distance between the current position and best.
This modification can be represented by the concept of velocity. Velocity of each agent can be modified by (4). The inclusion of an inertia weight in the PSO algorithm was first reported by Eberhart and Shi in the literature [24]. Consider where : index of the particle, ∈ {1, . . . , }, : population size, : dimension, ∈ {1, . . . , }, rand( ): uniformly distributed random variable between 0 and 1, : velocity of particle on dimension , : current position of particle on dimension , 1 determines the relative influence of the cognitive component, self-confidence factor, 2 determines the relative influence of the social component, swarm confidence factor, : personal best or best of particle , : global best or best of the group, and : inertia weight.
The current position that is the searching point in the solution space can be modified by the following equation: All swarm particles tend to move towards better positions; hence, the best position (i.e., optimum solution) can eventually be obtained through the combined effort of the whole population. The PSO algorithm is simple, easy to implement, and computationally efficient.

Cuckoo Search.
Cuckoo search is an optimization technique developed by Yang and Deb in 2009 based on the brood parasitism of cuckoo species by laying their eggs in the nests of other host birds. If a host bird discovers the eggs which are not their own, it will either throw these foreign eggs away or simply abandon its nest and build a new nest elsewhere. Each egg in a nest represents a solution, and a cuckoo egg represents a new solution. The better new solution (cuckoo) is replaced with a solution which is not so good in the nest. In the simplest form, each nest has one egg. A new solution is generated by Lévy flight. The rules for CS are as follows: (i) each cuckoo lays one egg at a time and dumps it in a randomly chosen nest; (ii) the best nests with high quality of eggs will carry over to the next generations; (iii) the number of available host nests is fixed, and a host can discover a foreign egg with a probability ∈ [0, 1]. In this case, the host bird can either throw the egg away or abandon the nest so as to build a completely new nest in a new location.
When generating new solutions ( + 1) for a cuckoo , a Lévy flight is performed using the following equation: The symbol ⊕ is an entrywise multiplication. Basically Lévy flights provide a random walk while their random steps are drawn from a Lévy distribution for large steps given in This has an infinite variance with an infinite mean. Here the consecutive jumps of a cuckoo essentially form a random walk process which obeys a power-law step-length distribution with a heavy tail.

Shuffled Frog
Leaping. SFL is swarm intelligence based subheuristic computation optimization algorithm proposed by Eusuff and Lansey [25] to solve discrete combinatorial optimization problem. A group of frogs leaping in a swamp is considered and the swamp has a number of stones at distinct locations on to which the frogs can leap to find the stone that has the maximum amount of available food. The frogs are allowed to communicate with each other so that they can improve their memes using other's information. An individual frog's position is altered by changing the leaping steps of each frog which improves a meme results. The search begins with a randomly selected population of frogs covering the entire swamp. The population is partitioned into several parallel groups (memeplexes) that are permitted to evolve independently, to search the space in different directions. Within each memeplex, the frogs are infected by other frog's ideas; hence they experience a memetic evolution.
Memetic evolution progresses the quality of the meme of an individual and enhances the individual frog's performance towards a goal. To ensure that the infection process is competitive, it is required that frogs with better memes (ideas) contribute more to the development of new ideas than frogs with poor ideas. Selecting frogs using a triangular probability distribution provides a competitive advantage to better ideas. During the evolution, the frogs may change their memes using the information from the memeplex best or the best of the entire population. Incremental changes in memotype(s) correspond to a leaping step size and the new meme corresponds to the frog's new position. After an individual frog has improved its position, it is returned to the community. The information gained from a change in position is immediately available to be further improved upon.
After a certain number of memetic evolution time loops, the memeplexes are forced to mix and new memeplexes are formed through a shuffling process. This shuffling enhances the quality of the memes after being infected by frogs from different regions of the swamp. Migration of frogs accelerates the searching procedure sharing their experience in the form of infection and it ensures that the cultural evolution towards any particular interest is free from regional bias.
Here, the population consists of a set of frogs (solutions) that is partitioned into subsets referred to as memeplexes. The different memeplexes are considered to be different cultures of frogs, each performing a local search. Within each memeplex, the individual frogs hold ideas that can be influenced by the ideas of other frogs and evolve through a process of memetic evolution. After a defined number of memetic evolution steps, ideas are passed among memeplexes in a shuffling process. The local search and the shuffling processes continue until defined convergence criteria are satisfied. An initial population of frogs is created randomly. Fordimensional problems ( variables), a frog is represented as = ( 1 , 2 , . . . , ). Afterwards, the frogs are sorted in a descending order according to their fitness. Then, the entire population is divided into memeplexes, each containing frogs ( × ). In this process, the first frog goes to the first memeplex, the second frog goes to the second memeplex, frog goes to the th memeplex, frog + 1 goes back to the first memeplex, and so forth. Within each memeplex, the frogs with the best and the worst fitnesses are identified as and , respectively. Also, the frog with the global best fitness is identified as . Then, a process similar to PSO is applied to improve only the frog with the worst fitness (not all frogs) in each cycle.

Feature Selection Based on Swarm Intelligence Techniques
The statistical measures are used to identify top-m genes and these genes are further used for feature selection in PSO, CS, SFL, and SFLLF. Figure 1 gives the schematic representation of the proposed method. Figure 2 shows the candidate solution representation of particle position for PSO, egg for CS, and frog for SFL and SFLLF using topm informative genes which are obtained from statistical techniques. The most used way of encoding the feature selection is a binary string, but the above optimization techniques work well for continuous optimization problem.

Candidate Solution Representation.
The random values are generated for gene position. The genes are considered when the value in its position is greater than 0.5; otherwise it is ignored.

Fitness Function.
The accuracy of -NN classifier is used as the fitness function [26,27] for SI techniques. The fitness function fitness( ) is defined as fitness ( ) = Accuracy ( ) .
Accuracy( ) is the test accuracy of testing data in the -NN classifier which is built with the feature subset selection of training data. The classification accuracy of -NN is given by where : samples that are classified correctly in test data by -NN technique and : total number of samples in test data.

Experimental Setup
In order to assess the performance of the proposed work, ten benchmark datasets are used. Table 1 shows the datasets collected from Kent Ridge Biomedical Data Repository. The The Scientific World Journal 5 For each particle Initialize particle END

Do
For each particle Calculate fitness value If the fitness value is better than its personal best Set current value as the new End Choose the particle with the best fitness value of all as For each particle Calculate particle velocity according (4) Update particle position according (5) End While maximum iterations or minimum error criteria is not attained  The Scientific World Journal Generate random population of solutions (frogs); Calculate fitness function value of each frog; Repeat for specific number of times Sort the population in descending order of their fitness; Divide into memeplexes; Repeat for specific number of iterations For each memeplex determine the best and worst frogs and ; Identify the best frog for the entire population ; Improve the worst frog position using ( + 1) = rand() × ( ( ) − ( )) If ( ( + 1)) < ( ( )) ( + 1) = rand() × ( ( ) − ( )) if ( ( + 1) < ( ( )) Levy ∼ = − and ( + 1) = ( ) + ⊕ Levy( ) end; Combine the evolved memeplexes; end; Present the best frog end; The symbol ⊕ is an entrywise multiplication. Basically Lévy flights provide a random walk while their random steps are drawn from a Lévy distribution for large steps.
From the microarray data the discriminative genes are identified and ranked based on -statistics, signal-to-noise ratio, and -test values. The top-m genes are used to represent the candidate solutions of the SI techniques. The values 10, 50, and 100 are assigned to m for testing purpose. The SI technique identifies the features (genes) for classification. The -NN method is used for classification. By empirical analysis the value of is assigned to be 5. The classification accuracy is obtained from 5-fold cross-validation.      Table 3 compares the maximum classification accuracies obtained from the SI techniques with different statistical measures.

Conclusions
Cancer classification using gene expression data is an important task for addressing the problem of cancer prediction and diagnosis. For an effective and precise classification, investigations of feature selection methods are essential. The swarm intelligence techniques based feature selection methods are simple and can be easily combined with other statistical feature selection methods. It is a simple model based on statistical measures and swarm intelligence techniques that perform two levels of feature selection to get the most informative genes for classification process.statistics, signal-to-noise ratio, and -test are used to select the important genes that are the reason for cancer. The SI techniques such as PSO, CS, SFL, and SFLLF are applied on   [18] Combination of attribute selection and classification algorithm 88.41 Maji (2012) [19] Mutual information 100 Chandra and Gupta (2011) [13] E ff e c t i v er a n g eb a s e dg e n es e l e c t i o n 83.87 Li et al. (2011) [15] Margin influence analysis with SVM 100 Chopra et al. (2010) [8] Based

Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.