Gene Selection via a New Hybrid Ant Colony Optimization Algorithm for Cancer Classification in High-Dimensional Data

The recent advance in the microarray data analysis makes it easy to simultaneously measure the expression levels of several thousand genes. These levels can be used to distinguish cancerous tissues from normal ones. In this work, we are interested in gene expression data dimension reduction for cancer classification, which is a common task in most microarray data analysis studies. This reduction has an essential role in enhancing the accuracy of the classification task and helping biologists accurately predict cancer in the body; this is carried out by selecting a small subset of relevant genes and eliminating the redundant or noisy genes. In this context, we propose a hybrid approach (MWIS-ACO-LS) for the gene selection problem, based on the combination of a new graph-based approach for gene selection (MWIS), in which we seek to minimize the redundancy between genes by considering the correlation between the latter and maximize gene-ranking (Fisher) scores, and a modified ACO coupled with a local search (LS) algorithm using the classifier 1NN for measuring the quality of the candidate subsets. In order to evaluate the proposed method, we tested MWIS-ACO-LS on ten well-replicated microarray datasets of high dimensions varying from 2308 to 12600 genes. The experimental results based on ten high-dimensional microarray classification problems demonstrated the effectiveness of our proposed method.


Introduction
In recent years, DNA microarray technology has grown tremendously, thanks to its unquestionable scientific merit.
is technology developed in the early 1990s allowed researchers to simultaneously measure the expression levels of several thousand genes [1,2], ese levels of expression are very important for the detection or classification of the specific tumor type. e microarray data is transformed into gene expression matrices, where a row represents an experimental condition and column represents a gene; each value of x ij is the measure of the level of expression of the j th gene in the i th sample (see Table 1).
For the cancer classification problem, each line contains information about the class of a sample (the type of cancer).
us, DNA microarray analysis can be formulated as a supervised classification task [3].
In the cancer classification task, a small number of samples are available, while each sample is described by a very large number of genes. ese characteristics of the microarray data make it very likely the presence of redundant or irrelevant genes, which limit the performance of classifiers.
us, extracting a small subset of genes containing valuable information about a given cancer is one of the principal challenges in the microarray data analysis [4].
Gene selection has become more and more indispensable over the last few years. e main motivation of this selection is to identify and select the useful genes contained in a microarray dataset for distinguishing the sample classes. It also provides a better understanding and interpretation of the phenomena studied. Also, it surpasses the curse of dimensionality in order to improve the quality of classifiers. In general, gene selection methods are divided into two subclasses: wrapper approaches and filter approaches [5]. In wrapper methods, the selection can be seen as an exploration of all the possible subsets, and the principle is to generate a subset of genes and evaluate it afterward. Indeed, the quality of a given subset is measured by a specific classifier. In the aforementioned method (wrapper), the classification algorithm is used several times at each evaluation. Generally, the accuracy according to the final subset of genes is high because of the bias of the process of generating the classifier used. Another advantage is their conceptual simplicity: just generate and test. However, they do not have any theoretical justification for the selection and do not allow us to understand the dependency relationships that may exist between genes. On the other hand, the selection procedure is specific to a particular classifier, and the found subsets are not necessarily valid if we change the classifier. Besides, they typically suffer from a possible overfitting and high computational cost [5,6]. Also, these approaches become unfeasible because the evaluation of large gene subsets is computationally very expensive [7]. While in filter methods, the final subset is selected based on some gene score functions and significance measures. Unlike wrappers, the selection is independent of the classifier used. e operating principle of these methods is based on the evaluation of each gene individually to assign it a score. e gene selection is performed by selecting the best-ranked genes. Filters are generally less expensive in computing time, so they can be used in the case where the number of genes is very high because of their reasonable complexity. But, the main negative point of these methods is that they do not take into consideration the possible interactions between genes. In the literature, there are several individual gene-ranking methods (filter) such as t-test [8], Fisher score [9], signal-to-noise ratio [10], information gain [7], and ReliefF [11].
In wrapper methods, metaheuristics are commonly used to generate high-quality subsets of genes. Examples of classification algorithms used for measuring the quality of each candidate solution include support vector machines (SVMs) and K nearest neighbor (KNN) [12]. e first works on the DNA microarray classification were published at the end of the 1990s [13,14]. In this context, several researchers have utilized metaheuristic methods and the ACO algorithm for solving the feature selection problem (particularly gene selection), in order to facilitate recognition of cancer cells: ACO [15][16][17][18][19][20] algorithm, PSO [4,6,[21][22][23][24][25] genetic algorithm [4,26,27], incorporating imperialist competition algorithm (ICA) [28], and binary differential evolution (BDE) algorithm [29]. e ant colony optimization algorithm (abbreviated as ACO) is a population-based metaheuristic [30,31]. anks to its efficiency, it has been used to solve several optimization problems in different fields. In the ACO algorithm, each ant presents a candidate solution to the problem, and the ants build approximate solutions iteratively (step-by-step). e process of constructing solutions can be regarded as a path (between home and food source of ants) on a graph. e choice of the best path by ants is influenced by the quantities of pheromone left in these pathways and a piece of heuristic information that indicates the goodness of the decision taken by an ant.
us, metaheuristics find application in solving the gene selection problem which is known to be NP-hard [32,33]. In the last decade, several researchers have also adopted graphbased techniques to select near-optimal subset of a feature set [34][35][36].
In this study, we propose a hybrid approach for solving the gene selection problem. Our two-stage proposed approach starts with a first stage in which a new graph-based approach is proposed (MWIS) without using any learning model. In the second stage, a wrapper method based on a modified ACO and a new local search algorithm guided by the 1NN classifier is developed. In this step, the role of 1NN is to evaluate each candidate gene subset generated. e proposed approach has not been previously investigated by previous researchers.
is paper is organized as follows: in Section 2, we present the proposed gene selection method. Section 3 provides a detailed exposition of the experiments that we have put on ten microarray datasets to evaluate our approach. Finally, we conclude our paper.

Graph eory Approach for Gene Selection
In this work, we use X to denote a dataset (Table 1) We use g 1 , g 2 , . . . , g N , g i ∈ R M to denote the N genes vectors. Y � (y 1 , y 2 , . . . , y M ) are the class labels.
Graph theory gives an abstract model to represent the relationships between two or more elements (vertices) into a given system. Let G � (V, E) be an undirected graph where V is a nonempty finite set called the set of vertices and E is the set of edges. We define a vertex-weighted graph (G, W) as a graph G together with a function W (the vertex weighting function) such that W(u) ∈ R + * for all u ∈ V [37,38]. e maximum weight independent set (MWIS) is one of the most important optimization problems, thanks to their several domains of application [39], particularly, in the gene selection problem, where we can transform the DNA microarray data into a vertex-weighted graph (gene- x 11 Computational and Mathematical Methods in Medicine similarity graph). In this graph, each gene can be considered as a vertex and their Fisher score as weight of this vertex. e set of edges represents the existence of significant correlation (relationship) between these genes; this relation is nothing but the degree of linear association (Pearson correlation) between the latter. After transforming the DNA microarray data, we try to find the maximum weight independent set. is set of genes will be used in the second stage of our proposed method.

Construction of Gene-Similarity Graph.
e construction of gene-similarity graph requires the definition of some statistical notions: starting with the Fisher score to calculate the weight of each vertex (gene). [9]. It is mainly applied in gene selection as a filter [40]. e Fisher score value of each gene represents its relevance to the dataset; a higher Fisher score means that the gene contributes more information.

Fisher Score F i
is information helps to measure the degree of separability of the classes through a given gene g i . It is defined by where c, n k , μ k i , and σ i k represent, respectively, the number of classes, the size of the k-th class, and mean and standard deviation of k th class corresponding to the i th gene. μ i is the global mean of the i th gene.

Pearson Correlation Coefficient.
e Pearson correlation coefficient is a measure of the strength of the linear relationship between two variables (genes). Let g 1 and g 2 be two random variables, and the correlation coefficient between g 1 and g 2 is defined by where cov(g 1 , g 2 ) is the covariance between g 1 and g 2 , σ g 1 is the standard deviation of g 1 , and σ g 2 the standard deviation of g 2 . e correlation coefficient may take on a range of values from − 1 to +1. Let (r ij � |ρ g i ,g j |) be the absolute value of the correlation between g i and g j . Now, we can define the adjacency matrix A G � (a ij ) N×N , with zeros on its diagonal to represent (G, W). Where a ij � 1 if (i, j) ∈ E is an edge of G and a ij � 0 if (i, j) ∉ E. More precisely, a value of 1 represents the existence of a relationship between g i (row i) and g j (column j), while a value of 0 means the nonexistence of this relationship. e creation of A G requires the definition of the absolute correlation matrix R � (r ij ) N×N . Based on this matrix, we fill A G ; let r 0 be a fixed value in [0, 1]; we assume that if r ij > r 0 then the mutual information between g i and g j is high (i.e., the two vertices are adjacent). More exactly, the matrix A G is filled based on the rule below: for i ≠ j, where r 0 ∈ [0, 1] is the minimum correlation value for which we consider two genes in relation. e experimental study carried out in our method proves that r 0 � 0.35 behaves well with the high-dimensional data. For example, if we have a dataset composed of 7 genes g 1 , g 2 , . . . , g 7 , Table 2 shows the corresponding absolute correlation matrix to these data. For r 0 � 0.35, the adjacency matrix is given in Table 3. We define the weight of a vertex i (g i ), by using the Fisher score: W(i) � F i . A gene i with a high score in the DNA microarray dataset corresponds to a vertex with a high weight in (G, W). is weight gives important information about the gene relevancy to the data. Indeed, if there are two genes connected by an edge in G we prefer the gene which has the best weight. On the basis of the steps defined before, we were able to transform a determined DNA data microarray into a vertex-weighted graph (Algorithm 1). Figure 1 shows the gene-similarity graph equivalent to the adjacency matrix (Table 3); we associate to each gene (vertex) a weight by using the Fisher score: In the context of gene selection for cancer classification, the microarray datasets are characterized by a very large number of genes. e application of an evolutionary algorithm such as ACO directly without passing by a preprocessing step is highly expansive.
is is where filter methods become so useful in order to extract a subset of possibly informative genes, and then the evolutionary metaheuristic is applied to select the near-optimal subset of genes [19]. As examples, generalized Fisher score, ReliefF, and BPSO are combined in [6], an information gain filter and a memetic algorithm in [41], chi-square statistics and a GA are used in [26], information gain and improved simplified swarm optimization in [42].and ReliefF, mRMR (minimum redundancy maximum relevance), and GA in [11]. Zhao et al proposed a hybrid approach by combining the Fisher score with a GA and PSO [40]. In order to overcome the disadvantages of filter methods, we propose an efficient approach based on graph theory techniques to select the first subset. is method takes into account possible interactions between genes.

Gene Selection Based on the Maximum Weight
Independent Set. Let (G, W) be a vertex-weighted undirected graph, where V is the set of its vertices, E is the set of edges, and W is the vertex weighting function.
is an independent set of G if there are no two adjacent vertices in I (i.e., connected by an edge). e MWIS is the independent set with the maximum weight (the weight of a subset of vertices in V is defined as the sum of the weights of the vertices in this subset [43]).
We remark that in filter methods for gene selection based on the rank of genes, the correlation between the selected genes is not considered. is implies the selection of subsets with a high level of redundancy that penalizes the Computational and Mathematical Methods in Medicine classification performances; on the other hand, these methods eliminate the genes with a low individual score, ignoring the possibility that they can become highly relevant when combined with other genes [44]. is motivates us to propose a graph-based approach to overcome these problems. In the first stage of our method, we consider the gene selection problem as the search for the maximum weight independent set in the gene-similarity graph (G, W). e choice of this subset is justified by two arguments: First, the term maximum weight can be translated in the context of gene selection as selecting a subset of genes with maximum relevance. Second, the notion of independent ensures the choice of a subset with minimum redundancy; i.e., in this subset, there are no two genes with high correlation. In addition, this subset can contain genes with a low score. erefore, the proposed method in this stage gives a good subset of genes for applying an evolutionary algorithm such as ACO.
e MWIS into a given graph is an NP-hard problem [45], and since in our case the gene-similarity graph is large (several thousands of vertices and edges), then it is impossible to find an exact solution to our problem in a reasonable time. For this, we propose a greedy algorithm (heuristic) to quickly obtain an approximate solution. e main lines of this algorithm are presented in Algorithm 2.
We illustrate the execution of our greedy algorithm (Algorithm 2) on the graph from Figure 1 formed by g 1 , g 2 , . . . , g 7 . In the first iteration, we select the best gene g 1 (W(g 1 ) � 1.6), then we remove their neighborhood g 2 , g 4 , g 7 , and in the next iteration we choose the best gene  Table 3: Adjacency matrix.

Begin
Calculate the weight of each gene W(g i ) � F i by using the Fisher score (1). Calculate the absolute correlation matrix R � (r ij ) N×N by using (2). Fill the adjacency matrix A G � (a ij ) N×N associated to G, based on the rule (3). Create the gene-similarity graph (G, W). Return (G, W).
ALGORITHM 1: Construction of a gene-similarity graph.  g 5 in the second graph composed by g 3 , g 5 , g 6 . In the last iteration, we have only one gene to choose g 3 . en I � g 1 , g 3 , g 5 ( Figure 2) is an approximate maximum weight independent set, and we can notice that our greedy algorithm gives the exact MWIS for this example.

Ant Colony Optimization for Gene Selection.
ACO is one of the algorithms based on swarm intelligence. It was introduced as a method for solving optimization problems in the early 90s by Dorigo et al. [30,31] and developed after in [46,47]. Initially, ACO was designed to solve the traveling salesman problem by proposing the first ACO algorithm: "Ant System" (AS) [48]. Subsequently, other applications that were considered early in the history of ACO such as quadratic assignment [49], sum coloring [50], vehicle routing [51], constraint satisfaction [52], and gene selection [15-17, 19, 20]. e ACO algorithm is inspired by the social behavior of ants. e artificial ants used in the ACO can cooperate with each other (by exchanging information via pheromones) to solve difficult problems; this is performed by building approximate solutions iteratively (step-by-step). e feasible solutions can be regarded as a path between home and food source of ants. e method of choice of this last path is detailed in the next subsections.

ACO for Gene Selection.
Denote the p genes as g 1 , g 2 , . . . , g p to adopt the ACO for gene selection problem, and a novel ACO is proposed; the path of each ant from the nest to food is coded as a p-dimensional binary string where each bit of the pathway is attached to a gene; the selection of the pathway "1" means that gene has been chosen. On the other hand, a pathway "0" indicates that the gene is not selected in the final subset. Suppose that p is 10, the coding of our modified ACO is presented and explained in Figure 3.
e ants seek to find the best path that maximizes the accuracy and minimizes the number of selected genes. Figure 4 describes the gene section procedure proposed on our ACO. Each ant starts from the nest to the food source with the aim to find the best path (best subset of genes). e building of this path is done step-by-step; in each step i, the ant decides to add the gene i to the candidate subset of genes or not, based on the pheromone and heuristic information assigned to this gene ( Figure 4). e ant terminates its tour in p steps and outputs a subset of selected genes as it reaches the food source.
As indicated previously, the task of each ant is to construct a candidate subset of genes using heuristic information and pheromone; this is performed via a probabilistic decision rule. We compute the probability of selecting a pathway as below: Input: Gene-similarity graph (G 0 , W) Output: An approximate maximum weight independent set I.
Choose the best vertex v i in G i , (i.e., vertex with the high weight).
ALGORITHM 2: Greedy algorithm to approximate the MWIS. Gene data Ant path g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 g 10 Genes selected are {g 1 , g 2 , g 5 , g 7 , g 10 } Figure 3: An illustrated example with generated subset and path representation.

Computational and Mathematical Methods in Medicine
where i represents the i th gene, j takes the value 1 or 0 to denote whether the corresponding gene has been selected or not, τ ij is the pheromone intensity that indicates the importance of the selection of the i th gene, and η ij represents the heuristic reflecting the desirability of the selection of this gene or not. α and β are two parameters controlling the relative importance of the pheromone intensity versus visibility; with α � 0, only the visibility (heuristic information) of the gene is taken into account, and the ants will decide to select or not a given gene based just on η ij . Since the previous research experience is lost, therefore there is no cooperation between ants in this case.
On the contrary, with β � 0, only the trail pheromone trails play. To avoid too rapid convergence of the ACO algorithm, a compromise between these two parameters is necessary to ensure the diversification and intensification of the search space.

e Heuristic.
e choice of a good heuristic, which will be used in combination with the pheromone information to build solutions, is an important task in the ACO implementation [53]. In our ACO, this heuristic is used to indicate the quality of a gene based on a scoring algorithm.
For a given ant, the heuristic information η i1 is the desirability of adding the gene i to the subset of selected genes. We define this quantity based on the Fisher score F i (1) which measures the quality of this gene and the number of genes selected by the ant before arriving at gene i N s . η i1 is calculated as follows: For the value of η i0 , we combine the mean of the scores of Fisher of all genes and N s . is means that the ants tend to choose the small subsets of genes that have high relevance:

Updating the Pheromone Trail.
e goal of the pheromone update is to increase the pheromone values associated with good solutions while reducing those associated with bad ones. e updates of pheromones are made in two stages, a local update and a global update.
Once the ant k has finished the built of its path, the pheromone in all of the pathways will be updated. e updated formula is described below: where ρ loc is the local pheromone evaporation coefficient parameter (0 < ρ loc < 1) which represents the evaporation of trail and Δτ ij is the amount of pheromone deposited by the ant k; in our ACO, it is given by where S is the candidate solution created by the ant, CA 1NN is the r-fold cross-validation classification accuracy of 1NN classifier (nearest neighbor) based on S, #genes is the number of selected genes in S, and λ is a parameter that indicates the importance of the number of selected genes in S (1 ≤ λ).
At each iteration T, after all ants finish their traverses, a global update of pheromone quantities is made for all pathways chosen by the best ant (the best candidate solution) during the iteration T. e global update is carried out as follows: where ρ glob is the global pheromone evaporation coefficient parameter and Δτ ij (T) is the amount of pheromone deposited by the best ant during the iteration T given by Chiang et al. [15].
To avoid stagnation of the search, the range of possible pheromone trails is limited to an interval [τ min ; τ max ].

Fitness Function.
In order to guide our novel ACO towards a high-quality subset of genes, we need to define a "fitness function" f. e quality of a candidate subset can be measured by combining the number of genes into this subset (size) and the classification accuracy using a specific classifier, and in gene selection the aim is to maximize the accuracy and minimize the number of genes used. e estimation of the classification accuracy is measured by a given classifier using the cross-validation rule. In this study, we use the K-nearest neighbor classifier (KNN).

K-Nearest Neighbor (KNN).
e KNN method is a supervised learning algorithm and was introduced by Fix and Hodges in 1951 [54]. It is based on the notion of proximity (neighbor) between samples for making a decision (classification) [55].
In order to determine the class of a new example, we calculate the distance between the new one and all testing data, and finally the classification is given by a majority vote of its K neighbors. e neighbors are determined by the Euclidean distance which is defined as follows: In our proposed method, we use the 1NN classifier, which is a particular case of KNN (with K � 1). Let X be a new sample to classify and T a sample from the training data, then the class of X is determined as below: Class(X) � Class(arg min(D(X, T))). (11) Note that, the genes into gene expression data had different scales, and the KNN classifier is influenced by the measure of distances between samples. erefore, we modify our 1NN by normalizing the training data to transform them to a common scale. is transformation is carried out based on the mean and the standard deviation of each gene, and the latter values are used for the scaling of the test data.

Objective Function.
e fitness value of a candidate solution S in our ACO is calculated as follows: where w 1 is a weight coefficient in [0, 1] that controls the aggregation of both objectives (maximizing the predictive accuracy and minimizing the number of genes), #genes is the number of selected genes in S, and p is the total number of genes. Mention that"CA KNN " is nothing but the average crossvalidation classification accuracy calculated by the KNN classifier, using leave-one-out-cross-validation (LOOCV) [56], in which we divide our dataset into M nonoverlapping subsets (M tissue samples). At each iteration, we train our KNN classifier on (M − 1) samples based on the selected genes, and we test it on the remaining sample. e"CA KNN " associated to LOOCV is calculated based on the rule below: CA KNN � the number of correctly predicted samples M .
2.6. Local Search. e local search algorithm is used to improve the solutions given by ants and provide good solutions within a reasonable time. With this aim, we are inspired by the framework proposed in [57], in which a local search based on the filter ranking method is used to solve the feature selection problem.
Given a candidate solution generated by an ant, we define X and Y as the subset of selected and eliminated genes, and X and Y both are ranked using Fisher score, respectively. We further define two basic operators of the local search algorithm: (i) Add: select gene from Y based on its ranking and add it to S (ii) Del: select gene from X based on its ranking and remove it from S e selection of the gene i from Y to move it to S by Add operator in our proposed method is based on the Roulette wheel developed by Holland [58]. Let Y � g 1 , g 2 , . . . , g n 1 and F 1 , F 2 , . . . , F n 1 be its Fisher score values. en the selection probability P i for gene g i is defined as follows: Similarly, for the operator Del, we define the probability of selecting a gene g i of X � g 1 , g 2 , . . . , g n 2 to remove it from S with a probability defined by: where F j � max(F 1 , . . . , F n 2 ) − F j , for j � 1, . . . , n 2 , and F 1 ; F 2 , . . . , F n 1 are the Fisher score values of g 1 , g 2 , . . . , g n 2 .
Based on the probabilities defined before, we can remark that Add operator prefers the genes with the high score to add to S, on the other hand, Del operator prefers the genes with the low score to remove from S.
Our local search algorithm (Algorithm 3) is characterized by the number maximal of Add n add and Del n del operations, and it max the maximal number of consecutive iterations without improvement in the best solution. In addition, this local search algorithm is general and efficient, for example, if we fix n add at 0, the local search algorithm becomes a backward generation, in which we try to remove the not relevant genes at each iteration.

Proposed Method for Gene Selection (MWIS-ACO-LS).
Our hybrid method for solving the gene selection problem is based on combining filter and wrapper approaches. is is carried out taking advantage of the low computing time in filters (MWIS) and the high quality of the subsets provided by the wrapper methods (ACO and LS). e overall process of MWIS-ACO-LS can be seen in Figure 5. e process begins by transforming the initial dataset into a vertex-weighted graph (Algorithm 1), where we search the MWIS, which is well-known as an NP-hard problem, so we have proposed a greedy algorithm (Algorithm 2) to find a near-optimal set of vertices (representing genes in our problem). e subset of genes selected in the later stage is taken as input into the second stage of selection, which used an evolutionary algorithm (ACO), combined with a local search algorithm to select the minimum number of genes that gives the maximum classification accuracy for the 1NN classifier. In this stage, artificial ants cooperate to build a high-quality subset of genes based on the transition rules already presented in Section 2. Also, a local search (Algorithm 3) is proposed to help the ants to achieve good results in a reasonable time.
e pseudocode of our proposed method is presented as follows.  (N 2 M). And finally, for the filling of the adjacency matrix A G (implicitly the construction of genesimilarity graph), the time complexity is O(N 2 ). In the second step of this stage (Algorithm 2), the weight of each vertex is already defined, and then we Input: DNA microarray data; it max ; S a candidate solution given by an ant; n add the maximal number of Add operations; n del the maximal number of Del operations; Output: A candidate solution S best better than S. Begin Determine the subsets X and Y. n a � ⌊rand * (m add + 1)⌋; n d � ⌊rand * (m del + 1)⌋ % ⌊.⌋ is the floor function. 6: Repeat n a times of Add operation to S.
Repeat   N). In this stage, the fitness of each candidate subset of genes is calculated using LOOCV (leave-one-out-crossvalidation) and the 1NN as a classifier equation (13). Let us analyze now the complexity of fitness calculation using 1NN (LOOCV): we compute the distance between the single sample of the testing set and each training set sample, requiring O(p(M − 1)), this process is repeated M times, so the fitness calculation

Experimental Studies
is section presents the performance of our proposed approach (MWIS-ACO-LS) on ten well-known gene expression classification datasets, and we compare our results with those of the state-of-the-art. Furthermore, the characteristics of the used datasets, the parameter settings, and the numerical results will be described in the following sections.
e implementation of the proposed approach (MWIS-ACO-LS) is performed using Matlab R2017a.
As far as the KNN classifier is concerned, we have chosen a predefined function in Matlab. Similarly for the SVM classifier [59,60] used in the comparison a predefined binary linear classifier was chosen. In addition, we have developed a multiclass SVM classifier based on the one-against-all strategy.
Concerning the logistic regression (LR) classifier we have regularized the cost function by two penalties, the first is lasso (L 1 ) and the second is the elastic net regularization. e minimizing the cost functions used on LR − L 1 and LR-Elasticnet is assured by the stochastic gradient descent (SGD) algorithm implemented in the Scikit-learn package [61]. Experimental initial parameters are given in Table 4.
Additionally, in this study, we use leave-one-out-crossvalidation (LOOCV) to measure the quality of the candidate subsets of genes and for comparing our results with the other works.

Environment.
To evaluate our approach, we have chosen ten datasets (DNA microarray) concerning the recognition of cancers [62], which are publicly available and easily accessible. In addition, these datasets are used in several supervised classification works, particularly in the papers using in the "Comparison with state-of-the-art algorithms" section.
All datasets used are described in Table 5. e latter datasets have a multitude of distinguishing characteristics (number of genes, number of samples, and binary classes or multiclasses). e number of samples in some datasets is small (Brain_Tumor2, 9_Tumors, etc.), while others have a higher number (Lung_Cancer, 11_Tumors, etc.). Also, some of them have binary classes (Prostate_Tumor, DLBCL) while others have multiclasses (Leukemia1, Lung_Cancer, etc.). And as our proposed method is designed for the high-dimensional microarrays, all these datasets are characterized by thousands of genes ranging from 2308 to 12600.

Parameters.
We note that our approaches have been run on an Acer Aspire 7750g laptop with Intel Core I5 2.30 GHz processor and 8 GB RAM, under system running Windows 7 (64 bit).
Several tests were carried out in order to obtain an appropriate parameterization; indeed, a set of initial values for the parameters were fixed, and then we change the value of one parameter for different runs until the solutions could not be ameliorated. e process of adjustment was repeated for each parameter until the solutions could not be improved. is process is carried out based on one dataset of cancer classification. Table 5 represents the parameters of the proposed approach.

Results and
Comparisons. Firstly, in order to limit the search space and accelerate the speed of convergence of our proposed approach, the first subset of genes was selected based on a graph-theory algorithm for gene selection (MWIS), and then a modified ACO-1NN coupled with a local search algorithm was applied to find more excellent subset of genes. e quality of a candidate subset is measured by the performance of the KNN classifier obtained using LOOCV and the size of this subset.
Step 2: Apply the greedy algorithm (Algorithm 1) to select an initial subset of genes. Stage 2: e application of ACO to the subset of gene selected in the first stage 6: Step 1: ACO combined with the local search Initialize the pheromone matrix by ones. for T � 1; T < � n max ; T + + do 9: for i � 1; i < � m; i + + do build the path (candidate solution S)of the ant based on the probabilistic decision rule defined by (4), (5) and (6). Calculate the fitness of the candidate solution using LOOCV in (11). 12: Do a local update of pheromones based on S. end for Apply the Local search (Algorithm 3) to S best . 21: Do a global update of pheromones based on S best . end for Find the global best solution S gbest 24: Step 2: Apply a backward generation to S gbest .
Return S gbest . e objectives of the experiments carried out on the ten datasets of (DNA microarray) are as follows: to test the effect of gene selection on the improvement of the classification accuracy and to validate the proposed method and verify its effectiveness.

ALGORITHM 4: Proposed approach (MWIS-ACO-LS).
Given the nondeterministic nature of our approach and the SGDlogistic classifier, ten independent runs were performed for each dataset to obtain a more reliable result. Table 6 shows the results obtained using a new graphbased approach (MWIS) for gene selection, and then, using the MWIS-ACO method where we apply to the subset selected by the MWIS and ACO algorithm, and finally using our new improved method MWIS-ACO-LS, where the ACO is coupled with the local search (LS) method. e classification accuracy in MWIS, MWIS-ACO, and MWIS-ACO-LS is calculated using the 1NN classifier, on the other hand, these methods are compared with SVM, 1NN, and SGDlogistic penalized classifiers without selection to demonstrate the usefulness of our selection approach. We analyze our results in three ways: (i) e classification accuracy (ii) e number of genes used in the classification (iii) e execution time

Discussion
First of all, we start by the execution time analysis of our proposed methods. We can remark that the execution time is appropriate to the complexity analysis; in the filterbased approach MWIS, the execution time is low, but the accuracy is not good since the selection is independent to the classifier. While in MWIS-ACO and MWIS-ACO-LS the execution time is important because of the nature of the wrappers method used and the use of the 1NN classifier at each evaluation, but the classification accuracy is high. Now passing to the analysis of the different stages of our proposed method MWIS-ACO-LS (from Figures 6  and 7 and Table 6), we can remark that the role of the ACO is to improve the classification accuracy and reduce the number of genes used. In addition, the local search has a primordial role in the refinement of the candidate solutions provided by the ants by reducing the number of genes, while retaining the classification accuracy proved by ants. e proposed approach (MWIS-ACO-LS) derives its effectiveness from the remarkable improvement in the classification accuracy and the reduction of the number of the genes used in the classification (shown in bold in Table 6), in all datasets (Figure 8). e "MWIS," "MWIS-ACO," and "MWIS-ACO-LS" methods select a reduced subset of informative genes compared to the original subset of genes in the datasets.
From Table 6 and Figure 8, it can be observed that MWIS overcomes the results obtained by the 1NN classifier for "9_Tumors," "Lung_Cancer," "SRBCT," and "DLBCL" datasets which is amazing because the role of the MWIS algorithm is just to find a candidate subset of genes to apply our modified ACO. at subset can contain weak genes and the process of the selection in this method is independent of the classifier used.
Based on the experiments and the application of our approach on ten dataset concern the cancer recognition, we can observe that the proposed method (MWIS-ACO-LS) outperforms all four algorithms in terms of classification accuracy and the number of genes used in the classification. e improvement in performance is more significant for the 9 Tumors; we are passed from a classification accuracy less than 60% to a perfect classification using just 40. So, we can conclude (from Table 6 and Figure 8) that MWIS-ACO-LS can successfully select a small subset of genes which can obtain a high classification accuracy. For all datasets, the "MWIS-ACO-LS" approach has reached a great classification accuracy, more exactly, a classification greater than 99.42% using only a small subset of genes from the original genes. In addition, MWIS-ACO-LS gave a perfect accuracy of 100% for the majority of datasets: (9_Tumors, Brain_Tumor1, Brain_Tumor2, Leukemia1, Leukemia2, SRBCT, Prostate_Tumor, and DLBCL) using just 5 genes for Leukemia1, and 6 genes for SRBCT and DLBLC dataset.
However, with regard to the MWIS approach based on some graph theory principles, we remark that the subset of genes selected by this method gives acceptable classification accuracy according to the number of genes used. is goes back to the procedure used for the construction of this subset in which we give to the genes with the low score the opportunity to be present. As detailed in Section 2, our twostage proposed method MWIS-ACO-LS starts by selecting a small initial subset of genes that contains the major information in the first stage using MWIS, and then we call our modified ACO combined with a local search algorithm. In this second stage, our algorithm tries to find the smallest subset of genes that give the highest classification accuracy, and Table 6 shows how the second stage plays a crucial role in the increase of the classification for all dataset, especially (Brain_Tumor2, 9_Tumors, 11_Tumors, and Leukemia1) where the results are significantly different (great improvement in the classification accuracy). In Figures 9-12, the abscissa axis expresses the number of generations in the second stage of MWIS-ACO-LS, and the ordinate axis expresses the classification accuracy of the best candidate solution during each iteration. is is done for the average of all solutions and the best solution found for the datasets "9_Tumors, Brain_Tumor1, Brain_Tumor2, and Leukemia1." ese figures clearly show that the use of our modified ACO and the local search algorithm play a crucial role in the amelioration of the classification accuracy. As we can remark in these figures the difference between the best solution and the average solution is not great. erefore, MWIS-ACO-LS possesses a faster convergence speed and achieves the optimal solution rapidly.
In Figures 13-16, we show the evolution of the number of genes selected on the ordinate axis relative to the number of generations (the abscissa axis) for the "9_Tumors, Brain_Tumor1, Brain_Tumor2, and Leuke-mia1" datasets. ese figures illustrate the role of our wrapper algorithm based on ACO in reducing the number of genes. Moreover, the second stage of our proposed approach based on the modified ACO and the local search algorithm plays a key role in increasing the classification accuracy and minimizing the number of genes used. Indeed, the ACO aims to identify the near-optimal subset of candidate genes, called the best ant, during each  iteration that maximizes the objective function, and once the subset in question is found, our local search algorithm is called to ameliorate the accuracy or reduce the number of genes used while retaining the classification accuracy found previously. After 100 generations of the ACO algorithm, we apply a backward local search algorithm to reduce the number of genes used in the last found best solution (Figures 13-16). ereafter, statistical analysis has been performed using the Kruskal-Wallis statistical test to evaluate our results and test the significance of the difference in the results (accuracy) obtained by our approach. e Kruskal-Wallis statistical test presented in Figures  17 and 18 shows a comparison of the results obtained by MWIS, MWIS-ACO, MWIS-ACO-LS, and 1NN classifier. According to these figures, the performance of MWIS-ACO and MWIS-ACO-LS approaches exceeds that of the MWIS method and the 1NN classifier. In terms of the statistical significance of the results (classification accuracy), the said test proves that the difference between the classification accuracy in ("1NN," "MWIS") and "MWIS-ACO-LS" is statistically significant (remarkable). Table 7 lists the best subset of genes selected by our proposed approach (MWIS-ACO-LS) for the datasets in which MWIS-ACO-LS gives the best performances compared to the other works (9_Tumors, Brain_Tumor1, Brain_Tumor2, and Prostate_Tumor). ese genes are potential biomarkers in cancer identification.
Based on the experiments we carried out, we can conclude that our approach of gene selection (MWIS-ACO-LS) is well-founded. Indeed, of the ten datasets used, our method has achieved a high classification accuracy. More exactly, the proposed method yielded a classification accuracy equal to or greater than 99.42% for all datasets, with a perfect classification (100%) for 9_Tumors, Brain_Tumor1, Brain_Tumor2, Leukemia1, Leukemia2, SRBCT, Prosta-te_Tumor, and DLBCL using less than 40 genes. e high classification accuracy found by our proposed methodology returns to two elements: the first is the combination of a method of the graph theory (MWIS) and the ACO metaheuristic, and the second is the use of a modified 1NN classifier where we normalize the training data in order to transform it to a common scale.
In the following, we do a comparison between our proposed method and some recent optimization algorithms using several classification datasets.

Comparison with State-of-the-Art Algorithms.
In this section, we compare our method with eight recently referred algorithms in the literature [6,[21][22][23][24][25]28]. And to make sense of this comparison, the experiments are performed under the same conditions in each algorithm. Specifically, our approach is executed ten times on each dataset, and then we choose the average and the best subset of genes found. We indicate that the papers [22,23] report just the best results found. Table 8 summarizes the classification accuracy and the number of selected genes (taken from the original papers) for the different approaches. e (− ) symbol indicates that the result is not reported in the related work. We remark that the results obtained by our approach are very encouraging compared to previous work. Indeed, for most of the datasets examined, the classification accuracies obtained by the proposed gene selection method matched or outperformed those obtained using other methods [6,[21][22][23][24][25]28]. First, for the dataset (9_Tumors) we achieve a perfect accuracy classification with only 40 genes. We find that the best performance for this dataset is attained by our approach (MWIS-ACO-LS), exceeding the best-known result by 5% in the accuracy [6]. We note that the number of genes reported in the FBPSO-SVM [6] is 71 genes to have a good accuracy.
Similarly, for the datasets (11 Tumors, Brain_Tumor1, Brain_Tumor2, and Prostate_Tumor), we get the best performance. In addition, we have a perfect classification (100%) for (Brain_Tumor1, Brain_Tumor2, and Prostate_-Tumor) with less than 21 genes. Table 9 reports the rank of the proposed method compared to other existing methods according to the average accuracy. e results mentioned in the table show that the proposed method has achieved the best average accuracy in most datasets. Indeed, we clearly see that our method is more suitable for gene selection. As shown in Tables 8 and 9, we match or exceed the  (%)   1  7  13  19  25  31  37  43  49  55  61  67  73  79  85  91  97  103  109 Generations Best Avg  1  6  11  16  21  26  31  36  41  46  51  56  61  66  71  76  81  86  91  96  101  106  111 Generations Avg Best Figure 13: Comparison of the evolution of the number of genes used for "9_Tumors."    1  7  13  19  25  31  37  43  49  55  61  67  73  79  85  91  97 103 Generations Best Avg performance of all comparison methods; except for the Brain_Tumor2 and Prostate_Tumor datasets, in which our approach comes in the second rank after the FBPSO-SVM approach. e results of this comparative analysis with previous methods for the gene selection in the context of cancer classification have enabled us to conclude that our nature-inspired optimization method is useful in the gene selection problem.

Conclusion
In this work, we have presented a hybrid approach (MWIS-ACO-LS) for the gene selection in DNA microarray data. e two-stage proposed approach consists of a preselection phase carried out using a new graph-theoretic approach to select first a small subset of genes; in this stage, we model the gene selection problem as an MWIS problem, and we    present a greedy algorithm to approximate the MWIS of genes and a search phase that determines a near-optimal subset of genes for the cancer classification. e latter is based on a modified ACO and a LS algorithm.
is approach aims to select a small subset of relevant genes from an original dataset which contains redundant, noisy, or irrelevant data.
e experimental results show that our approach compares very favorably with the reference methods in terms of the classification accuracy and the number of selected genes. Although the results obtained are interesting and encouraging, many points are likely to be studied in future works, such as (i) Modifying the MWIS method in order to improve the quality of the first subset of genes (ii) Combining the MWIS filter with other metaheuristics such as VNS is field of research will always remain active as long as it is motivated by the advances of data collection and storage systems on one hand, and by the oncology requirements on the other hand. e best approach for judging this selection of genes is to collaborate with experts (biologists) for a good interpretation of the results.