An Innovative Excited-ACS-IDGWO Algorithm for Optimal Biomedical Data Feature Selection

Finding an optimal set of discriminative features is still a crucial but challenging task in biomedical science. The complexity of the task is intensified when any of the two scenarios arise: a highly dimensioned dataset and a small sample-sized dataset. The first scenario poses a big challenge to existing machine learning approaches since the search space for identifying the most relevant feature subset is so diverse to be explored quickly while utilizing minimal computational resources. On the other hand, the second aspect poses a challenge of too few samples to learn from. Though many hybrid metaheuristic approaches (i.e., combining multiple search algorithms) have been proposed in the literature to address these challenges with very attractive performance compared to their counterpart standard standalone metaheuristics, more superior hybrid approaches can be achieved if the individual metaheuristics within the proposed hybrid algorithms are improved prior to the hybridization. Motivated by this, we propose a new hybrid Excited- (E-) Adaptive Cuckoo Search- (ACS-) Intensification Dedicated Grey Wolf Optimization (IDGWO), i.e., EACSIDGWO. EACSIDGWO is an algorithm where the step size of ACS and the nonlinear control strategy of parameter a→ of the IDGWO are innovatively made adaptive via the concept of the complete voltage and current responses of a direct current (DC) excited resistor-capacitor (RC) circuit. Since the population has a higher diversity at early stages of the proposed EACSIDGWO algorithm, both the ACS and IDGWO are jointly involved in local exploitation. On the other hand, to enhance mature convergence at latter stages of the proposed algorithm, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. To prove that the proposed algorithm is superior in providing a good learning from fewer instances and an optimal feature selection from information-rich biomedical data, all these while maintaining a high classification accuracy of the data, the EACSIDGWO is employed to solve the feature selection problem. The EACSIDGWO as a feature selector is tested on six standard biomedical datasets from the University of California at Irvine (UCI) repository. The experimental results are compared with the state-of-the-art feature selection techniques, including binary ant-colony optimization (BACO), binary genetic algorithm (BGA), binary particle swarm optimization (BPSO), and extended binary cuckoo search algorithm (EBCSA). These results reveal that the EACSIDGWO has comprehensive superiority in tackling the feature selection problem, which proves the capability of the proposed algorithm in solving real-world complex problems. Furthermore, the superiority of the proposed algorithm is proved via various numerical techniques like ranking methods and statistical analysis.


Introduction
Currently, there is a growing research interest in developing and deploying population-based metaheuristics to tackle combinatorial optimization challenges. This is because they are simple, flexible with an inexpensive computational cost, and gradient-free [1].
Many researchers have applied these optimization algorithms in various research domains because of their ability to achieve best solutions.
The optimization challenge grows bigger when tackling highly dimensioned datasets. This is because these datasets have a vast feature space with many classes. Due to the presence of redundant and noninformative attributes within these datasets, the process of effective machine learning greatly hindered. Thus, the construction of efficient classifiers with high predictive power largely depends on selection of informative features [2].
Feature selection (FS) is one of the main steps in data preprocessing that aims at selecting a subset of attributes out of the whole dataset resulting into removal of noisy noninformative and redundant features. This in turn increases the accuracy of a considered classifier or clustering model [3].
FS algorithms can be broadly categorized into two classes: filter and wrapper techniques [4,5]. Filters include techniques independent of classifiers and work directly on presented data. Moreover, these methods in many situations determine the correlations between features. On the contrary, wrapper approaches engage classifiers and mainly determine interactions between dataset features. From literature, wrapper approaches have proved to be superior compared to filters for classification algorithms [6,7].
To utilize wrapper-based techniques, three key factors need to be outlined: considered classifiers (i.e., k-nearest neighbor (KNN), support vector machine (SVM)), evaluation criteria for the identified feature subset, and a search technique utilized in determining a subset of optimal features [8].
Many researchers have pointed out that determining an optimal subset of attributes is not only challenging but computationally expensive as well. Though, in the recent past, metaheuristics have proved to be reliable and efficient tools in tackling many optimization tasks (e.g., engineering designs problems, machine learning, feature selection, and data mining), they are not efficient in solving problems with high computational complexity [5,[9][10][11].
In the recent past, a number of metaheuristic search algorithms have been utilized for FS using highly dimensioned datasets. Some of these metaheuristics are the grey wolf optimization (GWO) [12,13], genetic algorithm (GA) [14], particle swarm optimization (PSO) [11], ant-colony optimization (ACO) [15], differential evolution algorithm (DEA) [16], cuckoo search algorithm (CSA) [17], and dragonfly algorithm (DA) [18]. Though, many of these algorithms have already made an important contribution in the field of feature selection, in many cases, they offer acceptable solutions without a guarantee of determining optimal solutions since they do not explore the entire search space [11].
Some of the new modifications that have been proposed to improve the performance of these metaheuristics include chaotic maps [19], evolutionary methods [20], sine cosine algorithms [21], biogeography-based optimization, and local searches [22].
While designing or utilizing a metaheuristic, it should be noted that diversification (exploring the search space) and intensification (exploiting optimal solutions obtained so far) are two contradicting principles that must be balanced efficiently in order to achieve an improved performance of the metaheuristic [9].
In this regard, one promising alternative is developing a memetic algorithm whereby an integration of (at least) two algorithms is done with the aim of enhancing the overall performance.
Motivated by this, a good number of hybrid algorithms have been proposed in the recent past to solve a variety of optimizations and feature selection problems [23]. However, to enhance diversification and intensification of these hybrid algorithms, exploration and fine-tuning within their basic constituent algorithms is needed prior to hybridization [24].
This emphasizes, too, that there are a number of techniques lying within these memetic algorithms that are yet to be investigated.
Firstly, the technique of combining one or more natureinspired algorithms (NIAs) needs to be determined. Secondly, the criterion of determining how many NIAs need to be combined within the search space has to be accomplished. Thirdly, the method of determining the application area upon which the proposed memetic algorithm has to be done. Finally, the criterion of applying the memetic algorithm in a specific domain has to be accomplished [24].
Inspired by the aforementioned, this paper proposes a new hybrid algorithm called Excited-(E-) Adaptive Cuckoo Search-(ACS-) Intensification Dedicated Grey Wolf Optimization (IDGWO), i.e., EACSIDGWO algorithm to solve the feature selection problem in biomedical science. In the proposed algorithm, the concept of the complete voltage and current responses of a direct current (DC) excited resistor capacitor (RC) circuit is innovatively utilized to make the step size of ACS and the nonlinear control strategy of parameter a ! of the IDGWO adaptive. Since the population has a higher diversity during early stages of the proposed algorithm, both the ACS and IDGWO are jointly utilized to attain accelerated convergence. However, to enhance mature convergence while striking an effective balance between exploitation and exploration in latter stages, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. The remainder of this paper is organized as follows: Section 2 discusses the existing literature within the same research domain. Section 3 presents the background information of the CS and the GWO, respectively, where their inspirations and mathematical models are given emphasis. The continuous version of the proposed EACSIDGWO algorithm is presented in Section 4 while the details of its binary version are given in Section 5. The experimental methodology considered in this paper is presented in Section 6 while the results on feature selection are discussed in Section 7. Finally, conclusions and the suggested future works are given in Section 8.

Review of Hybridization of GWO with Other Search
Algorithms. Combining two or more metaheuristics to attain better solutions is currently a new insight in the area of optimization. In the literature, many researchers have utilized GWO in the field of hybrid metaheuristics. For instance, in [25], a hybrid of GWO and artificial bee colony (ABC) is proposed to improve performance of a complex system. In [26], GWO is hybridized with ant lion optimizer (ALO) for wrapper feature selection. Alomoush et al. [27] proposed a hybrid of GWO and harmony search (HS). In this memetic, GWO updates the bandwidth and pitch adjustment rate in HS, which in return improves the global optimization abilities of the hybrid algorithm. In [28], Arora et al. combined GWO with the crow search algorithm (CSA). The performance of the derived memetic as a feature selector is evaluated using 21 datasets. The obtained results reveal that the combined algorithm is superior in solving complex optimization algorithms. In [29], a novel combination between GWO and PSO is utilized as a load-balancing technique in the cloudcomputing arena. The conclusions point out that the hybrid algorithm improved both the convergence speed and the simplicity in comparison with other algorithms. Zhu et al. [30] hybridized GWO with differential evolution (DE). The hybrid algorithm was tested on 23 different functions and a nondeterministic polynomial hard problem. The obtained results indicate that this combination achieved superior exploration. In [31], a new memetic combining the exploration ability of the fireworks algorithm (FWA) with the exploitation ability of GWO is proposed. Utilizing 16 benchmark functions with varied dimensions and complexities, the experimental results indicate that the hybrid algorithm attained attractive global search abilities and convergence speeds.

Review of Hybridization of CS with Other Search
Algorithms. Utilizing the concept of rand and best agents within a population, Cheng et al. [32] developed an ensemble cuckoo search variant combining three different CS approaches that coexist within the entire search domain. These CS variants actively compete to derive superior generations for numerical optimization. To maintain population diversity, he introduced an external archive. The statistical results obtained reveal that the ensemble CS attained attractive converge speeds as well as robustness. In [33], GWO is hybridized with CS, i.e., GWOCS for the extraction of parameters for different PV cell models situated in different conditions. Zhang et al. [34] developed an ensemble CS algorithm that foremost divides a population into two smaller groups and then utilizes CS and differential evolution (DE) on the derived subgroups independently. The subgroups are free to share useful information by division. Further, the CS and DE algorithms can freely utilize each other's merits to complement their weaknesses. This approach proved to balance the quality of solutions and the computation consumption. In [34], CS is hybridized with a covariance matrix adaptation evolution approach, i.e., CMA-CS to improve the performance of CS in different optimization problems.
Despite the advantages portrayed by the aforementioned hybrid GWO and CS metaheuristics for optimization and feature selection, superior hybrid approaches can be achieved if the single GWO and CS algorithms are improved prior to hybridization. Furthermore, the no-free-lunch (NFL) theorem has logically proved that there has been, is, and will be no single metaheuristic capable of solving all optimization and feature selection problems [33]. While a given metaheuristic can show an attractive performance on specific datasets, its performance might degrade when applied to similar or different types of datasets [34]. Thus, there is still a dire need to improve existing algorithms or develop new ones to solve function optimization problems as well as feature selection problems efficiently.

Standard Cuckoo Search (CS)
3.1. Inspiration of CS 3.1.1. The Behavior of Cuckoo Birds. To date, more than a thousand different species of birds are in existence in nature [35]. For most of these species, the female birds lay eggs in nests they have built themselves [36]. However, there exist some types of birds that do not build nests of their own, but instead lay their eggs in other different species' nests, leaving the responsibility of taking care of their eggs to the host birds. The cuckoos are the most famous of these brood parasites [37].
There are three types of brood parasites: intraspecific brood parasites, cooperative breeding, and nest takeover [38].The cuckoo strategy is full of amazing traits; foremost, it replaces one host egg with its own to increase the chances of its egg being hatched by the host bird. Next, it tries to mimic the pattern and color(s) of this host eggs with the aim of reducing the chances of its egg being noticed and discarded by the host bird. It is also important to point out that the timing of laying its egg is amazing since it cleverly selects a nest where a host bird has just laid eggs, implying that the cuckoo's egg will hatch prior to the host eggs. The first action taken by the hatched cuckoo is evicting the host eggs that are yet to hatch out of the nest by blind propelling in order to increase its chances of being fed well by the host bird [37]. In addition, this young cuckoo mimics the call of host chicks thus enhancing more access to the food provided by the host bird [39].
However, if this host bird is able to identify the cuckoo's egg, it can either discard it from the nest or quit this nest to build a completely new nest in a different location.
3.1.2. Le'vy Flights. From literature, many researchers have shown that the behavior of many flying animals, birds, and insects can be demonstrated by a Le'vy flight [40][41][42][43]. Le'vy flights are evident when some birds, insects, and animals follow a long path with sudden turns in combination with random-short moves [43].These Le'vy flights have been successfully applied in optimization [41,[43][44][45]. A Le'vy flight is a random walk characterized with step lengths whose distribution is according to a heavy-tailed probability distribution.

Cuckoo Search (CS)
Algorithm. CS is a metaheuristic swarm-based global optimization based on cuckoos that was proposed by Yang and Deb in 2009.The CS combines the obligate brood parasitic nature of cuckoos with the Le'vy flight existing in fruit flies and some birds [38]. There are three basic idealized rules for the CS, namely: (i) A female cuckoo lays one egg at a time and puts it in a randomly chosen nest (ii) The best nests with high-quality eggs (highest fitness/solutions) will carry over to the next generations (iii) The number of available host nests is kept fixed, and the host bird can discover the egg laid by the female cuckoo (alien egg) with a probability P a ∈ ½0, 1.
Depending on the value of P a , the host bird can 3 BioMed Research International either throw away the alien egg or abandon the nest. An assumption that only a fraction of P a nests are replaced by new ones Based on the above rules, an illustration of the CS is shown in Algorithm 1.

Mathematical
Modelling of the Standard CS. Considering Algorithm 1, the standard CS has three major steps [46][47][48]: (1) Exploitation (intensification) by the use of Le'vy flight random walk (LFRW) (2) Exploration (diversification) using biased selective random walk (BSRW) (3) Elitist scheme via greedy selection 3.3.1. Intensification Using Le'vy Flight Random Walk (LFRW). In this phase, new solutions are generated around the current best solution, which in return enhances the speed of the local search. This phase is achieved via the LFRW that is generally presented in (1) where the step size is derived from the Le'vy distribution.
where X i,gen is the i th nest in the gen th generation and X i,gen+1 is a new nest generated by the Le'vy flight. ⊕ implies entrywise multiplications, and α is the step size where α > 0 and is formulated in (2). The formula in equation (1) ensures that a new solution will be close to the current best solution.
where X best is the current solution and α 0 is a scaler that is set to 0.01 in the standard CSA [38,49]. Le ' vy ðλÞ is a random number derived from the Le'vy distribution and is formulated in where λ is a constant whose value is 1.5 as suggested by Yang in the standard CS [38]. ε and φ are random numbers derived from a normal distribution whose mean and standard deviation is 1. ∂ is a parameter computed in where ⌈ is a gamma function. The final form of Le'vy flight random walk (LFRW) is a combination of equations (1) to (4) as presented in

Diversification by the Use of Biased Selective Random
Walk (BSRW). In this phase, new solutions are randomly generated in locations far from the current best solution, an approach that ensures that the CSA is not trapped in the local optimum thus enhancing suitable diversity and exploration of the entire search space [48]. This phase of the CSA is achieved by utilizing the BSRW which is efficient in exploring the entire search space especially when it is large since the step size in the Le'vy flight is much longer in the long run [46,48].
To find new solutions that are far from the current best solution, foremost, a trial solution is obtained by using a mutation of the current best solution and a differential step size from two solutions selected randomly. Then, a new solution is derived from a crossover operator between the current best solution and the two trial solutions [48]. The formulation of the BSRW is given in [47].
where a and b are two random indexes, s is a random number in the range [0, 1], and P a is the probability discovery whose best value is 0.25 [38,48].

Elitist Scheme via Greedy Selection.
After each random walk process, the cuckoo search algorithm utilizes the greedy strategy to select solutions with better fitness values that will be passed to the next generation. This facilitates maintenance of good solutions [48].

Grey Wolf Optimization (GWO) Algorithm
GWO is a recent nature-inspired metaheuristic algorithm that was proposed by Mirjalili et al. in 2014 [28,50,51]. The GWO imitates both the hunting and leadership traits of the grey wolves. The grey wolves belong to the Canidae family and follow a social hierarchy that is very strict. In most cases, a pack of between 5 and 12 wolves is involved in hunting. To efficiently simulate the leadership hierarchy of the conventional GWO algorithm, four levels are considered: alpha (α), beta (β), delta (δ), and omega (ω). Alpha, which is either a male or female is at the topmost of the hierarchy and is regarded as the leader of the pack. This leader makes all suitable decisions for the pack which are not limited to discipline and order, hunting, sleeping location, and waking-up time for the entire pack. Beta is known to assist the alpha in decision-making, and their main task is the feedback suggestions. Delta behaves like scouts, caretakers, sentinels, hunters, and elders. They control and guide the omega wolves by obeying both the beta and alpha wolves. The omega wolves are the least in the hierarchy and must obey all the other wolves [28,50,51]. The GWO algorithm is modelled mathematically in four stages that are described as follows.

Leadership
Hierarchy. The mathematical model of the GWO is anchored on the social hierarchy of the grey wolves. The alpha (α) is considered the best solution in the 4 BioMed Research International population while beta (β) and delta (δ) are termed as the second and third best solutions, respectively. Lastly, the omega (ω) is assumed as the rest of the solutions in the population [28,50,51]. (7) and equation (8) represent the mathematical model for the wolves' encircling trait [50].

Encircling the Prey. Equation
where D ! is the distance between the prey and a given wolf. X ! is the wolf's position vector, and X ! p depicts the prey's position vector at iteration t. A ! and C ! are random vectors computed as shown in [50].
where r ! 1 and r ! 2 are randomly generated vectors in the range [0, 1] and a ! is a set vector that linearly decreases from 2 to 0 over the iterations.

Hunting the Prey.
In the hunting stage, the alpha is considered the best applicant for the solution while its two assistants (beta and delta) are expected to know the possible location of the prey. Thus, the best three solutions that have been achieved until a given iteration are preserved and are used to compel the remaining wolves in the pack (i.e., omega) to update their positions in the search space consistent with the optimal location.
The mechanism utilized in updating the wolves' positions is given in where X ! 1 , X ! 2 , and X ! 3 are defined and computed using where X ! α , X ! β , and X ! δ are the three best wolves (solutions) in the pack at a given iteration t. A where iter is the iteration number and Max iter is the optimal total number of iterations.
When jA ! j < 1, the wolves are forced to attack the prey, and when jA ! j > 1, the wolf diverges out from the current prey.Searching for the prey is the exploration phase while attacking it is the exploitation phase.

Excited-Adaptive Cuckoo Search-Intensification Dedicated Grey Wolf Optimization (EACSIDGWO)
In general, effective balancing between diversification (global search) and intensification (local search) in a metaheuristic plays a beneficial and crucial role in achieving excellent performance of an algorithm [52][53][54]. However, it is difficult to achieve this balance with a single metaheuristic (for example, either using CSA or GWO) [52,53]. For instance, CSA is efficient at exploring the promising area of the whole search space (diversification) but ineffective at fine-tuning the end of the search space (exploitation/intensification) [55,56]. On the other hand, GWO is good at intensification (exploitation) but inefficient at diversification (exploration) [32,57]. For this reason, in trying to enhance mature convergence while ensuring that the required effective balance between diversification and intensification is met, a hybrid algorithm called Excited-Adaptive Cuckoo Search-Intensification Dedicated Grey Wolf Optimization (EACSIDGWO) utilizing the strengths of each algorithm (i.e., CSA's diversification and GWO's intensification abilities) is proposed in this paper. Moreover, the adaptability of the proposed EACSIDGWO is guided innovatively by the complete voltage and current responses of a DC excited RC circuit (whose analysis results in first order differential equations) that finds continual applications in electronics, communications, and control systems [58].

Adaptive
Step Size via the Complete Voltage Response of the DC Excited RC Circuit. From the details of the standard CS algorithm presented in Section 2, it is evident that the algorithm lacks a criterion to control its step size through the iteration process. Control of the step size is key in guiding the CS algorithm to reach either its global maxima or minima [48,59].
Inspired by the complete voltage response of a direct current (DC) excited RC circuit which increases with time, a novel mechanism to control the step size is proposed. Contrary to prior research [48,59] where the step size decays with generations, in this research, the step size grows with generations with the aim of strengthening the diversification (exploration) ability of the CS, which is a component of the proposed EACSIDGWO algorithm.
The solution to the first order differential equation of the direct current-excited RC circuit motivated the formulation of a new variant of ACS in this paper.
The complete voltage response of the RC circuit to a sudden application of a DC voltage source, with the assumption that the capacitor is initially not charged, is given in where τ = R * C is the time constant, which expresses the rapidity with which this voltage vðtÞ rises to the value of V s which is a constant DC voltage source. R and C are the equivalent resistance and capacitance in the circuit, respectively. Considering the situation when t > 0, equation (15) can be rewritten as presented in As t → ∞, the component ð1/e t Þ → 0 forcing vðt → ∞Þ → V s . We adopt this concept, i.e., the exponential growth of vðtÞ to control the step size of the cuckoo search algorithm by introducing the proposed where gen is the current generation (iteration), step Max is the upper bound of the step size step, and gen Max is the maximum number of generations (iterations).
To ensure that the step gen+1 is proportional to the fitness of a given individual nest within the search space in the current generation, the nonlinear modulation index τ is formulated in where τ i,gen is the nonlinear modulation index for the i th nest in generation gen, α nestf gen is the fitness value of the alpha (α) nest (overall best nest) in generation gen, β nestf gen is the fitness value of the beta (β) nest (2 nd best nest) in generation gen, δ nestf gen is the fitness value of the delta (δ) nest (3 rd best nest) in generation gen, i nestf gen is the fitness value of the i th nest in generation gen, and worst nestf gen is the fitness value of the worst nest among the remaining omega (ω) nests (i.e., nests whose fitness values are not featured among the top three fitness values). Thus, equation (17) is further modified as BioMed Research International where step i,gen+1 is the step size for the for the i th nest in generation gen + 1. From equation (19), the step size step i,gen+1 is nonlinearly increasing from relatively small values to values close to step Max . The reason for proposing a nonlinearly increasing strategy are as follows. Foremost, at the early stages of the proposed EACSIDGWO algorithm, whereby ACS is a component, the population has a higher diversity. A higher diversity implies a stronger ability to explore the global space. Our aim at this point is to accelerate convergence. Therefore, the value of the step size step i;; gen+1 is set to a smaller value.
It is important to point out that the anticipated accelerated convergence is a joint effort attained by foremost setting the step i,gen+1 of the ACS to a small value at early stages and utilizing the IDGWO (whose details are presented in Section 4.2) whose core task is exploitation.
On the other hand, since the proposed EACSIDGWO algorithm is a hybrid algorithm where the ACS cooperatively works with the IDGWO, all the nests will be attracted to the global optima, i.e., the alpha (α) nest at the later stage. This will compel them to converge prematurely without being given enough room to explore the search space. Such a situation will lead the nests away from a local optimum and encourage diversification. For this reason, the value of the step size step i,gen+1 is set to a larger value, i.e., step Max . In this paper, the step Max is set to 1.
In other words, our main reason for proposing a nonlinearly increasing step size step i, gen+1 is that its small values at the initial stages of the proposed EACSIDGWO algorithm facilitates "local exploitation" while its larger values in the later stages will facilitate "global exploration".
The ACS can then be modeled as presented in Equation (20) is a formulation of the new search space for the ACS from the current solution.
Moreover, if this step size is considered proportional to the global best solution, then equation (20) can be formulated as given in where X gbest, gen is the global best solution among all X i for i = 1, 2, ⋯, n at generation gen, and n is the number of host bird nests. Thus, from equations (17), (18), (19), (20), (21), it is evident that the diversification ability of the ACS is heightened as the number of generations (gen) approach the maximum number of generations (gen Max ). This is because the value of the step size rapidly increases towards the set maximum value of step (step Max ). can enhance a good balance between global diversification (exploration) and local intensification (exploitation).

Intensification Dedicated Grey
In the original GWO (described in Section 3), the value of a ! linearly decreases from 2 to 0 (refer to equation (14)).
However, the search process of the GWO algorithm is both nonlinear and complicated, which cannot be truly reflected by the linear control strategy of a ! presented in equation (14).
In addition, Mittal et al. [60] proposed that an attractive performance can be attained if parameter a ! is nonlinearly decreased rather than decreased linearly. Inspired by the complete current response of a direct current (DC) excited RC circuit which increases with time, a novel nonlinear adjustment mechanism of control parameter a ! is formulated in this paper.
The complete current response of the RC circuit to a sudden application of a DC voltage source, with the assumption that the capacitor is initially not charged, is given in As t → ∞, the component ð1/e t Þ → 0 forcing iðt → ∞Þ → 0. We adopt this concept, i.e., the exponential decay of i ðtÞ to formulate a novel improved strategy, i.e., equation (23) to generate the values for control parameter a ! .
where gen is the current generation (iteration), a o is the initial higher value of parameter a and gen Max is the maximum number of generations (iterations). τ i,gen is the nonlinear modulation index described earlier by equation (18).

Consequently, vector A
! is computed as given in Equation (23) is a nonlinear decreasing control parameter for a ! i,gen whose initial upper limit is equal to the value a o while its final lower limit is zero.
From the original literature of GWO, the value jA ! j < 1 compels the grey wolves to move towards the prey (exploitation) while jA ! j > 1 compels them to move away from the prey in search of a fitter prey (exploration). Thus, setting a o to 1 will always force the wolves to move to the prey which will enable us the dedicated modified GWO algorithm, a component of proposed EACSIDGWO, for intensification.

Enhanced Mature Convergence via a Fitness
Value-Based Position-Updating Criterion. Both diversification and intensification are crucial for population-based optimization algorithms [60]. However, from the detailed account of the conventional GWO (refer to Section 3), it is evident that all the other wolves are attracted towards the three leaders α, β, and δ; a scenario that will force the algorithm to converge prematurely without attaining sufficient diversification of the search space. In other words, the conventional GWO is prone to premature convergence.
In reference to the position-updated criterion of GWO described by equation (11), a new candidate individual is obtained by moving the old individual towards the best leader (α wolf ), the second best leader (β wolf ), and the third best leader (δ wolf ). This approach will force all the other grey wolves to crowd in a reduced section of the search space that might be different from the optimal region and without giving them a leeway to escape from such a region. In an effort to overcome this major drawback, in this paper, a scheme that promotes mature converge is devised.

Instead of averaging the values of vectors
(a form of recombining them) as a mechanism of updating the wolves' positions (refer to equation (11)), in this paper, we make full use of the information of their fitness values as a criteria of arriving at new positions for the wolves.
Foremost, the search agents of the populations X ! 1 , X ! 2 , and X ! 3 are computed as given in where i = 1, 2, ⋯, n and j = 1, 2, ⋯, d. n is the population size while d is the dimension of the search space.
where X ! j i,gen is vector j computed using search agent i during iteration gen, X j f i,gen is the fitness value of vector X ! j i,gen .

Proposed EACSIDGWO (Continuous Version).
We cooperatively combined the proposed adaptive cuckoo search (ACS) and the intensification-dedicated grey wolf optimization (IDGWO) and developed the EACSIDGWO. In the EACSIDGWO algorithm, the ACS is actively involved in intensification (exploitation) during the early stage when the population has higher diversity and diversification at later stages. On the other hand, the IDGWO is only actively involved in intensification in all the stages of the proposed algorithm. By doing so, an effective balance between diversification and intensification is achieved. In addition, mature convergence is enhanced which in the end leads to highquality solutions.

Proposed EACSIDGWO (Binary Version)
Selection of features is binary by nature [61]. Therefore, the proposed EACSIDGWO algorithm cannot be utilized in selection of features without further modifications.
In the proposed EACSIDGWO algorithm, the new positions of the search agents will have continuous solutions, which must be converted into corresponding binary values.
In this paper, this conversion is achieved by foremost applying squashing of the continuous solutions in each dimension using a sigmoid (S-shaped) transfer function [61]. This will compel the search agents to move into a binary search space as depicted by equation (31).
where X d i,gen is a continuous-valued position of the i th search agent in the d th dimension during generation gen.
The output S of the sigmoid transfer function is still a continuous value, and thus, it has to be the threshold to reach the binary-value one. Normally, the sigmoid function maps smoothly the infinite input to a finite output [61]. To arrive at the binary solution when a sigmoid function is used, the commonly stochastic threshold is applied as presented in where y d i,gen is the binary updated position at generation gen in the d th dimension and rand is a random number drawn from a uniform distribution ∈½0, 1. Y i,gen is the equivalent binary vector of the i th search agent at generation gen.
Using this approach, the original solutions remain in the continuous domain of the proposed EACSIDGWO algorithm and can be converted to binary when the need arises.
The pseudocode of the binary version of the proposed EACSIDGWO algorithm is presented in Algorithm 3.

8
BioMed Research International

Experimental Methodology
In this section, detailed accounts of the biomedical datasets, evaluation metrics, proposed fitness function, and the parameter setting for the considered metaheuristic algorithms are outlined.

Considered Biomedical Datasets.
To validate the performance of the considered metaheuristic algorithms, six benchmark biomedical datasets extracted from the UCI Irvine Machine [62] were utilized. Each dataset has two classes, and the performance of each of these algorithms is evaluated based on its ability to classify these classes correctly. Details of these datasets are given in Table 1.

Evaluation Metrics.
For the considered feature selection problem, the following evaluation metrics were utilized to compare the performance of each considered feature selection technique. Average Accuracy (Avg_Acc). It is one of the commonly used classification metric that represents the number of correctly classified instances by using a particular feature set. The mathematical formulation of this metric is given in Equation (33).
where N is the number of times (runs) a given metaheuristic algorithm is run, k represents the number of folds utilized, and Acc j is the accuracy reported during fold j. Acc j is defined in equation (34).
where TP and FN denote the number of positive samples in fold j that are accurately and falsely predicted, respectively, and TN and FP represent the number of negative samples in the same fold that are predicted accurately and wrongly, respectively [63]. Average Feature Length (Avg_NFeat). This metric characterizes the average length of selected features to the total number of features in the dataset. Equation (35) gives its mathematical formulation.
where Sel Feat i is the number of selected features in the testing dataset during run i. Minimum Accuracy (Min_Acc). It is the least value of accuracy reported during N runs. Equation (36) depicts its formulation.
where Avg crossAcc i is given by Maximum Accuracy (Max_Acc). It is the largest value of accuracy reported during N runs. Its mathematical formulation is given by Avg crossAcc j ! : ð38Þ  7 Assign the values of the 1 st ,2 nd and 3 rd best solutions i.e. X α , X β and X δ , respectively 8 repeat 9 for (i =1: i ≤ n) do 10 Update each search agent in the population using Equation (11) 11 Decrease the value of a using Equation (14) 12 Update the coefficients A and C as shown in Equation (9) and Equation (10)  Calculate τ i,gen and step i, gen+1 using Eq. (18) and (19), respectively 19 Generate a new cuckoo nest X i,gen+1 using Eq. (21) 20 Convert continuous values of X i,gen+1 to binary using Eq. (31), (32) and (36) 21 Train a classifier to evaluate the accuracy of the equivalent binary vector of X i,gen+1 and store the value in

BioMed Research International
Maximum Features Selected (Max_NFeat).It is the largest number of selected features during N runs. Equation (39) gives its mathematical formulation.
Minimum Features Selected (Min_NFeat). It is the least number of selected features during N runs. Equation (40) gives its mathematical formulation.
7.3. Evaluation of the Classifier Performance. Since the support vector machine classifier has already made immense contributions in the field of microarray-based cancer classification [63], it was adopted in this paper to evaluate the classification accuracy using the selected subset of features returned by the various considered metaheuristic feature selection approaches. The Matlab fitcsvm function that trains and cross-validates an SVM model was adopted in this paper. We specified the kernel scale parameter to "auto" to allow the function to select the appropriate scale factor using a heuristic search. With the SVM classifier, the data items are mapped points in an n − dimensional feature space (i.e., n = number of features) and each feature's value is a value of a given coordinate. The final output of this classifier is an optimal hyperplane which can be used to classify new cases [17,63].
However, the performance of the SVM classifier is highly dependent on the selection of its kernel function [17,63]. A reason why experiments were conducted using various kernels in this paper.
Selecting a suitable kernel is both dataset and problem specific and selected experimentally [17,63]. Based on the conducted experiments, suitable kernel functions were selected for the considered datasets. The considered datasets and their suitable kernel functions are presented in Table 2.
More information of selecting suitable SVM kernel functions is presented in [63].

Fitness Function.
The main aim of a feature selection exercise is to discover a subset of features from the whole set of existing features in a given dataset such that the considered optimization algorithm is able to achieve the highest possible accuracy using that subset. For instance, in datasets with many features (attributes), the objective is to minimize the number of selected features while improving the classification accuracy of the feature selection approach.
In classifications tasks, there exist higher chances that two feature subsets containing a different number of features will have the same accuracy [17]. However, if a subset with a large number of features is discovered earlier by a given optimization algorithm, it is likely that the one with least features will be ignored [17].
In trying to overcome this challenge, a fitness function proposed in [17] to evaluate the classification performance of optimization algorithms for feature selection tasks is adopted. This fitness function is given in where jNj represents the total number of features within a given dataset, jRj represents the number of selected features during run i, and Avg crossAcc i is the average crossvalidation accuracy reported during run i (refer to Equation (37)). β and α are two weights corresponding to the significance of the classification quality and the subset length, respectively. In this paper, β is set to 0.8 and α = 0:2 as adopted from [17]. It is important to point out that both terms are normalized by division by their largest possible values; i.e., the number of selected features jRj is divided by the total number of features jNj, and average accuracy Avg crossAcc i is divided by the value 1.

Parameter Setting for the Considered Feature Selection
Techniques. The performance of the proposed EACSIDGWO algorithm was compared to those of extended binary cuckoo search (EBCS), binary ant-colony optimization (BACO), binary genetic algorithm (BGA), and binary particle swarm optimization (BPSO) that were reported earlier in [17]. Table 3 indicates the selected parameter values for both the proposed BEACSIDGWO algorithm and each of the other algorithms as reported in [17].
To be consistent with the setup proposed in [17], the population size for the proposed EACSIDGWO was set to 30. Then, the algorithm was run 10 times to perform the feature selection task for each considered dataset. In addition, each run terminated when 10000 fitness function evaluations was attained. This approach allowed the proposed algorithm to utilize the fitness function at an equal number of times.
In this paper, all the experiments were conducted using Matlab 2017 running on Windows 10 operating system on a HP desktop with Intel® Core™ i7-3770CPU @ 3.4 GHZ with 12.0 GB of RAM.

Results and Discussion
To examine the diversification and intensification of the proposed EACSIDGWOA, detailed comparative study is presented in this section.
The efficiency and the optimization performance of the proposed algorithm have been verified by comparing and analyzing its results with those of four other state-of-the-art optimization algorithms.
The experimental classification results have been probed through statistical tests, comparative analysis, and ranking methods.
Tables 4-9 provide the performance of all the considered optimization approaches for feature selection using the datasets described in Section 7.1. It is important to point out that the best result achieved in each column for all the considered biomedical datasets is highlighted in bold while the worst is italicized.
To prove that the proposed EACSIDGWO is superior over the other four-optimization algorithms, Wilcoxon rank-sum test, i.e., a nonparametric statistical test, is also performed. The statistical results for the p, h, and z values obtained from the pairwise comparisons of the four groups are tabulated in Table 10. Tables 11 and 12 present a comparison of the overall ranking of the results obtained by the considered algorithms.  Tables 4-9, the proposed algorithm performed better on all the considered datasets In comparison with the original number of features in the considered datasets, there is a notable reduction in the number selected features by the proposed approach. For instance, the actual number of features in ovarian cancer, CNS, and Colon cancer datasets is 4000, 7129, and 2000, respectively, whereas the number of selected features by the proposed EAC-SIDGWO is 274.8, 1208.1, and 538.5, respectively. This clearly indicates that the proposed algorithm is able to reduce the number of features as well as locate the most significant optimal feature subsets. The strength of the proposed EACSIDGWO lies in its well-formulated algorithm (refer to Section 5) that enhances both its diversification and intensification capabilities which enables it to eliminate redundant (noninformative) attributes and then actively searches within the high-performance regions of the feature space.

Statistical
Analysis. The superiority of the proposed EACSIDGWO algorithm has been verified via Wilcoxon rank-sum test, i.e., a nonparametric test with a significance level of 5%. The results obtained for the pairwise comparison of the four groups are presented in Table 10. Observations from Table 10 reveal the statistical significance of the obtained experimental results for all the considered datasets. This clearly indicates that the proposed approach has an attractive performance in relation to the other four approaches. Thus, the overall statistical results by our algorithm are highly significant from the results of the four algorithms for all the considered datasets.       Values in bold represent the best result, and values in italic denote the worst in each column, respectively.   ). From the ranking, it is evident that the proposed EACSIDGWO algorithm obtained the best values in all these measures for all the datasets. Considering the final ranks, the proposed algorithm attained an attractive performance whose overall rank value is 37.This clearly reveals the superiority of EACSIDGWO algorithm in relation to the four state-of-the-art algorithms.

Conclusion
This paper proposed a new hybrid Excited-(E-) Adaptive Cuckoo Search-(ACS-) Intensification Dedicated Grey Wolf Optimizer (IDGWO), i.e., EACSIDGWO algorithm to solve the feature selection problem in biomedical science. In the proposed algorithm, the concept of the complete voltage and current responses of a direct current (DC) excited resistor capacitor (RC) circuit are innovatively utilized to make the step size of ACS and the nonlinear control strategy of parameter a ! of the IDGWO adaptive. Since the population has a higher diversity during early stages of the proposed algorithm, both the ACS and IDGWO are jointly utilized to attain accelerated convergence. However, to enhance mature convergence while striking an effective balance between exploitation and exploration in later stages, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. In order to test the efficiency of the proposed EACSIDGWO as a feature selector, six standard biomedical datasets from the University of California at Irvine (UCI) repository were utilized. The experimental results obtained prove that the proposed algorithm is superior to the state-of-the-art feature selection techniques, i.e., BACO, BGA, BPSO, and EBCSA in attaining a good learning from fewer instances and optimal feature selection from information-rich biomedical data, all these  while maintaining a high classification accuracy of the utilized data. In the future, utilizing this hybrid algorithm as a filter-feature selection approach seeking to evaluate the generality of the selected features will be a valuable contribution.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.