A Dual Level Analysis with Evolutionary Computing and Swarm Models for Classification of Leukemia

One of the major reasons of mortality in human beings is cancer, and there is an absolute necessity for doctors to identify and treat a person suffering from it. Leukemia is a group of blood cancers that usually originates in the bone marrow and results in very high number of abnormal cells. For the diagnosis of cancer, microarray data serves as an important clinical application and serves as a great aid to the entire medical community. The dimensionality of the microarray data is too high, and so selection of suitable genes is quite an important step for the improvement of data classification. Therefore, for the prediction and diagnosis of cancer, there is an utmost necessity to select the most informative genes. In this work, Minimum Redundancy Maximum Relevance (MRMR), Signal to Noise Ratio (SNR), Multivariate Error Weight Uncorrelated Shrunken Centroid (EWUSC), and multivariate correlation-based feature selection (CFS) are chosen as initial feature selection techniques. Then, to select the most informative genes, five different kinds of evolutionary optimization techniques too are incorporated here such as African Buffalo Optimization (ABO), Artificial Bee Colony Optimization (ABCO), Cockroach Swarm Optimization (CSO), Imperialist Competitive Optimization (ICO), and Social Spider Optimization (SSO). Finally, the optimized values are fed through classification process and the best results are obtained when multivariate CFS with SSO is utilized and classified with Probabilistic Neural Network (PNN), and a high classification accuracy of 95.70% is obtained.


Introduction
One of the worst diseases which causes a lot of deaths in humans is cancer [1]. There are various types of cancer, and it causes the cells to divide in an uncontrollable manner, resulting in tumors, complete breakdown to the immune system, and impairments of vital organs [2]. Some kinds of cancer cause a rapid cell growth while others cause cells to grow at a slow rate. Some forms of cancer result in visual growths named tumors while others such as leukemia do not. One of the three different blood cancer forms is leukemia while lymphoma and myeloma are the other two forms of blood cancer [3]. An abnormal number of immature white blood cells is produced by leukemia which collapses the bone marrow and prevents the promotion of healthy important blood cells required for developing a balanced immune system [4]. The onset of acute leukemia is rapid and progresses very fastly, and therefore, urgent treatment has to be provided to them. Thus, leukemia belongs to a broad array of cancer disease and is commonly termed as hematological malignancies. There are two types here such as Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL) [5].
AML: This kind of leukemia is the most prevalent type in older people but can affect younger people too. Due to the excess accumulation of immature hematopoietic cells in the blood and bone marrow, the malignancy occurs. Various genetic factors are responsible for such conditions.
ALL: This kind of disease is prevalent in children who are suffering from leukemia. When immature lymphoid cells excessively accumulate in the bone marrow and peripheral blood, this disease occurs.
Based on their morphological appearance, the categorization of the leukemia cells has been done traditionally. To identify the innate differences between tumor cells, there is an absolute necessity for highly skilled technological resources [6]. Such a process can be very expensive, highly time consuming, and tedious to handle. In a morphological manner, the cells can appear as similar, but each cell can respond quite differently to appropriate drugs and therapy [7]. Therefore, traditional techniques have huge limitations, and therefore, it leads to a necessity to identify other parameters so that cell categorization can be well framed [8]. High amount of useful information is provided by the gene expression data for subclassification studies. For managing gene expression data of thousands of genes simultaneously, microarrays have played quite an important role in it [9]. In the previous decade, microarray technology has been the most commonly used gene quantification method and is still in use due to the cheap and inexpensive nature of this technology [10]. Thus, using microarray techniques, the expression levels for tens of thousands of genes can be measured easily so that a functional relationship information is provided to the scientists between the physiological and cellular process of the biological organisms and genes [11]. As the microarray data is so huge to process owing to its large amount of noise and other disturbances, the curse of dimensionality problem is present and so gene selection is important so that the best genes are selected and provided for classification [12]. Some of the most important works done in leukemia microarray-based cancer classification are as follows: For the diagnosis of chronic lymphocytic leukemia, Artificial Neural Network (ANN) was implemented by Aghamaleki et al. [13]. A novel prognostic classification of chronic lymphocytic leukemia derived from a multivariate survival analysis was done by Binet et al. [14]. ANN was utilized for recognizing and predicting leukemia by Afshar et al. [15]. Utilizing momentum back propagation and genetic algorithms as a feature selection technique, microarraybased leukemia classification was performed by Wisesty et al. [16]. The Leukemia diagnosis using transfer learning in Convolutional Neural Networks (CNNs) for classification was performed by Vogado et al. [17]. An effective Map Reduce-based KNN classifier was utilized for the analysis of microarray leukemia data by Kumar et al. [18]. An ensemble machine learning for leukemia cancer diagnosis based on microarray datasets was done by Alrefai [19]. A framework to detect and discriminate ALL and AML using microarray gene expression profiles utilizing supervised machine learning was done by Dwivedi [20]. To classify gene expression profiles of acute leukemia, various features and classifiers were explored by Cho [21]. An enhanced leukemia cancer classifier algorithm was done by Nasser et al. [22]. The application of Probabilistic Neural Network (PNN) to the class prediction of leukemia was done by Huang et al. [23]. Utilizing Partial Least Squares (PLS) method, the classification of acute leukemia based on DNA microarray gene expression was done by Nguyen et al. [24]. A SNR approach to discriminate AML with ALL was done by Goloub et al. [25]. Gene expression-based leukemia subclassification using committee neural network was found by Sewak et al. [26]. A leukemia multiclass assessment and classification from microarray and RNAsequencing technologies integration at gene expression level was performed by Castiollo et al. [27].
Optimization algorithms have played a major role in gene selection procedure. An optimization-based tumor classification from microarray gene expression data was done by Dagliyan et al. [28], random cuckoo search for autism gene selection [29], and stellar mass black hole for engineering optimization etc [30]. Optimization models for cancer classification extracting gene interaction information from microarray expression was performed by Antonov et al. [31]. Other optimizations for cancer gene selection included a modified genetic algorithms with Levy flight [32], simplified swarm optimizations [33], chronological grasshopper optimization algorithms [34], Hybrid optimization algorithms [35], adaptive ant colony optimization [36], biogeography-based optimization [37], nondominated sorting GA [38], filter-based optimization [39], Particle Swarm Optimization (PSO) [40], Grey Wolf optimization [41], and hybrid of Grey Wolf and Crow search algorithm [42] have been reported in literature. In this work, the two-level feature selection employing statistical tests and then optimization techniques are done and then classified with suitable classifiers. The organization of the work is as follows. The experimental procedure is discussed in Section 2 along with the suitable gene/feature selection techniques. Section 3 gives the details about the different optimization techniques, and Section 4 gives the classification techniques' details. The results and discussion are done in Section 5, and the paper is concluded in Section 6.

Materials and Methods
For the leukemia classification, a dataset was used which is publicly available online [25]. There are two types of leukemia, where 25 samples of acute myeloblastic leukemia (AML) and 47 samples of acute lymphoblastic leukemia (ALL) are found. The details of the dataset are tabulated in Table 1.
The illustration of the work is shown in Figure 1.
2.1. Techniques to Select the Genes. The gene selection techniques utilized in this paper are as follows. The intention of this procedure is to shortlist the best 2000 genes from 7129 genes.

Minimum
Redundancy-Maximum Relevance (MRMR). By means of minimizing redundancy, the features are selected with a maximum minimizing relevance [43]. To measure and assess the relevance for discrete datasets, a mutual information criterion is utilized by MRMR.
For a feature Y j , the F-test value is expressed by where S = fS k g is the class set k = 1, 2, ⋯, m, μ j represents the mean of Y j , μ jk expresses the mean of Y j for class S k , and σ 2 = ½Σ k ðn k − 1Þσ 2 k /ðn − 1Þ represents the pooled variance for given size n k and variance σ 2 k of class S k . For feature subset T, the maximum relevance criterion is expressed The selection of the first method is done by this method, and utilizing the linear incremental search algorithm based on optimization function, the rest of the features are selected. However, for continuous variables, the two popular linear search schemes are MRMR-FDM and MRMR-FSQ schemes (F test distance multiplicative) and (F test similarity quotient).
For MRMR-FDM, the optimization condition is expressed as where dðY j , Y q Þ is the Euclidean distance between feature Y j and Y q .
For MRMR-FSQ optimization, 2.1.2. Signal to Noise Ratio. Pearson Correlation Coefficient (PCC) is quite an important measure utilized to find the gene significance. It is changed to specify the importance of SNR in using a gene as a predictor [44]. For a particular gene, to find the predictor strength, this predictor is utilized.
For a gene ′ g ′ , the calculation of SNR is done as The mean of the normal samples is expressed by y 1 , and the mean of the tumor sample is expressed by y 2 . sd 1 and sd 2 are the standard deviations of normal and tumor samples, respectively. The primary difference between the classes with respect to the standard deviation in between the classes is used by this value. Between the class distinction and the gene expression, a strong correlation is indicated if the values of SNRðgÞ are larger. If the values of SNRðgÞ are either positive or negative, then it corresponds to the gene being highly expressed in either class 1 or class 2. The genes which have a very large SNR value are quite informative, and so it is selected for cancer classification.

Multivariate Error-Weighted Uncorrelated Shrunken
Centroid (EWUSC). Based on Shrunken Centroid (SC) and Uncorrelated Shrunken Centroid (USC), this technique was developed [45]. When the average gene expression for each gene in every class is divided by the standard deviation for that gene in the same class, then the Shrunken Centroid is found. Genes where expression is similar among the various samples of the same class, then higher weight is assigned to it. Using squared distance, to the label with the nearest average pattern, the assignment of new samples is done. From tracing the genes that are highly correlated in the set of genes found by SC, the redundant features are removed by USC approach. Both of these steps are used by EWUSC in addition to the error weights addition so that the redundant genes are removed, and the noisy genes are downgraded.

Multivariate
Correlation-Based Feature Selection (CFS). When features are highly correlated with the class but uncorrelated with each other, then it forms a good feature subset [46]. By analyzing the predictive ability of every feature individually along with the degree of redundancy, the evaluation of a subset by CFS method is done. The main advantage of this technique is that a "heuristic merit" is provided for a feature subset instead of individual features. So, it implies that for a particular heuristic or function, the algorithm can decide on its progress by selecting the best options so that the output function is maximized.

Optimization Techniques
The shortlisted 2000 genes will undergo again a secondary feature selection methodology by means of utilization optimization techniques so that the best 50, 100, and 200 genes are finally considered and that is mentioned as a dual level analysis in this work. The feature selection is done using the five optimization algorithms as follows.

African Buffalo Optimization Algorithm.
To get the best solution in the search space, ABO is utilized [47]. Within the herd population, the initialization of the buffaloes is done.  Figure 1: Illustration of the work.

BioMed Research International
Then, by updating their locations, the global optimum is searched for as they tend to follow the current best buffalo bz max in the herd. In the problem space, the buffaloes make sure it keeps track of its coordinates to achieve the best fitness value. The ideal location of the specific buffalo which is considered as the best with respect to the optimal solution is termed as bq max:h. Progressing towards bq max:h and bz max, the dynamic location of every buffalo is traced depending on where the importance is specified and kept at a particular location. The learning parameters has a great effect on the speed of each animal.
The ABO algorithm steps are explained as follows: (1) Initialization: the buffaloes are randomly placed to the different nodes of the solution space (2) Buffalo fitness value updation: the fitness value is updated as where v:h and f :h indicate the exploration and exploitation moves of the h th buffalo ðh = 1, 2, ⋯, NÞ, lq1 and lq2 are learning factors, bz max is the best fitness of the herd, and bq max:h denotes the h best found location of the individual buffalo.
(3) The location of the buffalo h is updated utilizing the following formula as (4) If the updation of bz max is done, then proceed to step (5) or else go to (2) of this algorithm (5) Check for the meeting of the stopping criteria. If met, go back to algorithm step (3) or else go to (6) The best solution is taken as the output The updation equation (6) of the buffalo has 3 sections. The memory of the past location of the buffaloes is represented by f :h; a good memory ability is present for the buffalo which helps it to mention the places it has been before. This particular ability of the buffalo is important as it helps to search for best solutions by avoiding the areas that gave negative or poor results. As an alternative for the present local maximum location, a list of solutions is provided by the memory of each buffalo. The second part lq1ðbz max − v:hÞ represents the cooperative nature of buffaloes and indicates the social nature of the buffaloes such as guarding each other, information sharing, and danger sensing. The third part lq2ðbq max:h − v:hÞ mentions the intelligent abilities of the buffaloes. Therefore, the memory, socialization, and intelligent qualities of a buffalo are together represented in equation (6). Equation (7) helps the buffaloes in search of a better environment as the present environment has been fully explored and exploited or due to some unfavorable conditions.
The main highlights of the ABO algorithm are that to ensure a very fast convergence rate, and only a few parameters are used. In each iteration, the best buffalo bz max can be easily found out. To track the location and phase of the best buffalo ðbz maxÞ, adequate exploration is ensured. By exploiting other buffalo's area too, a good exploration is achieved.

Initialization and Updation of Speed and Location.
In the solution space, by placing the h th buffalo randomly, initialization phase is done. For the algorithm to converge in a smaller number of iterations, the previous knowledge of the problem can be helpful. Based on the previous maximum location ðbq maxÞ and source data gathered from the exploits of the other neighboring buffaloes, the updation of the location of every buffalo is done in each iteration. With such a modelling, the algorithm can track the buffalo movement to achieve an optimal solution.
3.2. Artificial Bee Colony Algorithm. In a multidimensional space, based on the bee's foraging activity for nectar, this global cum local search-based optimization procedure was utilized and the steps are explained in Algorithm 1 [48]. In this entire variable space, the food sources are spanning throughout, and in this variable space, the food source is assumed as the point in the variable space. For that particular point in the variable space, the objective function is maximized by this ABC method similar to the location tracing of the food source by the bee which has the highest nectar content. The objective function f ðyÞ should find the optimal solution in this ABC optimization problem where in an artificial multidimensional space, and the artificial bees will wander to trace the highest producing nectar source. The search task is achieved by utilizing the basic concept of food foraging procedure by the bee colony and is simulated in an artificial computer surrounding. In the entire variable space, a random population of initial food sources is denoted as y ðmÞ ðm = 1, 2, ⋯, NÞ, where N indicates the colony size is expressed as where v i is a random number in the range of [0,1]. Three different types of tasks are assured where each does a different task. A food source from their respective memories is considered by the employed bees and then seek a new food source w f in its neighbourhood. For this purpose, any neighbourhood operator can be utilized. A food source which is uniformly distributed within ±z of the present memory location y n is utilized as BioMed Research International where the randomly selected food source is expressed as y ðmÞ , and ϕ i is a random number in ½−z i , z i . The food source w f which is newly created is then compared with y n . and the food source which is better is placed in the memory of the employed bee. Here in our experiment, the total number of employed bees is set as 60% of the total food sources ðSÞ. The food source information stored in their memories is shared by the employed bees with the onlooker bees who is present in the bee hive observing the foraging act of the employed bees. The food source location w f traced by an employed bee is chosen by the onlooker bee in a probabilistic manner proportion to the total nectar content in the food source w f . The probability of choosing the food source is higher if the nectar content is high. Modification of a selected food source to trace w 0 in its neighbourhood is done by using a similar methodology with the selected w f as shown in equation (9). The memory of the onlooker bee selects and keep only the better of the two food sources. The number of onlooker bees is generally set as half of the food sources. Finally, the scout bees are the third kind of bees which chooses a food source location randomly utilizing equation (8) and act like global overseers. Though a predefined number of trials, if the memory location cannot be improved by the employed bees, then it booms as a mount bee. Once it becomes a scout bee, then in the variable space, the memory located is reinitialized randomly. The number of scout bees is assumed to be 1 in our experiment, and the algorithm runs for a maximum number of G generations. Only with an employed or an onlooker bee alone, each food source is associated, so that a single food source is associated in each of them. It is therefore used in other types of optimization too such as combinatorial optimization, multiobjective optimization, and to solve integer programming.

Cockroach Swarm Optimization
Algorithm. Inspired by the nature of the cockroaches searching for food such as progressing in swarms, escape mechanisms, or scattering mechanism from light, CSO was developed [49]. The collective cockroach behaviour is modelled by a set of rules in the CSO algorithm. The focus of this algorithm is to create a set of feasible solutions in its initial step. In the search space, the random generation of the initial solutions are done. For solving various optimization problems, the CSO algorithm includes 3 procedures such as (i) chase swarming, (ii) dispersing, and (iii) ruthless behaviour.
3.3.1. Chase-Swarming Phase. In this phase, the local best solutions S i are carried by the strongest cockroaches and then together it forms a small swarm. After the swarm formation, it is progressed towards the global optimum S o . In this procedure, within the range of its visibility, each individual A i progresses towards its local optimum. During the movement of the cockroaches in small groups, a particular approach can become the strongest by means of finding a better solution. Within its own visibility scope, a lonely cockroach has its local optimum and it progresses towards the global best solution.

Dispersion of Individual Phase.
To preserve the diversity of cockroaches, it is performed from time to time. In this phase, a random step is taken by the cockroach in the search space.

Ruthless Behaviour Phase.
Here, the currently best individuals replace the random individual. If the food availability is inadequate, then creating the weaker cockroaches becomes the procedure and so it is termed as ruthless behaviour.
The steps are as follows: (Step 1) A population of ′ q ′ individuals is generated, and the algorithm parameters are initialized (step, D-space dimension, visual scope, and stopping criterion) and it is present within its own visibility range 3.4. Imperialist Competitive Algorithm. One of the famously used population-based metaheuristic is ICA. In a population, each individual represents a country, and in the initialization process, some best countries are selected as imperialists [50]. The imperialist and colonies help to build the initial empire, and the generation of the new solutions is done by the colony assimilation and revolution, competition among the imperialists, and the exchange of imperialists. The procedure is as follows: (1) Initialization: An initial population P is generated By the assignment of colonies to imperialists, the formation of initial empires is done based on the imperialist power and is considered as where P q denotes the power of imperialist q and Z w = max q ft q g − t w denotes the normalized cost, and here, t q specifies the imperialist cost of q. The calculation of the number of initial colonies managed by imperialist q is expressed as roundfP q × M col g, where round is the nearest integer of a fractional number and is expressed by the function round.
The total number of colonies of imperialist q is expressed by S q . A colony in each empire progresses ε along the ′ d ′ direction towards the imperialist in the process of assimilation. ε is the moving distance and is a random number represented by random distribution in the interval ½0, c × d, where c ∈ ð1, 2Þ and the distance among imperialist and colony is expressed by d. The colony progresses towards the direction of the imperialist if c > 1. However, the colonies cannot be absorbed by the imperialist in direct movement thereby a deviation from the direct line prevails. The representation of deviation is done by θ which follows uniform distribution in ½−φ, φ, where φ is just an arbitrary parameter. Change in position of some colonies causes revolution, and it is because of unexpected changes in the characteristics. For instance, the change in characteristic would lead to the change in position, and it can be influence by changing the language or religion of a particular colony. Similar to the process of mutation in competitive algorithm, the revolution in ICA is carried out so that exploration is increased and the early convergence to local optima is prevented. Once the assimilation and revolution is done in an empire, the comparison of the cost of each colony with that of the imperialist is done. Therefore, if the colony has a very less cost in comparison to the imperialist, then the swapping of colony can be done. Depending on the total empire power, the determination of imperialist competition is done. Assume AT q is the total cost of the empire q, therefore for each empire q, AT q is initially calculated as where ζ represents a positive number between 0 and 1, and it is close to 0. For the empire q, the normalized cut cost Requirement: S, H 1. Iteration set counter g = 0 2. S/2 employer bees is initialized to S/2 food sources 3. Repeat A. Evaluate solutions y n to find nectar content at employee bee locations B. Employer bee movement to a new food source with the help of equation (9) C. Evaluate solutions w f to find nectar content at the novel employee bee locations D. Bees with lower nectar content at w f are returned to y n and recorded E. S/2 onlooker bees recruitment with proportion to their nectar content F. Movement of each onlooker bee to a novel food source in its neighbourhood G. Evaluate solutions w 0 nectar content tracing at novel onlooker bee locations H. Movement of the employees bee to the best location done by onlooker bees I. Best food sources recording among all y n , w f and w 0 J. Employer bees conversion into scout bees as they cannot find better food sources in H trials. K. Scout bees initialization using equation (8) L. g ⟵ g + 1 4. Until termination is satisfied 5. Best found source declaration (near optimal solution) Algorithm 1 6 BioMed Research International and the power is computed as After a vector ½EP 1 − c 1 , EP 2 − c 2 , ⋯, EP M im − c M im is defined, the assignment of the weakest colony from the weakest expires to the empire having largest index is done where c i represents the random number with uniform distribution in the range of [0,1].
3.5. Social Spider Optimization Algorithm. One of the recent meta heuristic algorithm which attracted a good attention is SSO [51]. In this algorithm, the search space is assumed as a communal spider web. For each population, the candidate solutions represent a spider. A weight is received from each spider based on its fitness value. The simulation of the various cooperation behaviour in the colony is approached by two different search sets of evolutionary operators. To solve a nonlinear global optimum problem, the algorithm is designed with box constraint as follows: Minimize: f ðyÞ, y = ðy 1 , y 2 , ⋯, y d Þ ∈ R d Subject to y ∈ Y where f : R d ⟶ R is a nonlinear function, and Y = f y ∈ R d jl j ≤ y ≤ u j , j = 1,: ⋯ , dg is a feasible space reduced by limiting the lower ðl j Þ and upper ðu j Þ limits. To solve this optimization problem, population A of N candidate solutions is utilized by SSO. A spider position is represented by each solution whereas the search space Y is represented by the general web. In this methodology, the population A is divided into two search agents. ðM a Þ represent male and ð F a Þ represent female. The real spider colony is aimed to be simulated and therefore the number of females (represented as N f ) is selected randomly in the range of 60-70% of the entire population A, where the rest is considered as the male individuals ðN m = A − N f Þ. Under this constraint, a set of female individuals is formed by the group F a as ðF a = f a1 , f a2 , ⋯, f a N f Þ, and the male individuals ðM a = m a1 , m a2 , ⋯, m a Nm Þ. Each spider ′c′ has a weight w c based on solution fitness and it is calculated as where f it c represents the fitness value of the c th spider solution, c ∈ 1, ⋯, N, best indicates best fitness value, and worst indicate worst fitness value of the whole population A. The main mechanism of SSO is the information exchange in the optimization process. Only through the vibration present in the website it can be simulated. The modelling of a vibration received from a spider b to spider c is expressed as follows where the weight of the b th spider is w b , and the distance between the 2 spiders is ′ d ′ . Three types of vibrations can be perceived by each spider ′ c ′ and v c,n , v c,h and v c,f :v c,n . The vibration produced by the nearest spider ′ n ′ with a very high weight is expressed by v c,n . v c,f is produced by the closest female spider, and their vibrations applied only if c is a male spider. In the population A, the best spider is produced by v c,h . At an initial stage s = 0, a population N of the total spider is operated to assess the total number of iterations ð s = iterationsÞ. Various sets of evolutionary operators are assigned to each individual based on its gender.
In the context of female spiders, the novel position f a s+1 c is obtained by the modification of the current position of spider f a s c . A probability factor P is used to randomly control the modification, and the movement is produced with respect to other spiders and throughout the search space, and the transmission of vibrations is done as with probability P and 1 − P, where α, β, δ, rand are random numbers between the range [0,1] and s denotes the iteration number. a n and a h are the nearest spider and best spider, respectively. The classification of male spiders in 2 types is done as dominant ðUÞ and nondominant ðWÞ. Only between the dominant male m u and female individuals, the mating is carried out with a specific range r, and so a new individual a new is defined by the weight of each spider. The new individual a new can be influenced easily by the heavier element which has more probability. Once the generation of the new spider is done, it is then compared to the rest of the population. If a new spider has a good fitness value than the worst spider member, then the worst spider is replaced by a new , or else discarding a new is done.

Classification Procedures
The optimized values are then classified with the following classifiers.

NBC.
The main assumption of NB classifier is that each characteristic is pretty independent to the rest of the characteristics [52]. Therefore, the optimized genes contribute in an independent manner to the probability of being a part of a specific class. For estimating the essential parameters for classification, a smaller number of training samples is required by these types of classifiers. For supervised learning problem, it is a fast and efficient classifier. 7 BioMed Research International 4.2. SVM. The main intention of SVM lies in the hyperplane selection that is equidistant from every class so that for the separation of the classes, a maximum margin is achieved [53]. The training support vector samples are the ones which fall into the frontier when the hyperplane is defined. The classifier greatly tolerates the classification errors which is controlled by the hyperparameters so that generalization capability of the model is controlled. Depending on the side of hyperplane to which the sample belongs, the classification of a new sample will be done for a biclass classification. This method usually changes for a multiclass classification because SVM builds ðN − 1Þ * N/2 classifiers where the number of classes is denoted by N. Then, a voting system is also established among them mentioning the most voted class for the new samples.

RF.
A forest of classification trees is built by the RF algorithm as it grows many single classification vectors (trees) [54]. A vector is assigned as an input to be classified in this classification model for each tree of the forest. Once the classification is done by that individual tree, the class having the largest number of votes over all the trees is decided by the standard voting system among the trees.

PNN.
This classifier is an implementation of a statistical algorithm called kernel discriminate analysis. The operations are usually organized into a multilayered feed forward network [55]. Only one epoch of training is needed in PNN. The main drawback of using this is that for storing the training samples, it assumes a lot of memory and so the recall process computation slows down gradually.

Results and Discussion
It is classified with a 10-fold cross validation method, and the performance of it is shown in tables below. The mathematical formulae for computing the Performance Index (PI), Sensitivity, Specificity, and Accuracy are mentioned in literature, and using the same, the values are computed and exhibited. PC is Perfect Classification, MC is Missed Classification, and FA is False Alarm in the expressions below.
The sensitivity is expressed as Specificity is expressed as Accuracy is expressed as Performance Index (PI) is expressed as Table 2 shows the average performance analysis of classifiers in terms of classification accuracies with ABO for different gene selection techniques using 50-200 selected genes. As depicted in Table 2, the PNN classifier with 50 genes at SNR features and PNN classifier with 200 genes selected in the multivariate EWUSC attained higher accuracy of 92.97%. In the case of SVM classifier with 100 genes for the multivariate, CFS reached a low accuracy value of 75.96%. This low accuracy is due to the high false alarm rate in the SVM classifier. Table 3 demonstrates the average performance analysis of classifiers in terms of classification accuracies with ABCO for different gene selection techniques using 50-200 selected genes. As shown in Table 3, the PNN classifier with 200 genes at SNR feature exhibits higher accuracy of 91.47%. The SVM classifier with 200 genes for the multivariate EWUSC feature is ebbed at the low accuracy of 75.7581%. Table 4 reveals the average performance analysis of classifiers in terms of classification accuracies with CSO for different gene selection techniques using 50-200 selected genes. As identified in Table 4, the NBC classifier with 50 genes at SNR feature demonstrates the higher accuracy of 92.19%. The PNN classifier with 50 genes for the multivariate EWUSC feature is achieved at the low accuracy of 75.75%. Table 5 exposes the average performance analysis of classifiers in terms of classification accuracies with ICO for different gene selection techniques using 50-200 selected genes. The Table 5 reports that RF classifier with 50 genes at multivariate CFS attained the higher accuracy of 92.45%. The PNN classifier with 50 genes for the multivariate EWUSC feature is achieved at the low accuracy of 75.625%. Table 6 expresses the average performance analysis of classifiers in terms of classification accuracies with SSO for different gene selection techniques using 50-200 selected genes. Table 6 exposes that PNN classifier with 200 genes at multivariate CFS attained the highest accuracy of 95.705%. The RF Classifier with 50 genes for the multivariate CFS feature achieved the lower accuracy of 75.875%. Figure 2 shows the performance of Performance Index (PI) parameter for four classifiers averaged in five different optimization methods. As exhibited in Figure 2

Conclusion and Future Work
For the diagnosis, analysis, and treatment of cancer, microarray-based classification of this disease is very useful. To determine the most informative genes that can cause cancer, a great impact and utility was provided by the microarray technique in recent years. The curse of dimensionality problem is a huge drawback in microarray data analysis which destabilizes the computational instability and prevents the usefulness of a certain information from a dataset. Thus, in analyzing the cancer microarray datasets, an imperative task lies in the selection and extraction of relevant features so that effective classification is achieved. In this work, four types of initial feature selection techniques were performed and then it was further optimized with five optimization techniques before proceeding into classification. The best results are obtained when multivariate CFS feature selection with SSO is utilized and classified with Probabilistic Neural Network (PNN), and a high classification accuracy of 95.70% is obtained. Future work is to analyze with a plethora of other optimization and machine learning techniques for a better analysis of microarray-based leukemia classification.

Data Availability
The programming codes would be made available to the researchers upon request to the corresponding author.

Conflicts of Interest
The authors declare that they have no conflicts of interest. 14 BioMed Research International