A Holistic Performance Comparison for Lung Cancer Classification Using Swarm Intelligence Techniques

In the field of bioinformatics, feature selection in classification of cancer is a primary area of research and utilized to select the most informative genes from thousands of genes in the microarray. Microarray data is generally noisy, is highly redundant, and has an extremely asymmetric dimensionality, as the majority of the genes present here are believed to be uninformative. The paper adopts a methodology of classification of high dimensional lung cancer microarray data utilizing feature selection and optimization techniques. The methodology is divided into two stages; firstly, the ranking of each gene is done based on the standard gene selection techniques like Information Gain, Relief–F test, Chi-square statistic, and T-statistic test. As a result, the gathering of top scored genes is assimilated, and a new feature subset is obtained. In the second stage, the new feature subset is further optimized by using swarm intelligence techniques like Grasshopper Optimization (GO), Moth Flame Optimization (MFO), Bacterial Foraging Optimization (BFO), Krill Herd Optimization (KHO), and Artificial Fish Swarm Optimization (AFSO), and finally, an optimized subset is utilized. The selected genes are used for classification, and the classifiers used here are Naïve Bayesian Classifier (NBC), Decision Trees (DT), Support Vector Machines (SVM), and K-Nearest Neighbour (KNN). The best results are shown when Relief-F test is computed with AFSO and classified with Decision Trees classifier for hundred genes, and the highest classification accuracy of 99.10% is obtained.


Introduction
e number of patients who are diagnosed with cancer is steadily increasing in a rapid manner [1]. With the help of biopsies, image processing techniques, and blood analysis, the diagnosis of cancer is done presently. When damaged cells are excessively accumulated in human body, it leads to cancer [2]. For every patient, the behavior of cancer differs, and by examining deeply into the origin of it, it can be well understood. e cancer originates in the cells and to every individual, the structure of the cell is quite unique. erefore, to cure cancer permanently, there is not a single specific vaccine available [3]. Understanding the relation between the gene and its products is a contribution to the genetic approach to cancer diagnosis, so that the identification of biomarker genes for targeting drug therapies can be understood well [4]. With this approach, the effects of genes on some cell signaling pathways can be well understood [5]. e information about active levels of a gene is provided by the gene expression. For gene expression, one of the widely used measurement technique is microarray [6]. In the cancer diagnosis and cancer classification types, the gene expression values obtained by microarrays can be utilized. In many studies, the microarray datasets are employed for these purposes. For the selection of biomarker gene subsets, various feature selection algorithms are employed [7]. To this microarray dataset, statistic machine learning techniques are implemented with or without feature selection [8]. Biomarker genes are to classify cancer types, with a highest classification accuracy being identified by the biomarker genes.
In recent years, a new dimension to cancer research has been encompassed by the advent of microarray technology. For the classification, analysis, diagnosis, and treatment of cancer, a proficient method has been emerged by the microarray-based gene expression data [9]. ousands of features termed as genes are found in the microarray gene expression dataset. Such data has records or instances from a few patients only and due to this limited availability of samples in comparison to larger number of genes, it is termed as curse of dimensionality problem [10]. Due to this, (a) the training time during the classification process is increased, and (b) there is a reduction in classification accuracy [11]. erefore, the extraction of useful information from the dataset is hindered due to these challenging issues. So, the number of genes has to be reduced, and then, the highly informative genes should be selected, so that classification accuracy is increased, and it is a significant step in the microarray data analysis [12]. Feature selection/gene selection in the microarray data classification aim is to select a small subset of features from the original huge feature space [13]. By removing redundant and irrelevant features, feature selection can be done, so that the classification accuracy is increased, and the classification time is reduced. e feature selection technique proposed in the literature includes hybrid method, embedded technique, filter, and wrapper methods [14]. In this study, the primary aim is to classify and select the optimal gene subsets for lung cancer.
en, feature selection is implemented along with optimization techniques and finally classified.
Some of the prominent works in the lung cancer classification using microarray gene analysis are explained as follows. For the molecular classification of lung cancer, a cross study comparison of gene expression study was done by Parmigiani et al. [15]. Using the significance analysis of Microarray-Gene set reduction algorithm, the classification of non-small cell lung cancer was performed by Zhang et al. [16]. For multiclass classification of lung cancer, an adaptive multinomial regression with overlapping groups is performed by Li et al. [17]. e lung cancer prediction from microarray data by gene expression programming was done by Azzawi et al. [18]. A support vector machine-based classification method for lung cancer gene expression data base analysis was done by Guan et al. [19]. Some progresses in the techniques and integrated analysis related to the image processing techniques and the development of advanced devices for tissue engineering approach as a potential solution to treat lung diseases too have been discussed in the literature [20,21].
As far as the microarray gene selection techniques using optimization and classification are concerned, self-organizing maps [22], ensemble classification techniques [23], Taguchi chaotic binary Particle Swarm Optimization (PSO) [24], adaptive wrapper approach combined with SVM [25], kernel based methods [26], pattern classification methods [27], Convolution Neural Networks (CNN) [28], fuzzy approaches [29], Analysis of Variance (ANOVA), and K-Nearest Neighbour (KNN) [30] were proposed in the literature. Using ant colony optimization, a hybrid gene selection approach was proposed by Sharbaf et al. [31]. For the cancer classification data on gene expression data, PSO and DTclassifiers were implemented by Chen et al. [32]. For gene selections, the various techniques reported in literature are utilizing multiobjective algorithms [33], a hybrid binary Imperialist Competition Algorithm (ICA), and tabu search approach [34], a binary differential evolution algorithm [35], a simplified swarm optimization using a Social Spider Optimization (SSO) algorithm [36], Artificial Bee Colony (ABC) [37], Binary PSO [38], novel rule-based algorithm [39], and Shuffled Leap Frog Algorithm (SLFA) [40], and it has been well explored. However, in this paper, other suitable swarm intelligence techniques have been explored and analyzed comprehensively. e organization of the paper is as follows. In Section 2, the materials and methods followed by the gene selection techniques are explained. In Section 3, the optimization techniques for gene selection are explained, and in Section 4, the classification techniques are explained followed by the results and discussion in Section 5 and conclusion in Section 6.

Materials and Methods
For the lung cancer classification, a lung Harvard 2 dataset was utilized, which is publicly available online [41]. e dataset has 181 samples with 150 Adenocarcinoma (ADCA) and 31 Malignant Pleural Mesothelioma (MPM). e dataset is tabulated in Table 1.
e pictorial representation of the work is shown in Figure 1.

Gene Selection Techniques.
e gene selection techniques utilized in this paper are Information Gain, Relief-F, Chisquare statistic, and T-statistic. e discretization of the attribute values is done before using chi-square, information gain, and other feature selection methods. e main intention of utilizing the gene selection techniques is to select the most important genes from 12,533 genes. Here, in our work, we have selected 1000 important genes after the gene selection process through the following techniques.

Information Gain.
It is used generally as an attribute selection criteria while dealing with decision trees; hence, it is used as a gene selection technique too [7]. Assume the class set S � S x , where x � 1, 2, . . . , l. For every feature Y j , the Information Gain is expressed as where H(S) � − s∈S p(s)log 2 p(s) and Only for discrete features, Information Gain is used widely, and therefore prior to computing Information Gain, the discretization of numeric features should be done. Depending on the large values of information gain, the selection of features are done.

Relief-F.
For dealing with multiclass, noisy, and incomplete datasets, Relief-F is introduced, and it is an extension of Relief algorithm [7]. To each feature, a relevance weight is assigned. e selection of a random sample instance I is done from n sample instances. Based on the basic differences between the selected instance I and its neighboring instance represented as Q and termed as hit and different class termed as nearest miss represented by N(S), the updating of the relevance features is done. e features that discriminate the instance from various neighbors of the surrounding classes are given more weight. By analyzing the average contribution of neighboring nearest misses N(S), the updating of the weights is done. e prior probability of each class is considered by the average contribution. e updating of the weight of j th feature Y j is as follows: where the distance between sample instances (I) and the nearest hit (Q) or nearest miss N(S) is calculated by the function Ψ(Y j, I, Q).

Chi-Square Statistic.
With respect to the classes, for each feature, the value of χ 2 statistic is computed [7]. Before computing χ 2 statistic, the discretization of the numeric attributes is done. For every feature, Y j , χ 2 statistic is computed as where n (y∈Y j &s∈S) represent the number of samples in Y j for class s whose value is y. e definition of expected frequency is expressed as where the number of samples in Y j with value y is denoted by n y∈Y j ; n s∈S indicates the number of samples of class s. e total number of samples is expressed by n. Based on the sorted value of χ 2 statistic, the selection of features is done.

T-Statistic.
is is a famous gene selection technique and quite popular in two-class problems [7]. Every sample can be classified into either class S 1 or class S 2 . For every feature Y j , the computation of t-statistic is expressed as where μ jk indicates the mean of the j th feature for class S k . e "k" indicates the class index, i.e., k � 1 or k � 2.
Once the t-statistic value for each feature is computed, then it is sorted out in a descending order, so that the important features can be selected.

Optimization Techniques
e shortlisted 1000 genes will undergo again a secondary feature selection methodology to select the best 50 genes, 100 genes, and 200 genes by means of utilization of optimization techniques. e second level feature selection is done using the five optimization algorithms as follows.

Grasshopper Optimization Algorithm.
In many engineering optimization problems, this algorithm is widely used. Based on the biggest swarms of all creatures, one of the recently proposed naturally inspired algorithms is GO [42]. Severe damage to the crops is caused by the herbivores grasshopper. e grasshopper has a swarming behavior, and it depends on both adults and nymphs. Soft plants and succulents are fed by the nymph, which rolls on the ground continuously. In search of food, the adult grasshopper can jump to a very high extent, and so, it will have a very large area to explore. e observation of both types of movement such as slow movement and abrupt movement has been achieved, which indicates that exploitation and exploration are possible. For the grasshopper, the swarming behavior is represented mathematically as  Journal of Healthcare Engineering where Q j represents the position of the j th grasshopper, A j represents the social interaction, F j represents the gravity force in the j th grasshopper, and B j represents the wind advection. e representation of social interaction A j is given as where d jk � |q k − q j | represents the distance between the j th and k th grasshopper and d jk � (q k − q j )/(d jk ) represents a unit vector from the j th grasshopper to the k th grasshopper. e social forces are expressed by the function "a" and are expressed as where the intensity of the attraction is represented as g and the attractive length scale is expressed by l. In terms of social interaction, three types of regions are created by the grasshoppers in search of food, that is, attraction zone, comfort zone, and repulsion region. Strong forces cannot be applied by the function "a" when the distance is large between grasshoppers. To resolve this, the F component in (7) is expressed as where the f represents the gravitational constant and e f indicates a unity vector progressing towards the Earth center. e computation of B component is as follows: where v represents the constant drift and e u represents a unity vector in the wind direction. If the values of a, F and B are substituted in (7), then where a(s) is given by (9) and the number of grasshoppers is represented by N. To solve optimization problems, a revised version of this formula is used as where v bd represents the upper bound and l bd represents the lower bound in the D th dimension. In the target, the value of the D th dimension is represented by T d . To shrink the three worms, the decreasing coefficient is "c." Only towards a target the wind direction is progressed always. While the food is a searched form, adults start jumping in the air, and nymphs move on rolling in the ground creating both cases of exploration and exploitation. By reducing the parameter c in the below equation, one can balance both these two in proportion to the total number of iterations. Its computation is done as where the maximum value is represented as c max , minimum value is represented as c min , i denotes the current iteration, and I represents the maximum number of iterations.

Moth Flame Optimization
Algorithm. Based on the simulation of moth behavior for this special movement method during nighttime, Moth flame optimization algorithm was developed [43]. For the purpose of navigation or movement, a mechanism termed as transverse orientation is utilized. By maintaining a standard angle with reference to the moon, moth flies, which is a very effective methodology for travelling by distances in a straight path, as the distance between the moon and the moth is very far away. is kind of methodology is adopted, so that moth flies along a very straight path at nighttime. It is a general observation that the moths fly around the lights in a spiral manner. e artificial lights can easily trick the moths to exhibit such behavior. As the light lies with close proximity to the moon, a spiral fly of moths is caused due to the maintenance of a similar angle to the light source. In this algorithm, the representation of the set of moths is done as a matrix A. For the storage of all the corresponding fitness values, there is an array OA for all the moths. In this algorithm, the second key component is the flames. Now, again, a matrix B similar to the moth matrix is considered. For the storage of all the corresponding fitness values, there is an array OB for all the flames. e global optimal of the optimization problem is approximated by the MFO algorithm by a three-tuple process as follows: A random population of moths with its corresponding fitness value is denoted by a function C. In this function, the methodical model is expressed as e movement of the moths around the search space is determined by the function D which is the primary function. e matrix of A is received by this function and eventually returns the updated one as e K function remains true if termination criterion is satisfied and false if the termination criterion is not satisfied.
With C, D, and K, the general framework of the MFO is expressed as follows: Until the K function returns true, the D function is run iteratively after the initialization. To simulate the moth behavior mathematically, the updating of the position of every moth is updated with respect to a flame using the following equation: where A c indicates the c th moth, B g specifies the g th flame, and F represents the spiral function. Subject to the following conditions, the utilization of any type of spiral can be done using the three conditions as follows: (1) e initial point of the spiral should begin from the moth (2)  For the MFO algorithm, the logarithmic spiral is defined as where J c specifies the distance of the c th moth for the g th flame, h denotes a constant for defining the shape of a logarithmic spiral, and k is a random number in [− 1, 1]. e computation of J is done as follows: where A c indicates the c th moth, B g specifies the g th flame, and J c specifies the distance of the c th moth for the g th flame. e spiral flying path of the moth is expressed by (20). From this equation, with respect to a flame, the next position of a moth is explained. In the spiral equation, the k parameters defer the next position of the moth with reference to its proximity or closeness to the flame. While the position is updated, it only regains a moth to progress towards a flame; thereby, it may be trapped in local optima fastly. Each moth is obliged to update its position using only one of the flames to prevent such situations. e position updating of moths with respect to "n" various locations in the search space may sometimes denote the exploitation of most promising solutions.
where I denotes the current number of iterations, N denotes the maximum number of flames, and K specifies the maximum number of iterations. To balance the exploration and exploitation of the search space, there is a gradual decrease in the number of flames. e general steps of the D function are described in Algorithm 1.
As projected in the algorithm, unless the K function returns true, the D function is executed. Once the D function is terminated, the best moth returns, as it is shown as the best attained optimum approximation value.

Bacterial Foraging Optimization Algorithm.
e three main mechanisms are present in the classical BFO, that is, chemotaxis process, reproduction process, and eliminationdispersal process [44].

Chemotaxis Process.
Here, a tumble indicates a unit walk with random direction, and a run indicates a unit walk with the similar direction in the last step. Assuming  θ a (b, c, d) indicates the bacterium at b th chemotactic, c th reproductive, and d th elimination-dispersal method. R(a) is considered as the run-length unit parameter is the chemotactic step size during every tumble or run. e movement of the a th bacterium in every computational chemotactic step is expressed as where Δ(a) represents the direction vector of the b th chemotactic step. Δ(a) is the same as the final chemotactic step if the bacterial movement is run; or else Δ(a) becomes a random vector, where specific elements lie in the range of [− 1, 1]. A step fitness indicated as B(a, b, c, d) is evaluated with the activity of both run or tumble assumed and considered at each step during the chemotaxis process.

Reproduction Process.
During its lifetime, the sum of the step fitness is calculated as the health status of each bacterium as N r b�1 B (a, b, c, d), where N r represents the maximum step in a chemotaxis process. Based on the health status, the sorting of the bacteria is done in a reverse order. Only the first half of population lives/survives in the reproductive step. e living bacterium divides into two identical ones, and they are kept in the same places, and so the population of bacteria keeps constant.

Elimination and Dispersal Process.
A basis for local search is provided by the chemotaxis, and the convergence is sped up by the reproduction process. Using the classical BFO, this situation has been simulated to a large extent. For searching of global optima, only chemotaxis and reproduction are not enough. Around the local optima, the bacteria may get stuck and to eliminate the accidents of being trapped into local optima easily and gradually, the diversity of the BFO changes. Only after a certain number of reproductive processes, the dispersion event happens. en, based on a probability P pr , some bacteria are chosen to be killed and shifted to another position within a particular environment.
e step by step procedure is explained in Algorithm 2.

Krill Herd Optimization Algorithm.
Based on the simulation of the herding of krill swarms, a famous metaheuristic algorithm for solving optimal problems is KH optimization algorithm [45]. e herding of the skill swarms is usually in response to a certain environmental and Journal of Healthcare Engineering biological process. In a 2D space, the time-dependent position of an individual krill is decided by 3 primary actions, that is, (i) Movement which influences or influenced by other krill individuals. (ii) Foraging actions (iii) Random diffusion In a d-dimensional decision space, the following Lagrangian model is adopted by the KH algorithm as where M j is the motion led by other krill individuals, G j is the foraging motion, and D j is the physical diffusion of the j th krill individual. e krill individuals affecting the other movement are represented by the direction of motion α j and it is computed by the target swarm density, a local swarm density, and a repulsive swarm density. e movement for a krill individual is defined as follows: e maximum induced speed is represented by M max , the inertia weight of the motion induced in [0, 1] is represented as v m and the latest motion induced is represented by M old j . With the help of two main components, the estimation of foraging motion is done. e first one is the food location, and the second one is the basic knowledge about the food location. e motion is approximately formulated for the j th krill individual as follows: where where the foraging speed is represented by W g . e inertia weight of the foraging motion between 0 and 1 is represented as v g and G old j is the last foraging action. A random process is modelled to the random diffusion of the krill individuals. In terms of both a random directional vector and maximum diffusion speed, the description of the motion can be done. It is represented as follows: where the maximum diffusion speed is D max , the random directional vector is δ and its arrays are random values in the range of [− 1, 1]. Utilizing various motion parameters during the time and based on the above-mentioned movements, using the following equation, the position vector of a krill individual from the time interval t to t + Δt is given as Δt is regarded as the most important term and based on the specific type of optimization, the parameters can be fine-tuned. e scalar factor of the speed vector is assumed because of Δt parameter.

Artificial Fish Swarm Optimization Algorithm.
It is a famous Swarm Intelligence technique, which is helping to solve the optimization problem by utilizing the behavior of artificial fishes like imitating swarming process, chasing process, and preying behaviors [46]. Assume A p is the current position of one artificial fish and A w is the viewpoint of artificial fish at one specific moment. e visual scope of every individual is expressed as Vis; therefore, within Vis of A p be the fishes A y and A z . e largest step of artificial fish is assumed as step and the congestion factor of the fish swarm is expressed as δ. e food concentration factor is highly proportional to the fitness function f(A). In the fish swarm, the behavior patterns are expressed as follows:

Swarming Behavior. If f(A s ) > f(A p ), then A s is the central point inside the Vis of the point A p and so the execution of swarming behavior is done easily. Assume A s as
A w and so, the fish at A p will progress towards the point A s .

Chasing Behavior.
e point (expressed by A max ), which has the best objective function value, is present inside the visual satisfying the criterion f(A max ) > f(A p ) and if there is less crowd in the visual of A p , then the execution of chasing behavior is done. Consider A max as A w and so, the fish at A p will progress towards the point A max .

Preying Behavior.
Under the following situations, preying behavior is tried. e preying behavior is executed if f(A q ) > f(A p ). Assume A q as A w and so, the fish at A p will progress towards the point A q . Otherwise, with its visual limit, it will move a step in a random fashion.
In each iteration, the best solution obtained is termed as "board." e search process can be terminated after the specified iterations, and the result present on the board is considered as the final solution. e position updating for the artificial preying fishes is formulated as where the next position of artificial fish is termed as A next . e current position of the artificial fish is expressed as A p and the position having a better objective function value is A q . e random number is expressed as rand and it is in the range of [− 1, 1]. Between the two position vectors, the distance is expressed as norm(A q − A p ). e position updating for the artificial swarm fishes can be done as e position updating for the artificial chasing fishes is formulated as

Classification Procedures
e optimized values or the best gene values obtained after the second level optimization techniques are classified using NBC, Decision trees, SVM, and KNN algorithms. e Performance Analysis of Classifiers in terms of Classification accuracies with GO, MHO, BFO, KHO, and AFSO for different gene selection techniques using 50-200 selected genes is done here.

Naïve Bayesian Classifier.
It is a famous probabilistic algorithm, where, given the class, the feature values based on Bayes rule are conditionally independent [47]. If a new sample observation is given, the assignment of the classifier to the class having the maximum conditional probability estimate is done.

Decision Tree.
A famous rule-based classifier is DT, where leaf nodes represent classification outcomes and nonleaf nodes represent selected attributes [48]. A classification rule is reflected by the path from the root to a leaf node. e J4.8 algorithm is used here.

Support Vector Machine.
e SVM analyzes the input data as two unique sets of vector in a p-dimensional space initially [49]. en, in that space, a separate hyperplane is constructed, so that the margin is maximized between the two data sets. e SVM is utilized with SVM Polynomial kernels for training purposes. [50]. e class label of a new testing sample is decided by the classifier. It is done by considering the majority of classes of its K closest neighbors dependent on their Euclidean distance. Here, the value of K is assigned to be 4.

Results and Discussion
It is classified with a 10-fold cross-validation method, and its performance is shown in the tables. e mathematical formulae for computing the Performance Index (PI), Sensitivity, Specificity, and Accuracy are mentioned in literature and using the same, the values are computed and exhibited. PC is Perfect Classification; MC is Missed Classification; and FA is False Alarm in the following expressions. e sensitivity is expressed as Specificity is expressed as Accuracy is expressed as Performance Index (PI) is expressed as Table 2 shows the performance analysis of classifiers for classification accuracy parameter with GO method for different gene selection techniques. As indicated in Table 2 that SVM classifier with 100 selected genes in Relief F test method and NBC with information gain method for 100 genes attained higher accuracy of 98.96%. e lower accuracy of 76% is thrown out by KNN classifier in all three statistical methods. Table 3 indicates the performance analysis of classifiers for classification accuracy with MFO method for different gene selection techniques. As shown in Table 3, DT classifier with 50 selected genes in Relief F test method reached higher accuracy of 98.012%. e lower accuracy of 78.125% is depicted by SVM classifier with 100 genes selected in relief F test method. e lower accuracy of SVM is due to the presence of outlier in the gene samples. Table 4 demonstrates the performance analysis of classifiers for classification accuracy with BFO method for different gene selection techniques. From Table 4, it is identified that DT classifier with 50 selected genes in Chisquare test method reached higher accuracy of 98.56%. e lower accuracy of 82.24% is shown by SVM classifier with 100 genes selected in information gain method. Across the gene samples, all the classifiers performed well in this BFO method. Table 5 reveals the performance analysis of classifiers for classification accuracy with KHO method for different gene selection techniques. Table 5 shows that SVM classifier with 50 selected genes in Relief F test method reached higher accuracy of 98.38%, as the number of selected genes increased gradually and given to SVM classifiers, which reported lower accuracy of 77.47% with 200 genes selected in Relief F test method. Table 6 reports the performance analysis of classifiers for classification accuracy with AFSO method for different gene selection techniques. As indicated in Table 6, DT classifier with 100 selected genes in Relief F test method reached the highest accuracy of 99.10%. e NBC classifier is settled at the lower accuracy of 77.08% with 200 selected genes for Relief F test method.

Journal of Healthcare Engineering
Initialize the position of moths While (Iterations ≤ Max_Iterations) Update flame number using (22) 1 , A k ); end for c � 1: n for g � 1: j Update r and k Calculate J using (21) with respect to the corresponding moth Update A(c, g) using (19) and (20) with respect to the corresponding moth end end ALGORITHM 1: D function execution and termination.
Step 1: Parameter Initialization s: dimension of the search space N: number of bacteria S r : chemotactic steps S n : swim steps S re : reproductive steps S pr : elimination and dispersal steps P pr : probability of elimination R(a): run-length unit Step 2: e elimination-dispersal loop is expressed as d � d + 1 Step 3: Reproducing loop: c � c + 1 Step 4: Chemotaxis loop: b � b + 1 (a) For a � 1, 2, . . . , N, a chemostatic step for bacteria "a" is considered. F(a, b, c, d) to save this value so that a better value is found out via run process (d) A random vector Δ(a) ∈ R s is generated in the tumble process. Here each element is represented as Δ m (a), m � 1, 2, . . . , N with a random number [− 1, 1] (e) For bacteria "a", in the direction of the tumble, the movement is progressed so that it results in a step of size R(a) F(a, b + 1, c, d). Another step of size R(a) in this similar direction is considered and then the newly generated θ a (b + 1, c, d) is utilized to compute the new F(a, b + 1, c, d) (c) Else let q � S n (h) Proceed to next bacterium (a + 1): if a ≠ N, go to (b) to process the next bacteria Step 5: If b < S r , proceed to step 3. In such a case, the chemotaxis is continued as the bacteria life is not over.
Step 6: Reproduction (a) For the given "c" and "d" and for every a � 1, 2, . . . , N assume B a health � S r +1 b�1 F(a, b, c, d) be bacteria health. In an ascending order, the bacterium is sorted out (B health ) (b) e N z bacteria with the highest B health values expire and the other N z bacteria comprising of best values split.
Step 7: If c < S re , go to step 2. e next generation in the chemotactic loop is initiated as the number of specified reproduction steps is not reached.
Step 8: Elimination dispersal: For a � 1, 2, . . . , N, with a probability P pr , the bacteria are eliminated and dispersed which results in keeping the total number of bacteria in the entire population as a constant. On the optimization domain, simply disperse it to a random location, so that the bacterium is eliminated. If d < S pr , then proceed to step 2 of this procedure or else end the procedure. ALGORITHM 2: BFO. 8 Journal of Healthcare Engineering Table 7 signifies the performance analysis of classifiers for classification PI parameter with GO method for different gene selection techniques. As shown in Table 7, SVM classifier with 100 selected genes in Relief F test method and NBC with information gain for 100 genes attained higher PI of 97.87%. e lower PI of 7.69% is indicated by KNN classifier in all three statistical methods. Table 8 demonstrates the performance analysis of classifiers for classification PI with MFO method for different gene selection techniques. As shown in Table 8, DT classifier with 50 selected genes in Relief F test method reached higher PI of 95.85%. e lower PI of 22.22% is indicated by SVM classifier with 100 genes selected in Relief F test method. e lower PI of SVM is due to the presence of outlier genes in the samples. Table 9 represents the performance analysis of classifiers PI with BFO method for different gene selection techniques. From Table 9, it is known that DT classifier with 50 selected genes in Chi-square test method reached higher PI of 97.009%. e lower PI of 45.09% is indicated by SVM classifier with 100 genes selected in information gain method. Across the gene samples, all the classifiers performed well in this BFO.

Conclusion and Future Work
One of the most prominent lethal factors for human beings nowadays is cancer. e best chances of suitable treatment can sometimes be missed due to mistaken diagnosis. e accuracy of cancer diagnosis with machine learning along with clinical tests is very helpful in the treatment of cancer. Microarray expression data is highly redundant and with respect to most number of classes, the genes present are uninformative. erefore, it is a critical necessity to select the best feature genes for the analysis of cancer. Out of a large dataset, the techniques should be capable of identifying a subset of most informative genes in a robust manner. In this work, a comprehensive analysis of lung cancer classification with the help of feature selection and optimization techniques is done. e best results are obtained when Relief-F test is computed with AFSO and classified with Decision Trees classifier for hundred genes, and a highest classification accuracy of 99.10% is obtained. Future works aim to work with other feature selection techniques and a variety of optimization techniques classified with deep learning techniques for effective classification of lung cancer.
Data Availability e data along with the program codes will be available to genuine researchers upon request to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.