Semisupervised Bacterial Heuristic Feature Selection Algorithm for High-Dimensional Classification with Missing Labels

. Feature selection is a crucial method for discovering relevant features in high-dimensional data. However, most studies primarily focus on completely labeled data, ignoring the frequent occurrence of missing labels in real-world problems. To address high-dimensional and label-missing problems in data classifcation simultaneously, we proposed a semisupervised bacterial heuristic feature selection algorithm. To track the label-missing problem, a k -nearest neighbor semisupervised learning strategy is designed to reconstruct missing labels. In addition, the bacterial heuristic algorithm is improved using hierarchical population initialization, dynamic learning, and elite population evolution strategies to enhance the search capacity for various feature combinations. To verify the efectiveness of the proposed algorithm, three groups of comparison experiments based on eight datasets are employed, including two traditional feature selection methods, four bacterial heuristic feature selection algorithms, and two swarm-based heuristic feature selection algorithms. Experimental results demonstrate that the proposed algorithm has obvious advantages in terms of classifcation accuracy and selected feature numbers.


Introduction
Te dimensionality of data, which consists of many features, is one of the most infuential aspects of the classifcation model's efectiveness [1].Based on feature properties, instances can be categorized into their respective classes.However, redundant, irrelevant, and noisy features in highdimensional data will hamper classifcation accuracy [2,3], e.g., medical or clinic data classifcation [4,5].Particularly, it is challenging to distinguish between representative and meaningless features without prior knowledge [6].On the other hand, due to statistical norms and personal errors, data classifcation in real life often faces missing labels and loses more valid sample problems, which ultimately reduces the accuracy and robustness of the classifcation model [5,7,8].To reduce feature dimensionality and improve the classifcation performance in classifcation, feature selection (FS) [9] is recommended to collect more relevant feature subsets from the original data space.As a tool for optimizing data space, FS can make classifcation less complicated and improve the precision of classifcation models [10].
FS methods can be categorized as flter, wrapper, or embedded based on various feature evaluation criteria [11].Filter methods use specifc statistical metrics, such as information gain [12] and Fisher score [13], to evaluate the performance of created feature subsets, while wrapper methods use learning algorithms, such as K-nearest neighbor [14], naive Bayes [15], and linear discriminant analysis [16].Embedded approaches, such as the least absolute shrinkage and selection operator [17] and ridge regression [18], embed FS into the training process for the learner.Filter methods typically execute faster than wrapper methods, but they cannot frequently achieve a higher degree of classifcation precision [19].In addition, the process of designing embedded methods is intricate and necessitates plenty of prior experience [20].Since the high-dimensional classifcation problem with only partial labels is already a hard task, this research investigates wrapper-based FS methods to ensure a higher accuracy while avoiding increasing classifcation difculties.
Wrappers seek to fnd the best subset from feature space according to one predetermined performance assessment.However, it is realistic to select all possible subsets of features measured by wrappers in high-dimensional classifcation problems because of the computational cost.Recently, wrappers based on population-based algorithms have been wildly developed without the necessity of evaluating all possible subsets.Tran et al. [21] proposed the frst variablelength particle swarm optimization representation for FS, enabling particles to have diferent and shorter lengths, which improves the performance of the algorithm.Considering the convergency of the population, Song et al. [22] proposed a variable-size cooperative coevolutionary strategy to optimize the searching population, which employs the idea of "divide and conquer" in the cooperative coevolutionary approach.However, since wrapper-based FS methods do not perform feature fltering in advance, the searching space for them is the whole data space [23].Tis means that in high-dimensional classifcation tasks, their search space is very large, so they have to use a heuristic strategy like random search to reduce the cost of computation [24].Nevertheless, classic heuristic strategy wrapperbased FS methods, such as particle swarm optimizationbased FS [25], diferential evolution-based FS [26], and genetic algorithm-based FS [27], do not account for all potential feature combinations [28].
In recent years, bacterial-based algorithms such as bacterial foraging optimization (BFO) [29] have been used to design FS methods to resolve combinatorial difculties due to their global searching capability for control and optimization [30].However, the intricate structure of BFO limits its computation efciency.To achieve efcient classifcation results, bacterial colony optimization (BCO) [31] with a new bacterial life cycle was proposed and laid the foundation for the following research on bacterial-based FS algorithms and applications [6,28,30,32,33].Te majority of those research studies ofer algorithmic enhancements in terms of weight setting, parameter optimization, and learning strategy optimization.Nonetheless, in actual applications, the integrity of data itself is a signifcant element, infuencing the efciency of FS, particularly the problem of incomplete sample labels, which is the most common and complicated task.Tis study focuses on developing an enhanced bacterial-based FS approach with a semisupervised learning strategy to address the high-dimensional medical data classifcation with partial labels.
According to the integrity of data labels, learning tasks can be categorized into supervised learning, semisupervised learning, and unsupervised learning [33].In the supervised task, training data have complete label information, whereas in semisupervised learning, label information is only available in part.Unsupervised learning means that analyzed data do not contain labels [34].In the absence of prior knowledge, the accuracy of supervised learning is generally higher than that of semisupervised learning.Nevertheless, the cost of getting complete labeled data is extremely high in practical medical data collection.Moreover, unsupervised learning is usually used to disclose the initial pattern of unlabeled data [35].Terefore, to address high-dimensional classifcation difculty and label missing limitations in medical data, this paper investigated semisupervised medical data classifcation problems and optimized the classifcation model by learning from partially labeled data to classify unlabeled data into the correct class.
Semisupervised learning has been widely studied in diferent felds, and in the human activity recognition (HAR) problem, Chen et al. [36] designed a semisupervised deep learning model that is useful in solving the problem of imbalanced distribution of labeled data over classes from multimodal wearable sensory data.In video semantic recognition problems, Luo et al. [37] proposed a novel semisupervised feature selection method to learn the optimal graph, which aims to upgrade the performance of video semantic recognition.Since the research studies mentioned above are based on multimodal data, it makes more sense to employ deep learning or graph machine learning to overcome the problem of missing data labels.Despite the fact that these methods are efective for multimodal data, they incur substantial computational costs.Frequently, for a single mode of data, it is not necessary to use overly complex techniques.In contrast, feature selection methods based on wrappers need less computation, and hence, they are more suitable for single-type data.In terms of wrapper-based FS methods, certain representative semisupervised classifcation algorithms, such as ensemble SVM-based semisupervised FS [38] and rough set-based semisupervised FS [39], have demonstrated the ability of ensemble learning to solve label-missing problems.Nevertheless, these methods rely on ensemble classifers to choose the best subset by voting for the results, which increases the computational cost marginally.As a straightforward and easy-to-use technique, K-nearest neighbor (KNN)-based semisupervised learning [40] ofers great promise for improving the classifcation efect with missing labels.A number of studies utilize semisupervised KNN.Zhang et al. [38] demonstrated that the introduction of semisupervised learning with Knearest neighbor (KNN) can enhance the available training sample size, provided that K is held constant.However, Mehta et al. [41] discovered that the magnitude of K would impact the efciency of the algorithm.Te precision of the results was enhanced by the use of an exhaustive procedure to determine a suitable value of K for solving the problem.Nevertheless, in partially labeled data, diferent learning densities of KNN may lead to biased results.When the value of K is small, model learning may not be comprehensive.When the value of K is large, operation cost may increase.In other words, the selection of the K value is a key issue to be explored.Terefore, this study attempts to develop a new semisupervised KNN learning approach that allows for the selection of K and can be combined with bacterial-based FS to form an efective classifcation method.
In this study, we propose a semisupervised bacterial heuristic feature selection (SHBFS) algorithm for the medical data classifcation mentioned earlier, i.e., label 2 International Journal of Intelligent Systems incomplete and high-dimensional redundant features.Te main contributions of this research are as follows: (i) A new self-adjusted semisupervised feature selection approach is proposed to solve the classifcation problems with missing labels and high-dimensional redundant features using a two-step self-training mechanism and an improved bacterial heuristic method (ii) Te strategies of hierarchical population initialization, dynamic learning, and elite population evolution are proposed to enhance the capacity of the bacterial heuristic algorithm in searching for various feature combinations (iii) Te proposed semisupervised bacterial heuristic feature selection algorithm is studied to be superior in addressing label incomplete and high-dimensional classifcation tasks in comparison to several state-of-the-art semisupervised FS algorithms Te rest of this paper is organized as follows: Section 2 gives the background of bacterial-based feature selection methods and some related works on these topics.Te proposed method is introduced in Section 3. In Section 4, the experimental confguration is given.Te experiments and analyses of the results are provided in Section 5. Te fnal section presents the conclusions and a description of future work.

Related Work
Te life cycle of the searching algorithm of the proposed bacterial-based FS approach in this study is inspired by BCO.Tus, this section briefy introduces its main principle and reviews of bacterial-based feature selection methods.More details are as follows.

Bacterial Colony Optimization.
Te life cycle of BFO is a triple-nested loop structure, which brings enormous computational complexity to solve high-dimensional problems.BCO simplifes the life cycle according to specifc rules to address this computational drawback.Similar to BFO, BCO contains reproduction and elimination-dispersal processes.However, the chemotaxis steps in BCO are simplifed as running and tumbling processes.Conditional controlling rules are used to cope with the traditional triple-nested loop structure to improve algorithm efciency.Te pseudocode of BCO is shown in Pseudocode 1.

Running Process.
Te running process is designed to speed convergence to the optimal position as where r i shows the learning coefcient randomly generated between [0, 1], g best is the best position in the current bacterial colony, and p best i represents the individual optimal position during the chemotaxis process.In addition, communication schemes such as dynamic neighbor and group-oriented learning can be embedded into the running process.

Tumbling Process.
Te tumbling process avoids being trapped in the local optimum and explores more potential solution spaces.As shown in Equation ( 2), a random direction vector Δ i is generated between [− 1, 1] for the ith bacterium.Although the randomly generated direction is correct, i.e., the current ftness is improved, the bacterium continues to exploit in the same direction.
Te reproduction and elimination mechanisms in BCO are consistent with those in BFO [29].For the reproduction operation, half of the population with better performance is used to replace the remaining half with poor performance, while the elimination of BCO is realized by assigning the bacterium a new and random position within the search space.It can be formulated as follows.
If P < Ped, then Otherwise, where Ped is a constant to determine the probability of the ith bacterium being assigned in a new position and up and lp are the upper and the lower boundaries of the search space, respectively.In this study, all BFO or BCO-based FS are referred to as bacterial-based FS, and the pertinent research reviews are detailed in the following section.

Bacterial Heuristic Feature Selection Methods.
In recent years, research on bacterial-based FS algorithm improvements and applications has gradually increased.Te process of bacterial-based FS consists primarily of the following steps [42]: (i) input the original data, (ii) randomly search for diferent features to form a small subset, (iii) use the subset to train the classifer, with the classifcation results to guide the bacteria search, (iv) output the optimal ftness of the International Journal of Intelligent Systems current iteration, and (v) loop steps ii∼iv until the maximum number of iterations is reached and output the fnal optimal ftness.In recent years, bacterial heuristic FS has many applications, including health care, recommendation, recognition, and model training [6,30,43,44].To improve the classifcation efect, bacterial-based FS has been improved in many ways.One improvement way is weight setting.Wang et al. [6] developed a weighted strategy to control the probability of diferent features being selected to enhance the accuracy.Te other is population optimization, which can be further subdivided into position updates and population updates.For position updates, Wang and Chen [43] incorporated chaotic mechanisms into the chemotaxis and position-updating stages of bacterial populations to increase their adaptability.For population updates, some studies divided bacteria into multiple groups to perform diferent jobs under the control of diferent modifed population updating strategies to improve the searching efciency [32].Furthermore, learning strategy optimization is also a common and useful method.For example, Kaur and Kadam [45] investigated multiobjective BFO to improve bacterial learning ability and improve the convergence speed of the algorithm.Wang et al. designed an adaptive attribute learning strategy to enhance the information communication ability among bacteria [30].
In summary, bacterial-based FS research focuses mostly on algorithm enhancement and the application of various situations.However, the combined efect of missing labels and high-dimensional redundant features poses signifcant challenges to optimizers (including bacterial heuristic algorithms) in FS, as the search space of FS problems expands exponentially and the proportion of incomplete data increases synchronously.Terefore, improving the efectiveness and efciency of bacterial heuristic algorithms while considering semisupervised learning methods and data dimension reduction simultaneously is worth studying.
Terefore, in this study, we focus on the development of a semisupervised feature selection approach based on bacterial optimization to solve classifcation problems with missing labels and high-dimensional redundant features.

Proposed Approach
Tis section presents the proposed SHBFS approach to solve classifcation problems with missing labels and high-dimensional redundancy features.Figure 1 shows the structure of SHBFS.From the fgure, we can see that the SHBFS approach consists of two main parts.On the one hand, a selfadjusted, semisupervised KNN strategy is presented for solving the problem of missing labels.On the other hand, an improved bacterial heuristic method for FS is presented for addressing the feature redundant problem, including three improvements: hierarchical population initialization, dynamic learning, and elite population evolution strategy.Hierarchical population initialization is used to obtain informative searching positions for bacteria to accelerate population convergence.Dynamic learning increases the searching variety of the algorithm by adaptively changing the search step length of bacteria.Finally, an elite population evolution strategy is employed to enhance the ability of bacteria to escape from the local optimum.
3.1.Self-Adjusted, Semisupervised KNN.Te proposed selfadjusted, semisupervised KNN is a two-step self-training method, consisting of K value determination and label construction.Furthermore, to make the semisupervised learning method more adaptive to datasets of diferent sizes, the K value is adjusted as follows: (1) Input: original data (2) Initialization: P (population), MaxIt (max iterations), and C (chemotaxis step size) (3) While maximum iterations are not satisfed do (4) For each bacterium do (5) Chemotaxis process (refer to Equation ( 1)) (6) Fitness evaluation (7) If previous ftness < current ftness (8) Tumbling process (refer to (2)) (9) End//alternative mechanisms (10) If the reproduction condition is satisfed do (11) Reproduction process (refer to article [29]) (12) End (13) If elimination-dispersal conditions are satisfed do (14) Elimination-dispersal according to Equations ( 3) and (4) ( 15) End ( 16) End (17) End//life cycle (18) Output: optimal position with the best ftness PSEUDOCODE 1: Bacterial colony optimization (BCO).4 International Journal of Intelligent Systems where NS is the number of data samples, which means that the K value linearly increases when datasets have smaller samples, while the logarithmic function is employed for datasets with larger sample sizes.Te primary process of the self-adjusted, semisupervised KNN is illustrated as follows: Step 1. K Value Determination Te samples with the labeled class and unlabeled class are separately saved in the dataset L and dataset U.As mentioned previously, a self-adjusted, semisupervised KNN is presented for labeling the data with no assignment in categories and fnding the best K value for classifcation.Tis step is to determine a K value for the label reconstruction using the labeled samples in L, provided in Pseudocode 2.
Step 2. Label Reconstruction Label reconstruction is provided in Pseudocode 3. First, self-adjusted, semisupervised KNN is used to predict the labels for the samples from the dataset U. Ten, newly labeled samples are moved from the dataset U to the dataset L. With increasing L in the space size, the self-training step can increase the learning efciency of the training model.

Hierarchical Population Initialization.
In BCO, the population is initialized randomly in a feasible space.However, addressing feature selection with high-dimensional features might make the bacterial colony fall into a poor searching position due to the high uncertainty of population initialization.As a result, more efort will be taken to jump out from their original position, which brings redundant computational complexity.To solve this problem, we develop a hierarchical population initialization strategy to enable bacteria to start at relatively good positions and further accelerate the convergence speed of the population.
In contrast to the aforementioned variable-size cooperative coevolutionary technique, hierarchical population initialization does not use multipopulation for searching.Instead, it uses the idea of the proposed feature hierarchical division strategy to reconstruct a smaller search space before each search.Te hierarchical population initialization consists of three steps.Te details are as follows: Step 1. Feature Ranking and Filtering Initially, a symmetrical uncertainty (SU) [21] ranking is performed on the original features according to Equations ( 6)- (9).In this step, the correlations between features and classes are ranked, and the features' relevant signifcance is ordered from highest to lowest.After ranking, the worst 10 percent of features with signifcance below the mean are eliminated.
(i) Symmetrical uncertainty (SU): the SU index has been widely used in traditional FS methods based on information theory.SU measures the uncertainty between feature variables f ∈ F, with label signals l ∈ L given in Equations ( 6)-( 9) based on the Shannon information entropy.In those formulas, p(f) is the prior probability for all values of f and p(f|l) is the posterior probability of f given l: International Journal of Intelligent Systems where H(f) and H(l) are the entropy of the feature variable f and the label signal l, respectively.N is the number of observation samples x ∈ X. SU(f, l) evaluates the correlation between features f and label signals l.A larger SU(f, l) indicates a higher signifcance of the feature f to the label l.Tis means that the feature f has more robust ability to discriminate labels, and the feature f needs to be selected into the feature subset.
Step 2. Feature Hierarchical Division As shown in Figure 2, it is assumed that the numbers of SU are signifcant in the box.According to their signifcance, sorted features will be divided evenly into three layers, L 1 , L 2 , and L 3 .After that, 80% of the feature dimension will be randomly selected from the L 1 set, 15% from the L 2 set, and 5% from the L 3 set to form a searching position for bacteria.Tis strategy can exclude subpar features and shrink the search space when dealing with high-dimensional features.
Step 3. Feature Weight Updating We assume that the feature size is H, and each feature of the i th bacterium is denoted as f i .We defne the current ftness as ft(f i ) and the historical ftness as Fit(f i ).In this paper, we adopt a weight mechanism [6] to evaluate the performance of features.Te rules are as follows: if ft(f i ) < Fit(f i ), then the performance weight pf i will be increased by (12).Otherwise, pf i will be decreased by Equation (13).
Given that, after completing the aforementioned procedure, there are still unselected features in each feature layer.To increase these features' probability of being selected in the future, we defned the unselected weight (Uweight) of f i as uf i and Uweight � uf 1 , uf 2 , ..., uf H  .Ten, the weight of each unselected feature will be updated by Equation (10) after Step 2. In each feature selection process, if one feature has been selected repeatedly in each search, then its uf i will be decreased by Equation (11): where f i is each feature of the ith bacterium, uf i is the unselected weight, pf i is the performance weight, (1) Input: L (L is used to save the labeled samples in the original dataset) (2) For each K obtained by Equation ( 5) (3) For each running time (4) Randomly divide L into two subsets/ * half with saved labels and half with removed labels * / (5) Use the labeled subset to predict the labels of the unlabeled subset by KNN (6) Record the accuracy of label prediction on each running time (7) End (8) Record the K value and the average accuracy of label prediction of each K (9) End (10) Output: Best (K value with the maximum average accuracy) PSEUDOCODE 2: Determination of the K value.
(1) Input: labeled dataset L and unlabeled dataset U, best (2) For each unlabeled sample in the dataset U (3) Use the samples from the dataset L to predict the label by KNN (using Kbest) (4) Assign the predicted label to the unlabeled sample and move it from U to L (5) End (6) Output: the updated dataset L PSEUDOCODE 3: Label reconstruction.6 International Journal of Intelligent Systems are layers obtained in Step 2, ft(f i ) is the current ftness of f i , and Fit(f i ) is the historical ftness.

Dynamic Learning.
In BCO, the chemotaxis step length of bacteria is governed by a set of fxed values denoted by C(i).However, the lack of variation in step lengths may trap bacteria within the same search space.On a long-term basis, the diversity of feature subsets will decline.Terefore, in SHBFS, the running process is the same as that in BCO, while the tumbling process is improved by employing a dynamic learning strategy to increase the search variety.
Specifcally, a dynamic learning strategy is proposed by adopting an adaptive chemotaxis step length changing strategy, which is denoted as aC [46], and a step length communication strategy dC.Equations ( 14) and (15) show that aC is afected by the bacterial size S, where S � 1, 2, . . ., i, i ∈ N + { }.We defne the current ftness as ft, the upper bound of the step length as C ub , and the lower bound of the step length as C lb .z is the disturbance factor.As the iteration proceeds, the disturbance efect of z on aC will become small.In addition, the larger the ft, the larger the value of aC.Te step length can be changed dynamically by aC: Tere is no information communication among the bacteria in BCO.To enhance the convergence speed and improve the search capability, this paper presents a step length communication strategy.Let dC be the step length after communication, and its size is S × D, where D � 1, 2, . . ., d, d ∈ N + { } is the dimension of bacteria.Equation ( 16) shows the communication process of i th bacteria in the t th iteration: where θ t i is the current position of the bacterium, c p and c g are constant learning factors, and R p and R g are random disturbance terms.R p and R g are confned to [0, 1].Te step length size in SBHFS learns from the best population record of individual bacteria (pbest) and the best population record of the bacteria (gbest).For this, bacteria will prefer to learn from the record with a larger position excursion.After updating the step length dC, bacterial population tumbling is conducted as follows: where ∆ i is a random direction vector generated between [− 1, 1] for the ith bacterium.Due to varying data sizes, the range of bacterial location change is greater in large samples than in small samples.Terefore, has been proposed in this paper to adjust the ofset of the bacterial position.In the tumbling process, q � 1; in the swimming process, q � 2. Te setting of O 1 and O 2 is explained in Section 4.3.We defne the number of features of the whole dataset as H, and the selected feature subset size is After tumbling, the feature subset is formed by Equation (18), where [•] represents the rounding operator.Te performance of the feature subset is measured by a classifer.Tus, we adopt the confusion matrix [47] as an evaluation metric, and the ftness is the error rate, which is updated by Equation (19).
where FP is the false-positive result, FN is the false-negative result, TP is the true-positive result, and TN is the truenegative result.ft is the current ftness.Fit is defned as the historical ftness.Te current best ftness is fpbest, and the historical best ftness is fgbest.Te main process of dynamic learning is shown in Pseudocode 4.

Elite Population Evolution.
In most bacterial-based methods, the population will randomly undergo dispersal elimination.Tis means that the new searching position of bacteria could be good or bad.Te bad searching position may waste the search time.To make population evolution more meaningful, this paper designed an elite population evolution mechanism using Pweight and Uweight values aforementioned in Section 3.2, which are to guide bacteria to conduct reproduction and dispersal elimination.
In SBHFS, either reproduction or dispersal elimination will be conducted per iteration.Te elite population evolution mechanism is proposed to determine which operation is executed, as depicted in Figure 3.After dynamic learning, bacteria will perform a swimming loop as BFO until they Divide the features into three parts evenly D is sample; F is feature; L is the layer.
If bT > control threshold//it means bad efect searching (7) Do dispersal elimination (8) Rank F by Uweight and save as SF//SF is the sorted feature (9) Ɵ � randomly selecting D features from the top half of SF (10) Else//D is the bacterial dimension (11) Do reproduction (12) For r � 1: Sr//Sr is the reproduction size of bacteria (13) Sort the position Ɵ of bacteria (14) Sort features F by Pweight and save as SF (15) Ɵ r+Sr � SF(Sr) (  International Journal of Intelligent Systems meet the threshold Ns (see Table 1).In the swimming loop, each bacterium will frst undergo a ftness evaluation to determine its performance.errTre is defned as the performance threshold.If the ftness exceeds errTre, it will be counted in bT.When bT is larger than half of Ns, we can simply regard this bacterium by performing a bad search, and dispersal elimination is conducted based on the Uweight matrix.Otherwise, bacteria will reproduce based on the Pweight matrix.Next, we calculate the ftness of the new bacteria and updated two weights (Pweight and Uweight).Finally, we repeat preceding steps until the end of the loop.Te main process of the elite population evolution mechanism is given in Pseudocode 5.
Te following is a description of reproduction and dispersal elimination processes: we assume that the bacterial dimension is D. In dispersal elimination, seldomly selected features are identifed frst by ranking features according to their Uweight.Te new searching position for bacteria is then determined by randomly selecting D features from the top half of the seldomly selected features.In reproduction, features are initially ranked by their Pweight to identify the highest-performing features of the search history.Ten, the half population of bacteria with poor performance will be gradually replaced by the dimensions of the highest-performing features.Te overall pseudocode of SBHFS is given in Pseudocode 6, and here, we analyze the computation time of feature selection (lines 3 to 13) of SBHFS.We suppose that there are S bacteria in the population, the max iteration time is I, the original number of features is D, and the swimming time is M and M ≪ I.

Experimental Configuration
In this section, detailed information on the datasets, benchmark methods, and experimental design is given.

Datasets.
In this paper, we verifed the proposed method on diferent datasets, consisting of fve high-dimensional microarray datasets and three benchmark datasets [6].Te description of the selected datasets is given in Table 2. #Features defne the number of original features, #instance denotes the number of samples, and the number of classes is given in #Class.#Smallest class is the size of the class with the fewest instances, whereas #Largest class is the size of the class with the most instances.Among these datasets, Colon, SRBCT, DLBCL, Leukemia-AllAML (LA), and Central Nervous System (CNS) are datasets with the highest number of features up to 7129.All feature values in those fve datasets are normalized within [0, 1].Besides, the number of instances relative to the number of features in the last fve datasets is considerably lower.Furthermore, all datasets are signifcantly imbalanced.Tese traits present FS and classifcation with formidable challenges.Since the proposed method is intended to handle missing label data, the original data will be transformed into partially labeled data, as described in Section 4.3.

Comparison Methods.
Te proposed SBHFS method is measured and compared with six recently widely recognized bioinspired wrapper FS algorithms, denoted as benchmark methods.Te parameters of the comparison algorithms are shown in Table 1.
Te adaptive chemotaxis bacterial foraging optimization algorithm (ACBFO) [42], improved swarming and International Journal of Intelligent Systems elimination-dispersal bacterial foraging optimization algorithm (ISEDBFO) [42], and multiobjective bacterial-inspired algorithm (MOBIFS) [48] are three recently proposed BFO variants for FS, which have good performances.ACBFO proposed an adaptive chemotaxis strategy, and ISEDBFO adopts a hyperbolic tangent function and a roulette technique to improve the search efects of BFO in FS.MOBIFS is an efective multiobjective BFO algorithm that handles FS issues using four information exchange mechanisms.Te slime mold algorithm (SMA) [49], binary manta ray foraging optimization (BMRFO) [50], and improved binary butterfy algorithm (IBFA) [51] are three other bioinspired algorithms that have good performance in FS.SMA imitates slime mold's foraging behavior and introduces the composite mutation strategy and restart strategy.BMRFO is a manta ray heuristic algorithm for FS problem solving that uses a rational transfer function.IBFA uses a new dynamic mutation operator to increase the diversity of the searching population.Except for the abovementioned six bioinspired benchmark algorithms, to better verify the efectiveness of SBHFS, we designed two more groups of comparison experiments: comparisons with standard BFO and BCO and comparisons with semisupervised methods: (i) Comparison with Standard BFO and BCO.Based on the basic bacterial evolutionary framework, the SBHFS method has been developed with some effcient strategies.Tis comparison intends to evaluate the enhanced performance of the proposed  Obtain ftness and feature subset (8) If s < Ns//Ns is the number of swimming times (9) s � s + 1; (10) Elite population evolution by Pseudocode 5 (11) Obtain ftness and the feature subset ( 12) End (13) End (14) Output: ftness and the feature subset PSEUDOCODE 6: SBHFS.[38] and the best K semisupervised KNN (BKSKNN) [41], are selected to be executed on eight incompletely labeled datasets with SBHFS.

Design.
In this study, all experiments were performed on a PC with Windows 10, Intel Core i7-7700, at 3.6 GHz, 8 GB RAM, and the Windows 10 operating system.Moreover, for all algorithms, the population size was set to 30, and the number of maximum iterations (max iter ) was set to 100.All experiments were run independently 30 times.Due to the facility to implement KNN, this paper used KNN as the learning algorithm to assess the classifcation performance after FS as in literature [38,41].In each dataset, 70% of the samples from each class were randomly selected as the training set and the remaining 30% as the testing set.To simulate partially labeled data, this paper divided the training set into half-labeled samples and half-unlabeled samples (see Section 3.1).According to the previous experiments [6], only a small subset of tenths of the features provides the ideal solution.When the number of features for the last fve datasets in Table 1 is less than 50, it is possible to attain high classifcation accuracy.Te desired number of features (Fno.)therefore varies between 1 and 10 for the frst three datasets (with reduced feature subset size) and between 5 and 50 for the remaining datasets.Te parameters of all benchmark methods are given in Table 2.
For evaluation metrics (Equations.(20)-( 25)), the classifcation error rate (denoted as Err.), true-positive rate (TPR), true-negative rate (TNR), precision (Pre), G-means (GM), and F1 score (F1) are used to assess feature selection results [52].Te efectiveness of feature selection approaches can be fully refected by these evaluation metrics.Te performance of the classifcation result for imbalanced data is assessed using the error rate and G-means.Te TNR measures a method's capacity to isolate true-positive samples (minority samples) from all other samples, whereas the TPR measures a method's ability to isolate negative samples (majority samples) from all other samples.Precision gauges a method's capacity to distinguish genuine positive samples from all other positive samples (including true positives and false positives).A thorough evaluation of TPR and precision performance is provided by the F1 score.
Additionally, the Wilcoxon rank-sum test [53] was performed on each approach.It is marked as "�" when the p value is greater than 0.05, meaning there is no signifcant diference under the signifcance level of 5%.If the p value is less than 0.05, the recommended method is considered more signifcant than the comparison algorithms and marked as "+."Otherwise, it is marked as "− ."

Experimental Results and Analyses
Tis section gives the comparison results and analyses of the three experimental groups.First, the improvement of the proposed bacterial heuristic optimization algorithm is proved by making comparisons with standard BFO and BCO for feature selection.Next, the enhanced semisupervised method is verifed and discussed with two KNNbased semisupervised methods.Finally, the efectiveness of the overall proposed SBHFS method for tracking incomplete data classifcation is demonstrated.In Tables 3-6, the value in bold represents the best value for the current indicator.When the p value is "�," there is no signifcant diference between algorithms.Terefore, the evaluation index score corresponding to the p value will not be bold.

Comparisons with Standard BFO and BCO.
Tis comparison aims to verify the efectiveness of the proposed three strategies in BHFS, including hierarchical population initialization, dynamic learning, and elite population evolution.Table 3 shows the comparison results among the proposed bacterial heuristic optimization algorithm for FS (BHFS) and BFO for FS (BFOFS) and BCO for FS (BCOFS).Te rows of Ave. and Std.show the average and standard deviation classifcation metrics of 30 independent runs, respectively.Te rows of p show the signifcance values obtained by the Wilcoxon rank-sum test.
From the specifc data, the feature numbers of these algorithms are consistently unchanged.Tis is because the controlling strategies for BHFS, BFOFS, and BCOFS are the same (see Section 4.3).Consequently, there is no diference in the signifcance of Fno.
On the whole, excluding Fno, BHFS obtains signifcantly better results in 92 out of 96 cases versus BFOFS while achieving statistically similar performance in 4 cases.Since the proposed three strategies of BHFS are the improvements of BFOFS and BCOFS, they are also the key modules that compose BHFS, where each strategy is interlinked.Tis result proves that BHFS is better than BFOFS and BCOFS, which refects that our improvements are efective.

International Journal of Intelligent Systems
From the comparison between BFOFS and BCOFS, we can see that the classifcation results of BCOFS perform better than those of BFOFS.Tis demonstrates that the improved life cycle model in BCOFS performs better than the triple-nested loop structure, in which optimization capability is further enhanced.Moreover, compared with BFOFS and BCOFS, BHFS achieves signifcantly better performance in 86 out of 96 cases while obtaining statistically similar results in 10 cases.In particular, it almost achieves the best classifcation error rate for eight datasets.Except for the German dataset with the 26.9% classifcation error rate on average, BHFS for other datasets has achieved an accuracy rate of more than 90%, even the 100% accuracy rate achieved for LA and DLBCL, two microarray datasets.Tus, it is evident that BHFS outperforms both BFOFS and BCOFS.Te primary reason is that BHFS has further developed the life cycle with the three proposed strategies, which improve the algorithm's search ability to locate the optimal space in the population initialization step, increase the probability of individual learning in the chemotaxis stage, and enhance the quality of population evolution in the reproduction and dispersal-elimination stages.12 International Journal of Intelligent Systems Table 4 illustrates the average calculation time for feature selection and classifcation in each run.Compared to all bacterial-based methods, BHFS achieves a superior classifcation efect with less computational complexity.Troughout the iteration period, the computing time of the BFO algorithm increases exponentially due to its nested structure.However, life cycle enhancement ofered by BCO streamlines this procedure, hence reducing computing cost dramatically.Inspired by BCO, BHFS modifes the parts of the population update based on BCO so that reproduction and dispersal-elimination operations can be carried out just one at a time, and the algorithm is additionally programmed with a rule to instantly stop iterating when the ideal solution occurs repeatedly.

Comparisons with Semisupervised Methods.
Since BHFS demonstrates its superiority and usefulness in Section 5.1, this section evaluates the efectiveness of the proposed self-adjustment, semisupervised KNN strategy for BHFS.In the following context, we refer to BHFS with the self-adjusted, semisupervised KNN strategy as SBHFS.Te compared two semisupervised techniques based on KNN are as follows: One is the semisupervised KNN (SSKNN) [38], which assigns the unlabeled sample's label to the label of the labeled sample that is closest to it.Te best K semisupervised KNN (BKSKNN) [41] is another comparative technique.By learning about their neighbors, BKSKNN also labels unlabeled samples.In contrast, BKSKNN has two steps as opposed to SSKNN, the frst of which is to compute the accuracy of the labeling result of KNN using various K values.K is then set from 1 to 51.Te process fnds the best K value with the highest level of labeling accuracy and then uses the best-labeled data to perform the subsequent procedure.Tese two semisupervised learning approaches are embedded into BHFS for the comparison of the efectiveness Te bold values for each method mean that they achieve the best results under the evaluation index.
International Journal of Intelligent Systems  5 shows the average, standard deviation classifcation metric, and statistical test results of diferent semisupervised learning approaches for benchmark datasets.Since all three methods are based on BHFS, their feature subset size control methodologies are identical (see Section 4.3).Consequently, the signifcance of Fno.does not change in the three methods.SBHFS outperforms BHFS-SSKNN and BHFS-BKSKNN in the majority of classifcation evaluation metrics, demonstrating the efcacy of the self-adjusted, semisupervised KNN technique.Self-adjusted, semisupervised KNN will adaptively update the K value to fnd a better label for each sample from varying data sizes, whereas SSKNN will simply apply the fxed K value that limits the algorithm's performance.
Compared with BHFS-SSKNN, SBHFS obtains a lower error rate in all data cases.In other classifcation metrics, BHFS-SSKNN shows better performance with the truepositive rate (TPR), indicating that the SSKNN method is more capable of correctly labeling positive samples, while SBHFS can achieve signifcantly better or similar performance with the true-negative rate (TNR).Tis proves that using TPR or TNR metrics alone to judge the performance of algorithms is one-sided.Terefore, it is necessary to deeply analyze the algorithm efect through the remaining three comprehensive evaluation indicators (Pre, GM, and F1).Te results demonstrate that the Pre scores of SBHFS are 100 percent superior to those of BHFS-SSKNN, while the GM scores of SBHFS are higher for fve out of eight datasets, and the F1 scores are higher for half of the datasets.Tus, selfadjusted, semisupervised KNN does improve the performance of SBHFS.
Figure 4 shows the bar chart of the comparison results with semisupervised methods.Te horizontal axis corresponds to evaluation metrics.Fno. is excluded since the comparison algorithms use the same feature subset size control methods.Te ordinate represents each algorithm's score, with larger values indicating superior performance.From Table 5 and Figure 4, we can see that SBHFS can achieve statistically signifcant better classifcation performance for all high-dimensional datasets with the most diferent metrics.For benchmark datasets, statistical signifcance is not as obvious.One reason is that in a lowdimensional space, the KNN-based semisupervised learning method is less afected by the value of K. Terefore, we conclude that the proposed self-adjusted, semisupervised From the specifc results, SBHFS performs best for fve datasets (i.e., LA, CNS, Colon, SRBCT, and DLBCL) with the most evaluation metrics.However, for the Australian dataset, the proposed method does not perform best.Comparing the results of other algorithms reveals that the efect of SBHFS may be infuenced by the sparsity of features.To be specifc, according to the results for the Australian

16
International Journal of Intelligent Systems dataset, we can see that smaller subsets produce better results and that the number of efective features of the Australian dataset is about 1 or 2. Nevertheless, SBHFS sets the subset size to integers between 1 and 10 when the total dataset contains fewer than 50 features (see Section 4.3).Tis increases the average size of the subset in each iteration, which exceeds the efective feature size of the Australian dataset.Except for this, based on the results of statistical signifcance tests for all datasets, SBHFS achieves considerably enhanced efciency in 229 of 336 cases (39 cases with the "�" p value are excluded), which illustrates that SBHFS performs well for most of the datasets, especially for high-dimensional ones.
From the perspective of the fundamental algorithm, SBHFS achieves notable signifcance in 150 out of 224 cases (25 cases with the "�" p value are excluded) in comparison to four other bacterial-based FS methods, i.e., ACBFO, ISEDBFO, SMA, and MOBIFS.Te results demonstrate that the improvements in SBHFS are better than in other bacterial-based algorithms in this study.Te proposed strategies optimize the searching ability of the algorithm, achieving smaller classifcation error rates and better results on other evaluation indexes.Moreover, the superiority of SBHFS is more obvious for high-dimensional datasets (i.e., #features > 1999).For example, compared with ISEDBFO, SBHFS achieves 11.2% lower Err.for Colon (#features � 1999) and 9.8% lower Err.for SRBCT (#features � 2308).Tis demonstrates that the dimension redundancy capability of the suggested feature selection approach is satisfactory.
Furthermore, compared with BMRFO and IBFA, SBHFS achieves superior performance in 79 out of 112 cases (14 cases with the "�" p value are excluded).Specifcally, compared to IBFA, SBHFS achieves 9.9% higher F1 for LA (# Features � 7129) and 1.4% higher F1 for SRBCT (# International Journal of Intelligent Systems Features � 2308).Compared to bacterial-based algorithms, the efects between SHBFS and other bioinspired algorithms are better.Tis shows that the proposed modifed bacterialbased FS algorithm is better for solving dimension reduction problems and that SBHFS can be used not only for highdimensional datasets but also for some low-dimensional datasets.
To verify the stability of SBHFS, the boxplot in Figure 5 shows the comparison results of the SBHFS with other bioinspired FS methods.According to the boxplot, except for the Australian dataset with the 35.7% average classifcation error rate, SBHFS has achieved the best accuracy rate compared with other algorithms for other datasets.Moreover, the median results show that SBHFS generally achieves lower error rates, and the width of boxes indicates that SBHFS is more stable than other comparison methods.Tis is due to the dynamic learning method that allows the bacterial population of each iteration to move closer to the optimal solution instead of searching randomly.

Conclusions
Tis paper presents a semisupervised bacterial heuristic feature selection algorithm (SBHFS) to address label incomplete and high-dimensional classifcation problems.Te self-adjusted, semisupervised KNN strategy can reconstruct labels efectively with the help of the two-step self-training mechanism, and the improved bacterial heuristic method can enhance the searching precision by increasing feature selection variety and cooperating with hierarchical population initialization, dynamic learning, and elite population evolution strategies.To be specifc, hierarchical population initialization accelerates the convergence of the algorithm with the help of the SU feature ranking method and the proposed layer mechanism.Ten, the dynamic learning strategy increases the diversity of the feature subset because it promotes the communication of searching bacteria.Furthermore, the proposed elite population evolution strategy changes the population update method of the bacterial-based algorithm and improves its optimization performance.Te comparisons with the semisupervised methods show that the proposed semisupervised learning method is efective for labeling incomplete data, especially for high-dimensional datasets.
Although the proposed SBHFS approach has shown promise in high-dimensional classifcation with missing labels, the proposed semisupervised approach is based on the enhancement of the KNN semisupervised technique, and the semisupervised method based on other learners is not considered.Tis may limit the efciency of the bacterial heuristic algorithm in FS for classifcation issues involving plenty of sparse features.Considering this information in feature selection may help bacterial heuristic algorithms achieve better results, although this is challenging to accomplish.In our future endeavors, we will consider this direction.International Journal of Intelligent Systems

Figure 1 :
Figure 1: Te overall structure of the proposed SHBFS approach.

First, we analyze
the initialization part.Te feature ranking by SU and feature weighting are the main time consumption step.Te time complexity of calculating SU scores and weight for features are both O(D), which are related to the number of features.Tus, the time complexity of the initialization part is O(D) + O(D) � O(D).We further analyze the time computation of the main loop of Pseudocode 6 from lines 4 to 13.At iteration I, if the dynamic learning step in line 6 is conducted, the time complexity of this step is O(SN I ) according to Pseudocode 4, where N I is the selected features at iteration I and N I ≤ D. In the elite population evolution part in line 11 of Pseudocode 6, if the evolution choice is dispersal elimination, the time complexity is O(SM).If the evolution choice is reproduction, the time complexity is O(SM) + O(s) � O(SM), where s is the population to be updated and s ≤ S (usually, s is half of the population).Terefore, in the worst case for one iteration I, the complexity of SBHFS is O(D) + O(SN I ) + O(SM) � O(SN I ).Since N I denotes the number of selected features at iteration I and N I is smaller than or equal to D, the time complexity of SBHFS at iteration I is smaller than or equal to O(SD).Tus, we can reach the conclusion that the time complexity of the main loop of SBHFS during I iterations is not larger than O(ISD).

Figure 4 :
Figure 4: Te bar chart of the comparison results with semisupervised methods.

Figure 5 :
Figure 5: Te box-line plot of comparison results with benchmark methods.

Table 2 :
Datasets for feature selection.

Table 3 :
Te results of the comparisons with standard BFO and BCO.

Table 4 :
Te average computation time (minutes) of BHFS, BFOFS, and BCOFS for each run.
Te bold values indicate that the SBHFS takes the least amount of time in each independent run.

Table 5 :
Te results of the comparisons with semisupervised methods.

Table 6 :
Te result of the comparisons with benchmark methods.
5.3.Comparisons with BenchmarkMethods.We further compare the proposed SBHFS method with other bioinspired wrapper FS algorithms.Table6shows diferent evaluation metrics (i.e., Err., Fno., TPR, TNR, Pre, GM, and F1) of SBHFS and benchmark methods for test sets.In general, SBHFS achieves competitive results compared to the