Enhanced Feature Selection Based on Integration Containment Neighborhoods Rough Set Approximations and Binary Honey Badger Optimization

This article appoints a novel model of rough set approximations (RSA), namely, rough set approximation models build on containment neighborhoods RSA (CRSA), that generalize the traditional notions of RSA and obtain valuable consequences by minifying the boundary areas. To justify this extension, it is integrated with the binary version of the honey badger optimization (HBO) algorithm as a feature selection (FS) approach. The main target of using this extension is to assess the quality of selected features. To evaluate the performance of BHBO based on CRSA, a set of ten datasets is used. In addition, the results of BHOB are compared with other well-known FS approaches. The results show the superiority of CRSA over the traditional RS approximations. In addition, they illustrate the high ability of BHBO to improve the classification accuracy overall the compared methods in terms of performance metrics.


Introduction
In recent days, the high dimensionality [1] became a big problem [2] in different fields such as human activities recognition [3], silicon-on-insulator FinFETs [4], nonlinear servo systems [5], computer vision [6], processing of IoT data [7], and feature selection. Moreover, utilizing the techniques of feature selection (FS) appeared more in many problems like the metaheuristic search techniques [8][9][10], for example, salp swarm algorithm (SSA) [11], grey-wolf optimization algorithm (GWO) [12], conflict monitoring optimization [13], runner-root algorithm (RRA) [14], boosting arithmetic optimization algorithm [15], electric fish-based arithmetic optimization algorithm [16], fractional calculus-based slime mould algorithm [17], and moth-flame optimization (MFO) [18]. In [19], the salp swarm algorithm was hybridized with the particle swarm optimization algorithm, where the hybridized algorithm is noted as SSAPSO, which is efficient in the procedures of exploitation and exploration. Khotimah et al. [20] hybridized the genetic algorithm (GA) with the naïve Bayes classification (NBC) to perform the exploration procedure through the classification of some incomplete data experiments. Other enhancements on other metaheuristic algorithms like the grey wolf optimization algorithm (GWO) have been performed, for example, in [21], the authors enhanced the exploration and the exploitation activity of GWO and used the new algorithm for selecting features for the galaxies images then classifying them.
ey utilized the opposition-based learning (OBL) and the chaotic logistic map for recommending solutions for avoiding the drawbacks of the random solutions. In addition to the above, utilizing the operators of DE with GWO as local operators improves the exploitation ability of GWO. and such hybrid solutions are updated. In addition, the disruption operator (DO) is efficient for the exploration procedure and keeps the diversity in the population of the solutions. Yet, in the data analysis problems, the uncertainty is a big problem which may cause defect in the problem solution.
Recently, the rough set theory (RS) [22,23] has been used as an efficient tool in solving such problem. Many recent applications included utilizing RS for reducing the dimension such as feature selection [24], pattern recognition [24], and machine learning [25]. In such context, more work is in the literature review as in [26], in which the authors developed a filter feature selection method which was based on utilizing the rough set theory for identifying all class documents. A multiplication of the class by a parameter in which the documents that are proved to be existed inside is performed, and then the authors utilized the new technique for text classification. In [27], Nabwey et al. introduced using the rough set with the hypergraph for determining the relevant subset of features and utilized the proposed technique for the wart treatment prediction. In [28], Zhao et al. proposed using the rough set theory, especially the representative entropy, and the proposed method called classified nested equivalence class (CNEC) has the ability for computing the information entropy and significance for performing the feature selection. e authors tested the proposed method on the KDD Cup competition and some datasets from the UCI repository. But, the rough set theory has a drawback in dealing the feature selection that it selects only one feature in each iteration and it has a difficulty in dealing with the real-world applications which require that RS performs features discretization into many partitions. Moreover, RS is used with the metaheuristic techniques that have the ability for avoiding the RS obstacles. From the examples in the literature, in [29], the authors introduced using RS with the binary whale optimization algorithm and tested the performance on 32 datasets taken from the repository of UCI machine learning. Ibrahim et al. in [30] combined the runner-root algorithm (RRA) with RS and also with NRS and utilized the proposed technique for the galaxies images classification after selecting the relevant features. Acharjya [31] merged RS with artificial bee colony (ABC) algorithm for the application of the hepatitis disease diagnosis. Jothi [32] combined RS with the firefly-based quick reduct algorithm and used the proposed technique, RS firefly-based quick reduct (TRSFFQR), for dealing the MRI brain images. Tawhid et al. [33] combined two metaheuristic algorithms, binary particle swarm optimization and flower pollination algorithm, with RS, and then found the binary version of the combination and used it for solving some binary problems. Patra and Barman [34] proposed using RS with developing the hyperspectral band selection method for hyperspectral band selection. Patra et al. [35] presented a multiobjective FS method depending on the cultural algorithm combined with RS and compared their method with other methods. Sahlol et al. [36] proposed combining RS with the whale optimization algorithm (WOA) and used the modified algorithm for the recognition of the handwritten Arabic optical. Jothi et al. [37] combined RS with the Jaya optimization and used the modified method as feature selection method after that using it for acute lymphoblastic leukemia classification. In [38], Mafarja and Mirjalili combined the ant lion optimization (ALO) with RS for solving some classification problems. Reddy et al. [39] combined RS with the fuzzy rule for extracting the features from the dataset of the heart disease, then selecting the relevant features for performing classification. Zou et al. [40] introduced the neighborhood RS combined with the fish swarm algorithm for solving the feature selection of some datasets, where the proposed method depended also on combining the tolerance rough set (TRS) and firefly algorithm (FA). Such techniques which are used for solving feature selection problems use RS, especially elementary sets according to some classes called equivalence ones where the equivalence can be applied just for complete data, so it may be not suitable for many cases. e issue of incomplete knowledge became a pivotal problem for several researchers, essentially in the domain of information system. ere are different methods to understand indistinctness and doubtful knowledge, one of them is rough set theory (RST). e explanation of RST counts on relating one subset with two sets called upper and lower approximations that are employed to set the boundary region and accuracy degree of that subset. RST was initiated by Pawlak [41] which has been generalized by many methods as [42][43][44][45][46][47][48][49][50][51][52][53][54]. What interests us around those ways whose ideas that are motivated by topology, for instance, the methods of structure lower and upper approximations utilizing diverse kinds of neighborhoods, such as zR, Rz neighborhoods [53,55], 〈z〉R, R〈z〉 neighborhoods [47], and zRz, R〈z〉R neighborhoods [50,54]. Neighborhoods are significant approaches to decrease the boundary region and enhance the accuracy measure. e wish of maximizing the accuracy degree of any subset is a prime motivation factor for introducing C-neighborhoods system [44] that exemplifies a beneficial tool to increase the lower approximation and decrease the upper approximation compared with R-neighborhoods given in [44,46]. In addition, C-neighborhoods preserved most characteristics of Pawlak's approximations compared with the other systems of neighborhoods.
Furthermore, metaheuristic techniques, MHT, have more drawbacks like that more MHT depend on the evolutionary process like the evolutionary algorithms or extension for them, and also, the solutions quality is influenced by the stuck local optima. In addition, these MHT cannot solve all problems with the same efficiency according to the no free lunch theorem [56]. erefore, this motivated us to propose an alternative FS method that depends on a recent efficient MHT called honey badger optimization (HBO) [57] algorithm that simulates the behaviour of honey badger to catch its prey. According to the mathematical model of these behaviours, HBO has been applied to solve global optimization problem and engineering problems [57]. In addition, we developed an extension of RST, named containment neighborhoods RSA (CRSA), as fitness value. According to our Knowledge, this is the first time the HBA is combined with extension rough set and used as FS method. In addition, the CRSA is a new extension of RST that is not used yet. e HBOCRSA is used for solving the feature selection, which is initialized where the dataset is split into two parts, training and testing. After that, initialize the first set of candidate solutions by utilizing the containment neighborhoods RSA (CRSA) as a new extension of knowledge which represents a fitness function to use the training part for assessing each solution quality. en, reaching the finest solution and using the operators of the proposed method for modernizing the current agents, as the stopping condition is met, the updating process is performed; then, obtaining the optimal solution can be used to remove the irrelevant features and evaluate the classification performance, then reducing the testing set of features.
e main contribution of this study can be summarized as follows: (1) Propose an extension of rough set approximation (RSA) named containment neighborhoods RSA (CRSA). e new extension generalizes the traditional concepts of RSA and obtains valuable consequences by minifying the boundary areas. e rest of this paper is organized as follows: in Section 2, the basic notation about multiknowledge rough set is given. Section 3 presents the extension of the rough set approximations based on containment neighborhoods (CRSA). In Section 4, the steps of the proposed feature selection method are introduced. Experimental results and discussion are given in Section 5. e conclusion and future work are discussed in Section 6.

Preliminaries
Let Λ be a universe (nonempty finite set), and R be any relation on Λ, i.e., R ⊆ Λ × Λ. A form (w, z) ∈ R means that w is in relation R with z, which is abridged as wRz.
Definition 1. Let Λ be a universe. A binary relation R on Λ is called [41] (1) Equivalence if it is transitive (yRw whenever yRz and zRw), symmetric (wRz if zRw ), and reflexive (wRw for every w ∈ Λ) (2) Tolerance if it is both reflexive and symmetric (3) Dominance if it is both reflexive and transitive Neighborhood systems have been adopted to characterize relationships between objects in database methods for the target of approximate retrieval. Definition 2. Let R be any binary relation on a universe Λ.
Definition 3. [42]. Let each R ℓ , ℓ ∈ 1, 2, . . . , n { } be a binary relation on a universe Λ. If M ⊆ Λ, then the n lower and n upper approximations of M are given by Abu-Donia [43] employed the neighborhood 〈z〉R to characterize other distinct approximations of any set w.r.t reflexive, tolerance, dominance, and equivalence relations. Furthermore, he deduced the KRA (knowledge based on the rough approximation) approach, which generalized previous RS methods Definition 4. [43], Suppose each R ℓ , ℓ ∈ 1, 2, . . . , n { } is a binary relation on a universe Λ. en, the n lower and n upper approximations of a subset M of Λ are given by
en, the following results hold: Computational Intelligence and Neuroscience 5 { }, then the concepts of zC ℓ , 〈z〉C ℓ (resp. C ℓ z, C ℓ 〈z〉 and zC ℓ z, C ℓ 〈z〉C ℓ ) are incomparable (3) Mainly, the converse of Proposition 4 and Corollary 3 cannot be valid as we see in example 2 In view of Propositions 3 and 4, we have Proposition 5. suppose that R ℓ : ℓ � 1, 2, . . . , n} is a finite class of dominance relations on Λ and z ∈ Λ. en, for all ℓ, the following results hold: . , n} is a finite class of dominance relations on Λ and z ∈ Λ. en, . , n} is a finite collection of dominance relations on Λ and x, y ∈ Λ. en, for each ℓ, y ∈ xR ℓ x iff yR ℓ y � xR ℓ x.

□
In view of (3) of Proposition 5, the following corollary holds.

□
Depending on containment neighborhoods generated from the collection of finite binary relations, we propose in the next part some new sorts of lower and upper approximations.
Definition 6. Suppose that every R ℓ , ℓ ∈ 1, 2, . . . , n { } is a relation on Λ and z ∈ Λ. Based on containment neighborhoods zC ℓ z � zC ℓ ∩ C ℓ z, ℓ ∈ 1, 2, . . . , n { }, the pair ( (n) E ⊗ L(M), (n) E ⊗ U(M)) stands for lower and upper approximations of a set M, respectively, defined as the following: (iii) e n-E ⊗ boundary of M is (iv) e n-E ⊗ accuracy measure of any set M is where the cardinality of any set is denoted by the symbol |.|.
, and it is rough set otherwise.
Our model acquires whole essential properties of the original rough set model, thus various main characteristics of the n-E ⊗ lower and n-E ⊗ upper operators in the next proposition are inserted. Proof.
e characteristics of (n) E ⊗ L(.) and (n) E ⊗ U(.) operators are dual, then we will examine one of them.
□ Mostly, the opposite of (4) of Proposition 8 cannot be valid, as we show in the next example.

Example 4.
If Λ � β 1 , β 2 , β 3 , β 4 and △ is an identity relation on Λ, then we discuss the following cases: (1) Suppose R 1 , R 2 are two binary relations on Λ defined as Computational Intelligence and Neuroscience 7 (3) Suppose R 1 , R 2 are two tolerance relations on Λ defined as Next, we have the following remark using Corollary 5.

Remark 6. According to Example 5,
(1) e converse of 1 and 2 of eorem 1 may not be true in general, i.e.,

Proposed Feature Selection Method
In this section, the proposed FS method is based on HBO and CRSA. However, the basic steps of the HBO are introduced in Section 4.1.
en, we discuss the developed method.

Honey Badger Optimization Algorithm.
In this section, the mathematical notation of honey badger optimization algorithm is introduced. In general, HBO emulates the behaviour of honey badger to catch its prey. is process is performed through a set of stages named Digging and Honey. In the Digging stage, the prey is determined based on the smelling of honey badger, whereas in the Honey stage, the honey badger follows the honey bird for determining the beehive.
e steps of HBO begin by setting the initial of value of agents using the following: where LB is the lower boundary and UB is the upper boundary of the search space. r 1 ∈ [0, 1] refers to a random 8 Computational Intelligence and Neuroscience number. Followed by [57], the exploration (Digging) and exploitation (Honey) are balanced using the density factor (α) that is defined as where C > 1 stands for a constant value, T represents the total number of iterations, and t indicates the current iteration. e next step is to update the solutions using the operators of Digging stage.
is is performed based on the cardioid movements formulated as In equation (7), Z new stands for the new value of Z i , Z b represents the best solution found so far, and r 3 , r 4 , r 5 , and r 6 are random numbers. (B) is a constant number. (F) is a parameter used to control the search direction and it is the value determined using the following equation: I stands for the smell intensity of the prey (x b ) and it is used to represent the distance between the x b and x i . It is formulated as Meanwhile, the solutions can be updated using operators of Honey stage. is process is achieved using the following formula: where r 7 is a random number. e steps of HBO are given in Algorithm 1.

Proposed HBOCRSA Framework.
We present the steps of the developed feature selection based on modified HBO and combined it with CRSA approximations in this section. Figure 1 depicts the enhanced method structure. e main aim of the binary HBO (BHBO) algorithm based on CRSA (HBOCRSA) is identifying the subset of features that are more relevant regarding the dependencies degree among the features with the target features. As well as to deal with the discrete FS problem, binary HBO is applied. e initialization of HBOCRSA is represented in identifying the features number (f k , k � 1, 2, . . . , d) of the dataset corresponding to each solution dimension. en, such dataset is split into training and testing sets. As well as, through the interval [0, 1], the generation of the initial value for N agents Z is performed. Furthermore, the computation of the value of the objective function of Z i , i � 1, 2, . . . , N with determining the finest among them is performed.
rough utilizing the HBO operators, modernize the agents. e modernization process is repeated till the stopping condition is met. Each phase is discussed in detail as follows.

First Stage: Generating Solutions.
e dataset is split into training and testing sets, by the percentages 80% and 20%, respectively, through the enhanced FS method. e N agents Z are initialized by the following equation: � 1, 2, . . . , N, j � 1, 2, . . . , D. (12) In equation (12), UB j and LB j exemplify the dimension j upper and lower boundaries.

Second Stage: Updating Solutions.
e proposed technique is initialized through computing, for each agent Z i , the objective function value for two iterations.
e main goal of the first iteration is transferring Z i into binary, as shown in Such process is used for transferring the real-valued agents to discrete ones which can deal with making them the feature selection problem. e second step in the developed method contains choosing the features that are more relevant regarding ones in BZ and at the same time deleting the irrelevant features that correspond to the zeros. Furthermore, the quality first features are evaluated based on the function (Fit i ) given as in equation (14).
is is considered as the optimization problem which aims to minimize the error classification and minimize the number of features.
where |Z i | exemplifies the features number, as in equation (14), which is chosen by utilizing the Z i current value. In addition, d exemplifies the number of features inside the dataset. R and η are the coefficients that have the ability for balancing the number of selected features with the dependency degree c C (D) given by In equation (15), POS C (D) represents the positive region defined as where B(Z) exemplifies the lower approximation given in equation (1). In addition, c C (D) aims for computing the features approximating power.

Computational Intelligence and Neuroscience
Finally, the third step aims for identifying the finest agent Z b , after that the agents are modernized by utilizing the HBO operators given in Section 4.1.

ird Stage: Stop Conditions.
Meeting the stopping condition for the proposed technique can be tested and returning the finest solution as the solutions accepted with repeating the strides existed in the second stage. Moreover, the testing set is used to assess the relevant features existed in the finest solution Z b and perform classification for the reduced set by utilizing the KNN classifier for the features quality assessing.

Experimental Results and Discussion Using
Real-World Datasets e performance of the developed method can be justified through the modified CRSA which is considered as RS extension with concluding some of the experiments. rough our work and experiments, we depended on using ten datasets, in addition to comparing the results with other FS approaches. In this experiment, the proposed BHBA is used to determine the relevant feature from the given data. To conduct this, the CRSA is used as a part of the fitness function that is defined in equation (14).

Datasets Description.
e quality of HBOCRSA is validated by utilizing ten datasets with varying dimensionality. e datasets, gathered from various fields, are taken online from UCI [58], and those ones that we depended in our work are shown in Table 1 where they have various instance numbers, feature numbers, and classes. e efficiency of HBOCRSA can be validated and justified, where the dataset is split into 80% and 20% training and testing sets, respectively, where these percentages form the whole number. rough 30 independent times, each (1) Input: the value of number of solutions N, total number of iterations T.
(2) Set the initial value for a set of solutions Z using equation (5).
Update the parameter C using equation (6) (8) for i � 1: N do (9) if (rand < 0.5) then (10) Update Z i based on equations (7)-(10) (11) else (12) Update Z i based on equation (11)  (13) end if (14)  algorithm has been conducted for guaranteeing the comparison quality. e comparisons are performed through using the algorithms of the salp swarm algorithm (SSA), selfadaptive differential evolution (SaDE), grey-wolf optimization (GWO), genetic algorithm (GA), and teachinglearning-based optimization (TLBO) as competing ones for feature selection. With taking the note that each algorithm parameter is set regarding its implementation, the iteration number and the population size are set to 15 and 20 which are considered the common parameters.
ereafter, each solution in the population has dimension which equals to the features number for each dataset.

Performance Measures.
More metrics for the proposed technique, HBOCRSA, assessing can be utilized. Some of such measures or metrics are accuracy and the selected features number; such measures can be defined as (i) Average accuracy (AVG Acc ) exemplifies the accuracies average; overall, the runs number (N r � 30) is given as (ii) Average number of the selected features (AVG |BZ Best | ) calculates the features average chosen by each algorithm through all runs and it is given as In equation (15), |BZ k Best | exemplifies the finest solution cardinality at k th run.

Experimental Series 1: Results and Discussion Using UCI Datasets.
e comparison results between the developed method and other competitive methods are given in Tables 2-5 and Figures 2 and 3. From these results, it can be noticed that the HBO using CRSA (i.e., HBOCRSA) has high accuracy at four datasets (i.e., D2-D4 and D8). However, for the other datasets, its results are competitive with other methods, for example, at D1, D7, D9, HBOCRSA allocates the second rank. Moreover, the average of each method among the tested datasets is given in Figure 2 and it can be noticed the difference between HBOCRSA and LSHADE, TLBO, SSA, SGA, SaDE, and bGWO is 3.527%, 2.711%, 3.799%, 4.754%, 2.638%, 5.258%, respectively. In case the traditional RS is used as part of fitness value (as in Table 3), HBORS provides the better accuracy at D2, D4, D7, and D9. In addition, its average overall, the tested ten datasets are better than other methods with difference 3.780%, 3.7918%, 3.009%, 1.6491%, 3.218%, and 8.0456% when compared with LSHADE, TLBO, SSA, SGA, SaDE, and bGWO, respectively (see Figure 2).
Besides the accuracy obtained by each FS approach using either RS or CRSA, it can be noticed that the accuracy of each method is increased using CRSA. is indicates that CRSA has ability to determine the relevant features better than traditional RS.
To further analysis the behaviour of HBO using RS and CRSA, its ability to reduce the number of features is computed, as given in Table 4. From these results, it can be seen that HBO based on traditional RS provides smallest number of features at 70% from the tested datasets (i.e., all datasets, except D2, D3, and D8), followed by GA which has the smallest number at three datasets. In terms of the average of selected features overall the tested sets, it can be noticed that the HBO and GA allocate the first and second ranks, respectively; however, the difference between them is not significant. Followed by LSHADE and TLBO, whereas, the worst algorithm is SSA which has the largest number of features.
In case of using CRSA, the number of selected features using HBO is still the smallest number nearly 16 features.
is is followed by bGWO and LSHADE that allocate the second and third ranks, respectively. By comparing the performance of each algorithm in terms of number of selected features using traditional RS and CRSA, it can be observed that RS provides number of features larger than those CRSA. For example, HBO, LSHADE, TLBO SSA, SGA, SaDE, and bGWO based on RS provide nearly 6.07, 4.18, 12.86, 3.71, −3.344, 1.22, and 18.14 features, respectively, greater than using CRSA.  5 show an example of the average of convergence curve the first four dataset for FS algorithms using RS and CRSA, respectively, as fitness function. From these figures, it can be concluded that BHBA has high ability to converge faster than other methods in both cases RS and CRSA. In addition, by comparing the behaviour of the algorithm(s) using the CRSA, it can be noticed that they have ability better than using RS to minimize the fitness value.
To make a decision if the difference between the developed HBO using CRSA and other methods is significant or not, a nonparametric Friedman test is applied. e statistical values obtained using FR test are given in Table 5. From these values, it can be observed that the developed BHBO has the largest mean rank in terms of accuracy using both the traditional RS and new CRSA. In addition, the smallest mean rank in terms of number of selected features is       achieved using BHBO. is indicates the combination between the BHBO and CRSA has been led to increase in the classification and decrease in the number of features.
It can be seen that the new HBOCRSA approach is more applicable and efficient than competing algorithms based on earlier evaluations utilizing UCI datasets. However, HBOCRSA still has significant drawbacks, such as a high computational complexity, especially when used to handle high-dimensional datasets. Using parallel processing and GPU hardware, this can be remedied. Furthermore, HBOCRSA's behaviour in balancing exploration and exploitation to find the viable region has the greatest impact on its performance. Combining it with additional operators can improve this.

Conclusion and Future Works
In this article, we focused on creating a novel model of rough set approximations (RSA), namely, the rough set approximation models depending on containment neighborhoods (CRSA), that generalize the classical notions of (RSA) and derive a number of distinguished results. To evaluate the appropriateness of this model, it is applied to enhance the classification of contrasting dataset by utilizing it an objective function for distinct feature selection approach. is has been carried out by applying the binary version of honey badger optimization algorithm (BHBO) as feature selection method. e effects of HBOCRSA are compared with different MH techniques, which involve GWO, LSHADE, SSA, GA, TLBO, and SaDE. A class of ten datasets is employed to assess the performance of the developed method. e experiential results clarified the high performance of the developed method as FS approach which owns accuracy better than other methods. Furthermore, the number of the selected features acquired utilizing the HBOCRSA is smaller than the other methods. Additionally, the accomplishment of the model of (CRSA) is better than classical (RS) approach that is based on the factors of performance measures. Based on the favourable consequences gained from the developed method, it can be applied in diverse scopes, for instance, cloud computing, image processing, and IoT applications. In addition, it can be reconstructed as multiobjective technique and used to several sets of reality multiobjective problems involving feature, selection, and engineering issues and others.

Data Availability
e data used to support the findings of this study are available from the authors upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.