Effective Evolutionary Multilabel Feature Selection under a Budget Constraint

,


Introduction
Multilabel classification has emerged as a promising technique for various applications, including lifelong structure monitoring [1], functional proteomics [2], and sentiment analysis [3].These applications produce a series of labels for describing complicated concepts, which are compounded when high-level concepts are composed of multiple subconcepts, such as the environmental and operational conditions of structures [1,4,5].Let  ⊂ R  denote a set of patterns constructed from a set of features .Then, each pattern   ∈ , where 1 ≤  ≤ ||, is assigned to a certain label subset   ⊆ , where  = { 1 , . . .,  || } and is a finite set of labels.Therefore, the task of multilabel classification is to identify a function that maps given instances into one of 2 || label subsets based on input feature values.
In practice, there can be a maximum number of features allowed because of the limits on data acquisition rates or energy consumption [6][7][8].In reality, for example, this problem can emerge from the music applications on lightweight mobile devices.Applications for mobile devices typically have a limitation in computational capacity and there is a maximum number of allowed features to be extracted [9,10].This is because an overly excessive number of extracted features on mobile devices causes consumers to suffer low quality user experience due to unacceptable waiting or battery consumption.
Given input data with an original feature set  and label set , the goal of our multilabel feature selection problem is to identify a feature subset  ⊂  with the maximum number of features  that yields the best multilabel classification accuracy [11,12].This problem is known as budgeted feature selection [13] or feature selection with test cost constraints [8,14,15].However, most studies have been conducted from the perspective of traditional single-label learning.It should be noted, especially when a given constraint  is small, that our multilabel feature selection problem becomes more challenging in terms of classification accuracy due to the fact that a small number of features must support multiple labels simultaneously [16][17][18][19].
Multilabel feature selection methods can be categorized according to how they assess the importance of candidate feature subsets [16,[20][21][22].Filter-based multilabel feature selection methods identify a final feature subset by focusing on the intrinsic discriminative power of features [21,[23][24][25].Some multilabel learning algorithms have a feature 2 Complexity selection process embedded in their learning process [26,27].In contrast, wrapper-based multilabel feature selection methods assess the importance of feature subsets through a search process by using a multilabel classifier directly.This typically results in better classification accuracy [11,12].For this reason, we focus on a multilabel feature wrapper based on an evolutionary search process [28].
During the search process, each chromosome represents a feature subset and selects a number of features less than or equal to .As a result, most features remain unselected by any chromosome in the population.This can lead to an ineffective search because important features can be continuously neglected.Without negatively affecting the strength of the evolutionary search, this problem can be solved by adding additional chromosomes that convey promising unselected features to the population.In this study, we propose an effective multilabel feature wrapper while considering the constraint of feature subset size.Experimental results demonstrate that the proposed method is able to identify an effective feature subset for multilabel classification with the aid of an enhanced evolutionary search process.

Related Work
In traditional single-label feature selection, the budgeted feature selection problem is treated as a special case of the feature selection problem where the algorithm should consider the effectiveness of the feature subset and the acquisition cost for gathering each feature simultaneously.To solve this problem, Zhang et al. [29] proposed a feature selection algorithm based on the bare bones particle swarm optimization, which considers the complexity of an algorithm due to additional parameters.Because the acquisition cost for each feature can be unequal, multiobjective particle swarm optimization approach for cost-based feature selection and return-costbased binary firefly algorithm for feature selection are also studied [30,31] which have another objective function of minimizing the cost sum of features.
In multilabel feature selection studies, one of the major trends is the application of a feature selection method for single-label problems by transforming multilabel datasets into single-label datasets [32,33].Although this strategy facilitates the use of conventional methods, which has advantages in terms of ease of use [34], algorithm adaptation strategies that directly manage multilabel problems have also been considered [35].In these approaches, which are largely filterbased, a feature subset is obtained by optimizing a specific criterion, such as a joint learning criterion that involves simultaneous feature selection and multilabel learning [27,36],  2,1 -norm function optimization [37], label ranking error [26], Hilbert-Schmidt independence criterion [23], statistics [21], or mutual information [16,24,38].However, these methods commonly suffer from low multilabel classification accuracy because of a lack of interaction with multilabel classifiers.
As a notable multilabel feature wrapper study, Zhang et al. [12] proposed a multilabel feature selection method based on a genetic algorithm (GA), which is the most common choice in evolutionary feature wrapper studies [28].Specifically, their method combined instance-and label-based evaluation metrics [39] as a fitness function to determine label dependency.However, in the original proposal, a maximum number of features to be selected were not considered during the genetic search process.The multilabel classification performance when considering the number of features to be selected was later demonstrated for comparison purposes [11].During initialization, this method creates chromosomes by selecting a number of features less than .During the genetic search process, this constraint is continuously satisfied by employing restrictive crossover and mutation operators [40] that immediately discard features randomly if the number of selected features exceeds .Although this method satisfies the constraint, important features may be discarded, resulting in an ineffective feature subset.
Recent multilabel feature wrapper methods have treated the number of features to be selected as a secondary objective to be achieved by the evolutionary search process (i.e., multiobjective optimization [28]).This is achieved through a specifically designed ranking method for multiobjective optimization problems, known as nondominated sort [41], where the rank of each chromosome is based on the number of times it dominates other chromosomes in terms of two fitness values: multilabel classification accuracy and the number of selected features.Because the ranking of the chromosomes can be determined, it can be directly used in the natural selection process of a GA.Although the most common approach using a nondominated sorting method is NSGA-II [42], nondominated sorting has also been employed in other evolutionary search methods, including particle swarm optimization (PSO) [43].A common drawback in these methods is that no solution may satisfy the feature number constraint if such a solution is not included in the final Pareto front.Additionally, they may suffer from unnecessary searches of infeasible solutions conveying unacceptable number of features.
Our review indicates that conventional multilabel feature wrappers can fail to identify a final solution that satisfies a given constraint.To remedy this limitation, in addition to the evolutionary process, it is necessary to devise a new process, namely, exploration operation, to find important features in a large set of novel features with the aid of an effective filter and supply them to the population to enhance the evolutionary search process.We summarize subsequent issues and corresponding reasons to our approach as follows.
(i) The exploration operation must be able to identify promising features in a large unselected feature set size of (|| − ) = (||).To achieve this, we employ a criterion that measures the relevance score of features.(ii) The exploration operation must be computationally efficient to circumvent performance degradation of the entire search process.To achieve this, we employ a multilabel feature filter that is confirmed to be efficient because it only requires the dependency between two variables [16].(iii) Our exploration operation is designed to incur no additional parameter that may cause complicated create () using genetic operators (7) create () using exploration operator based on () (8) () ← {() ∪ ()} ⊳ offspring set () (9) evaluate () using a multi-label classifier (10) add () to () (11)  ←  + 1 (12) select () from ( − parameter control issues and increase the overall complexity of the algorithm [11,44].Based on the number of features given by the evolutionary search, it automatically identifies an effective feature subset that is composed only of novel features.

Motivation and Approach.
In this study, we enhance the performance of a population-based search, such as a GA, for multilabel feature selection with a budget constraint by introducing novel chromosomes that inject promising unselected features into the population.Figure 1 reveals several key issues that should be considered when introducing novel features into the evolutionary search-based multilabel feature selection process with a budget constraint.In the original feature set , there may be a subset of important features that are strongly dependent on multiple labels, leading to excellent discriminative power in the multilabel classifier if they are included in the final feature subset.After a random initialization process is completed, important features, such as  1 , may be unselected by any chromosome (feature subset) because each chromosome only covers a small number of features under the budget constraint .It should be noted that ⌈||/⌉ chromosomes should be evaluated to consider all the features at least once, even though all chromosomes are forced to select disjoint feature subsets, which incurs an expensive computational cost.Instead, the proposed method identifies promising features with the help of the employed filter without explicit evaluation of candidate feature subsets.
Next, genetic operators, such as crossovers and mutations, are applied to the population to create new chromosomes.However, unselected important features may not be considered because new chromosomes are created by exchanging the alleles of their ancestors.This means that if ancestors commonly unselect a feature, then their offspring will also unselect that feature.The only chance to add neglected features into the offspring creation process is through the use of a mutation operation.However, this is computationally inefficient because the mutation operation is done by selecting features randomly and, additionally, the mutation rate is set to a small value in order to achieve the convergence.Thus, a large number of iterations or generations should be spent to introduce important features into the population randomly.
In the proposed method, the exploration operator is applied to each of the new offspring to create novel chromosomes that contain promising features that were not considered by the original offspring.During each exploration operation, we calculate the dependency of unselected features on multiple labels ( 1 ,  2 , . . .,  8 ).After the ranking of each feature is computed (e.g.,  1 →  44 →  32 →  3 → ⋅ ⋅ ⋅ ), a new chromosome that selects the most promising features is created.Finally, exploration and genetic operation-based chromosomes are then merged into a single population.
This paper presents an effective evolutionary search method that remedies the aforementioned issues.In Section 3.2, we discuss the procedural steps of the proposed method and how to handle the issues associated with the exploration operation and the creation of new chromosomes.Section 3.3 presents a mutual-information-based search method for efficiently capturing the relationships between features and labels.

Algorithm
. Algorithm 1 outlines the pseudocode for the procedures used in the proposed method.The terms used for describing the algorithm are summarized in "Terms Used in This Study and Meanings" section.The feature selection vector in a chromosome is a binary string where each bit represents an individual feature, with values of one and zero representing selected and unselected features, respectively.In the initialization step (line (3)), the algorithm generates  chromosomes via random assignment of maximum  binary bits.The selected feature subset (  ) encoded in  ∈ () is then evaluated using a fitness function.We use multilabel classification error as the fitness function for the selected Unimportant feature (Weakly dependent to labels)  feature subset.Because  chromosomes must be evaluated in order to obtain their fitness values,  fitness function calls (FFCs) are used in line (4).
After performing the initialization process, the proposed method performs a reproduction process that can be divided into two parts: reproduction via genetic operators and reproduction via the exploration operator.First, the proposed method creates an offspring set () (line (6)) using restrictive crossover and mutation operators to control the number of selected features [40].Next, the exploration for each  ∈ () do (4)  ← {0} ⊳ initialize novel feature subset  (5) for  = 1 to |  | do ⊳ feature subset selected by ,   (6) find the best feature  + = arg max  + ∈{\{  ∪}} ( + , ) (7) add  + to  (8) end for (9) add  to () as a chromosome (10) end for (11) end procedure Algorithm 2: Procedures of exploration operator.
operator identifies unselected promising features from the perspective of each chromosome in () and encodes them into a new chromosome in () (line (7)).For balance between the genetic and exploration operations, we set the size of () to the same value as that of () because () must be evaluated in order to determine its fitness.These two sets of chromosomes are then combined to form the offspring set () of the th population (line (8)).To evaluate the fitness of the offspring set, the proposed method uses a certain number of FFCs (line (9)).Specifically, the proposed method uses 2 ⋅ |()| FFCs in one generation.Next, () is added to () and  chromosomes with higher fitness values are selected (line (11)).This procedure is repeated until the algorithm uses all of its allowed FFCs.This limit is denoted V and is chosen by the user.The output of Algorithm 1 is the best feature subset obtained during evolution.

Exploration Operator.
Because a feature subset selects a small number of features within  and most features will remain unselected, the exploration operator is needed in order to explore a large set of unselected features.Algorithm 2 outlines the pseudocode for the proposed exploration operator.For each offspring generated by the genetic operators, we iteratively select relevant features that maximize the objective function and that were not selected by the offspring  until the subset size becomes |  |, where |  | is the subset size of .Thus, proposed exploration operation does not incur additional parameter for determining the number of features to be selected.
To implement our exploration operation, we employ an effective filter method called the scalable criterion for large label sets (SCLS) [16] as an objective function ( + , ), where  is the label set.The selection of the th feature from the set {\{  ∪ }}, where  is a feature subset with  − 1 features when selecting th feature, is performed by identifying   that maximizes the value of the following relevance evaluation [17]: where (  ) and (  ) denote the dependency of   on  and the dependency of   on the selected features of , respectively.From [17], (1) can be reformulated as follows: max where (; ) = () − (, ) + () is the mutual information between variables  and  and () = − ∑ () log () is the joint entropy of the probability functions (), (), and (, ).Following from (2), ( 2 ) can be calculated as follows: As (2), ( 2 ) can be calculated as In order to calculate ( 2 ) while considering adaptability against the scaling of ( 2 ) and avoiding repetitive calculations by  ∈  and  ∈ , let Red( 2 ) be represented as follows: where 0 ≤  ≤ 1, which must be estimated, determines the reduction with relevance to  2 based on ( 2 ), while circumventing the repetitive calculations for reduction against each label.According to [16],  can be approximated as follows: As a result, the relevance evaluation for  2 is performed as follows: Equation ( 7) represents how the relevance evaluation can be performed when  = 2.By considering the previously .
Equation ( 8) is the objective function for selecting relevant features from the unselected feature subset used by our exploration operation.

Experimental Settings.
We experimented on 20 different datasets from various domains.The Birds dataset is audio data containing examples of multiple bird calls.The Emotions dataset is music data classified into six emotional clusters.The Enron, Language Log (LLog), and Slashdot datasets were generated from text mining applications, where each feature corresponds to the occurrence of a word and each label represents the relevancy of each text pattern to a specific subject.The Genbase and Yeast datasets come from the biological domain and include information about the functions of genes and proteins.The Mediamill dataset is video data from an automatic detection system.The Medical dataset was sampled from a large corpus of suicide letters obtained from the natural language processing of clinical free text.The Scene dataset is related to the semantic indexing of still scenes, where each scene may contain multiple objects.The TMC2007 dataset contains safety reports of complex space system.The remaining nine datasets come from the Yahoo dataset collection.We performed unsupervised dimensionality reduction on text datasets, including the TMC2007 and Yahoo collections, which were composed of more than 10,000 features.Specifically, the top 2% and 5% of features with the highest document frequency were retained for TMC2007 and the Yahoo datasets, respectively [45].In the text mining domain, existing studies report that classification performance will not suffer significantly from the retention of 1% of features based on document frequency [46].
Table 1 contains the standard statistics for the multilabel datasets employed in our experiments, including the number of patterns in the dataset ||, number of features ||, type of features, and number of labels ||.When the feature type is numeric, we discretize the features by using the supervised discretization method [47] for multilabel naïve Bayes classifier (MLNB) [12].Specifically, each observed numeric value is assigned to one of several bins that are automatically determined by using the discretization method.The label cardinality Card represents the average number of labels for each instance.The label density Den is the label cardinality over the total number of labels.The number of distinct label sets Distinct indicates the number of unique label subsets in .Domain represents the application that each dataset was extracted from.
We measured the mean size of the selected feature subsets for both the proposed method and the conventional multilabel feature selection methods (GA with restrictive genetic operators [40] (RGA), NSGA-II [43], and MPSOFS [43]) to determine which methods achieved to select less than 10 features.Specifically, we provide detailed parameter setting to support good reproducibility as follows: (i) RGA creates  = 20 initial solutions by selecting less than  = 10 features randomly in accordance with each chromosome.Each solution in the initial population (), where  0, is evaluated using an employed multilabel classifier.Next, the RGA creates an offspring set () by using genetic operators.To apply the crossover operator, two solutions in () are randomly selected and mated; thereafter, one solution in () is randomly selected and mutated.In this study, we employed restrictive crossover and restrictive mutation operators with both crossover rate and mutation rate set to 1.0.Therefore, for each iteration, the GA creates three new solutions to compose ().
Each newly created solution is evaluated using the multilabel classifier.To create ( + 1), () is added to (), and 20 solutions with higher fitness values are selected.This procedure is repeated until the RGA spends 100 FFCs.(ii) NSGA-II creates  = 20 initial solutions randomly, the same number RGA creates.The maximum number of allowed feature is set to || because the NSGA-II naturally minimizes the number of selected features.Each solution in () is evaluated using an employed multilabel classifier and the number of features.The NSGA-II then creates () where |()| = 3 which is the same setting of RGA.To create ( + 1), () is added to (), and the superiority of each solution is determined by the nondominated sort method.After the superiority among solutions in {() ∪ ()} is determined, the top 20 solutions are selected to form ( + 1).This procedure is repeated until the NSGA-II spends 100 FFCs.(iii) MPSOFS creates 20 initial solutions randomly, the same number RGA creates.Each solution in () is evaluated using an employed multilabel classifier and the number of features and ranked using the nondominated sort method.The MPSOFS then preserves the best solution of () called the global best solution.In addition, the best solution which each chromosome experienced is also preserved; this is called the individual best solution, and therefore there are 20 individual best solutions.Thereafter, the MPSOFS updates the representation of each chromosome based on the global best solution and its own individual best solution using a velocity with inertia weight of 0.7298 and two acceleration coefficients of 1.4962 suggested from the study of [48].After all chromosomes in () are modified, they are evaluated and regarded as ( + 1).This procedure is repeated until the MPSOFS spends 100 FFCs.
Although different parameter setting may result in better performance, we fixed the size of the population  to 20 and the number of spent FFCs V to 100 for all the methods to ensure a fair comparison.To evaluate the quality of the feature subsets obtained by each method, we used MLNB classifier because it outputs a predicted label subset based on the intrinsic characteristics of a given dataset without requiring any complicated parameter-tuning process that might influence the final multilabel classification performance [39].For the sake of fairness, we used the hold-out cross-validation method for each experiment [11,49].80% of the samples in a given dataset were randomly chosen as the training set for multilabel feature selection and classifier training, while the remaining 20% of the samples were used as the test set to obtain the multilabel classification performance.For both the RGA and the proposed method, we set the population size to 20 and the maximum number of allowed FFCs to 100.Each experiment was repeated 10 times and the average value was used to represent the classification performance of each feature selection method.
We employed four evaluation metrics: Hamming loss, multilabel accuracy, ranking loss, and normalized coverage.Let  = {(  ,   ) | 1 ≤  ≤ ||} be a given test set where   ⊆  is a correct label subset.For a given test sample   , a classifier, such as MLNB, should output a set of confidence values 0 ≤  , ≤ 1 for each label  ∈ .If a confidence value  , is larger than a predefined threshold value, such as 0.5, the corresponding label  will be included in the predicted label subset   .Based on the ground truth   , confidence values  , , and predicted label subset   , multilabel classification performance can be measured with each evaluation metric [33,45,50].
Multilabel accuracy is defined as follows: Hamming loss is defined as follows: where △ denotes the symmetric difference between two sets.Ranking loss is defined as follows: where   is a complementary set of   .Therefore, ranking loss measures the average fraction of (, ) pairs with  , ≤  , over all possible relevant and irrelevant label pairs.Finally, normalized coverage is defined as follows: where rank(⋅) returns the rank of the corresponding relevant label  ∈   according to  , in nonincreasing order.Therefore, normalized coverage measures how many labels must be marked as positive for all relevant labels to be positive.Higher values of multilabel accuracy and lower values of Hamming loss, ranking loss, and normalized coverage indicate good classification performance.
Additionally, because we are interested in the superiority of the proposed method over conventional multilabel feature selection methods, we perform the Wilcoxon signed-rank test [51] to validate the performance of the proposed method.Let   be the difference between the performance of the two methods for the th dataset.The differences are ranked 2 contains the results for the mean size and standard deviation of the selected feature subsets of the proposed method and conventional multilabel feature selection methods when the evaluation metric is multilabel accuracy.The N symbol indicates methods that failed to satisfy given constraint for the corresponding dataset.The proposed method and RGA both selected less than 10 features for all datasets.The NSGA-II and MPSOFS methods failed to select less than 10 features for all datasets other than the Mediamill dataset for NSGA-II, despite having objective functions to minimize feature subset sizes.Because the NSGA-II and MPSOFS failed to select less than 10 features for most datasets, we compared the performance of the proposed method with the performance of the RGA from subsequent experiments.It should be noted that  can be set to a larger value than 10, such as 30 or 50.The experimental results in Table 2 show that the NSGA-II or MPSOFS will fail to satisfy the given constraints because they output the final feature subset, which is composed of tens or hundreds of features for most experiments.

Comparison Results. Table
Tables 3 and 4 contain the experimental results for the proposed method and RGA on 20 multilabel datasets, presented as the average performances for hold-out crossvalidation with corresponding standard deviations.Table 3 contains the performance results for multilabel accuracy and Hamming loss, and Table 4 contains the performance results for ranking loss and normalized coverage.The best performance between the two methods is indicated by bold font and a ✓ symbol.Finally, Table 5 contains the results of the Wilcoxon signed-rank test for the proposed method against RGA for Genbase dataset with a significance threshold of  = 0.05.For each evaluation metric, the winner of each comparison is indicated with bold font and the corresponding sum of the outperformed rank  + over the total rank and  values are presented in the parenthesis.We observed a similar tendency from the same experiments on the other multilabel datasets.
As shown in Tables 3 and 4, the proposed method outperformed RGA for most multilabel datasets.Specifically, the proposed method achieved the best performance for 90% of the datasets in terms of multilabel accuracy, 95% of the datasets in terms of Hamming loss, 95% of the datasets in terms of ranking loss, and 100% of the datasets in terms of normalized coverage.Thus, the proposed method significantly outperforms RGA for all evaluation metrics.This is evident from the experimental results shown in Table 5, which clearly demonstrate that the proposed method is statistically superior to RGA.

Analysis.
Figure 2 shows the convergence behaviors of the GA and proposed method according to the number of spent FFCs () in terms of the multilabel accuracy; the horizontal axis represents , and the vertical axis indicates the multilabel accuracy performance.Because the convergence behaviors may differ according to each experiment owing to the stochastic nature of the population-based search methods, we set the same initialized population in both algorithms and averaged the multilabel accuracy performance of the top elitist in the population after conducting the experiment 10 times.Figure 2 shows that the multilabel accuracy performance monotonically improves with .Because the initialization steps consume 20 FFCs and the two methods have the same initialized population that is randomly created, both methods gradually improve the multilabel accuracy initially.However, the experimental results indicate that the multilabel accuracy value of the proposed method is dramatically improved when  ≥ 20 because the exploration operator is applied to the population after the initialization.Thus, Figure 2 indicates that the proposed method can efficiently locate a good feature subset from unselected features.The goal of our exploration operation introduces novel promising features that would effectively improve the multilabel classification performance.To validate the effectiveness of our exploration operation, we conduct an additional experiment by comparing the fitness values of the offspring set created by the proposed exploration operation and the random operation, respectively.Specifically, 50 chromosomes, namely, , that select 10 or lesser number of features as the same initialization procedure of RGA were used and 50 new chromosomes are then created by applying the proposed exploration operation to each chromosome in  to form the first offspring set.Thereafter, for the sake of comparison, novel features with regard to each chromosome in  are selected randomly and introduced to create the second offspring set.Finally, the fitness values of the first and second offspring sets in terms of the four performance measures are measured.Figure 3 shows the box plots of fitness values given by the two offspring sets of the Genbase dataset.The experimental results indicates that the fitness values of the first offspring set (Proposed) is much better than that of the second offspring set (Random) from the viewpoint of all measures, indicating that the proposed exploration operation has a much better search capability than the random search.

Conclusion
We proposed an effective evolutionary search-based feature selection method with a budget constraint for multilabel classification.As a feature subset selects a small number of features within the maximum allowed number of features and most features are unselected in the budget constraint problem, we employ a novel exploration operation to find relevant features in the large unselected feature subset.Our experiments on 20 real-world datasets demonstrated that proposed exploration operator successfully enhances the search capability of genetic search, resulting in an improvement in multilabel classification.The results also showed that the proposed method can search a feature subset successfully, which does not violate the budget constraint.Statistical tests showed that our method outperformed conventional methods in four performance measures.Although the proposed exploration operation improves the effectiveness of evolutionary search without incurring additional parameters, it cannot be applied directly to certain types of evolutionary search algorithms, such as particle swarm optimization, which do not depend on offspring sets.Thus, an additional consideration should be made to design a new exploration operation for such cases.
A future research direction will be a study on an evolutionary algorithm.The proposed method is a genetic algorithm based feature selection; however, it can be applied to other evolutionary algorithms such as the Estimation of Distribution Algorithm.We would like to study this issue further.

Figure 1 :
Figure 1: The cooperation process between genetic and exploration operation.

Constants𝑡:
Number of generations : The size of the population, |()| =  : Maximum number of allowed features selected by   : A chromosome in ()   : A selected feature subset represented by  V: Maximum number of allowed fitness function calls (FFCs) : Number of spent FFCs,  =  + 2 ⋅ |()| ⋅ .

Figure 2 :Figure 3 :
Figure 2: Comparison results of the convergence between RGA and the proposed method in terms of multilabel accuracy (a higher value indicates a good classification performance).

Table 1 :
Standard characteristics of employed datasets.

Table 3 :
Comparison results multilabel feature selection methods in terms of multilabel accuracy and Hamming loss (mean ± std.deviation).The ✓ symbol indicates the method that achieves the best performance for each dataset.

Table 4 :
Comparison results for multilabel feature selection methods in terms of ranking loss and normalized coverage (mean ± std.deviation).The ✓ symbol indicates the method that achieves the best performance for each dataset.

Table 5 :
Wilcoxon signed-rank test results for the proposed method against RGA for Genbase dataset with a significance threshold of  = 0.05, sum of outperformed rank  + over the total rank and  values.