MOFSRank: A Multiobjective Evolutionary Algorithm for Feature Selection in Learning to Rank

. Learning to rank has attractedincreasing interestin the past decade, due to its wide applications in the areas like documentretrieval and collaborative filtering. Feature selection for learning to rank is to select a small number of features from the original large set of features which can ensure a high ranking accuracy, since in many real ranking applications many features are redundant or even irrelevant. To this end, in this paper, a multiobjective evolutionary algorithm, termed MOFSRank, is proposed for feature selection in learning to rank which consists of three components. First, an instance selection strategy is suggested to choose the informative instances from the ranking training set, by which the redundant data is removed and the training efficiency is enhanced. Then on the selected instance subsets, a multiobjective feature selection algorithm with an adaptive mutation is developed, where good feature subsets are obtained by selecting the features with high ranking accuracyand low redundancy. Finally, an ensemble strategy is also designed in MOFSRank, whichutilizes theseobtained feature subsets to producea set of betterfeatures. Experimental results on benchmark data sets confirm the advantage of the proposed method in comparison with the state-of-the-arts.


Introduction
As a central issue of many applications, such as document retrieval [1], collaborative filtering [2], and expert finding [3], learning to rank has attracted much focus in machine learning area during the last decade.Rank learning, when applied to document retrieval, is a task as follows [1].In learning, a ranking model is constructed by using the training data that consists of queries, their corresponding retrieved documents, and relevance levels given by human annotators.In ranking, given a new query, the documents are sorted by using the trained ranking model.
Due to the wide usages, a great number of learning to rank algorithms have been proposed, which achieve the ranking models with high accuracies [4][5][6][7][8][9][10][11].However, in several real ranking applications, such as image retrieval [12,13] and biomarker finding [14], the number of features in training data is large, which brings great challenges to existing ranking methods, since many features in these applications are redundant or even irrelevant, which reduces the performance of ranking algorithms [15].To tackle the issue, recently, considerable efforts have been made on designing feature selection algorithms for learning to rank.For example, Geng et al. proposed the first filter based work, termed Greedy Search Algorithm (GAS) for feature selection in learning to rank [15].In GAS, the feature that maximized total importance scores and minimized total similarity scores was iteratively selected to obtain the final feature subset.Experimental results demonstrated the effectiveness of GAS, when compared with traditional ranking algorithms.Since then, many other filter based ranking algorithms have been developed [16][17][18][19].Another type of feature selection algorithms for learning to rank belongs to the wrapper approach, where a rank learning algorithm is included in the feature selection procedure to create a good feature subset.BRTree [20], RankWrapper [21], BFS-Wrapper [22], and GreedyRankRLS [23] are the representative works of this type.Recently, embedded methods have been proposed to solve feature selection for learning to rank, where feature selection is embedded in the ranker construction by introducing a sparse regularization term.For example, RSRank [24], FenchelRank [25], and FSMRank [26] adopted L1 regularization term, whereas in the work of [27], an embedded based feature selection algorithm by using a nonconvex regularization was suggested.
The existing feature selection algorithms for learning to rank have shown promising performance in achieving the features with small number and high ranking accuracy.However, all these algorithms solve the problem by only considering the traditional optimization techniques, such as greedy method and gradient descent method.Different from them, in this paper, we tackle the issue by using evolutionary computation as the optimization technique.To be specific, a Multi-Objective evolutionary algorithm for Feature Selection in learning to Rank, named MOFSRank is proposed.The main contributions of this paper can be summarized as follows: (1) A multiobjective feature selection method with an adaptive mutation is suggested, where the features with high ranking accuracy and low redundancy are selected as the feature subsets.Based on the suggested method, a multiobjective evolutionary algorithm, named MOFSRank, is proposed for feature selection in learning to rank.(2) In MOFSRank, an instance selection strategy is developed to choose the informative instances from the training data, by which the redundant data is removed and the learning process of feature selection is sped up.In addition, an ensemble strategy is also designed in MOFSRank, where the selected feature subsets are further utilized to produce a set of better features.(3) The effectiveness of the proposed MOFSRank is evaluated on the benchmark data sets, and the experimental results show that compared with the existing work the algorithm we proposed has superior performance in terms of both ranking accuracy and number of selected features.
The remainder of the paper is organized as follows.In Section 2, the preliminaries and related work are presented.Section 3 gives the details of the proposed algorithm and empirical results by comparing our algorithm with several state-of-the-arts on the benchmark data sets are reported in Section 4. Section 5 concludes the paper and discusses the future work.) and  is the number of ranks.There exists a total order between the ranks   ≻  −1 ≻ . . . 1 , where ≻ denotes the partial order.With the training data, learning to rank is to construct a ranking model , which for a given new query  can rank the documents associated with  such that more relevant documents are ranked higher than less relevant ones.

Preliminaries and Related Work
To obtain accurate ranking models, different learning to rank algorithms have been proposed, which can be divided into three categories: Pointwise approach, Pairwise approach, and Listwise approach [1].The Pointwise approach uses each single document as a learning instance, and defines the loss function on individual documents [4,28].The Pairwise approach regards a pair of documents as a learning instance and transforms the ranking problem into binary classification on document pairs [5][6][7].The Listwise approach solves the ranking problem in a straightforward fashion, which takes the entire ranked list of documents as a learning instance and defines a Listwise loss function for learning [8][9][10][11].Among these three approaches, the Pairwise one has attracted much focus, since in the real ranking applications, such as search engine and recommendation system, the training data of this category can be easily obtained from the users' click through [5].More algorithms for learning to rank can be found in [1].

Feature Selection Methods for
Learning to Rank.The different types of ranking algorithms have shown promising performance in achieving the models with high accuracy.However, in several real ranking applications, the number of training features is large, which brings great challenges to learning to rank algorithms.To tackle the issue, recently, researchers introduced feature selection to the ranking methods and a variety of feature selection algorithms for learning to rank have been suggested, which mainly fall into three categories: filter approach, wrapper approach, and embedded approach [27,29].
The filter approach is independent of the ranking method, and one representative work is GAS proposed by Geng, which is also the first feature selection algorithm for learning to rank [15].The basic idea of GAS is to select a subset of features with maximum total importance scores and minimum total similarity scores and use selected features to construct a ranking model.Experimental results on LETOR data sets have shown that GAS can achieve good ranking accuracy with a small number of features.Based on this work, several other filter based feature selection algorithms have been developed [16][17][18][19]30].
Different from filter approach, the wrapper approach includes a rank learning algorithm in the feature subset evaluation step, where the ranking algorithm is used as a black box by a wrapper to evaluate the goodness (i.e. the ranking accuracy) of the selected features.Example algorithms include BRTree, which uses boosted regression trees [20], RankWrapper with Ranking SVM [21], BFS-Wrapper utilizing search [22], GreedyRankRLS with Rank RLS algorithm [23], and LMIR using smoothing language model [31].
Recently, embedded approach (Note that some researchers categorize feature selection algorithms into two groups, where embedded approach is included in wrapper approach.)has been suggested to solve feature selection for learning to rank, where feature selection and rank learning are integrated into one single process.For example, Sun et al. [24] proposed an embedded feature selection ranking algorithm, termed RSRank, where L1 regularization term was introduced into  [25], where Fenchel duality was used to solve the sparse ranking optimization.Empirical evaluations indicated that FenchelRank was not only better than the classical ranking algorithms but also provided better performance than RSRank.Following this work, Lai et al. further developed a new embedded feature selection algorithm for learning to rank, termed FSMRank [26].The algorithm solved a joint convex optimization problem by simultaneously minimizing ranking error and conducting feature selection.Experiments on the LETOR collections demonstrated that FSMRank can obtain better results than the filter approach, such as GAS.Different from the algorithms above, which all used convex L1 regularization, Laporte et al. designed a feature selection algorithm for learning to rank with a nonconvex regularization, which resulted in both good ranking accuracy and a small number of selected features [27].Other embedded feature selection ranking algorithms can also be found in [32,33].
The algorithms mentioned above have shown the effectiveness of feature selection for learning to rank, and in this paper, we continue this research line by proposing a multiobjective evolutionary algorithm for ranking feature selection.Before giving the details of the proposed algorithm, it should be noted that recently, multiobjective evolutionary algorithms (MOEAs) have been successfully applied to solve different problems in machine learning areas, such as classification [34][35][36], clustering [37,38], and pattern mining [39].In the following, we will propose an MOEA for feature selection in learning to rank.

The Proposed Algorithm
The proposed algorithm (MOFSRank) is a feature selection algorithm.To be specific, it is a multiobjective feature selection algorithm for learning to rank, where Pairwise documents are used as the learning instances.To select feature subset from the training set with ( 2 ) size ( is the number of training data), we first choose some informative instance subsets from the Pairwise training set and then feature selection is performed on those selected instance subsets.Lastly, the outputs of feature selection are combined together to achieve a better feature subset.The main procedure of MOFSRank is shown in Figure 1, which consists of three phases: (1) instance selection phase, (2) feature selection phase, (3) and ensemble phase.In the first phase, a multiobjective evolutionary algorithm, termed MOIS, is suggested to select the informative instances from the original Pairwise training set, which has two advantages.First, it removes the possible noisy data in the original set and improves the quality of training set.Second, the instance selection reduces the number of training instances and makes the feature selection more efficient.In the second phase, the final nondominated solutions of MOIS are used for feature selection.To this end, an MOEA for feature selection (MOFS) is proposed, where ranking accuracy and number of the selected features are defined as two optimization objectives.In addition, an adaptive mutation probability is also designed in MOFS, by which the proposed method can choose the features with high ranking accuracy and low redundancy.In the last phase, a mixed coding based multiobjective ensemble algorithm, namely, MOEN, is developed, where the Pareto solutions in the second phase are utilized to produce a better feature subset as the final output.The framework of the proposed MOFSRank is demonstrated in Algorithm 1.

Instance Selection Phase.
As mentioned before, in this paper, we focus on Pairwise ranking, whose training set is of ( 2 ) size, where  is the number of training data.Thus, before feature selection, an instance selection operation is carried on the Pairwise training set.To be specific, an MOEA named MOIS is proposed for instance selection, where two optimization objectives are the number of selected instances and the value of 1 −  ( in general, the larger value of  (0 ≤  ≤ 1) means better ranking performance; however, since the multiobjective optimization problem is often described as a minimum problem, thus, in this paper, we use 1 −  as the second objective.), where  denotes the accuracy value measured by ranking metrics, such as NDCG or MAP.Thus, the corresponding multiobjective instance selection problem can be described as where  denotes the selected instance subset, () is the number of the instances in , and   is a ranker learned on  set.In this paper, we adopt linear SVM to create the ranker, which has been widely used in many feature selection For the MOP1, we use binary encoding scheme, which means that the -th individual (instance subset) can be represented as   = ( ,1 , . . .,  , ), where  , ∈ {0, 1},  ∈ {1, . . ., },  is the total number of the instances in the original training set.If  , = 1 denotes that the -th instance is selected in the -th individual, otherwise means not.With this encoding scheme, the proposed MOIS adopts a similar framework of NSGA-II [40], and Algorithm 2 presents the procedure of MOIS in detail.

Feature Selection Phase.
We take the non-dominated solutions of MOIS as the training data sets, and the feature selection is carried on them.To this end, a bi-objective evolutionary algorithm with an adaptive mutation for feature selection (MOFS) is suggested, where two conflicting objectives are the number of features and the value of 1 − .Thus, the biobjective optimization problem for feature selection is defined as where  denotes the selected feature subset and () is the number of features in  set.  is the ranker learned with features in  set.Since each  is evaluated on the Pareto instance subsets, we choose the largest value as the value of    .We also use the binary encoding scheme for the MOP2.Thus, the -th individual (feature subset) in population is represented as   = ( ,1 , . . .,  , ), where  , ∈ {0, 1},  ∈ {1, . . ., },  is the total number of features in  (original training data set). , = 1 denotes that the -th feature is included in the -th individual, otherwise means not.With the binary encoding strategy, we solve the MOP2 by adopting a similar framework as NSGA-II.To further improve the Input:  2 : maximum generations of multi-objective feature selection,  2 : population size of multi-objective feature selection,   2 : crossover probability of multi-objective feature selection,   2 : mutation probability of multi-objective feature selection, : a set of non-dominated instance subsets; Output: a set of non-dominated feature subsets , and their corresponding rankers set ; (1) Initializing the population  1 = { performance of MOFS, an adaptive mutation strategy is also suggested, whose basic idea is from the intuition that during the mutation, the important features should have greater probability of being selected, whereas the redundant features should have greater probability of being removed.Thus, the suggested adaptive mutation probability is defined as where function () denotes the value of -th bit in an individual . *  () is the adaptive mutation probability of -th bit in , and   is the basic mutation probability that used in NSGA-II. is a decaying factor and, in this paper, we set  =  − , where  is the number of current generation.Impo() and () represent the important degree and redundant degree of -th feature in , which are formally defined as where   () denotes the ranking accuracy value of the single -th feature on original training set. , is Pearson's correlation coefficient between the -th feature and the -th feature ( ̸ = ) in .By using the adaptive mutation strategy, we can select the features with high ranking accuracy and low redundancy.The whole procedure of MOFS is presented in Algorithm 3.

Ensemble Phase.
After the second phase, a set of nondominated solutions (feature subsets) are obtained.To produce a better final feature subset, a biobjective ensemble algorithm, named MOEN is proposed, where two optimization objectives are the number of selected features and the value of 1-RAccuracy with the selected features.The basic idea of MOEN is that a better feature subset can be achieved by weighted combining these nondominated solutions together.To this end, a mixed coding strategy is developed in MOEN, which consists of two parts.The first part uses the binary encoding, whose length  denotes the number of different features in the nondominated solutions of MOFS, the -th bit corresponds to the -th feature, and if this bit is 1, means this feature is selected, 0 indicates otherwise.The second part utilizes real encoding, and its length equals || × , where || is the number of Pareto solutions in the second phase.Figure 2 provides an example to illustrate the suggested mixed encoding scheme in detail.
In Figure 2, there is an individual .The first part of  has 4 bits, which means there are 4 different features in the non-dominated solutions of MOFS.The second part consists of 3 sub-part, which indicates that the number of feature subsets is 3. Let assume they are  1 ,  2 and  3 , thus the th sub-part denotes the ensemble weight for   ( ∈ {1, 2, 3} calculate its two objectives.The value of the first objective (the number of selected features) can be easily obtained from the part1 of ind.To get the value of second objective ( the value of 1-RAccuracy of selected features), first, we should achieve the ranker - corresponding to ind.To this end, we utilize the non-dominated solutions of the second phase and the weights in the part2 of ind.To be specific, let suppose - = ( 1 ,  2 , . . .,   ); thus each   ∈  ( = 1, . . ., ) is obtained by the following formula: where (1  ) is an indicator function which returns 1 if the -th bit in part1 is 1 and 0 otherwise.2 , ∈  denotes the value of -th bit in 2  , and 2  is the -th subpart of part2. , ∈  represents for the value of -th bit in the ranker   , where   is the ranker that corresponds to the output feature subset   in (Line (15) of Algorithm 3).

Experiments
In this section, we empirically verify the performance of the proposed MOFSRank by comparing it with several state-ofthe-arts ranking algorithms.To be specific, we first present the experimental setting (including the data sets, comparison algorithms, and evaluation measures) and then report the comparison results between the proposed algorithm and the Complexity 7 baselines (including the classical ranking algorithms and the representative feature selection algorithms for learning to rank).Lastly, we discuss the effectiveness of the suggested strategies in MOFSRank.

Experiment Setting
4.1.1.Data Sets.We conduct our experiments on the publicly available LETOR data collections [41], which are considered as the benchmark data sets in learning to rank.We select four data sets (NP2004, HP2004, TD2004, and OHSUMED) from LETOR 3.0 and one data set (MQ2008) from LETOR 4.0.Among them, OHSUMED is a three-level ranking set, while others are all bilevel data sets.The detail characteristics of those data sets are depicted in Table 1.
It should be noted that in LETOR collections, each data set is divided into five-folds and each fold contains a training/validation/test set, respectively.In the following experiments, we adopt the same splits as LETOR provides and report the results by averaging on the five folds.

Comparison Algorithms.
The comparison algorithms used in this paper can be divided into two categories.The first group is the classical ranking algorithms provided by the LETOR.In this paper, we select RankSVM-Primal [42], RankSVM-Struct [43], ListNet [8], and AdaRank-NDCG [11] as the comparison algorithms, among which the former two belong to Pairwise approach, while the latter two optimize Listwise loss functions.The second group of comparison algorithms are the recently suggested feature selection algorithms for learning to rank, which include FenchelRank [25], FSMRank [26], and a nonconvex regularization feature selection method for learning to rank, proposed by Laporte et al. [27].It is worth noting that, in the work [27], the authors presented three algorithms, and we choose the one, termed  0.5 , since it has the best mean performance on LETOR data sets.
For fair comparisons, we adopt the recommended parameters values for all comparison algorithms, which were suggested by the authors in their original papers.For the proposed MOFSRank, since it is composed of three sub-MOEAs (MOIS, MOFS, and MOEN), we need to set parameters for each sub-MOEA.The population sizes, cross probabilities and mutation probabilities of three sub-MOEAs are set to where  is the length of the individual in the sub-MOEAs.The maximum numbers of generation for MOIS, MOFS, and MOEN are set to  1 = 200,  2 = 300, and  3 = 500, respectively.For  used in the second objective of each sub-MOEA, we adopt NDCG@10, which is a popular criterion to measure the accuracy of a ranking algorithm and, in the next section, we will discuss this criterion in detail.

Evaluation Measures.
On the data sets above (NP 2004, HP2004, TD2004, MQ2008 and OHSUMED), we compare the proposed MOFSRank with several baselines, and the results of different algorithms are reported in terms of NDCG [44] and MAP [45], which are two most widely used metrics in learning to rank.NDCG (Normalized Discounted Cumulative Gain) is often used in the case with multilevel relevance judgments and, for a query, DCG score at position  is formally defined as where () is the relevance label of the -th document in the sorted list.Then Normalized DCG score at position  in the ranking list of documents can be calculated by the equation as follows: where  is the normalization constant so that the value of NDCG ranges from 0 to 1.In the rest of this paper, we use N@k as the abbreviation of NDCG@k.Another evaluation metric is MAP (Mean Average Precision), which deals with binary relevance judgments: relevant and irrelevant.First, we shall introduce the definition of precision at , which denotes the proportion of relevant documents at the top  positions: where   is an indicator function.If the document at position  is relevant,   = 1, otherwise   = 0. Then the average precision of a given query  is defined as the follows: where  and   represent the total number of documents and relevant documents associated with query , respectively.
Based on ( 11) and ( 12), MAP can be formally defined as where  is the set of all queries.

Comparison Results between MOFSRank and Classical
Ranking Algorithms.In the first part of experiments, we compare our method with several classical ranking algorithms, which are all the algorithms without using feature selection.Specifically, we evaluate MOFSRank with RankSVM-Primal, RankSVM-Struct, ListNet, and AdaRank-NDCG on five LETOR data sets.Table 2 presents the performances of different algorithms, averaged on five-folds.
From Table 2, we can find that on all data sets, the proposed algorithm performs significantly better than the existing classical ranking methods.The comparison results have shown that MOFSRank can achieve the best ranking accuracy on 53 of 55 statistical points, which demonstrates the superiority of MOFSRank on LETOR data set and indicates the effectiveness of feature selection for learning to rank.

Comparison Results between MOFSRank and Feature
Selection Algorithms for Learning to Rank.In the second part of experiments, we are interested in how our MOFSRank performs, when compared with other feature selection baselines for learning to rank.To this end, we report the comparison results between the proposed algorithm and FenchelRank, FSMRank, and  0.5 , which are all recently suggested ranking feature selection methods with good performances.Tables 3  and 4 depict the ranking accuracy and the number of the selected features with different algorithms on the LETOR data sets, averaged on five-folds.
It can be observed from Table 3 that the proposed MOFSRank achieves the highest ranking values on most statistical points, which is much better than the existing feature selection baselines for learning to rank.Here, we N@1 N@2 N@3 N@4 N@5 N@6 N@7 N@8 N@9 N@10 MAP NP2004 FenchelRank 0.5600 present a few statistics on different data sets in terms of N@10.On NP2004, HP2004 and MQ2008 data sets, MOFSRank obtains the NDCG values of 0.8543, 0.8622 and 0.2406.Compared to the second best algorithms (FSMRank), its performances increase 3.2%, 2.9% and 3.7%, respectively.On TD2004 data set, the value of N@10 for MOFSRank is 0.3560, which shows 11.1% improvement than the second best algorithms ( 0.5 ).Similarly, the increase of MOFSRank on OHSUMED set is 0.1%, in comparison with the second best algorithm, FenchelRank.Table 4 presents the number of selected features of different algorithms on LETOR data sets, averaged on five folds.From the table, we can find that, on NP2004, HP2004, TD2004, and OHSUMED data sets, the features selected by MOFSRank are much fewer than those of other baselines.On MQ2008 data set, the proposed algorithm achieves the second best performance, whose number of selected features is slightly larger than the nonconvex feature selection algorithm  0.5 .The statistics in Tables 3 and 4 have demonstrated the competitiveness of MOFSRank, when compared with other feature selection algorithms for learning to rank.
To further investigate the performance of different feature selection algorithms on the LETOR data sets, in the following, we detailed report the value of N@10 (y-axis) with respect to different number of selected features (x-axis), and the results are plotted in Figure 3.Note that since three feature selection baselines cannot directly select a given number of features, we adopt the strategy used in [26], which can choose top  ( ≥ 1) best features from the whole features.From the figures, we can find that although the NDCG accuracy of different algorithms varies with the number of selected features, our MOFSRank can always achieve the best trade-off between the   accuracy and the number of selected features, which indicates the superior performance of the proposed method.

Effectiveness of the Suggested Strategies in MOFSRank.
As mentioned before, in the proposed MOFSRank, three strategies (instance selection, adaptive mutation, and Pareto based ensemble) are suggested and, in the following, we will empirically investigate the influence of these strategies on the performance of MOFSRank for LETOR data sets, respectively.5, that on all the LETOR data sets the suggested instance selection strategy does reduces the training instances greatly, especially on the data sets with hundreds of thousands of training instances (TD2004 and OHSUMED), and the ratios of the selected instances are only 0.04 and 0.03.Secondly, we take N@10 as the ranking measure, and plot the final non-dominated solutions obtained by MOFSRank and MOFSRank-NonIS in objective space in Figure 4.Note that due to space limitation, in the following experiments, we only list the results on one LETOR 3.0 data set (NP2004) and one LETOR 4.0 data set (MQ2008), and the results on other LETOR data sets are similar.As can be seen from Figure 4, on both data sets, the MOFSRank can obtain better nondominated solutions than the MOFSRank-NonIS, which demonstrates the effectiveness of the suggested instance selection strategy in MOFSRank.

Effectiveness of the Adaptive Mutation Strategy.
In the second phase of MOFSRank, an adaptive mutation strategy is developed, which can enhance the performance of MOFSRank.To confirm the fact, we compare the proposed method with MOFSRank-NonAM, where the adaptive mutation strategy is removed from the original MOFSRank.The final nondominated solutions obtained by MOFSRank and MOFSRank-NonAM in objective space for LETOR data sets are plotted in Figure 5, from which, we can find that compared with MOFSRank-NonAM, the MOFSRank achieves better nondominated solutions on the experimental sets, which indicates the effectiveness of the adaptive mutation strategy.

Conclusion
In this paper, we have proposed a multiobjective evolutionary algorithm, termed MOFSRank, for feature selection in ranking.In MOFSRank, an MOEA for instance selection (MOIS) has been suggested, where the informative instances were chosen from the original training set and made the Complexity 13 following feature selection more effective and efficient.Then a multiobjective feature selection (MOFS) algorithm with an adaptive mutation has been performed on these chosen instances subsets, which can obtain the features with high ranking accuracy and low redundancy.Finally, a multiobjective ensemble (MOEN) algorithm has been developed to integrate the Pareto solutions of MOFS, by which the performance of MOFSRank can be further improved.Experimental results on LETOR data sets have demonstrated the competitiveness of the proposed algorithm.There still remains some interesting work related to MOFSRank that deserves to be further investigated.The proposed MOFSRank has shown that MOEA is a promising method to solve feature selection for learning to rank and, in this paper, we mainly focus on the Pairwise ranking approach.In the future, we plan to further design feature selection algorithm for other type of learning to rank approach, such as Listwise approach.In addition, in our MOFSRank, we adopt NSGA-II as the framework, it is also interesting to combine the proposed method with other frameworks of MOEA, such as MOEA/D [46], SPEA2 [47], and AR-MOEA [48].

Figure 3 :
Figure 3: The NDCG accuracy of four feature selection algorithms on LETOR sets with different feature numbers.

Figure 4 :
Figure 4: The final nondominated Solutions Obtained by MOFSRank and MOFSRank-NonIS on LETOR Data Sets in Objective Space.

Figure 5 :Figure 6 :
Figure 5: The final nondominated solutions obtained by MOFSRank and MOFSRank-NonAM on LETOR data sets in objective space.

4. 3 . 3 .
Effectiveness of Pareto Based Ensemble Strategy.In the third phase of MOFSRank, to obtain a better feature subset, a Pareto based ensemble strategy is suggested, where the Pareto solutions of the second phase are combined together.In order to verify the effectiveness of this ensemble strategy, we compare the MOFSRank with MOFSRank-NonPE.The only difference between them lies in the fact that MOFSRank-NonPE does not include the Pareto based ensemble operation.The experimental results of two algorithms on LETOR data sets are plotted in Figure6, from which we can clearly find that with the suggested ensemble strategy, the proposed algorithm achieves better nondominated solutions than MOFSRank-NonPE.This fact demonstrates the effectiveness of the suggested Pareto based ensemble strategy.

Table 1 :
Characteristics of the LETOR Data Sets.

Table 3 :
The Comparison Results between MOFSRank and Feature Selection Baselines for Ranking on LETOR Data Sets, Averaged on Five Folds.

Table 4 :
The number of selected features between MOFSRank and feature selection baselines for ranking on LETOR data sets, averaged on five-folds.

Table 5 :
The training instances of MOFSRank and MOFSRank-NonIS on LETOR data sets.
Instance Selection Strategy.In the first phase of MOFSRank, an instance selection strategy is suggested, which can reduce the number of training instances, and improve the performance of MOFSRank.To verify this fact, we compare the proposed algorithm with MOFSRank-NonIS, which is the same one as our MOFS-Rank, except that it excludes the instance selection strategy, and uses the original Pairwise instances in the training set.The comparison results on LETOR data sets are shown from two aspects.Firstly, we present the real training instances of two algorithms in Table 5, where #Ins of MOFSRank and #Ins of MOFSRank-NonIS denote the numbers of real training instances of MOFSRank and MOFSRank-NonIS.It can be easily observed from Table