Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach

Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated for improving its performance. Every individual expansion term selection method has its own weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, first the possibility of improving the overall performance using individual query expansion terms selection methods has been explored. Second, Borda count rank aggregation approach is used for combining multiple query expansion terms selection methods. Third, the semantic similarity approach is used to select semantically similar terms with the query after applying Borda count ranks combining approach. Our experimental results demonstrated that our proposed approaches achieved a significant improvement over individual terms selection method and related state-of-the-art methods.


Introduction
Retrieving relevant documents that can fulfill user need is one of the major challenges in the information retrieval (IR) system. One of the most feasible and successful techniques to handle this problem is PRF based query expansion (QE), where some top documents retrieved in the first iteration are used to expand the original user query. To consider the above problem, there is a need of automatic PRF based QE techniques that can automatically reformulate the original user query. In some last years, it has been observed that the volume of data available online has dramatically increased while the number of query terms searched remained very less. According to the authors [1], the average query length was 2.30 words, the same about query length has been reported ten years after by Rijsbergen [2]. While there has been a slight increase in the number of long queries (of five or more words), the most prevalent queries are still those of one, two, and three words. In this situation, the need and the scope for automatic query expansion (AQE) have increased, but it has some problems.
The main problem of AQE is that it cannot work efficiently due to the inherent sparseness of the user query terms in the high dimensional corpus. Another problem is that not all the terms of top retrieved documents (feedback documents) are important for the QE. Some of the QE terms may be redundant or irrelevant. Some may even misguide the result, especially when there are more irrelevant QE terms than relevant ones. QE selection aims to remove redundant and irrelevant terms from the term pool (top retrieved documents as feedback documents for selecting user QE terms), and the selected QE terms set should contain sufficient and reliable information about the original document. Thus, QE terms selection should not only reduce the high dimensionality of the feedback document corpus (term pool), but also provide a better understanding of the documents, in order to improve the AQE result. Feedback based different QE terms selection methods have been widely used in AQE, and it has been reported that QE terms selection methods can improve the efficiency and accuracy of IR model.
Traditional QE terms selection methods for AQE are either corpus statistics based or term association based, 2 Computational Intelligence and Neuroscience depending on the used algorithm in the retrieval model. Term association based terms selection methods, such as mutual information [3] and cooccurrence information [2,4], estimate the goodness of each term based on the occurrence of terms in feedback documents (term pool). Corpus statistics based QE terms selection methods, such as Kullback-Leibler Divergence [5], information gain [6], and Robertson selection value [7], estimate the goodness of each term based on the distribution of terms across the corpus.
Most of study on the QE terms selection focused on the performance improvement of individual QE terms selection methods. However, it remains as a challenge to develop an individual QE terms selection method that would outperform other methods in most cases. Moreover, as multiple QE terms selection methods are available, it is natural to combine them for better performance by taking advantage of their individual strength. In the past, experiments of combining multiple query terms selection methods have been conducted, but no theoretical analysis has been done. Combinations of two uncorrelated and high-performing QE terms selection methods have been tested [8]. The author in [5] developed a naive combination of cooccurrence and probabilistic kinds of method, with which the developed approach obtained results that improve those obtained with any of them separately. This result confirms that the information provided by each approach is of a different nature and, therefore, can be used in a combined manner. The author in [9] discussed various ranked search results aggregation methods such as Borda and Condorcet and confirmed the improvement in search quality.
A set of expansion terms are obtained after applying Borda count based ranks combining approach, we analyzed that some obtained expansion terms are not related to user query semantically. Therefore, it became compulsory to check the semantic meaning of selected expansion terms with the user query to avoid query drifting problem. For this purpose, we use the concept of semantic similarity with the help of WordNet. In the literature survey, semantics based approaches are also used in some research; for example, concept level approach is used in sentiment analysis [10], representation of word and phrase [11], finding patterns for concept level sentiment analysis [12], and analysis of emotions in natural language [13]. Some other works have been done by using the concept of semantic similarity for IR with QE. The authors in [14,15] proposed a QE technique using WordNet lexical chains, hypernym/hyponymy, and synonymy relations in WordNet. Lexical chains are used as the basic expansion rules and confirmed that QE can improve the query performance dramatically. In [16] Liu et al. explained the use of WordNet lexical ontology for both expanding query and selecting proper sense of expansion terms and achieved reasonable performance improvement. After applying semantic filtering, a refined set of additional expansion terms was obtained. After that, an approach of reweighting was required that will provide higher weight to original query terms than the additional expansion terms. The authors in [17] present a new method for query reweighting to deal with document retrieval. It reweights a user's query vector, based on the user's relevance feedback, to improve the performance of document retrieval systems.
The experiments of the proposed model are performed on two well-known benchmark datasets, namely, FIRE and TREC-3. For performance evaluation, the proposed model has been compared with Okapi-BM25 [18] and Aguera and Araujo's model [5]. The proposed methods increase the precision rates and the recall rates of IR systems for dealing with document retrieval. It also gets a significant higher average recall rate, average precision rate, and -measure on both datasets.
The major contributions of this work are summarized as follows: (1) First, we present IG, cooccurrence, KLD, and RSV terms selection methods for PRF based AQE; with this, the experimental analyses of all these terms selection methods are presented with evaluation parameter score.
(2) Second, we propose most popular Borda rank aggregation methods for combining different ranked lists of expansion terms selected by IG, cooccurrence, KLD, and RSV methods discussed in step (1).
(3) Third, we propose semantic similarity approach to filter out the irrelevant and redundant expansion terms with context to user query obtained from step (2). Next, additional expansion terms were used after applying the reweighting approach.
(4) Finally, paired -test is conducted between our proposed approaches and the other model considered as the baseline model.
The organization of this paper is as follows. In Section 2, we briefly introduce PRF based document selection and four individual QE terms selection methods. Section 3 explained our proposed model and its algorithm with Borda count based ranking approach and semantic similarity based approach. Section 4 presents the experimental results of different QE terms selection methods and then compared them with each other; next in this section our proposed approaches results are presented and compared or analyzed with baseline approaches in terms of the precision, recall, and -measure on both FIRE and TREC datasets. Finally, Section 5 presents the conclusion and future research directions.

PRF Document Selection and Query Expansion Terms Selection Methods
In this section, we briefly discuss PRF based QE. In Section 2.   Tables 2 and 3).
an initial set of retrieved documents, which is more efficient than the traditionally used Cosine similarity measure. Figures  1 and 2 show the architecture of our proposed AQE retrieval model based on rank aggregation and semantic filtering schemes. To construct the term pool, we first retrieve a number of top documents from first retrieved documents for the query using a matching function. In our problem, we used an Okapi-BM25 matching function to retrieve first relevant documents. The Okapi-BM25 measure is given by following [18] Okapi ( , ) = ∑ where is the query that contains terms, tf is the term frequency of term in the th document , and qtf is the term frequency in query . Next, 1 , , and 3 are constant parameters, the values of parameters that we used in our experiments are based on the Robertson et al. [18] ( 1 = 1.2, = 0.75, and 3 = 7.0): where is the number of documents and is the number of documents containing the term . Parameters dl and avdl are document length and average document length. Once the top relevant document was retrieved with the help of Okapi-BM25 method discussed in this section, all the unique terms of top documents are selected to form term pool or candidate terms set denoted by Tp. The terms are ranked by any of the several scoring techniques/measures available to rank the terms on the basis of their appropriateness for expansion. These scoring measures are presented in coming subsections.

Kullback-Leibler Divergence Based Query Expansion. The
Kullback-Leibler Divergence (KLD) [5] is well-known in information theory [4]. KLD based approach has been used in natural language and speech processing applications based on statistical language modeling in IR (Robertson, 1990). KLD can be used as a term scoring function that is based on the differences between the distribution of terms in the collection of top retrieved relevant documents and entire document collection. Thus, the following equation can be used to find the KLD score of candidate expansion term: where ( ) is the probability of presence of term in top retrieved document collection , given by And ( ) is the probability of presence of term in the entire document collection , explained by Equation (3) is used to find the KLD score of candidate expansion terms. Some top scored candidate terms are used to expand the user original query. This type of query expansion is called KLD based query expansion (KLDBQE).

Cooccurrence Based Query
Expansion. The most feasible method for selecting the QE terms is to initially score the terms on the basis of their cooccurrence with original user query terms. The concept of term cooccurrence has been used since the 90s for identifying some kind of relationship among terms in the documents set [4]. According to van Rijsbergen [2], the idea of using cooccurrence statistics is to find the relationship between document corpus and the query terms, and the author used this idea to expand the original user queries.
We can use co( , ) to quantify the strength of cooccurrence based association between two terms. Following are some well-known cooccurrence coefficient methods; here co( , ) can be given by one of the following equations: where and are the number of documents that contain terms and , respectively, and is the document numbers that contain both terms and together. We can use these cooccurrence coefficient values to find the value of similarity between user query terms and the candidate expansion term . But there is a problem of query drifting by adding these high similar terms with the user query terms. For handling this kind of problem, one can use the concept of inverse document frequency (idf). With the help of candidate term idf value and normalization cooccurrence coefficient value with user query terms, the codegree coefficient of the candidate term is obtained, explained in (8). Consider Codegree ( , ) = log 10 (co ( , ) + 1) idf ( ) = log 10 ( ) , where is the number of documents in the corpus, is the number of top ranked retrieved documents considered, is the th query term, is the candidate expansion term, and is the number of documents in the corpus that contain term . And co( , ) is the number of cooccurrences between and in the top ranked documents, that is, Jaccard( , ).
Equation (8) can be used for finding similarity of a term with individual query term . To obtain a value measuring how well is for the whole query , there is a need to combine its codegree with all individual original query terms present in query. So we use Finally, (9) is used to find the cooccurrence coefficient score of candidate expansion terms. This type of query expansion is called Cooccurrence Based Query Expansion (CBQE).

Information Gain Based Query Expansion (IGBQE).
Information gain (IG) coefficient is a parameter to find the degree of class prediction by the presence or absence of a term in a documents set [6]. Let = { 1 , 2 , . . . , | | } be the set of classes; in our case there are two classes: first, set of initially retrieved relevant documents for a user query called PRF documents; second, set of nonrelevant documents for the same query. Now the value of information gain coefficient of a term can be explained as follows: where ( ) is the probability that term occurs, means that term does not occur (i.e., ( ) = 1 − ( )), ( ) is the probability of the th class value, ( | ) is the conditional Computational Intelligence and Neuroscience 5 probability of the th class value given that occurs, and ( | ) is the conditional probability of the th class value given that does not occur. The value of information gain coefficient is used to measure the importance of a term with respect to all the classes. The terms of term pool or top retrieved feedback document are ranked based on the value obtained from (10). Some high IG scored candidate terms selected for expanding the user query. This type of query expansion is called IG Based Query Expansion (IGBQE).

Robertson Selection Value Based Query Expansion.
The RSV method [7] is based on Swets model of IR system performance [19]. The system is assumed to retrieve items by ranking them according to some measure of association with the query. The principle idea of the Swets theory is to examine the distribution of values of this match function over the document collection. More specifically, it considers two such distributions, one for the relevant documents and one for the nonrelevant ones. If the retrieval system is any good, the two distributions will be different; in particular the match function values will generally be higher for relevant documents than for nonrelevant ones.
In general, the more the two distributions are separated, the better the performance of the system will be. Other things being equal, the higher the difference = − between the means of the two distributions, the better the performance. Actually the measure of performance proposed by Swets and an alternative proposed by Brookes [20] can both be expressed as normalized by some function of standard deviations of the distributions. However, these measures are associated with the assumption that the distributions are normal. This would not be an appropriate assumption for the present situation. So the present argument is based on the use of , unnormalized, as a simple measure of performance.
If the weight of candidate term is then those classes that contain the term will have added to their match function values. For the case of query expansion, we consider the candidate term with weight . The new mean of relevant and nonrelevant document class is given by and , respectively.
If tr and tnr correspond to the probability of terms present in relevant and nonrelevant document collection, respectively, the equation for (mean of relevant document) is given as follows: Similarly, the new mean for (the nonrelevant documents) is given as follows: And the effectiveness is defined as If differences between two distributions are very low then where is the original difference of and . Finally, the weight of candidate expansion term is given as follows: where tr is the probability of expansion term in relevant documents and tnr is the probability of expansion term in nonrelevant document or corpus. Equation (15) can be used to find the RSV score of candidate expansion terms. Some top scored candidate terms are used to expand the user original query. This type of query expansion is called RSV Based Query Expansion (RSVBQE).

Proposed Borda and Semantic Similarity Based Model
Our proposed work can be categorized mainly in two parts: (i) First, score combination of different individual approaches using Borda count approach.
(ii) Second, apply semantic similarity approach for removing noisy or irrelevant terms.
PRF based QE methods select the candidate terms for expanding the user query from initially retrieved set of documents. We have used an efficient Okapi-BM25 similarity measure for selecting initial set of retrieved documents, which is more efficient then the traditionally used Cosine similarity measure. Figure 1 shows architecture of our proposed AQE retrieval model based on Borda count rank combination and semantic similarity approaches.
Initially, we use cooccurrence approach, in which words present around the query term in top feedback documents are used for selecting expansion terms; we call it CBQE. In this approach, high cooccurrence value terms selected from cooccurrence approach form a term pool of candidate terms. Further, the information gain method is used to score terms of term pool and some high scored terms are used as query expansion terms; this is called IGBQE. Next, the concepts behind the Kullback-Leibler Divergence (KLD) and Robertson selection value (RSV) are used to score the term pool terms and high scored terms used for expanding the user original query; these QE methods are called KLDBQE and RSVBQE.
Further, well-known Borda count ranks combining scheme is used to combine multiple terms ranks obtained from cooccurrence, IG, KLD, and RSV methods. This rank aggregation method produces a single combined list of candidate terms with their Borda score that is high to low from top to bottom. Top candidate terms, selected from this approach, are used to expand user query: this is called Borda based query expansion (BBQE). The set of candidate terms obtained after applying Borda rank aggregation methods contains some noisy or semantically irrelevant terms with 6 Computational Intelligence and Neuroscience the query. If we include these noisy terms in the process of query expansion, it may lead to the problem of query drifting. Therefore the concept of semantic similarity is used to filter out semantically irrelevant terms obtained from BBQE for query reformulation or expansion that is called Borda and semantic based query expansion (BSBQE). Finally, reformulated query with reweighted expansion terms submitted to the searching engine a list of ranked documents retrieved as a final result for the user query.  (1) Apply Okapi-BM25 similarity function for retrieving the ranked relevant document with respect to a user query.
(2) All the unique terms of top retrieved documents obtained from step (1) are selected to form term pool.
(3) The different method is used to score the unique terms of term pool to form candidate terms; these are listed below: (i) Calculate IG score. (ii) Calculate cooccurrence score. (iii) Calculate KLD score. (iv) Calculate RSV score.
Top scored candidate terms obtained from substeps (i) to (iv) of step (3) are used to expand the user query and called IGBQE, CBQE, KLDBQE, and RSVBQE, respectively.
(4) Borda rank aggregation methods are used to combine different candidate term ranks obtained from substeps (i) to (iv) of step (3).
(i) Borda rank aggregation produced a ranked list of candidate terms.
Some top candidate terms obtained from substep (i) of step (4) are used to expand the user query and called BBQE.
(5) Semantic filtering approach is used to filter out semantically irrelevant expansion terms from expansion terms set obtained from BBQE approach. After applying semantic filtering, this Borda and semantic based approach is called BSBQE.

Proposed Borda Based Query Expansion.
After applying different query expansion terms selection methods, we got a separately ranked list of QE terms from each terms selection method. Now we need some rank combination approach that can combine different ranked lists of QE terms into a single list of terms. Now, some top scored terms are selected from this single list of terms as QE terms with the user query. In this section, we brief the reader about the ranks combination methods based on rank positions that we used in our proposed work. The social choice theory [21] is a study field in which voting algorithms are used as a technique for making the social or group decision. Algorithms used in this section are based on voting in the elections.

Borda Count Ranks Combining
Approach. According to Borda ranks combining approach, each voter has their own preference list of candidates. For each voter, the top first candidate obtains points, the top second candidate obtains − 1 points, the top third candidate obtains − 2 points, and so on. The sum value of obtained points of each voter gives the final points to each candidate. There are few candidates that are unranked by a voter (candidate terms selection method); then remaining points are divided among the unranked candidates. The candidate that has high points wins [22]. Example 1. Here we used an example to illustrate the working of Borda ranks combining approach. Here, we assume a combined single query expansion terms selection method with five following ranked query expansion terms selection methods, which ranked four candidate terms , , , and as follows: Candidate terms selection method 1: , , , and .
Candidate terms selection method 5: , . Now we denote the score of each candidate term by candidate score ( ).
Thus, the final ranking of candidate terms is , , , and .
Some high ranked candidate terms selected by Borda scheme are used for expanding the user query: this type of QE is called Borda based query expansion (BBQE).

Proposed Semantic Filtering Based Query Expansion.
A list of candidate terms is obtained after ranks combination modules. In this candidate terms list, we observed that some candidate terms as expansion terms are not related to the original user query. If we use these candidate terms as query expansion terms, it may retrieve irrelevant documents. Thus, it is compulsory to filter out these irrelevant candidate terms. To eliminate the irrelevant and redundant candidate expansion terms, we used the concept of semantic similarity that captures the semantically related terms with query terms Computational Intelligence and Neuroscience 7 from the candidate terms list and filters semantically nonrelated terms. For applying semantic similarity, we used linguistic ontology WordNet as background knowledge. The basic idea of semantic similarity is that if a candidate term has some kind of semantic relation (i.e., synonym, hypernym) with the query term then it will be appropriate for query expansion. According to the discussion in this section, there are a number of semantic similarity finding modules that can be used to find semantic similarity between two words or terms or concepts (such as query term and candidate term). The popular and feasible semantic similarity modules/approaches are Leacock and Chodorow (Lch) [23], Resnik [24], and Wu and Palmar [25], which takes two words/concepts as input and returns semantic similarity between these two terms. We used Leacock-Chodorow (Lch) semantic similarity measure in our work and found that results are motivating.
In this paper authors also have tried to handle sentiment and emotions to some extent, by using the approach present in [12]. For this purpose, first sentiment words are selected from the user query using background knowledge (Senti-WordNet) and these words are expanded by adding other related sentiment and emotional words.
The Lch method defines a semantic similarity measure based on the shortest distance length ( 1 , 2 ) between two concepts or terms 1 and 2 and scaling that value by twice the maximum depth of the hierarchy, given in where Dp is the maximum depth (i.e., 12 in case of WordNet-3.0); note that, in practice, we add 1 to both length ( 1 , 2 ) and 2Dp to avoid log(0), when shortest path length is 0. Our semantic filtering based approach BSBQE takes candidate terms as an input from BBQE approach and filters semantically irrelevant terms from the candidate terms list. We give a new formula for finding semantically similar expansion terms from candidate terms set. The new suggested formula is given in the following equation used to find semantic similarity between candidate term and the query terms: semantic similarity score for = SemSim ( , ) where is all query terms, is a single candidate term, and is an th term of the query. Finally, noisy or irrelevant terms of BBQE are filtered by this semantic approach, and this semantic based approach is called BSBQE. The algorithmic steps of our proposed semantic based QE approach are listed in Algorithm 2.
(1) Once the candidate terms sets are obtained from step (4) of Algorithm 1.
(2) Input two terms/concepts 1 and 2 ; the first term 1 is obtained from step (1) and the second 2 is query term.
(3) Words validation: If both words are present in English WordNet lexical taxonomy, go to step (4). Else, go to step (9).
(5) Hypernymy validation module: find if both trees have the same root or not.
If root is the same, go to step (6). Else, go to step (9).
(6) LCS module: find nearest common hypernym ancestor node of both words in the hypernym tree, which is called least common subsumer (LCS). (10) Semantic similarity between candidate term and all query terms is obtained from (17).

Methods for Reweighting the Expanded Query Terms.
After one of the QE terms selection methods described above has generated the list of candidate terms, the selected candidate terms that system adds to the user query must be reweighted. Different methods have been proposed for QE terms reweighting. We made a comparison analysis of these methods and tested which one is the most appropriate for our proposed AQE model. The most traditional and simple approach of expansion term reweighting is the Rocchio algorithm [26]. In this proposed work, we used Rocchio's beta version of Rocchio's algorithm, in which we require only the parameter. Finally, we computed the new weight qtw of candidate terms used as expansion terms with the original user query as follows: In (18), parameter ( ) is the old weight of candidate term and max ( ) is the maximum weight of the expanded query terms.
is a setting parameter, qtf is the query term frequency, and qtf max is the query term maximum frequency present in the query . The value of the parameter is fixed to 0.1 in our experiment. Finally, the selected candidate terms are used after reweighting for expanding the user query.

Experimental Study
All the experiments carried out in this paper are based on the model proposed in Section 3. First, the performances of 8 Computational Intelligence and Neuroscience individual methods such as CBQE, RSVBQE, IGBQE, and KLDBQE are compared with each other or with Okapi-BM25 [18]. Second, the performance of BBQE and BSBQE is compared with Aguera and Araujo's model (combining multiple terms selection methods) [5] using different performance evaluation parameters.

Datasets.
In this section, we describe two well-known benchmarks test collections used in our experiments: TREC disks 1 and 2 and FIRE ad hoc dataset, which are different in size and genre (TREC disks 1 and 2 size is 6 Gb, while FIRE dataset is 3.4 Gb). The detailed descriptions of both datasets are given in Table 1. Query numbers ranging from 126 to 175 are used for FIRE dataset and query numbers ranging from 151 to 200 are used for TREC dataset (a different collection of 50 queries is used for both datasets). The TREC disks 1 and 2 collections contain newswire article from different sources, such as Association Press, Wall Street Journal, Financial Times, and Federal Register, which are considered as highquality text data with minimum noise. The FIRE ad hoc dataset is a medium size collection containing newswire article from two different sources named The Telegraph and BD News 24 provided by the Indian Statistical Institute, Kolkata, India.
In our experiments, we use only title field of TREC and FIRE query sets for retrieval task, because this field is closer to the actual queries used in real time applications. The last column of Table 1 presents the average documents length in the corresponding TREC and FIRE datasets.
Based on the performance, Porter stemmer is used to stem each term in the process of indexing and querying, and a latest list of 420 stop words is used to remove the stop words. In both FIRE and TREC dataset, the top 10, 25, and 50 retrieved documents are used to measure the average precision, recall, and mean average precision.

Parameter Tuning.
To investigate the optimal setting of parameters for fair comparisons, we used the training method explained in Diaz and Metzler [27] for our proposed model, which is very popular in IR's field. First, for parameters in PRF models, we used different numbers of top feedback documents in both baseline and proposed approaches (5, 10, 15, 25, and 50), to find the optimal number of feedback documents. Here, we found that our proposed model performed best for top 15 numbers of feedback document; that is why we fix top 15 feedback documents to make the term pool in our experiment. Second, we select different number of top candidate terms from ranked candidate terms based on similarity value with query terms as expansion terms (10,20,30,50, and 75), for both baseline and proposed methods to find the optimal number of top expansion terms used for reformulating query. Here, we found that our proposed model performed best for top 30 candidate terms; that is why we fix top 30 candidate terms to reformulate the original user query in our experiment.

Evaluation Parameters.
Recall ( ), precision ( ), andmeasure are three parameters that are used to evaluate the performance of information retrieval system; recall is given by where is the set of relevant documents retrieved and arel is the set of all relevant documents, and where ret is the retrieved documents set. The average precision (AP) is used as a standard measure to find the quality of a search system in information retrieval. The precision of a document is defined as the fraction of relevant documents within the set of retrieved documents. The AP for a relevant documents set is obtained as the mean precision of all these docs: where is the relevant documents set.
In general there has to be the trade-off between precision and recall as both of them cannot increase simultaneously. Depending on the requirement, we may be interested in higher precision or higher recall. However, if we want to evaluate the accuracy considering both precision and recall, we may use the -measure to evaluate the accuracy of the result. The -measure is a harmonic combination of the precision ( ) and recall ( ) values of the th documents set used in information retrieval.
The -measure can be calculated as follows: We use these evaluation metrics as the primary single summary performance metric in our experiments that are also the main official evaluation metric in the corresponding TREC and FIRE evaluations forum. To make more confirm the superiority of our proposed method results, we used fixed-level interpolated precision-recall (the PR curve) curve for making the basic comparisons of our proposed method with other methods.

Experimental Results of Individual QE Terms Selection
Methods. Tables 2 and 3 show the retrieval performance of query expansion terms selection methods in terms of average precision and recall on FIRE and TREC datasets and compared with Okapi-BM25 retrieval model, where Okapi-BM25 is a state-of-the-art probabilistic retrieval model [14].
In our experiment, we found that the performance of our proposed query expansion terms selection approaches IGBQE, CBQE, KLDBQE, and RSVBQE achieved a significant improvement over basic retrieval model Okapi-BM25. We also note that the improvements achieved by the proposed model on TREC disks 1 and 2 are little greater than the FIRE dataset. This is probably because the disks 1 and 2 collections   contain news articles, which are usually considered as highquality text data with less noise. On the contrary, FIRE ad hoc dataset contains news as well as web collections that are more challenging and include multiple sources of a heterogeneous set of the documents as well as more noise. Tables 2 and 3 show that the performance of KLD based query expansion terms selection method (KLDBQE) is higher than other terms selection methods in all top retrieved documents sets on both FIRE and TREC datasets. Figure 2 shows the significant improvement by all used individual terms selection methods over Okapi-BM25 and the superiority of KLDBQE over other individual methods on both FIRE and TREC datasets.
The 11-point precision-recall curves of all used individual terms selection methods, namely, CBQE, RSVBQE, IGBQE, and KLDBQE, with baseline approach Okapi-BM25 are shown in Figure 3. The 11-point precision-recall curve is a graph plotting the interpolated precision of an information retrieval (IR) system at 11 standard recall levels, that is, {0.0, 0.1, 0.2, . . . , 1.0}. The graph is widely used to evaluate IR systems that return ranked documents, which are common in modern search systems. Figure 3 also shows the significant improvement of individual terms selection approaches over baseline approach on both datasets. This also indicates the superiority of KLDBQE over other individual approaches.

Experimental Results of Borda and Semantic Similarity
Methods. Tables 4 and 5 show the retrieval performance of our proposed Borda rank combination methods with or without semantic similarity in terms of average precision and recall on both FIRE and TREC datasets. Then, we compared the proposed model with Aguera and Araujo's model (model based on combining three QE terms selection methods) [5], where Aguera and Araujo's model is a state-of-the-art multiple QE terms selection combination based retrieval model.
Both Tables 4 and 5 also present the results of Okapi-BM25 and KLDBQE (best performing the individual method) methods for better comparisons. In our experiment, Tables 4 and 5 show that the performance of our proposed Borda based QE approach BBQE alone and with semantic similarity BSBQE achieved significant improvement over Okapi-BM25 model, KLDBQE (best individual QE terms selection method), and Aguera and Araujo's methods. Figure 4 shows the significant improvement by our proposed BBQE and BSBQE over Okapi-BM25 and Aguera and Araujo's model in terms of recall, precision, and -measures on both FIRE and TREC datasets.
The 11-point precision-recall curves of proposed approaches, namely, BBQE, BSBQE, and baseline approaches Okapi-BM25 and Aguera and Araujo's model are shown in Figure 5. The 11-point precision-recall curve is a graph plotting the interpolated precision of an information retrieval (IR) system at 11 standard recall levels, that is, {0.0, 0.1, 0.2, . . . , 1.0}. The graph is widely used to evaluate IR systems that return ranked documents, which are common in modern search systems. Figure 5 also shows the significant improvement of both our proposed approaches over baseline approaches. This indicates that the combination of both Borda rank aggregation scheme and semantic similarity scheme has the positive effect on improving the quality of expansion terms.  After observing that our proposed approach is giving better performance than the best of individual similarity measure considered, a -test was applied to show that the improvement is statistically significant. This paired -test compares one set of measurements with a second set from the same sample. Given two paired sets and of measured values, the paired -test determines whether they differ from each other in a significant way under the assumptions that the paired differences are independent and identically normally distributed.
The statistical paired -test results obtained for FIRE and TREC datasets are tabulated in Tables 6-7. A paired -test is the most commonly used hypothesis test in IR. In the present work, the paired -tests are conducted to determine whether the proposed query expansion approaches are statistically different from KLDBQE (best individual method) and Aguera and Araujo's model or not. These paired -tests return the results in terms of ℎ-value, value, and CI values. The value = 0 indicates that the null hypothesis is rejected and that the mean of our data is significantly different from other approaches with 95% certainty and therefore the null hypothesis "means are equal" cannot be rejected at the 5% significance level ( = 0.05).
If the value = 1, then the performances are not statistically different and therefore the null hypothesis ("means are equal") can be rejected at the 5% significance level ( = 0.05). CI is the 95% confidence interval of the mean based upon the -distribution. Table 6 clearly indicates that the improvement of the proposed Borda rank aggregating approaches over KLDBQE method is statistically significant at = 0.05 ( is almost zero for both the FIRE and TREC dataset). Table 7 shows paired -test values between both our proposed approaches and Aguera and Araujo's model. The tables contain only the proposed approaches that pass the paired -test. In our experiment, we compared both our proposed approaches with Aguera and Araujo's model. Table 7 clearly indicates that the improvement of our proposed approaches BBQE and BSBQE over Aguera and Araujo's model is statistically significant at = 0.05 ( is almost zero for both the FIRE and TREC datasets).    (i) The individual query expansion terms selection methods, namely, IGBQE, CBQE, RSVBQE, and KLDBQE, perform better than Okapi-BM25 (nonquery expansion method). In all used terms selection methods, KLDBQE performed best among CBQE, IGBQE, and RSVBQE.
(ii) Our proposed Borda based approach BBQE achieved motivational results and performed significantly better than the Okapi-BM25 model, KLDBQE (best individual expansion terms selection method), and Aguera and Araujo's method.
(iii) Our proposed Borda and semantic filtering approach BSBQE also performed better than the Okapi-BM25 model, KLDBQE (best individual expansion terms selection method), BBQE (best rank aggregation method), and Aguera and Araujo's method.
(iv) Paired -test shows statistical significance of our proposed approaches over baseline approach in terms of ℎ-value, value, and CI value as shown in Tables 6-7.

Conclusion and Future Work
In this work, we explored the power of combining multiple query expansion terms selection methods to improve the performance of the information retrieval system by using AQE. We studied the Borda rank combination of four QE terms selection methods on two real datasets with or without semantic similarity approach. In our experiment, we observed that applying semantic similarity after Borda rank aggregations outperformed each individual QE terms selection method in terms of the average precision, recall, andmeasure values. In that case, different query expansion terms selection methods can capture the different characteristics of the terms, and the newly obtained terms can represent the documents set more accurately.
In this paper, we presented a new Borda rank combination based AQE method for document retrieval based on PRF techniques by mining additional QE terms, where our proposed Borda count based approach combines IG, cooccurrence, RSV, and KLD scores of candidate expansion terms and produces a single list of candidate expansion terms. The proposed Borda method uses voting approach to infer the weights of the additional query terms and then uses these additional query terms together with the original query terms to retrieve documents for improving the performance of information retrieval systems. After Borda approach, semantic similarity algorithms were used to filter out semantically irrelevant terms from candidate expansion terms obtained after Borda rank aggregation based query expansion approach.
TREC and FIRE benchmark datasets are used to validate our proposed QE method. The experiments confirmed that both our proposed QE methods increase the values of precision, recall, and -measure. The higher values of average precision and average recall are also obtained by the proposed method in comparison to Okapi-BM25 and Aguera and Araujo's QE method. A paired -test is conducted to present statistical analysis. This statistical analysis confirms that the proposed Borda based QE method significantly improves the IR efficiency as compared to Okapi-BM25 and Aguera and Araujo's approach. The robustness of the proposed QE model may be further tested on other TREC datasets.

Nomenclature
: O r i g i n a l u s e r q u e r y : Set of initially retrieved documents by Okapi-BM25 function for user query tf: Term frequency qtf: Query term frequency dl: Document length avdl: Average document length : Number of documents in entire corpus : Set of some top initially retrieved documents for user query or set of top PRF docs : Entire corpus containing relevant and nonrelevant docs of user query idf: Inverse document frequency : Number of documents in the corpus that contain term SemSim( , ): Semantic similarity between a candidate term and the query Dp: Depth of WordNet or maximum depth of concepts LCS: Least common subsume Codegree( , ): Cooccurrence degree between a candidate term and the th query term Sim lch ( , ): Lch module based semantic similarity between a candidate term and the th query term Tp: Term pool or candidate terms set containing all the unique terms of PRF docs Computational Intelligence and Neuroscience 13 : Single candidate term docs: Documents.