Keyword Query Expansion Paradigm Based on Recommendation and Interpretation in Relational Databases

Due to the ambiguity and impreciseness of keyword query in relational databases, the research on keyword query expansion has attracted wide attention. Existing query expansion methods expose users’ query intention to a certain extent, but most of them cannot balance the precision and recall. To address this problem, a novel two-step query expansion approach is proposed based on query recommendation and query interpretation. First, a probabilistic recommendation algorithm is put forward by constructing a term similarity matrix and Viterbi model. Second, by using the translation algorithm of triples and construction algorithm of query subgraphs, query keywords are translated to query subgraphswith structural and semantic information. Finally, experimental results on a real-world dataset demonstrate the effectiveness and rationality of the proposed method.


Introduction
In recent years, keyword query in relational databases has been widely applied due to its simplicity and ease of use [1,2].Although this method does not require users to be knowledgeable about the underlying structure of databases and structured query language (e.g., SQL), its semantic fuzziness and its expressive power are limited due to the lack of structure [3].In addition, ordinary users are usually unable to specify exact keywords to describe their query intention, which makes it harder to return adequate results through keyword query [4,5].For these reasons, the precise and recall of keyword query methods cannot be effectively guaranteed [6].Therefore, query expansion has even been an important research branch, which can completely and precisely interpret the query and then improve the recall and precision of query results [7][8][9][10].
Problem and Motivation.Query expansion is to provide more descriptions for the information requirements to improve the query performance.First, we formally define the problem of query expansion as follows.
Then, we will illustrate the research motivation of this paper by the following examples and analysis.Although numerous query expansion methods can be found in the scientific literature, unfortunately these studies suffer from two main limitations.First, existing approaches do not consider the relationship between keywords, which causes the low query precision.Second, most of the existing work neglects the similar words or related items of query keywords, leading to the low query recall.On the one hand, Meng et al. propose a semantic approximate keyword query method based on keyword and query coupling relationships [11].This method partly solves the problems of semantic fuzziness and limitations of expression, but it analyzes the keyword and query coupling relationships through query history.When the query history of database is incomplete and even missing, the method will not be able to conduct semantic analysis normally.Additionally, the queries obtained by this method contain only content information related to the keywords rather than the structure information between keywords, thus affecting the query precision.For example, suppose a user issues the keyword query "Machine learning Arthur Samuel" in DBLP to retrieve the paper "Machine learning" published by Arthur Samuel.The query expansion contains "Machine learning Arthur Samuel SIGMOD" using the above query expansion method.The expansion extends the query with the content but lacks structure information.Consequently, all tuples containing "Machine learning," "Arthur Samuel," or "SIGMOD" will be returned.Obviously, such results are not precise enough and many of them may not be useful.The method proposed in this paper extends the initial query with content and structure related to the keywords.An expanded query is as shown in Figure 1.
It can express the potential semantic and structural information of original keyword query: find out the paper "Machine learning" published in SIGMOD and authored by Arthur Samuel; simultaneously find out the paper "Markov" which cites paper "Machine learning" and authored by Arthur Samuel.In contrast to the query expansion in [11], the method proposed in this paper extends the original query to query subgraphs with underlying structure of databases and thus further improves the query precision.On the other hand, Ganti et al. [12] translates keyword queries into SQL based on the materialized mappings.Bergamaschi et al. [3] propose a Metadata method to translate keyword queries into SQL based on Munkres algorithm.Though these methods describe users' query intention to a degree, they do not take into account similar words and related items extension.Thus these methods have relative high query precision, but their recall needs to be further improved.For example, assume that a user hopes to study the methods of data analysis.Because of his knowledge limitation, the user accesses the database DBLP and submits a keyword query "Machine learning Arthur Samuel," and the expanded query obtained via the above methods is as follows: select R3.Title from Author R1, PaperAuthor R2, Paper R3 where R2.Pid=R3.Pid and R2.Aid=R1.Aid and match(R3.Title) against ("Machine learning" in Boolean mode) and match(R1.Name) against ("Arthur Samuel" in Boolean mode), while the user may also be intending to retrieve papers which have similar or relevant topics to "Machine learning" such as paper "data mining" published by Micheline Kamber.Since paper "data mining" cites paper "Machine learning" and they are much related to each other, the user is also interested in it.The results of the query expansion method proposed in this paper are as shown in Figure 2. Compared with the previous result, these results not only have structural relationship between keywords but also contain the related or similar queries with query keywords.
The analysis and examples of the query expansion methods in Section 1 illustrate that the key challenge here is to develop an approach which balances both query precision and recall.In this paper we focus on the problem of how to tackle the above two limitations and then improve the performance of keyword query expansion approach.We propose a novel query expansion method ReInterpretQE based on query recommendation and interpretation, which extends a keyword query to a list of query subgraphs.These subgraphs could better capture users' information need and the possible semantics of the keyword query and then guide users to explore related tuples in the relational database.First, we construct a term similarity matrix based on the tuple information contained in relational databases.Second, we make the query recommendation using the similarity matrix and dynamic programming to get a query list.This query list consists of top- queries related to the initial query.Finally, we transform the keyword query into query subgraphs.Through the query recommendation and interpretation, the query expansion method improves the recall and precision of query results.
The main contributions of this paper are summarized as follows: (1) We present a keyword query expansion paradigm ReInterpretQE in relational databases, which is based on query recommendation and interpretation.
(2) We design a probabilistic recommendation algorithm based on similarity matrix and dynamic programming.
(3) We propose a keyword query interpretation method.It uses statistical information and schema graph of database for translating a keyword query to query subgraphs.
(4) We conduct extensive experiments on the DBLP dataset, and experimental results demonstrate the effectiveness and feasibility of the proposed method.
The paper is organized as follows: Section 1 introduces the research motivation, main contribution, and structure of this paper; Section 2 reviews the related work; Section 3 describes the architecture of approach ReInterpretQE and then provides details of algorithms in query recommendation and query interpretation; Section 4 conducts the experiments on a real dataset and compares the experimental results; Section 5 concludes this paper and prospects the study in future.

Related Work
This section discusses the related research work.It mainly includes the following two parts: keyword query in relational databases and keyword query expansion.
2.1.Keyword Query in Relational Databases.Recently, keyword query methods in relational databases have been studied extensively [2,[13][14][15].According to the different modeling methods of databases, the existing query methods can be divided into two main categories: schema graph-based methods and data graph-based methods.In [16][17][18][19][20][21], the database is modeled as a schema graph, where nodes represent tables and edges represent primary-foreign-key relationships.These methods enumerate all possible CNs (Candidate Networks) based on the schema graph.Although these methods store the abstract structural information and take little memory, the generation process of CNs needs to take a huge amount of time.Correspondingly, in [22][23][24][25][26][27] the database is modeled as a data graph, where nodes represent tuples and edges represent primary-foreign-key relationships between tuples.The methods identify the minimum connection trees that contain the keywords on the data graph.A major challenge of the methods is the maintenance of data graph.The whole data graph needs to be reconstructed when the data in database changes, which is often a time-consuming exercise.Therefore, it is important to study the dynamic update of databases and design incremental query methods based on the dynamic construction of data graph.

Keyword Query Expansion Paradigm.
In the field of keyword query in relational databases, the majority of existing studies have focused on how to improve the efficiency of query algorithms, while effective query preprocessing has yet to be investigated.Since keyword queries lack the structural information and users tend to select inappropriate keywords, semantic fuzziness becomes an urgent problem to be solved.The method in [28] expands the original query by using query log mining techniques, but it is not applicable in the relational databases.Reference [7] selects the related query expression using user feedback.The results obtained by this query expression are more accordant with users' query intention.However, it needs the interaction of humans, so its efficiency is low.In [3], the keyword query is transformed to a SQL statement based on Munkres, which provides possible semantic descriptions.This method is useful for identifying users' query intention.Nevertheless, the approach does not consider the multiple connection between keywords (i.e., there are various explanations for a keyword query).In addition, the above approach does not take into account the similar words and related items which can well express the intention of users.An analysis model about coupling relationship is presented in [11].It extracts semantic relations based on query history.However, this model also has great limitations.When the query logs are missing or user's preference often changes, the model cannot be applied effectively.Therefore, this paper proposes a query expansion method ReInterpretQE, which consists of two steps: query recommendation and query interpretation.First, we construct similarity matrix based on the structure and content information in databases and put forward a probabilistic recommendation algorithm using dynamic programming.Second, we come up with a keyword query interpretation method to transform the keywords to subgraphs based on the statistics and schema graph of database.Experimental results show that both the recall and precision of query results have been improved by using the proposed query expansion method.

Overview of ReInterpretQE.
To solve the two problems of semantic fuzziness and limitations of expression, this paper comes up with a novel two-step query expansion method, ReInterpretQE.This method is based on query recommendation and query interpretation.The goal of query recommendation is to extend the initial query  to a list of keyword queries related to it, so that the query results are more comprehensive and better to meet the demands of users.The query interpretation is to translate the list of keyword queries into query subgraphs, which can lock the query results more precisely.It is designed to improve the recall and precision of query results.Figure 3 shows the architecture diagram of ReInterpretQE, which is divided into two main phases: query recommendation and query interpretation.
Phase 1 (query recommendation).In the process of query recommendation, the intrasimilarity and intersimilarity between terms are calculated using the structure information, content information, and words cooccurrence, and a similarity matrix is constructed based on the above two similarities.Then the idea of dynamic programming is used to build Viterbi model; thus a keyword query  = { 1 ,  2 , . . .,   } can be extended to a keyword query list The query list produced by the query recommendation process is semantically related to the original query.
Phase 2 (query interpretation).In the process of query interpretation, an algorithm is put forward for translation from keywords to triples.Then query subgraphs are built for each query in query list using the schema graph of database.The implementation detail of the algorithm will be introduced in the subsequent sections.
First, Section 3.2 describes how to recommend query list with the same or close similarity of meaning according to structure information and content information in databases.This problem can be solved effectively by the construction of term similarity matrix and probabilistic recommendation algorithm.The method makes the query results contain more information that users want to obtain.Thus the query recall can be further improved.Section 3.3 puts forward a two-step query interpretation method.The keyword query can be translated into query subgraphs with potential structural information through the following steps: Step 1: translation from keywords to triples; Step 2: construction of query subgraphs.With the above process of query interpretation, the query precision is improved effectively.

Query Recommendation.
It is difficult to specify proper keywords to express query intention for a common user.Therefore, a lot of information related to query cannot be returned as results.This section presents a query recommendation method to extend the original query and derives a series of queries which have a similar semantic to the original query.Thus the recall of the query results can be improved.Assume that a user submits a query  = { 1 ,  2 , . . .,   }.First, Section 3.2.1 constructs the term similarity matrix to find the top- keywords  1  ,

Construction of Term Similarity
Matrix.In the construction phase of term similarity matrix, from two aspects of intrasimilarity and intersimilarity, this paper calculates the similarity between keywords based on the structure information and content information.
(a) Intrasimilarity.Normally, in information retrieval if two keywords often appear in the same documents, thus keywords are regarded as semantically related.In relational databases, tuples are generally taken as virtual documents.
Similarly, the higher the cooccurrence of two keywords in the same tuples, the higher the degree of similarity between Structured information, database, data mining two keywords.The intrasimilarity between two keywords is measured using Jaccard similarity coefficient, as shown in where TS(  ) and TS(  ) are tuple sets containing   and   , respectively.
We can obtain the intrasimilarity between any two keywords in databases by formula (1).Example 2 will further show the calculation process above.
Example 2. To facilitate explanation of the approach, we simplify the structure and content of database as shown in Table 1.There is a data table with four tuples  1 ,  2 ,  3 , and  4 in DBLP database.We use da, qe, si, sa, ml, pr, and dm to represent database, query, structured information, statistical analysis, machine learning, probability, and data mining.The intrasimilarity between keywords da and qe is as follows: Similarly, we can calculate the intrasimilarity between any two keywords and get the following intrasimilarity matrix M in : [ da qe si sa ml pr dm da 1 0.3 1 0 0 0 0.3 qe 0.3 1 0.3 0.5 0.3 0 0 si 1 0.3 1 0 0 0 0.3 sa 0 0.5 0 1 0.5 0 0 ml 0 0.3 0 0.5 1 0.5 0.3 pr 0 0 0 0 0.5 1 0.5 (b) Intersimilarity.The intrasimilarity reflects the direct semantic similarity according to the cooccurrence of keywords.Additionally, there is also indirect semantic similarity between keywords.For example, although da and sa do not appear in a tuple at the same time, they have the indirect similarity via keyword qe, because da and qe appear in the same tuple  1 , while sa and qe appear in the same tuple  2 .Keyword qe is called the semantic associative term; thus keywords da and sa have the indirect semantic similarity.
Next, we will detail the computing process of intersimilarity.Suppose keywords   and   belong to different tuples and the set of semantic associative terms is .If   is any element in , then the intersimilarity between   and   via associative term   is as shown in The intersimilarity between   and   is as shown in ( Example 3. We take the DBLP database in Table 1, for example, as well.Example 2 leads us to know that the intrasimilarity between keywords si and ml is simi in (si, ml) = 0.They have the intersimilarity via keywords qe and dm, where simi in (si, qe) > 0, simi in (ml, qe) > 0, simi in (si, dm) > 0, and simi in (ml, dm) > 0. Thus the intersimilarity between si and ml via qe is as follows: simi out (si, ml | qe) = min {simi in (si, qe) , simi in (ml, qe)} = 0.3.
In conclusion, the intersimilarity between si and ml is as follows: The intersimilarity matrix of DBLP database in Table 1 is as follows: (c) Construction of Term Similarity Matrix.Formula (10) integrates the intrasimilarity and intersimilarity to calculate the similarity between any two keywords simi (  ,   ) =  simi in (  ,   ) + (1 − ) simi out (  ,   ) , (10) where  ∈ [0, 1] is the balance factor to adjust the contribution of two similarities to the final results.From the result of parameter setting experiment in Section 4.2, the precision of term similarity calculation reaches the maximum when  equals 0.5.So we can get the following term similarity matrix of DBLP database in Table 1: Algorithm 1: Probabilistic recommendation algorithm.

Probabilistic Recommendation Algorithm. This section comes up with a probabilistic recommendation algorithm.
We build the Viterbi model using dynamic programming and generate the query list related to query input, as shown in Algorithm 1.

Query Interpretation.
As a fuzzy query method, keyword query cannot reflect query intention accurately.This section translates the keyword query  = ( 1 ,  2 , . . .,   ) to a set of query subgraphs { 1 ,  2 , . . .,   } by the translation algorithm of triples and construction algorithm of query subgraphs, where   is the schema subgraph.Compared with the keyword query, the query subgraph not only contains the content information but also carries structural and semantic information.Thus it can more accurately reflect the users' query intention.

Translation from Keywords to Triples.
In order to identify users' query intention, we should know exactly the role of a keyword in databases, that is, to know whether it is Metadata or content data.Thus each keyword should be extended to include the table name, attribute names, and attribute values of the table where the keyword is located.2, 3, and 4.
By analyzing the statistics tables, it is obvious that every keyword in query  may correspond to several different semantic interpretations; namely, keyword   can be translated to a set of triples.Query  is also translated to several sets of triples.In most relational databases, the numbers of table names and attribute names are far less than the number of attribute values, so most ambiguities occur during the translation process of attribute values.Therefore, when we conduct the translation, we first try to match the keywords to the table names and attribute names.Then we match the remaining keywords to the attribute values to get the final sets of triples.Algorithm 2 shows the details of this translation process.
Lines (1)-( 9), ( 10)- (24), and ( 25)-(39) in Algorithm 2, respectively, match the keywords in query  to statistics TN of table names, statistics AN of attribute names, and statistics AV of attribute values.Triples corresponding to each keyword in query  can be obtained.As each keyword may correspond to more than one attribute name or attribute value, lines (40)-( 46) and (47)-( 52) handle these cases, respectively.It makes the different attribute names or attribute values corresponding to the same keyword translated to multiple sets of triples.All possible sets of triples are generated by the algorithm.

Query Subgraphs Construction.
Before the construction of query subgraphs, we should combine the triples in [] = ( 1 ,  2 , . . .,   ).The paper makes the following merger rules.
We assume that [] is the set of triples corresponding to query .  ∈ [] will be added to a new group, if   satisfied the following three cases.
These merger rules are intended to add the triples corresponding to different entities into different groups, so that the interpretation process of triples is further refined.After the translation from keywords to triples and the mergers of triples, this paper translates the merged triples []( 1 ,  2 , . . .,   ) to query subgraphs, where ∀  ∈ [] contains all the triples belonging to the same entity.Algorithm 3 describes the construction process of query subgraphs in detail.

Results and Discussion
Our experiments are conducted on the real dataset DBLP [29].The experiments mainly deal with the selection of balance factor  in the calculation process of term similarity and the performance evaluation of our query expansion method ReInterpretQE.Section 4.1 introduces the dataset, query sets, and experimental environment used in the experiment.Section 4.2 compares the precisions of the term similarity calculation with different parameters to choose the optimal value of .Section 4.3 gives contrast experiments using Metadata [3] and -coupling [11] as the baselines.The performance of these algorithms is evaluated, respectively.Performance metrics include precision, recall, and -score.
The experiment is used to verify the performance of the method ReInterpretQE.

Experimental Setup
4.1.1.Dataset.The paper uses the DBLP dataset released in March 2015 [29].The address of downloading is http://dblp.uni-trier.de/.The main statistics of the dataset are as shown in Table 5. DBLP is a computer bibliography dataset widely used for query expansion in relational databases.
The dataset records the information about papers published by scholars.Its original form is XML and we use the Java SAX API to parse the XML file.Then we can obtain five data tables, where tables Author, Paper, and Conference contain information about scholars, papers, and conferences, respectively.Tables Cite and Write are relationship tables.The former specifies the reference relationships and the latter contains the writing relationships between scholars and papers.Figure 4 shows a sample of five tables from the DBLP dataset.The database DBLP contains a number of tuples.They have semantic relevance in content and primary-foreign key relationships in structure.So the DBLP dataset is very appropriate for testing the performance of our query expansion method.

Query Sets.
In the experiment, we invite researchers to choose keywords from DBLP dataset and then build the keyword query they want to perform.By this method, 6 sets of queries with length ranging from 1 to 6 are obtained to form the query sets.Each set contains 10 queries.According to the above method, the researchers constantly submit queries in an extensive scope and we collect 600 queries from researches as the query history.Core(TM) i5-4570 CPU @ 3.20 GHz, 4 GB of RAM, and 1 TB Disk.All the algorithms are implemented in Java.

Parameter
Setting.The experiment in this section is to evaluate the impact of  on the precision of calculation results and provides us guidelines in choosing a good value of .First, we randomly select 8 keywords from the query sets in Section 4.1.For each keyword   , we can get the corresponding top-6 related keywords by formula (10).Further, we can get 11 different sets of results by adjusting the parameter  from 0 to 1. Second, we integrate the related keywords in the above results to get the set   (set size ≤ 66).Finally, we calculate the cooccurrence rate of keywords in set   and   .Mark the keywords ranked top-10 as real set related to   .For keyword   and based on real set, the precisions under different  are as follows: the ratio between the number of keywords related to   through formula (10) and the total number of the received keywords (13).The precisions corresponding to 8 keywords are summed and averaged to get the final precision.Figure 5 illustrates how the precision is adjusted by the parameter .As shown from Figure 5, the precision of the terms similarity calculation is the maximum, 0.87, when  equals 0.5.So we choose the parameter  = 0.5 for the following experiments.

Performance Study.
In this subsection, we report the performance of the query expansion method ReInterpretQE in comparison with the state-of-the-art approaches, Metadata [3] and -coupling [11].The performance is measured by three evaluation metrics: precision, recall, and -score.A corresponding SQL statement is generated for each query in the query set.The results obtained through SQL statement are perceived as real query results and added in the test set Test.Given the result set Result and real result set Test, the precision, recall, and -score can be calculated as follows: -score = 2 × precision × recall precision + recall .Figures 6, 7, and 8 show the comparisons in terms of precision, recall, and -score.Each data point on the -axis of Figures 6, 7, and 8 corresponds to the number of keywords.The -axis presents the corresponding precision, recall, and -score.
As we can see in Figure 6, the query expansion method ReInterpretQE proposed in this paper achieves much higher precision than the methods such as Metadata and coupling significantly.For example, the average precision of ReInterpretQE reaches 0.81, 20.9% and 26.6% higher than Metadata and -coupling, respectively.Specifically, when the number of keywords is 4, the precision of ReInterpretQE is 0.76, while the precision of Metadata and -coupling is 0.64 and 0.62.The ReInterpretQE method increases the precision by 18.8% and 22.6%, compared with Metadata and -coupling, respectively.Overall, the precision of Metadata method is little higher than -coupling.And the precision of ReInterpretQE is improved obviously compared with the other two.This comparison shows the significance of our proposed query interpretation, which can help to describe the semantics of keyword queries and thus significantly improve the query precision.More specifically, the reason for the poor performance of -coupling, as compared to Metadata and ReInterpretQE, is that -coupling method focuses on identifying a set of keyword queries related to the given keyword query.The expanded queries obtained by the -coupling method are still keyword queries without the structure information between keywords.Yet the inherent ambiguity of keyword queries may directly affect the query precision.Methods Metadata and ReInterpretQE transform the initial keyword query into SQL and query subgraphs, respectively.The expansions of both methods contain structural information, which is helpful to improve the query precision.The reason for effective and improved expansion of ReInterpretQE over Metadata is that Metadata does not consider the multiple connection between keywords and various explanations for a keyword query, while ReInterpretQE translates the keyword query to a set of query subgraphs with structural and semantic information through the translation algorithm of triples and construction algorithm of subgraphs.
The subgraphs can locate the query results more precisely.
To further evaluate the performance of our method, we vary different numbers of input keywords and compare the corresponding recalls.Figure 7 plots the average recall of the different query expansion methods.The evaluation results show that ReInterpretQE generally produces expansion of higher recall compared to the other two, suggesting that query recommendation is crucial to obtain good performance.More precisely, when the number of keywords is 4, the recall of Metadata and -coupling is 0.63 and 0.73, respectively, while the one of ReInterpretQE is 0.80.As expected, the recall of ReInterpretQE is approximately 9.6% higher than that of the -coupling method, and it is even higher when compared with the other method, Metadata.We observe that the methods -coupling and ReInterpretQE beat Metadata significantly.This is so because the Metadata method does not take similar words and related items into account, while the methods -coupling and ReInterpretQE deal with the problem accordingly, which can help to progressively and efficiently make query expansion and thus lead to higher query recall.ReInterpretQE always generates better results than -coupling.For example, ReInterpretQE achieves 0.82 average recall, which leads to about 5.1% over -coupling.The reason for this phenomenon is that -coupling only uses keyword coupling relationship matrix to analyze the original query, while ReInterpretQE conducts the similar words and related items expansion for every keyword and expands the query to a query list using probabilistic recommendation algorithm.ReInterpretQE builds Viterbi model using dynamic programming and generates the query list related to query input.After this operation, the query results can include more complete and comprehensive information.And the recall of results has been further improved.
Figure 8 further illustrates that the query expansion method ReInterpretQE outperforms the baseline methods through the comparison analysis of these methods on score.The overall trend is clear.At all thresholds we evaluated, the results produced by ReInterpretQE are significantly better than the other two methods.For example, when the number of keywords is 4, the -score of Metadata and -coupling is 0.63 and 0.67, respectively, while the score of ReInterpretQE is 0.78.The ReInterpretQE method increases the -score by 23.8% and 16.4%, compared with Metadata and -coupling.We investigated the reasons that ReInterpretQE has higher performances than the baseline methods.On the one hand, ReInterpretQE calculates the term similarity considering both intrasimilarity and intersimilarity.Thus the similarity calculation between terms is more reasonable.Then Algorithm 1, probabilistic recommendation algorithm, is proposed based on the term similarity.It constructs the Viterbi model using dynamic programming, which can improve the query recall.On the other hand, ReInterpretQE designs Algorithm 2, translation algorithm of triples, to perform the translation from keywords to triples.Then Algorithm 3, query subgraph construction algorithm, is used to transform the triples to query subgraphs.In the construction of the query subgraphs, ReInterpretQE not only considers the expansion in the structure and content of keyword query, but also considers the various explanations for a keyword query.So the query precision is further improved.
Summary.Based on this observation, we realize that ReInter-pretQE indeed boosts the performance of query expansion and has a clear positive effect on quality of query results.Thus ReInterpretQE can be considered as a quite effective and practicable algorithm for query expansion.

Conclusions
Aiming at addressing the problems of semantic fuzziness and expression limits, the paper proposes a novel two-step query expansion method, ReInterpretQE.The method translates the keyword query to query subgraphs with potential structural and semantic information.Compared with the traditional methods, the method completes the query expansion and analysis only depending on the structure and content information of databases, without the requirement of query logs.In addition, the method uses query recommendation and query interpretation to balance the precision and recall of query results.Finally, experimental results on DBLP dataset verify the effectiveness of the proposed method.There are many open questions in the research of query expansion in relational databases.In the future work, we will make further research and discussion on it.For instance, we will take into account the influence of feedback on the performance of query expansion.

Figure 5 :
Figure 5: Precision of results for different values of .
2  , . . .,    related to any keyword   in original query .Second, Section 3.2.2proposes a probabilistic recommendation algorithm.It uses dynamic programming to build Viterbi model for translating the original query keyword  = { 1 ,  2 , . . .,   } to a query list

Table 2 :
Table names statistics.
three statistics tables: statistics table  of table names, statistics table  of attribute names, and statistics table  of attribute values.They make statistical analysis on the table names, attribute names, and attribute values, respectively.These three statistics tables have similar structures, as shown in Tables

Case 1 .
The table name of   is different from all the table names of   ∈ [],  ∈ [1,  − 1].The table name of   is the same as the one of   ∈ [],  ∈ [1,  − 1], but the attribute name and attribute value of   are null.The table name and attribute name of   are the same as the ones of

Table 5 :
Description of attributes in datasets.