Keywords-Driven and Popularity-Aware Paper Recommendation Based on Undirected Paper Citation Graph

,


Introduction
With the increasing maturity of recommender systems [1], users are apt to employ existing academic paper recommender websites (e.g., Google Scholar and Baidu Academic) to search for their interested papers based on a set of keywords typed by the users. Generally, an academic paper often contains only partial keywords that a user is interested in. erefore, a paper recommender system needs to analyze the user's search requirements to return a set of papers that collectively covers all the queried keywords.
Next, we use Figure 1 to introduce the common process of paper recommendation [2]. is process mainly consists of three phases. e first phase is entering keywords; users analyze their research requirements and enter all query keywords (e.g., k 1 , k 2 , k 3 , and k 6 ) to a recommender system. e second phase is paper discovery [3]; the recommender system automatically identifies diverse sets of candidate papers. e third phase is paper selection [4,5]; the recommender system recommends candidate papers containing query keywords to users. Frankly, the returned papers may fail to satisfy users' requirements on deep and continuous research on a certain content or topic as these papers may belong to the variety of research domains.
Keyword search methods [6,7] have long been popularized in searching for papers, but these methods are hardly to find a set of satisfactory papers. In fact, a set of satisfactory papers must satisfy following requirements: on the one hand, these papers are collectively covering users' query keywords [8][9][10]; on the other hand, one candidate paper containing query keywords has direct or indirect correlation relationships [11] with other candidate papers containing diverse query keywords. In short, recommending a set of satisfactory papers still needs in-depth analyses and study [12].
To recommend a set of satisfactory papers, we propose PR keyword+pop (keywords-driven and popularity-aware paper recommendation) approach that assists users in searching for a set of satisfactory papers, i.e., these papers not only cover all queried keywords but also have higher popularity and correlation among papers. Moreover, PR keyword+pop runs on an undirected paper relationships graph, where paper is modeled as node and a connected edge represents whether there has been correlation relationship among papers. In practice, PR keyword+pop may return one or multiple subgraphs of the paper relationships graph according to users' query keywords; the returned subgraphs include keywords papers covering query keywords, bridging papers (if any) needed however not specified by query keywords, and the composability and popularity of these recommended papers. Note that we speak interchangeably of a paper and its corresponding node in the remainder of this paper, both denoted as p/v.
In summary, we make the following contributions: (1) We propose a novel keyword-driven and popularityaware paper recommendation approach, which efficiently recommends a set of satisfactory papers. (2) We build an undirected paper citation graph [13], and the users' keyword query problem is regarded as the Steiner tree problem. Finally, we employ papers' popularity to find optimal solutions. (3) We conduct large-scale experiments on the Hepdataset [14] to evaluate the usefulness and feasibility of PR keyword+pop . e rest of this paper is structured as follows: Section 2 demonstrates the research motivation, Section 3 defines an undirected paper citation graph, Section 4 formulates main research problems, Section 5 introduces how PR keyword+pop answers users' keyword query on the undirected paper citation graph, Section 6 evaluates PR keyword+pop by using experimental results, Section 7 reviews related works, and Section 8 further concludes this paper and points out the future research directions.

Research Motivation
In Section 2, we use examples of Figures 2 and 3 for demonstrating the research motivation. Figure 2 shows that a user needs to perform the following keywords research tasks [15] before his creation: (1) paper recommendation (i.e., k 1 ) for paper recommendation process research [16]; (2) keyword search (i.e., k 2 ) for keyword search research and applying it to paper recommendation process; (3) Steiner tree (i.e., k 3 ) for Steiner algorithm [17] research and applying it to keyword search; (4) dynamic programming (i.e., k 6 ) for dynamic programming technique research and applying it to solve Steiner tree problem. In Figure 2, the user obtains four corresponding keywords (i.e., Q � {k 1 , k 2 , k 3 , k 6 }) by the preliminary analysis of his research content [18]. Next, the user can search some corresponding papers from Figure 3. Figure 3 is a part of an undirected paper citation graph and contains 14 nodes covering diverse keywords, i.e., v 1 , . . . , v 14 . Furthermore, the notation v 13 {k 11 , k 13 } indicates that node v 13 offers keywords k 11 and k 13 , and the edge e(v 1 , v 10 ) indicates that nodes v 1 and v 10 have a correlation relationship. us, given a query Q � {k 1 , k 2 , k 3 , k 6 }, the user easily searches a set of papers from Figure 3, i.e., Even if this user fortunately obtains a set of papers covering all query keywords, however, it is possible that he is still having no idea of whether these papers can finish his creation as correlation relationships among these papers are both transparent to him. In fact, each user must manually find a set of required papers from massive candidate papers [19,20]; worse still, this process is very time consuming and challenging. To tackle these issues, we propose a novel keyword-driven and popularity-aware paper recommendation approach, named PR keyword+pop , which will be a detailed presentation in Section 5.

Undirected Paper Citation Graph
e citation relationships of paper citation graph [21] can sufficiently attest the correlation among papers' research content. If we use the paper citation graph in our proposal, the direction of knowledge information will be considered flowed in one direction. In fact, the knowledge information can be bidirectionally transferred in the paper citation graph. us, we use undirected citation relationships to denote the papers' correlations. For example, an undirected citation relationship {p 1 -p 2 } indicates the correlation among papers p 1 and p 2 . As more citation relationships are mined and papers are included in an undirected paper citation graph, this graph will grow larger and denser [22], offering a solid base for recommending a set of satisfactory papers.
PR keyword+pop runs any undirected paper citation graph that fulfills requirements specified by the following definitions: Definition 1 (nodes). For each paper, an undirected paper citation graph G p has a corresponding node v. Each node contains multiple keywords (i.e., k 1 , . . . , k m ) representing    Definition 3 (undirected paper citation graph). An undirected paper citation graph is expressed as G p (V p , E p ), where V p and E p denote its sets of nodes and edges, respectively. According to Definition 2, relevant papers in same domain are connected, either directly or indirectly, forming a connected undirected paper citation graph. Note that if users enter entirely irrelevant query keywords (e.g., privacypreserving [23][24][25] and protein engineering), we will fail to recommend a set of satisfactory papers to users.
To answer users' query, PR keyword+pop prebuilds an inverted index S(K) [17] on G p , i.e., if nodes contain same query keywords, nodes are stored in common inverted index. For example, nodes v 3 , v 4 , and v 11 are both containing keyword k 2 in Figure 3, the S(k 2 ) � v 3 , v 4 , v 11 . is way, given an individual keyword k, PR keyword+pop can easily find all papers that perform the research of keyword k.

Problem Formulation
In fact, our proposal includes a key point, recommending a set of satisfactory papers. Specifically, answering a keyword query Q mainly consists of two steps: (1) to find Steiner trees based on an undirected paper citation graph G p , denoted as T(Q), where T(Q) not only covers all query keywords but also has the fewest number of nodes (i.e., the higher correlation); (2) to obtain optimal Steiner trees T 1 (Q) based on T(Q), where T 1 (Q) has the highest popularity (i.e., the more trust [26]). To better clarify our paper, we summarize the symbols in Table 1.
Likewise, our proposal recommends papers to the user based on the undirected paper citation graph of Figure 3 and the query keywords of Figure 2 (i.e., k 1 , k 2 , k 3 , k 6 ). Here, nodes v 1 and v 2 contain query keyword k 1 ; nodes v 3 , v 4 , and v 11 contain query keyword k 2 ; nodes v 5 and v 6 contain query keywords k 3 and k 6 ; node v 12 contains query keyword k 6 . us, given Q � {k 1 , k 2 , k 3 , k 6 }, we are looking for a Steiner tree that connects one node from v 1 , v 2 , one node from v 3 , v 4 , v 11 , one node from v 5 , v 6 , and one node from v 5 , v 6 , v 12 . Furthermore, the Steiner tree also connects nodes that do not cover any query keywords, e.g., nodes v 10 , v 13 , and v 14 .
erefore, the Steiner tree of Figure 4 (i.e., R p � v 1 , v 10 , v 13 , v 11 , v 14 , v 12 , v 6 ) can satisfy users' requirements on deep and continuous research on a certain content or topic.
us, Steiner tree is defined as follows.
Definition 4 (Steiner tree). Given an undirected paper citation graph G p (V p , E p ) and a set of nodes V p ′ ⊆ V p . When T P covers all nodes of V p ′ and it is a connected subgraph, T P forms a Steiner tree.
Given a query keyword k in Q � {k 1 , . . ., k l }, we use the inverted indexes of Section 3 for identifying multiple sets of nodes, denoted as V p1 , . . . , V pl , where V pn (1 ≤ n ≤ l) at least contains keyword k n . Next, we need to find a group Steiner tree, and the group Steiner tree is formally defined as follows.
Definition 5 (group Steiner tree). Given the G p (V p , E p ) and multiple sets of nodes V p1 , . . . , V pl ⊆ V p , where each group V pn (1 ≤ n ≤ l) contains the query keyword k n . When T P is a Steiner tree and it contains exactly one node of each group V pn (1 ≤ n ≤ l), T P forms a group Steiner tree.
Firstly, we may obtain multiple group Steiner trees according to the Q. Next, PR keyword+pop aims to find minimum group Steiner trees that not only cover the users' query keywords but also have the higher correlation (i.e., the fewer nodes). us, a minimum group Steiner tree is defined as follows. Traditional recommendation

Complexity
Definition 6 (minimum group Steiner tree). Given a set of exact group Steiner trees, i.e., T P1 , . . . , T Pm , when is a minimum group Steiner tree. |T Pi | represents the number of nodes (papers) of T Pi .

PR keyword+pop Approach
e basic step of PR keyword+pop is as follows (see Figure 5): first, we generate multiple minimum group Steiner trees (i.e., T(Q)) by employing the DP (dynamic programming) technique [17]; then, we generate optimal solutions (i.e., T 1 (Q)) by employing the PP (paper popularity) method.
Step 1. Minimum group Steiner trees generation based on an undirected paper citation graph.
is section mainly discusses employing the DP technique to solve a MGST (minimal group Steiner trees) problem. Specifically, the DP technique firstly breaks up the MGST problem into a series of simpler subproblems; next, each of the same subproblems is solved only once and the corresponding results are stored; finally, multiple solutions are effectively provided via combining the stored results, i.e., T(Q).
In this section, we treat all query keywords as K, i.e., K � Q. In the DP model, T Pmin (v, K ′ )(K ′ ⊆ K) rooted at node v is a state and it contains the users' query keywords K ′ . Moreover, w DP (T Pmin (v, K ′ )) represents the number of nodes in T Pmin (v, K ′ ). e state-transition equation of the DP model is as follows: Minimum group Steiner trees T 1 (Q) Optimal solutions Q 1 A queue in ascending order of number of tree nodes Minimum group Steiner trees rooted at v R p Paper recommendation results Step 1: Minimum group Steiner trees generation based on an undirected paper citation graph. According to users' query keywords, we generate multiple minimum group Steiner trees by employing the DP (dynamic programming) technique.
Step 2: Optimal solution generation based on minimum group Steiner trees. Based on minimum group Steiner trees, we generate optimal solutions by employing the PP (paper popularity) method.  (1) indicates the weight of a tree is 1 in the DP model owing to only covering one keyword node [27]. Formula (2) indicates that T Pmin (v, K ′ ) is obtained by using the following two operations: tree growth operation (i.e., formulas (3) and (4)) and tree merging operation (i.e., formulas (5) and (6)). In Figure 6(a), tree growth operation generates new T Pmin (v, K ′ ) by adding new node u (i.e., one of v's neighbors) to T Pmin (v, K ′ ). In Figure 6(b), tree merging operation generates new T Pmin (v, K ′ ) by merging two trees that are both having same root node. e pseudocode of these two operations is specified more formally in Algorithm 1 and Algorithm 2, respectively. In Step 1, we repeat tree growth operation and tree merging operation to obtain a queue Q 1 . e pseudocode of obtaining Q 1 and T(Q) is specified more formally in Steiner tree algorithm (Algorithm 3).
Next, an intuitive example of Figure 7 shows the T(Q) generation processes according to K � k 1 , k 2 , k 3 , k 6 . e trees rooted at nodes containing k 1 , k 2 , k 3 , or k 6 are enqueued firstly, i.e., v 1 6 , v 11 , and v 12 are added in Figure 7(b). Since these eight trees only contain one node, the tree growth operation is performed in Figure 7(b). Fortunately, these nodes are both containing neighbor nodes, so trees connecting any one of nodes are generated in Figure 7(c). Next, tree growth operation is performed on Figure 7(c). For example, trees v 8 , v 3 and v 8 , v 11 can generate a new tree rooted at node v 8 , i.e., v 8 , v 3 , v 11 , but this operation is not tree merging operation as this tree does not contain new query keywords. Furthermore, some new generated trees are deleted as these trees are of no use, e.g., while tree v 8 , v 11 can generate new tree v 8 , v 11 , v 13 , the new tree contains not only same query keywords k 2 but also more nodes. erefore, five required trees are both retained in Figure 7(d). Next, we execute tree merging operations in Figures 7(d) and 7(f ) and tree growth operation in Figure 7(e). Finally, the user obtains four minimal group Steiner trees in Figure 7(g).
Note that we consider the output results of Steiner trees algorithm may be entire graph, i.e., G p (V p , E p ); furthermore, the worst-case scenario is that our algorithm fails to recommend papers to users.
Step 2. Optimal solutions generation based on the minimum group Steiner trees.
According to the abovementioned algorithm, it is possible to return multiple qualified candidates, e.g., the output result of Figure 7. To ease the heavy burden of users' paper selection decisions, we will select optimal solutions (i.e., T 1 (Q)) from the output results of Step 1. Generally, a higher citation frequency of papers often means a higher popularity of the papers. us, we use the PP (paper popularity) [28] method for selecting T 1 (Q) as follows: where nodes v i and v j belong to T(Q) and G p , respectively. d(v i , v j ) � 1 if v j cites v i in paper citation graph (i.e., v i ⟶ v j ) and 0 otherwise. Finally, we produce a ranking list in descending order according to the popularity of each candidate.
us, PR keyword+pop returns T 1 (Q) having the highest popularity among candidates. Note that we consider the recommendation result of PR keyword+pop could be T(Q) as all candidates have same popularity.

Experiments
To demonstrate the usefulness of PR keyword+pop , large-scale experiments are designed and tested.

Experimental Settings.
Paper citation graph is extracted from the Hep-dataset [14], where the graph covers 8721 papers and each paper contains keyword information.
Generally, an author is allowed creating up to 6 index terms (i.e., keywords) in an article, so we create query keywords with up to 6 in our research. Here, we firstly set a series of experiments, i.e., set A, set B, and set C. In set A, all keywords of a paper are used as a query Q. is scenario emulates that users exactly provide query keywords for their research content. In set B, the query keywords are selected from different papers (in excess of one paper) randomly.
is scenario emulates that users randomly provide query keywords. In set C, query keywords are selected from two papers randomly, which further verifies the feasibility of the Steiner trees algorithm. Here, we do not execute the PP method in set C. In addition, each experiment set is repeated 50 times and the average experimental results are adopted.
Currently, we conduct the following experimental evaluation: (1) Number of nodes: the less amount of recommended papers in a tree, (i.e., the higher correlation of the tree), the better of recommendation approach. (2) Success rate [17]: the number of recommended papers is smaller than twice the number of query keywords, and the recommendation result is successful. (3) Average paper popularity (APP) [29]: the APP is defined as follows: where m is the number of T 1 (Q) and n z is the number of nodes in T 1 (Q). (4) Computation time: the consumed time for generating T 1 (Q) in sets A and B and T(Q) in set C, respectively. (5) Precision [30]: precision is calculated as follows: End If (14) Return Q 1 (15) End For ALGORITHM 1: Tree growth.
update Q 1 (15) End If (16) Return Q 1 (17) End For ALGORITHM 2: Tree merging. 6 Complexity , set A and set B, where TP denotes a set of papers containing query keywords. (6) Recall [30]: recall is calculated as follows: where |p| � 2 in set C. T p a is a set of papers cited by p a . (7) F1 score: F1 score is calculated as follows: To the best of our knowledge, some of approaches address the papers recommendation issue by using papers' relationships.
us, we compare PR keyword+pop with four approaches that are adapted from [17,31,32].
Baseline 1 (Paper-Random [17]): this approach randomly selects a set of nodes that collectively cover all query keywords. Next, the approach finds minimum Spanning trees that interconnect the selected nodes. Finally, we can obtain optimal minimum Spanning trees by executing the PP method.
Baseline 2 (Paper-Greedy [17]): likewise, the approach is randomly selecting a set of nodes that collectively cover all query keywords. Next, the approach regards the selected nodes as initial root nodes and continuously grows trees until these nodes are interconnected. Furthermore, the greedy heuristic algorithm is applied in the tree grow process. Finally, we also use the PP method for obtaining optimal solutions. Baseline 3 (Random Walk (RW) [32]): RW runs on 2layer graph, i.e., the undirected paper citation graph and the built paper-keywords graph. In addition, each query only uses users' entered keywords: q � [0, q W ], and this approach only executes the keywords query of set C. Baseline 4 (Random Walk Restart (RWR) [31]): RWR runs on same 2-layer graph. Furthermore, this approach only executes the query keywords of set C, i.e., q � [0, q W ]. Here, if the state vector of RWR has been growing linearly in the experiments, the approach achieves linear convergence. e experiments are conducted on a machine with Intel(R) Core(R) CPU @3.0 GHz, 16 GB RAM and Windows 10 @ 1809. e software configuration environment: Windows 10 @ 1809 and Python 3.6.

Profile 1:
e Number of Recommended Nodes of Different Approaches. In this profile, we contrast the number of returned papers of PR keyword+pop with two approaches (i.e., Paper-Greedy and Paper-Random). As shown in Figure 8, the number of the users' query keywords ranges Input: K � k 1 , k 2 , . . . , k l Output: Q 1 and T(Q) If v contains any nonempty keyword set K ′ ⊆K End If (14) Break (15) End If (16) Else tree growth (17) Else tree merging (18) Return Q 1 (19) Return T(Q) (20) End While ALGORITHM 3: Steiner trees algorithm: MGST (G p , K). Complexity 7 from 2 to 6. Furthermore, the quantity of recommended papers in our approach increases with the number of query keywords increasing, which is because the returned solutions including more papers can satisfy more query keywords requirements of users. For Paper-Greedy and Paper-Random, when the number of query keywords equals to 6 in sets A and C, or the number of queried keywords equals to 4 in set B, and they obtain maximum papers quantity. In addition, these experiments results show that our proposal acquires a smaller number of recommended papers than these two approaches.
As the smaller number of recommended papers can guarantee higher correlation among papers, PR keyword+pop is superior to Paper-Greedy and Paper-Random.

Profile 2: e Success Rate of Different Approaches.
In the profile, we compare the success rates of different approaches. As shown in Figure 9, the experiment results of the different approaches are very different in different experiment sets. Facing to the different experiment scenarios, Figure 9 presents that our proposal can effectively answer the users' keyword query and the success rate is 100%. However, Paper-Random and Paper-Greedy are difficult to get successful solutions as the number of the users' query keywords increasing; especially, the success rates of these two approaches are both equal to 0 in set B. Again, the experiment results present that our proposal can effectively acquire solutions than Paper-Greedy and Paper-Random.

Profile 3: e Average Paper Popularity of Different
Approaches. In both sets A and B, we compare different approaches by utilizing the average paper popularity. As shown in Figure 10, these experiment figures show that the average paper popularity of Paper-Random and Paper-Greedy are both larger than PR keyword+pop . at is because  Complexity the number of recommended papers obtained by Paper-Random and Paper-Greedy are both in excess of our approach; moreover, each recommended paper is cited more than once. In practice, the solutions of Paper-Random and Paper-Greedy will be seldom selected as these solutions take users a serious amount of time and energy to do some unnecessary research studies. In my opinion, Figure 10 presents that the average paper popularity of PR keyword+pop is allowable and receivable in the case of satisfying users' query keywords requirements.

Profile 4:
e Computation Time of Different Approaches. In Profile 4, we contrast the time consumption of different recommendation approaches. As shown in Figure 11, PR keyword+pop , Paper-Random, and Paper-Greedy spend more time getting solutions with the number of the users' query keywords increasing. Furthermore, we only calculate the time of RW and RWR in set C, and their time is a constant value. As Paper-Random and Paper-Greedy use extremely simple heuristic for selecting papers, these two approaches spend fewer time than PR keyword+pop in most cases. In addition, RW and RWR are both spending a lot of time than our proposal as these two approaches need to do a significant amount of iterative operations and matrix operations in experiments. While our proposal takes time to obtain solutions, the time consumption of PR keyword+pop is allowable and receivable in most really cases for users. at is because this is the price to pay if users take fewer time and energy to effectively achieve their research goal. Figure 12, these three figures present that the   Complexity precision of different approaches, respectively. Luckily, our proposal can accurately answer the users' keyword query, and the precision of three different experiment sets are both 100%. For Paper-Random and Paper-Greedy, their precision ranges from 10% to 45%. erefore, whether users can accurately or randomly offer query keywords, our approach can accurately answer the users' keyword query, and the recommended results better satisfy users' query requirements. Furthermore, these experiment results show further that users may spend fewer time and energy on realizing their research aim.

Profile 5: e Precision of Different Approaches. As shown in
6.2.6. Profile 6: e Recall Rate and F1 Score of Different Approaches. In this profile, we firstly contrast the recall rate of different approaches in set C. According to Figure 8, the number of recommended papers of our approach is not exceeding 30, so the number of recommended papers of RW and RWR are 10, 20, and 30, respectively; the recall rate of RW and RWR take the average value among the three. In Figure 13(a), the recall rate of PR keyword+pop ranges from 4% to 21%; the recall rate of Paper-Random and Paper-Greedy range from 39% to 54%; and the recall rate of RW and RWR are less than 9.5%. In addition, we also compare the F1 score of our approach with Paper-Random and Paper-Greedy. In Figure 13(b), the F1 score of our proposal ranges from 9% to 34% and the F1 score of Paper-Random and Paper-Greedy range from 33% to 44%. As the number of returned papers of Paper-Random and Paper-Greedy are in excess of PR keyword+pop , the recall rate and the F1 score of our proposal are less than these two approaches. Furthermore, when the number of query keywords is not equal to 3, the recall rate of RW and RWR are both less than our approach. In conclusion, the recall rate and F1 score of Figure 13 can directly verify the feasibility of our proposal.

Related Work
Currently, recommender techniques play vital roles in many research areas. Furthermore, recommendation methods can be mainly classified into three categories: collaborative filtering (CF), content-based filtering (CBF), and graph-based approaches.

Collaborative Filtering.
e early work on paper recommendation mainly explored the use of collaborative filtering (CF) techniques. For example, McNee et al. [33] mainly focused on the rating matrices in paper citation networks. In addition, Pennock [34] proposed a personality diagnosis method based on a Bayesian network as their considered rating frequency of other users made a difference to user's ratings of items. Furthermore, McNee et al. [33] combined CF method with the cited frequency of papers to recommend papers, which was because they [33] considered that the number of citations of a paper had a vital effect on papers' ratings. In addition, if there were interactions between users and items in implicit collaborative filtering, it was recorded as 1, otherwise 0. However, 1 or 0 did not indicate positive or negative factors between users and items that generated the interaction [35]. According to users' query keywords, CF approaches can effectively recommend papers to users, but these approaches are generally limited by some problems, e.g., the cold start problem and the data sparsity problem [36].

Content-Based Filtering.
To further ameliorate paper recommendation approach, some researchers further explored content-based filtering (CBF) approaches. Generally, CBF [37] approaches attempted to retrieve papers with respect to textual content and it was not using rating relationships. For example, Alzoghbi et al. [38] examined the preferences among papers by their proposed two different validation mechanisms and recommended interesting papers to users. Furthermore, Wang and Blei [39] combined a topic model with the collaborating filter to propose paper recommendation approach, named CTR. e CTR firstly used LDA to find latent topics for papers, and this approach inferred user-item relations by using matrix factorization. In fact, the CBF approach suffers from traditional information retrieval issues, e.g., the semantic ambiguity problem. Furthermore, gathering and dealing with the relevance information of papers is often time consuming.

Graph Model.
Currently, the papers' relationships can further reflect the future research trends of paper recommendation, which is mainly because the correlation relationships among papers can indicate the correlation of papers' research contents. For example, Meng et al. [31] regarded authors, papers, topics, and keywords as nodes and their relationships as edges, and the approach recommended academic papers by executing the random walk on a fourlayer heterogeneous graph. Furthermore, Gori and Pucci [32] proposed the graph-based PageRank-like recommendation approach that performed the biased random walk on paper citation graphs, and the approach further emphasized on the correlations among citations. In addition, Wu and Sun [40] thought that three different types of paper citation networks could be constructed based on papers' citation relationships, i.e., directly connected network, coupling network, and cocitation network. Liang et al. [41] have proved that the cocitation relationships of cocitation network could be employed in paper recommendation, e.g., if two papers were both cited by more same papers, these two papers had high relevance and were highly likely to be recommended simultaneously.
In fact, the correlation relationships [42] could be formed in a paper citation graph as most papers selected their references based on the content similarity. us, we use the paper citation graph for establishing an undirected paper citation graph. On the undirected paper citation graph, our proposal (i.e., PR keyword+pop ) efficiently recommends a set of satisfactory papers to users. Finally, extensive experiments results validate the usefulness and feasibility of PR keyword+pop approach.

Conclusions and Future Work
Whether a set of satisfactory papers will be recommended to users is very important paper discovery and paper selection tasks, which is known as paper recommendation problem. Here, we propose a novel keywords-driven and popularityaware approach (i.e., PR keyword+pop ) to return a set of satisfactory papers, i.e., these papers not only collectively cover users' query keywords but also have higher correlation and popularity among papers. Furthermore, these recommendation results support users in doing deep and continuous research on a certain topic or domain. In addition, the experiment results further show the usefulness and feasibility of our proposal.
Although our work shows desirable results, there are still some aspects worth further research and improvement. Since users cannot analyze research requirements in detail, e.g., the required data types [43][44][45], the recommended results may fail to return satisfactory results. Furthermore, we may face the sparsity problem of the existing paper citation graph. Hence, the abovementioned research contents are to further study and progress.
Data Availability e experiment dataset Hep-used to support the findings of this study has been deposited in "http://snap.stanford. edu/data/cit-Hep .html."