A Novel Keyword Generation Model Based on Topic-Aware and Title-Guide

Keywords are usually one or more words or phrases that describe the subject information of the document. The traditional automatic keywords extraction methods cannot obtain the keywords which do not appear in the document, and the semantic information is not considered in the extraction process. In this paper, we introduce a novel Keyword Generation Model based on Topic-aware and Title-guide (KGM-TT). In the KGM-TT, the neural topic model is used to identify the latent topic words, and a hierarchical encoder technology with attention mechanism is able to encode the title and its content, respectively. The keywords are generated by the recurrent neural network with attention and replication mechanism in our model. This model can not only generate the keywords which do not appeared in the source document but also use the topic information and the highly summative word meaning in the title to assist the generation of keywords. The experimental results show that the F1 value of this model is about 10% higher than that of CopyRNN and CopyCNN.


Introduction
Keyword is the smallest unit to express the subject meaning of a document. It is widely used in tasks such as information retrieval, documents classification, and opinion mining [1]. With the rapid development of the Internet, the number of electronic documents is increasing exponentially. Various methods of automatically extracting and generating keywords have been proposed. e main step of traditional automatic keyword extraction methods (such as n-gram, TF-IDF, TextRank [2], SingleRank [3], and KEA [4]) is to identify candidate words and then rank all the candidates based on the importance computed by either unsupervised ranking approaches or supervised machine-learning approaches [5]. e keywords extracted by the above methods are words that have appeared in the source file, such as the keywords "knowledge graph," "knowledge resolution," and "relation embedding" in Figure 1. However, according to relevant research statistics, nearly half of the keywords given by the author in the scientific articles do not appeared in the source document [6], such as the keywords "knowledge representation" and "entity embedding" in Figure 1.
erefore, these kinds of extracting methods cannot generate keywords that do not appear in the source document, and they are difficult to mine the deep semantic information between context words.
To overcome the above drawbacks, some researchers had proposed a Sequence-to-Sequence model to solve the keyword generation task and had achieved some results, including seq2seq, CopyRNN [6], CorrRNN [7], and CopyCNN [8]. First, these methods treat the title and the main body content equally and just combine them as the source document input. en, the encoder is used to maps each document word to a hidden state vector as the context representation. Finally, a decoder based on context representation is used to generate keywords from a predefined vocabulary [9]. However, these generation methods still have some flaws. ey do not consider whether keywords exist in the document but directly extract them from the vocabulary, so they can generate keywords that do not appear in the source document. In fact, the title is the core subject condensed by the document author. However, they ignore the guiding role of title on keyword generation and fail to give full play to the strong semantic information of document title. e title is a highly purified and refined for the document, which can accurately reflect the nature, scope, and depth of the content of the document. Keywords are words that can express the characteristics of document subject content, and they play a similar and complementary role with title [10], so keywords and titles have similar semantic information. Taking an academic paper as an example, the title is usually straightforward, focused, concise, and directly expounds the research object, research method, and research purpose of the paper. erefore, one or several main central words in the title can often be directly used as keywords [4]. As shown in Figure 1, "relation embedding" in keywords appears in the title but not in the body. In addition, the information in the title also helps to reflect which parts of the body of the document are important, such as including the same or related parts as the title.
In view of the fact that the existing extractive methods does not fully consider the semantics of context and the generative models ignore the guiding role of title, in this paper, we propose a Keyword Generation Model based on combination Topic-aware and Title-guide (KGM-TT). First, the neural topic model based on the variational autoencoder is used to generate the latent topic of the document and the hierarchical encoder with attention mechanism to encode the corpus; then, the circular neural network with attention copy mechanisms to decode the corpus coding and topic distribution for generating keywords. e KGM-TT model not only enriches semantic features by mining latent topics but also uses highly summative and valuable information in the title to jointly guide the generation of keywords.

Keyword Extraction.
At present, there are many keyword extraction methods, which are mainly divided into unsupervised methods and supervised methods [5]. For unsupervised extraction methods, Mihalcea and Tarau [2] proposed the TextRank-based graph ranking method to calculate the correlation between candidate keywords. Liu et al. [11] used the KeyCluster method of clustering to detect representative phrases from topic clusters. For supervised machine learning methods, the keyword extraction task is transformed into a binary classification problem [7]. Firoozeh et al. [4] introduced KEA method, which extracts candidate phrases from documents, then calculates several features such as TF-IDF value and location information of each candidate phrase, finally they used Naive Bayes model (NB) to judge whether the candidate words are keywords. Turney [12] adopted C4.5 decision tree training classifier. Wang and Peng [13] used Support Vector Machine (SVM) to extract keyword usage features, including word frequency and location information of words. Zhang [14] used Conditional Random Field (CRF) to extract keywords from documents. e above extraction methods still cannot generate keywords that are not in the source documents. With the development of deep learning technology, a growing number of researches have been gradually evolving from keyword extraction to generative method [5]. Liu et al. [15] proposed to view the keywords extraction task from the perspective of machine translation. ey modeled the keywords extraction process as a translation operation from document to keywords and took the keywords as the target output of translation. Since then, the mainstream keyword generation methods became based on Sequence-to-Sequence model (Seq2Seq) [4] such as what Meng et al. [6] introduced in a model based on CopyRNN to generate keywords in academic texts.
ey took the generation process as a Sequence-to-Sequence learning task framework, and employed a widely used encoder-decoder framework with attention and copy mechanisms. Chen et al. [7] proposed CorrRNN model to capture the correlation between multiple keywords. However, these Recurrent Neural Network (RNN) based models may suffer the low-efficiency issues. Because of the computation dependency between the current time step and the preceding time steps in RNN [5], Zhang et al. [8] proposed CopyCNN model based on convolutional neural network (CNN). It employs position embedding for obtaining a sense of order in the input sequence and adopts Gated Linear Units (GLU) as the nonlinearity function. In addition, some extraction methods also considered the influence of title, such as that Li et al. [16] introduced a graph-based ranking algorithm which sets the importance of words in the title to one and the others as zero, then propagates the influence of title phrases iteratively.

Topic Modeling.
Keywords can represent the topics of a document. A document is usually composed of multiple semantic topics. erefore, the extracted keywords should also cover all topics in the document. Li et al. [17] proposed the single topical PageRank (single TPR) method to calculate the probability of words under all topics of the document without decomposing into multiple PageRank models, which greatly reduces the amount of calculation. Parveen et al. [18] proposed the salience rank model to balance the topic specificity and corpus specificity of words and avoid extracting words that are not highly related to the topic. Our work is closely related to topic models. Topic-aware is to use topic model to find latent topics from document level word co-occurrence, and then join the model with keywords generation model. Starting from latent semantic analysis (LSA [19]), topic model is used to reveal the underlying semantic structure of document collection has been widely used in data mining, text processing, and information retrieval [3]. Among them, the typical topic models are probabilistic topic models (such as PLSA [20], LDA [21], and HDPs [22]). ey provide an extensible basis for document modeling by referencing the latent topic of each variable in topic allocation [22]. With the wide application of deep learning technology in the field of natural language processing, some researchers have proposed a topic model based on neural network. is kind of method mainly uses neural network to reconstruct the text generation process of topic model and adds the sparse constraint of topic in the modeling to generate more expressive topic words [23]. We use a topic model based on variational autoencoder [24,25] to infer latent topics, which is also conducive to the use of other neural models for end-to-end training without the need for specified model derivation. is model has been proved useful for citation recommendation [26] and conversation understanding [27]. Zeng et al. [28] proposed joint training topic model and short text classification model, but due to the diversity of keywords, this method cannot adapt to the scene of this work. us, this paper uses the neural topic model based on variational autoencoder to find the latent topics, which will also be trained together with the keyword generation model.
Sequence to sequence (seq2seq) model is a deep learning framework based on encoder and decoder. In 2014, Cho et al. [29] proposed seq2seq model based on cyclic neural network. is model uses the encoder and decoder to calculate the conditional probability of phrase pairs, which can improve the performance of machine translation. e Seq2seq can solve the problem of end-to-end sequence inequality and is suitable for a variety of natural language processing tasks. Using this model to generate keywords can generate keywords that have not appeared in the original document, which makes up for the shortcomings of the traditional keyword extraction model. e attention mechanism is also used in the coding process of this method, but it treats the title and the text alike.

e Framework of KGM-TT.
e traditional keyword extraction method cannot obtain the words that do not appear in the document. e existing generative methods ignore the guiding effect of document title on keywords and do not consider the subject semantic information of the document.
e KGM-TT is a keyword generation model based on the fusion of topic aware and title guide, including neural topic model, title-guide hierarchical encoder, and topic-aware sequence decoder. e proposed KGM-TT integrates the semantic guidance of title words into the matching layer in the coding stage. e specific method is to use the attention mechanism to aggregate the title information for each word in the document content (Title-guide).
In addition, in the decoding process, using the results of the trained neural topic model, a decoder with topic aware ability is improved to generate keywords with topic sensitivity (Topic-aware).
e framework of KGM-TT model is shown in Figure 2. First, the corpus is preprocessed, and the document topic distribution is generated by neural topic model. en, the corpus is encoded by the title-guide hierarchical encoder. Finally, the topic distribution and coding representation are input into the topic aware sequence decoder to decode and automatically generate keywords. e training process of the left neural topic model in the framework is used to obtain the topic distribution of the document. e right branch of the frame is the encoding process of the document word sequence. Different from reference [29], we use the title to guide the word coding in the document in the coding stage. In addition, in the decoding stage, the topic of document in the right branch is integrated. e measures of Title-guide and Topic-aware make our KGM-TT better than other models, such as references [6,8,9].

Neural Topic Model.
e neural topic model (NVDM-GSM) is used in KGM-TT is based on variational autoencoder (VAE). As shown in Figure 3, it is composed of an encoder and a decoder to simulate the process of document reconstruction [24].
Specifically, the document C � X 1 , X 2 , · · · , X L in corpus C as input, and process each document X into a BoW vector X bow . X bow is the V-dimensional vector on the vocabulary (V is the size of the vocabulary), input X bow into the neural topic model and encode it into a continuous Gaussian variable Z (representing X's topic) by BoW encoder. en the BoW decoder with Z as the condition reconstructs X and outputs a BoW vector X ' bow . e decoder simulates topic Computational Intelligence and Neuroscience model's generation process. We then describe their division of labor.
BoW Encoder. Bow encoder is used to estimate a priori variables μ and σ, and σ is used to lead out topic representation Z. We adopt the following formula: where f * (·) is a neural perceptron with an RuLU activated function.
BoW Decoder. Suppose that the attributive database C has K topics, and the topic vocabulary distribution of each topic in the vocabulary is represented by ϕ K . For each document X ∈ C, there is a document subject distribution represented by θ (K-dimensional). Specific to the neural topic model, θ is constructed by softmax functions. e generation process of documents in the decoder is as follows: (1) Draw latent topic variable Z ∼ N(μ, σ 2 ) (2) Using softmax function construction topic mixture θ � softmax(W T θ Z) (3) For each word, w ∈ X, w ∼ softmax(W T ϕ θ) Here N(μ, σ 2 ) represents the multidimensional Gaussian distribution, and σ 2 is the diagonal of the covariance matrix. W θ is the matrix of L * K, L is the dimension of Z, and K is the number of topic. Here, it is used as the topicword distributions (ϕ 1 , ϕ 2 , · · · , ϕ K ). We adopt the topic mixture vector θ as the topic representation to guide keyword generation. Figure 4, the title-guide hierarchical encoder module is composed of sequence encoding layer, matching layer, and merging layer. e sequence encoding layer reads the title input and main body input and learns their context representation, respectively. e matching layer matches the relevant title information for each word of the document, reflecting the important words in the document. e merging layer merges the aggregated title information into each word to generate a title-guide context representation.

Sequence Encoding Layer.
e vector table maps each word in the text and title to a dense vector with a fixed size d e . Two bidirectional Gate Recurrence Units (GRU) [29] are used to encode the context and title, respectively, and the context information is combined into the representation of each word. e specific formula is as follows: where i � 1, 2, · · · , L x , j � 1, 2, · · · , L t , x i is the vector of the i-th word of the document in the corpus, t j is the vector of the j-th word of the document title. u i and v j is the context vector of the i-th word and the j-th title word, u is the hidden vector of d/2 dimension, where d is the hidden layer dimension of bidirectional GRU. ⟶ indicates the coding direction to the right and ← indicates the coding direction to the left.

Matching
Layer. e attention-based matching layer is engaged to aggregate the relevant information from the title for each word within the context. e aggregation operation c i � attn(u i , [v 1 , v 2 , · · · , v L t ]; W 1 ) is as follows: where c i the aggregated information vector of the i-th word of document x, s i,j is the unnormalized attention between u i and v j , α i,j is the normalized attention between u i and v j . Documents contain title and main body, so the matching layer is also composed of two parts. One part is title-to-title self-matching, in order to better contact the context information of each title word. Because the title contains a lot of high summary information, this part is used to strengthen the importance of the title itself and plays a vital role in capturing the important information of the document. e other part is the matching from main body to title. Each word in the text aggregates the information of the title according to semantic relevance.
is part is used to highlight the words highly related to the title, and use the title information to reflect the importance of the words in the text. Compared with the previous Sequence-to-Sequence methods, this matching layer makes full use of the summary information contained in the title.

Merging Layer.
e source contextual vector u i and the aggregated information vector c i are used as the input to the information merging layer. e specific formula is as follows: where u i is a residual connection, λ ∈ (0, 1) is the corresponding hyperparameter. Finally, the title-guide context representation [m 1 , m 2 , · · · , m L x ] is obtained and stored as M for subsequent decoding process. Figure 5, the topic aware sequence decoder is conditional on encoding representation M and latent topic θ, the generation process of the following keyword Y is defined:

Topic-Aware Sequence Decoder. As shown in
Pr y i |Y j , M, θ .
where Y 〈j � 〈y 1 , y 2 , · · · , y j−1 〉, Pr(y j |Y 〈j , M, θ), denoted as p j , is a word distribution over vocabulary, reflecting how likely a word to fill in the j-th slot in target keyword. e sequence decoder adopts a unidirectional GRU. On the general state update, the j-th hidden state s j is further designed to add the latent topic θ of the document X: where z j is the input of the j-th decoder, s j is the hidden state at the j-th time, s j−1 is the previous hidden state, and [; ] indicates the connection operation. e decoder decodes sequence M and obtains key information through the attention mechanism. When predicting the j-th word in the keyword, the attention weight on w i ∈ X seq is defined as α ' ij : where v α , W α , b α are trainable parameters and f α (·) indicates the semantic relationship between the i-th word and the j-th target word to be predicted. is relationship is also calibrated with the input latent topic information θ to explore and highlight the topic words. us, the topic sensitive context vector c j is obtained: Encoding Layer

Matching Layer
Merging Layer Attention-based Matching Computational Intelligence and Neuroscience In addition, under the condition of c j , the j th word is generated on the vocabulary according to the following formula: p gen � softmax W gen s j ; c j + b gen .
A copy mechanism See et al. [30] is added here to extract keywords from the source document. Specifically, λ j ∈ [0, 1] is used as a soft switch to decide whether to copy a word directly from the original text as the j-th target word.
where W λ , b λ are trainable parameters and topic information θ is also injected here to guide the switch decision. Finally, the distribution p j of the j-th target word can be predicted by using the following formula: where attention scores α ' ij |X| i�1 serve as the extractive distribution over the source input.

Jointly Learning Topics and Keywords. Our model KGM-
TT is an end-to-end learning of latent topic model and keyword generation. First, the objective functions of the two modules are defined, respectively.
For the neural topic model, the objective function is defined based on negative lower bound of negative variation, and its loss function is as follows: where the first term is the Kullback-Leibler divergence loss and the second term reflects the reconstruction loss. p(Z) represents a priori distribution, q(Z|X) and p(X|Z) represent the process of BoW encoder and BoW decoder, respectively.
For the keyword generation model, the minimum cross entropy loss function is used for training on all training sets: log Pr Y n |X n , θ n , (13) where N is the number of instances of the training set and θ n is X n 's latent topics induced from GSM. Finally, the linear combination of L GSM and L KG is used to define the training objectives of the whole framework: where c hyperparameters balance the effects of neural topic model and generative model. e two models are trained together and their parameters are updated at the same time. After the training, the beam search is used to generate the ranking list of keywords.

Experimental Data.
e corpus CNKI is crawled by our own crawler from China's largest scientific and technological publications databases (https://www.cnki.net). It has 18,000 papers in total and includes 5,6589 words. ese documents are papers published from 2000 to 2020, including the title, abstract content, keywords, and publication time of the paper. In order to reflect its good portability and achieve good results in large-scale data sets, the largest publicly available keyword generation data set KP20k built by Meng et al. [8] in 2017 is selected for testing and evaluation in our experiment. e KP20k consists of a large number of high- quality scientific publications from different fields of computer science. e details of CNKI and KP20k are shown in Table 1.

Parameter Setting.
For the neural topic model, it is implemented according to the design of Zeng et al. [28] and the number of topics K is set to 50. For the hierarchical encoder, the vocabulary V is defined as 50,000 words with the highest frequency of use. Set the embedding dimension d e to 100, the number of hidden nodes to 256, the damping coefficient λ to 0.5, and the word embedding is randomly initialized and evenly distributed in [−0.1, 0.1]. Using the optimization model of Adam et al. [31], the batch size is 64, the initial learning rate is set to 10 −4 , the gradient cutting is set to 1, and the launch rate is set to 0.1. e convergence speed of neural topic model is much slower than that of generative model. erefore, before joint training, train the neural topic model for 100 iterations and the generative model for 1 iteration. Empirically set c � 1.0 to balance the loss of neural subject model and generation model, and iteratively update the parameters in each module. en update their combination in turn. In the test, set the maximum depth of beam search to 6 and the beam size to 200.

Evaluating Indicator.
We use Precision (P) to measure the accuracy of the model, Recall (R) to measure the integrity of the model, and F1-measure to evaluate the performance of keyword extraction methods. T o represents the keyword set provided by the dataset itself, T e represents the keyword set extracted by the model, and T o ∩ T e represents the correctly extracted keyword set. Precision, Recall, and F1-measure are defined as follows:

Experimental Results and Analysis.
For the keyword prediction existing in the source text, two unsupervised models, including TF-IDF and TextRank, and a supervised model KEA are used as the traditional extraction method. In addition, the following keyword generation models are considered: Sequence-to-Sequence (Seq2Seq) model without copy mechanism, sequence-to-sequence model with copy mechanism CopyRNN and CopyCNN. For the prediction of keywords that do not exist in the source text, since the traditional extraction methods cannot generate such keywords, the baseline models are CopyRNN and CopyCNN. For all baseline models, select the same parameter settings as Meng et al. [6]. For predicting the keywords existing in the source text, the keyword prediction ability of the above model on CNKI and KP20k datasets is compared. e F1-measure values of the top 5 and top 10 predictions of each model are shown in Table 2.
It can be found from the table that all the generation models are better than the traditional baseline method. In addition, it can be noted that the model proposed in this paper has significant advantages in both datasets. For example, on the KP20k dataset, our model improves 15.2% (F1@10 score) than the best generative model CopyCNN. Generally speaking, the recall rate (R) of the top 10 and top 50 predictions is engaged as an indicator to measure how many absent keywords are correctly predicted. e experimental results are shown in Figure 6.
On the two datasets, we observed that our model is always better than the previous sequence to sequence model, such as our model is on the kp20k dataset, R@10 score is 10.2% higher than CopyCNN model. In general, the results show that our model can generate keywords better than the baseline model and capture the latent semantic information in the context content. e quality judgment of keywords is controversial. erefore, in addition to the above objective evaluations, we also invited five artificial experts to evaluate the prediction results of the models. Each expert was given the task of judging 100 papers. ey need to judge the right and wrong results of the three generative keyword models: CopyRNN, CopyCNN, and our KGM-TT. For fairness, hide the models name in the expert evaluation process. We conducted the experiment on Chinese corpus CNKI and selected five keywords for evaluation. e experimental results are shown in Table 3.
According to the judgment results of artificial experts in Table 3, the F1 index of model CopyRNN and model CopyCNN is basically close. In contrast, the KGM-TT is ahead, about 5 percentage points higher than the other two models. e specific reason is the KGM-TT use the attention mechanism to aggregate the title information for each word in the document content and use a decoder with topic aware ability. It is improved to generate keywords with topic sensitivity.

Conclusion
e traditional keyword extraction method and cannot obtain the words that do not appear in the document. e existing generative methods ignore the guiding effect of document title on keywords and do not consider the subject semantic information of the document. We present a keyword generation model based on combination topic aware and title-guide (KGM-TT). In the process of encoding document words, it is guided by the title semantics. In the decoding process, a topic aware decoder is used. erefore, the keywords generated by the KGM-TT are more subjectsensitive and higher quality. e experimental results show that the KGM-TT model proposed in this paper can make better use of the strong semantic information of the latent topic and title of the document. e KGM-TT is superior to other methods in keyword prediction, and can generate keywords with high accuracy. However, there are a large number of synonyms in the words generated by the keyword generation model. is is a problem that needs to be solved in our research in the future.
Data Availability e data generated and analyzed during this research are available from the corresponding author on request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.