Story Generation Using Knowledge Graph under Psychological States

Story generation, aiming to generate a story that people could understand easily, captures increasing researchers’ attention in recent years. However, a good story usually requires interesting and emotional plots. Previous works only consider a specific or binary emotion like positive or negative. In our work, we propose a Knowledge-Aware Generation framework under Controllable CondItions (K-GuCCI). The model assigns a change line of psychological states to story characters, which makes the story develop following the setting. Besides, we incorporate the knowledge graph into the model to facilitate the coherence of the story. Moreover, we investigate a metric AGPS to evaluate the accuracy of generated stories’ psychological states. Experiments exhibit that the proposed model improves over standard benchmarks, while also generating stories reliable and valid.


Introduction
Story generation has been an emerging theme in natural language processing technologies [1][2][3]. Much of the research has examined the coherence, rationality, and diversity of the generated stories. Huang et al. and Song et al. [4,5] and Luo et al. [6] argued that assigning emotions in text generation could enrich the texts and full of variety. From psychological theories, Figure 1 shows the fine-grained psychological states involved in an individual. Figure 1(a) displays the motivation of folks described by the two popular theories: the "Hierarchy of Needs" of Maslow [7] on the left, and the "Basic Motives" of Reiss [8] on the right. "Hierarchy of Needs" proposed by Maslow use such terms as physiological needs, stability, love/belonging, esteem, and spiritual growth to describe the evolution of human motivation. There are nineteen fine-grained categories that include a variety of motives in Reiss, which is richer than that in Maslow. Plutchik [9], also called the "Wheel of Emotions", has eight motions shown in Figure 1(b). Xu et al. [10] propose a novel model called SoCP, which uses the theories to generate an emotional story. Nevertheless, it still lacks richness and coherence.
Knowledge graph, known for better semantic understanding, is a key factor in the success of natural language processing. External knowledge can be introduced to increase the richness of the texts in story generation. Zhou et al. [11] use a large-scale commonsense knowledge in neural conversation generation with a Graph Attention approach, which can better interpret the semantics of an entity from its neighboring entities and relations.
Accordingly, we propose a model called K-GuCCI, which leverages the knowledge graph to enhance the coherence of story generation and psychological theories to enrich the emotion of stories. Table 1 shows an example of a generated story. Our proposed model under controllable conditions can generate stories with multiple fine-grained psychological states of multiple characters by assigning emotional change lines to the characters in the story. We design a Character Psychological State Controller (CPSC). Each time step in the decoder selects a character who will be described at the current time step and corresponding psychological state we assigned manually from many characters in the story. The selected character's psychological state will then be controlled and determined. For generating coherent stories easily, we introduce the external knowledge that can facilitate language understanding and generation. ConceptNet is a commonsense semantic network that consists of triples with head, relation, and tail, which can be represented as g = ðh, r, tÞ. The head and tail can be connected by their relation, where we apply this method to build a bridge of story context and next story sentence. Inspired by the Graph Attention [11], we design a knowledge-enhanced method to represent the knowledge triples. The method treats the knowledge triples as a graph, from which we can better interpret the semantics of an entity from its neighboring entities and relations. For the reason of the particularity of our model, the evaluation metric of the accuracy of psychological state control is investigated.
Our contributions are as follows: (i) We develop three psychological theories as controllable conditions, which are used to describe characters in the stories (ii) To enhance the semantic and richness of the story, we introduce the external knowledge graph into the generation model (iii) We propose a model K-GuCCI, which utilizes external knowledge to enhance story generation's coherence while ensuring the controllability of conditions. We design a character psychological state controller that achieves fine-grained psychological state control of the characters in the story (iv) We explore a novel evaluation metric for the accuracy rate of psychological state control (v) The experimental results demonstrate superior performance in various evaluating indicators and can generate more vivid and coherent stories with fine-grained psychological states of multiple characters. We also verify the effectiveness of the designed modules 2. Related Work 2.1. Text Generation with External Knowledge. Introducing external knowledge to natural language tasks is a trend in recent several years. Semantic information can be enhanced by external knowledge to help complete many works, particularly important in story generation. Chen et al. [12] utilize external knowledge to enhance neural data-to-text models. Relevant external knowledge can be attended by the model to improve text generation. Wang et al. [13] introduce the knowledge base question answering (KBQA) task into dialogue generation, which facilitates the utterance understanding and factual knowledge selection. Zhou et al. [11] first attempt to use large-scale commonsense knowledge in conversation generation. They design a graph attention mechanism in encoder and decoder, which augments the semantic information and facilitates a better generation. Guan et al. [14] focus on generating coherent and reasonable story endings by using an incremental encoding scheme. All of the above works show the effectiveness of introducing external knowledge. In our work, the proposed model K-GuCCI mainly focuses on the characters' psychological state.    [17] use a sequence-to-sequence structure with topic information to produce exciting chatbots responses with rich information. Ghosh et al. [18] generate conversational text conditioned on affect categories, which customize the degree of emotional content in generated sentences through an additional design parameter. There are also some generation tasks with emotion or sentiment [4,19,20]. They only use a specific or binary emotion like positive or negative to express emotion or sentiment. Unlike the above works, we aim to generate a story with different characters' psychological states change, including multiple motivations and emotions. We use [21] as our dataset that composes of a five-sentence story. The characters' motivations and emotions in the story will change with the development of the story plot. Paul and Frank [22] also use the dataset to do a sentiment classification task according to the psychological state.
3. Methodology Figure 2 shows an overview of the K-GuCCI model. The proposed model can generate vivid and coherent stories under controllable conditions. We assign the multiple fine-grained psychological states of characters as controllable conditions. We perform story generation using a Seq2Seq structure [23] with external knowledge using graph attention method, where BiLSTM [24] and LSTM [25] are used as encoder and decoder, respectively. We design a Character Psychological State Controller (CPSC) module to control each character's fine-grained psychological state.
3.1. Problem Formulation. Formally, the input is a text sequence X = ðx s 1 , ⋯, x s n Þ that is a begining of the story, which consists of n words. We also take story context C = ðx c 1 , ⋯, x c m Þ consisting of m words as input to increase the coherence of the story, which represents historical story sentences of the input sentence X. We represent the external knowledge as K = ð½h 1 , r 1 , t 1 , ⋯, ½h l , r l , t l Þ, where ½h l , r l , t l is the triple consisting of head, relation, and tail. Meanwhile, we quantify a psychological state score of each character for three theories Plutchik, Maslow, and Reiss: S pmr = ðS 1 p , S 1 m , S 1 r Þ, ⋯, ðS j p , S j m , S j r Þ, where j is the characters' number in the story. The output target is another text sequence Y = ðy 1 , ⋯, y k Þ that consists of k words. The task is then formulated as calculating the conditional probability PðY | ½X, C, PMR, KÞ, where PMR represents the psychological state.

Character Psychological State Controller. The Character
Psychological State Controller is used to control which and how much characters' psychological state can be used to describe the story. For psychological state, because it is composed of multiple psychological states, we quantify the psychological state so that it can be accepted by the model.

Psychological State Representation.
We quantify a psychological state as a PMR matrix that is used to describe the fine-grained psychological state of characters in the story. As shown in Figure 3, we just display Plutchik scores of each character for the third sentence in the story, where the score "0" denotes no current emotion. The higher the score, the richer the current emotion. We normalize these scores and then build a vector for each emotion or motivation. We set the characters number as maximum n for processing different characters in the stories. We concatenate them as multiple characters score matrix, i.e., Plutchik score S p , Maslow score S m , and Reiss score S r . Then, a word vector matrix for the three psychological states is randomly initialized as V p , V m , and V r , respectively. Figure 3 shows the Plutchik score matrix S p and word vector matrix V p . For the Plutchik score matrix S p , we pad the matrix with less than the maximum number of characters. Each row represents a character, and each column represents a score for each emotion. For word vector matrix V p , each row expresses a representation of an emotion. The word vector matrix will be multiplied by the characters score matrix; then, the product will be mapped into a low dimension space. We obtain the Plutchik matrix, the Maslow matrix, and the Reiss matrix subsequently. The formulation is as follows:

Wireless Communications and Mobile Computing
where W p , W m , and W r are the weight matrices. b p , b m , and b r indicate the biases, and i is the i-th character. The Plutchick, Maslow, and Reiss matrices will be concatenated as characters PMR matrix for the convenience of calculation: 3.2.2. Controllable Psychological State. We control the multiple characters' psychological states by first selecting a character who will be described at each decoder time step, and then, the selected character's psychological state will be controlled using an attention method. At each step t of decoder, we use a feed-forward layer to compute a character gate vector g char t . The softmax activation is used to calculate a probability distribution of characters in the story; then, a one-hot mechanism picks up a character with maximum probability o char t . We multiply the PMR i with the o char t to obtain the character's psychological states: where W g is the weight matrix, y t−1 is the input word, h t−1 is the decoder hidden state, and c t is the context vector. After that, we calculate a psychological state vector c PMR t at step t which is taken as the final condition to control model generation.
where W a and U a are the weight matrices, t is the time step, i is the i-th character, and c is the number of characters.

Knowledge-Enhanced Generation Model
3.3.1. Knowledge Encoding. In order to represent a word more meaningful and a story more coherent, we use knowledge aware representation and attention based on the context to enhance the semantic expression in the encoder. We first calculate a knowledge graph vector c kg which attends to the triple of the words in the knowledge graph, and then a context vector c con to attend to the context information; both of which are as the input with the sentence together. We get a knowledge graph vector c kg by using graph attention [11]. The words in the sentences have their own knowledge representation by triples. In this way, the words can be enriched by their adjacent nodes and their relations. For a context vector Gina wanted a unicorn folder like her friend Tami She had never seen anything like it. She had already been in trouble for talking while the teacher was. So she decide to wait till the teacher was done teaching. Once the teacher finished she asked Tami about the folder.  Wireless Communications and Mobile Computing c con , we use attention [26] method, which reflects the relation between the input sentence and its previous context.
where h ðiÞ is the hidden state of the i-th sentence of the story.
where c ðiÞ con is the context attention vector of i-th sentence. c ðiÞ kg is the knowledge graph vector of the i-th sentence and is formulated as follows: where gðxÞ ði−1Þ is the graph attention vector in [11]. The whole story generation process will always be followed by the knowledge graph vector and context vector, which is the soul that keeps the story coherent.

Incorporating the Knowledge.
We concatenate the last time step word embedding vector eðy t−1 Þ, PMR context c PMR t , knowledge graph vector c kg t , and attention context c t , which represent incorporating the external knowledge and psychological state into the generation model. The LSTM hidden state is updated as follows: We minimize the negative log-likelihood objective function to generate expected sentences.
where N is the story number in the dataset, and T is the time step of the i-th generated sentence in the decoder. X ðiÞ represents the i-th sentence in the dataset. Similarly, C ðiÞ , PMR ðiÞ , and K ðiÞ represent the i-th context, i-th PMR matrix, and i-th knowledge triples in the dataset, respectively.

Experiments
4.1. Dataset. The dataset in [21] is chosen as our story corpus, consisting of 4 k five-sentence stories. The corpus contains stories where each sentence is not only annotated but also with characters and three psychological theories. Figure 4 displays the statistic of the psychological states. Plutchik's emotion appears more frequently than Maslow's and Reiss's motivation. Particularly, "joy" and "participation" are most in the Plutchik states. The Reiss categories are subcategories of the Maslow categories. We use different methods to process the dataset for Plutchik, Maslow, and Reiss. Three workers who are employed by the original author annotate the original data. Intuitively, the workers will have different viewpoints, so we sum up the Plutchik scores and normalize them. Maslow and Reiss have no repeated words; thus, we use a one-hot vector to represent them. We split the data as 80% for training and 20% for testing. In the test phase, we input the story's first sentence and the normalized psychological states scores. Table 2 statistics the character number in each story sentence. They are most in the range 1-3, and the largest character number is 6. Thus, we set the character number as 3.  Figure 4: The statistic of psychological states. "Anticipation" is the most frequent state in Plutchick's wheel of emotion, while "belonging" is the least frequent state in the Reiss classification. Inc-S2S is an incremental Seq2Seq model mentioned in [3]. Different from the implementation in [3], we incorporate the psychological states into the model. The story sentences are generated according to the beginning of the story sentence and context. Compared with the Inc-S2S model, the effectiveness of the Character Psychological State Controller can be proved.

Baselines
Transformer [27] is a novel architecture that aims at solving natural language processing tasks while handling long-range dependencies with ease. Since the Transformer model facilitates more parallelization during training, it has led to the development of pretrained models such as BERT [28], GPT-2 [29], and Transformer-xl [30], which have been trained with huge general language datasets.
GPT-2 [29] shows an impressive ability to write coherent and passionate essays. Its architecture is composed of the decoder-only transformer and it can be trained on a massive dataset. There are many works in natural language generation tasks that use GPT-2-based model.
SoCP [10] proposes a novel model called SoCP, which can generate a story according to the characters' psychological states. The model is most relative to us. Different from that, our model introduces a knowledge graph to enhance semantic information and promote the coherence of the story.

Experimental Settings.
Based on the above, we fix the character number to three. If the character number is smaller than three, use "none" as a character. The pretrained glove 300 dimension vector is used as our word embedding vector. We map the PMR matrix from a high dimension to a 256 low dimension. We implement the encoder as a twolayer bidirectional LSTM and the decoder as a one-layer LSTM with a 256 hidden size. The batch size is 8, and 0.2 is the dropout [31]. The learning rate of Adam optimizer [32] is initialed as 0.0003. [33] is a metric to quantify the effectiveness of generated text according to compare a candidate sentence of the text to one or more reference label sen-tences. Although designed for translation, it is commonly utilized for a suite of natural language processing tasks.

Evaluation Metrics. BLEU
ROUGE, stands for Recall Oriented Understudy for Gisting Evaluation, is a set of metrics used for evaluating the automatic text summarization and machine translations. The metrics basically compare the similarity between generated sentences and reference sentences.
METEOR is based on the harmonic mean of unigram accuracy and recall, weighted higher than accuracy with recall. In the more common BLEU metric, the metric will correct some of the issues and also produce a strong correlation with human judgement at the level of the sentence or section.
In AGPS (Accuracy of Generated Psychological State), we pretrain a classifier to evaluate the accuracy of the generated psychological state. There are many approaches to train a classifier [34][35][36]. We utilize bidirectional LSTM to pretrain a classifier to classify the generated sentence like sentiment classification. This demonstrates our model's capacity to convey emotions. The name of the character Char and the sentence X as input are concatenated. In this fashion, several training pairs with different outputs in similar sentences can be obtained when different characters in a sentence have various psychological states. The compact vector h clf can be accessed by BiLSTM, and then, we utilize two feed-forward layers to compact it into the output size.
4.5. Result Analysis. We have the Seq2Seq framework, the Inc-S2S, and Transformer as the baseline model. We use automatic assessment metrics, including the proposed metric AGPS, to compare our model with baseline models' effectiveness. The experiments can prove our components and demonstrate the consistency in which the generated sentence psychological states correspond with our previous collection. All matrix' scores in our model are the highest, as seen in Table 3, which shows the effect of our built modules, and the generated sentences are coherent. We see that Seq2Seq has better results for BLEU, ROUGE, and METEOR than the Transformer framework. Table 3: Automatic evaluations of the proposed model and the baseline models. The context-merge and context-independent represent different methods of encoder mentioned in [10]. The former is to encode the context and sentence together, while the latter encodes them separately and then concatenates them. (c) Attention map 3

Wireless Communications and Mobile Computing
The reason may be that the Seq2Seq model is more appropriate for short texts than Transformer. In addition, the outcomes of our method are better than the SoCP with context-merge method and context-independent method. The effectiveness of external knowledge that can enrich the generated stories is also reflected in this table. Our proposed model, of all the ones, has the best efficiency. The training speed of the Transformer, however, is much higher than that of Seq2Seq. It reflects the benefit of the Transformer in training speed because of the parallelism of operation.
We design AGPS to assess whether the emotion of the generated sentences is consistent with the settings. We intuitively assume that the score without input emotion will be lower. The performance of Inc-s2s is between our model and other models, which shows that our model performs efficiently for our built components.
The result of the K-GuCCI model is better than that of the SoCP model, which shows that the story can be enriched by introducing knowledge.

Model Effect Analysis
4.6.1. The Effect of the Character Psychological State Controller. We display the attention weight distribution to demonstrate the relation between the generated sentences and psychological states. As seen in Figure 5, the model provides interpretability dependent on the character's psychological state controller. The brighter the square corresponding to the two words while generating the next word is, the stronger the relationship between them will be. Visualization of the focus maps offers a proof of the model's ability to recognize which psychological state corresponds to the character. A word may have many different color squares, suggesting that our component can read several characters' psychological states automatically and can select the psychological states for the character automatically. The black square suggests that no responsible psychological state is reached because the feeling is not actually conveyed by all words, such as "a" and "the". The model correctly chooses the elements from the psychological states in the examples displayed. The first example focuses on the emotions of Plutchik, such as "fear" and "anger," while the second example discusses Maslow and Reiss's elements, such as "spiritual growth" and "indep." The term "hospital" is correlated with the Plutchik in the third attention diagram, such as "fear," "surprise," and "sadness," implying that "hospital" is typically associated with a character's negative emotion. In the fourth attention diagram, the word "however" predicts a vital turning point and negative outcomes that the character will be failed in the exam, which is also compatible with the "depression" and "anger" emotions. 4.6.2. The Effect of the External Knowledge. As seen in Table 4, the evaluation matrix shows that the performance of our model is better than other models. In addition, K-GuCCI demonstrates the effect of external knowledge in Table 4. For example, "necklace" is linked to "like it," and "losing his mind" is linked to "go to the hospital," which illustrates that the generated stories are coherent and reasonable. In the meantime, the conditions we set can control the stories while the coherence of the story is assured. With our setting, as we assigned the Plutchik emotional lines, the emotions of the stories will shift.  Table 4. A coherent story can be constructed by the psychological state condition we have given.
We see that it generates typically repetitive sentences by the Seq2Seq model, and it cannot accept all the characters. In Table 4, example 1 produced by the baseline model describes only one character "Jane," but can generate  Figure 5: Visualization of our Character Psychological State Controller. The row in the figure is the fine-grained emotion and motivation of the three psychological states, and the column represents the generated sentence. In the graph, the brighter the grid, the more attention there is between rows and columns. 8 Wireless Communications and Mobile Computing "friends" by the K-GuCCI. In example 2, we see that with our defined psychological state condition, the baseline model cannot vary the story and even have incorrect feelings, but our K-GuCCI model can match it correctly. The GPT-2 model is capable of generating rational phrases but has several repetitions. Overall, by manipulating the feelings of the characters, our proposed model will generate good stories. There are also some stories that are not coherent, so it is still a challenge for us. Table 5 show the controllability of generated stories under different psychological state conditions. The first example in Table 5 compares the generated stories under various condition scores using an identical Plutchik element. In specific, we set the Plutchik "joy" with different scores in the first example. Some obvious terms such as "great," "excited," or "really liked" are produced when the score sets 1. As the "joy" score gets lower and lower, the terms produced get more and more negative. When the score is set to 0, some negative terms, such as "nervous" or "not good" are produced. The second example shows the produced stories with various indicators from Plutchik. We assign various Plutchik indicators to "surprise," "fear," and "anger." It produces several words, such as "was surprised" or "shocked" when Plutchik is "surprise." When Plutchik is "fear," the words "was afraid of" or "scared" are formed. The term "angry" can be formed when Plutchik is "anger." In the third case, for the multiple Plutchik metrics, separate scores are assigned. In the third case, in the produced stories, several emotions are portrayed.

Controllability. The examples in
The above examples demonstrate the controllability of our model. On the other hand, in the examples mentioned above, several incoherent stories tell us that although the He was not sure where he was going. She tried it.
He was not sure where he was going. She was thankful of Jane with it.
He was not sure where to go.

Inc-S2S
She was not sure if she wanted to go to the store. He asked his credit hour he did not know how. She was very happy.
He asked his parents would help him some directions. She was very happy.
He tied the games and tied miles. She was so happy that she had a good time.
He was sad that he realized his card.

Transformer
She decided to do something about it. He yelled at him. She did not have any money.
He had to go to the hospital. She decided to save money.
He asked his girlfriend. She decided to save up.
She said yes right away.

GPT-2
"I'm not sure if I'm going to be able to wear it anymore," she said.
"I'm sorry, but I'm not going to be able to go to the hospital," he said. "I'm not sure if I'm going to be able to wear it anymore." " I'm not going to be able to go to the hospital." She said she's been looking for a way to get rid of the necklace. The man next door was also in shock.
"I'm not sure if I'm going to be able to wear it anymore," she said. "I'm not going to be able to go to the hospital," he said.

SoCP
She was excited to get a job. He was very angry and did not want to do. Her friend was on the same.
He was a detective man about the problem. She was very happy.
The man was not the wrong man. She was happy to the best game.
He was able to beat the ball and he was shot.

K-GuCCI
She was a great man. He was very angry and did not want to go to the hospital. Her friend was very excited.
He was very surprised; he was not very tall as he got to the man. She really liked it.
He was surprised by the organizers. She was very excited.
He was very excited to be nice. 9 Wireless Communications and Mobile Computing model performs well in emotion control, it still needs to be improved in coherence.

Conclusion
Traditional story generation models can only generate stories with one specific emotion and lack coherence. In this paper, we propose a model called K-GuCCI, which can generate more vivid and coherent stories under controllable conditions. We take the three psychological state theories as our controllable conditions and design a character psychological state controller, which controls the psychological state of multiple characters in the story. We introduce the external knowledge graph to enhance the semantic and richness of stories. In addition, we design an evaluation metric called AGPS to evaluate the accuracy of the generated psychological state. For future work, we will use an advanced pretrained model to generate more coherent texts. In the field of wireless communications and mobile computing, there are many applications of the recommender system, such as [37][38][39], and the Internet technology, such as [40][41][42]. We want to use our method to recognize users' emotions, generate high-quality text, and serve more Internet applications.

Data Availability
The data that support the findings of this study are available at https://uwnlp.github.io/storycommonsense/.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

References
[1] A. Fan, M. Lewis, and Y. Dauphin, "Hierarchical neural story generation," in Proceedings of the 56th Annual Meeting of the Jane bought a new necklace. Jane bought a new necklace. Jane bought a new necklace. She was a great man.
She was a new person; it was a bit huge. She was a person and it was a bit huge.
Her friend was very excited. She was excited.
The man was not good and she was very nervous. She really liked it.
She always had been in the car, and it was fun. She was really wanted it; the baby was gone. She was very excited.
She was really allowed to go to the beach. She was pulled in the car; It was gone.
Generated stories with different Plutchik indicators Surprise = 1 Fear = 1 Anger = 1 Jane bought a new necklace. Jane bought a new necklace. Jane bought a new necklace. She was surprised to see what she had been.
She was nervous. She was nervous.
The man was upset. She was nervous.
She was angry about it.
The man was shocked. She was afraid and was going to be scared. She tried to be a person; a man came into the car. The man was not friendly.
She was going to the car that was very crowded. The man was pulled over the police's car. She was very tired and she was not able to go. She was nervous. The man was very good and they were very good.
She was nervous. She was afraid.
The man was always had been in the car. She was nervous and her friends were not allowed to go.
She was afraid and her friends were going to be scared.
The man was very close in the car. She was nervous and decided to go to the party. She was nervous and went to the car; she was very scared 10 Wireless Communications and Mobile Computing