A Modular System Oriented to the Design of Versatile Knowledge Bases for Chatbots

The paper illustrates a system that implements a framework, which is oriented to the development of a modular knowledge base for a conversational agent. This solution improves the flexibility of intelligent conversational agents in managing conversations. The modularity of the system grants a concurrent and synergic use of different knowledge representation techniques. According to this choice, it is possible to use the most adequate methodology for managing a conversation for a specific domain, taking into account particular features of the dialogue or the user behavior. We illustrate the implementation of a proof-of-concept prototype: a set of modules exploiting different knowledge representation methodologies and capable of managing different conversation features has been developed. Each module is automatically triggered through a component, named corpus callosum, that selects in real time the most adequate chatbot knowledge module to activate.


Introduction
Research on intelligent systems in last years has been characterized by the growth of the Artificial General Intelligence (AGI) paradigm [1]. This paradigm focuses attention more on learning processes than on the formalization of the domain. According to this theory, an intelligent system should not only solve a specific problem, but it should be designed as a hybrid architecture that integrates different approaches in order to emulate features of human intelligence, like flexibility and generalization capability.
At the same time, particular interest has been specifically addressed in the development of functional and accessible interfaces between intelligent systems and their users, in order to obtain a satisfactory man-machine interaction.
In this context, an ambitious goal is the creation of intelligent systems with conversational skills. The implementation of a conversational agent, however, is a complex task since it involves language understanding and dialogue management [2,3].
The simplest way to implement a conversational agent is to use chatbots [4], which are dialogue systems based on a pattern matching mechanism between user queries and a set of rules defined in their knowledge base.
In this work we show the evolution of a previously developed model of conversational agent [5,6]. The cognitive architecture evolved into a modular knowledge representation framework for the realization of smart and versatile conversational agents. A particular module, named "corpus callosum," is dedicated to dynamically switch the different modules. Moreover, it manages their mutual interaction, in order to activate different cognitive skills of the chatbot. This solution provides intelligent conversational agents with a dynamic and flexible behavior that better fits the context of the dialogue [7].
We have modified the ALICE (Artificial Linguistic Internet Computer Entity) [4] core and realized the implementation of a proof-of-concept chatbot prototype, which uses a set of modules exploiting different knowledge representation techniques.

ISRN Artificial Intelligence
Each module provides specific capabilities: to induce the conversation topic, to analyze the semantics of user requests, and to make semantic associations between dialogue topics.
The dynamic activation of the most adequate modules, capable to manage specific aspects of the conversation with the user, gives to the conversational agent more naturalness of interaction.
The proposed solution tries to overcome the main limits of pattern-matching-based chatbot architectures, which are based on a rigid knowledge base, time consuming to establish and maintain, and a limited dialogue engine which does not take into account the semantic content, the context, and the evolution of the dialogue.
The modularity of the architecture makes it possible to use in a concurrent and synergic way specific methodologies and techniques, choosing and using the most adequate methodology for a specific characteristic of the domain (e.g., an ontology to represent deterministic information, Bayesian Networks to represent uncertainty, and semantic spaces to encode subsymbolic relationships between concepts).
The remaining of the paper is organized as follows: Section 2 gives an overview of related works; Section 3 illustrates the modular architecture; in Section 4 a case of study is described; in Section 5 dialogue examples are reported; finally Section 6 contains the conclusions.

Related Work
Several systems oriented to the artificial general intelligence approach have been illustrated in the literature. They combine different methodologies of representation, reasoning, and learning.
As an example, ACT-R [8] is a hybrid cognitive system which combines rule-based modules with subsymbolic units represented by parallel processes that control many of the symbolic processes. CLARION [9] uses a dual representation of knowledge, consisting of a symbolic component to manage explicit knowledge and a low-level component to manage tacit knowledge. CLARION consists of several subsystems, each one is based on this dual representation. Subsystems are an action centered system to control actions, a noncentered action system to maintain the general knowledge, a motivational subsystem for perception cognition and action, and a metacognitive subsystem to monitor and manage the operations of other subsystems. The most significant example of hybrid architecture is the OpenCog framework [10]. It is based on a probabilistic reasoning engine and an evolutionary learning engine called Moses. These mechanisms are integrated with a representation of knowledge both in declarative and procedural form. Procedural knowledge is represented by using a functional programming language called Combo. Declarative knowledge is represented in a hypergraph labeled with different types of weights: probabilistic weights representing values of the semantic uncertainty and Hebbian weights acting as attractors of neural networks, allowing the system to make inferences about concepts which are simultaneously activated.
In the field of conversational agents, two different knowledge representation approaches are generally used by intelligent systems to extract and manage semantics in natural language: symbolic and subsymbolic.
Symbolic paradigms provide a rigorous description of the world in which the conversational agent works, exploiting ad hoc rules and grammars to make agents able to understand and generate natural language sentences. These paradigms are limited by the difficulty of defining rules and grammars that must consider all the different ways of expression of people. Subsymbolic approaches analyze text documents and chunks of conversations to infer statistic and probabilistic rules that model the language.
In last years we worked on the construction of a system that implements an hybrid cognitive architecture for conversational agents, going to the direction of the AGI paradigm. The cognitive architecture integrates both symbolic and subsymbolic approaches for knowledge representation and reasoning.
The symbolic approach is used to define the agent's background knowledge and to make it capable of reasoning about a specific domain, both in terms of deterministic and uncertain reasoning [11].
The subsymbolic approach, based on the creation of data-driven semantic spaces, makes the conversational agent capable to infer data-driven knowledge through machine learning techniques. This choice improves the agent competences in an unsupervised manner, and allows it to perform associative reasoning about conversation concepts [5].

A Modular Architecture for
Adaptive ChatBots The illustrated system implements a framework oriented to the design and implementation of conversational agents characterized by a dynamic behavior, that is, capable to adapt their interaction with the user according to the current context of the conversation. With context we mean a set of conditions characterizing the interaction with the user, like the topic and the goal of the conversation, the profile of the user, and her speech act. The proposed work has been realized according to the AGI paradigm: a modular and easily manageable and upgradable architecture, which integrates different knowledge representation and reasoning capabilities.
In the specific case illustrated in this paper, the architecture integrates symbolic and subsymbolic reasoning capabilities. The system architecture is shown in Figure 1, and it is constituted by different components. The proposed architecture is quite general and it can be particularized defining specific implementations for each component: for example, it is possible to use different models of knowledge representation, to change the planning module, and to consider different kinds of context variables. Moreover, the proposed architecture is characterized by an adaptive behavior and generic adaptability. In fact, the dynamic activation of the modules makes it possible to obtain a behavior of the chatbot capable to adapt itself to the current context. The behavior depends on the modules specific functionalities and of the corpus callosum planner.

Dialogue Engine.
The dialogue engine improves the standard Alice [4] dialogue mechanism. The standard Alice dialogue engine is based on a pattern-matching approach that compares from time to time the sentences written by the user and a set of elements named "categories" defined through the AIML (Artificial Intelligence Markup Language) language. This language defines the rules for the lexical management and understanding of sentences [4]. Each category is composed of a pattern, which is compared with the user query, and a template, which constitutes the answer of the chatbot. The main drawbacks of this approach are (a) the time-expensive designing process of the AIML Knowledge Base (KB), because it is necessary to consider all the possible user requests and (b) the dialogue management mechanism, which is too rigid.
In previous works [5,6] many approaches have been proposed in order to overcome these disadvantages. The traditional KB has been extended through the use of ad hoc tags capable to query external knowledge repositories, like ontologies or semantic spaces, enhancing as a consequence the inferential capabilities of the chatbots. In fact, incomplete generic patterns can be defined and completed through a search of concepts related to a given topic of conversation in an ontology in order to dynamically build appropriate answers. Furthermore, the KB has been modeled, in an unsupervised manner, in a semantic space, starting form a statistical analysis of documents and dialogue chunks.
The contribute of the new architecture presented in this paper is to enhance the knowledge management capabilities of the chatbot both by declarative and procedural points of view. The goal is reached by splitting the traditional monolithic knowledge base of the chatbot in different components, named modules, that are perfectly suited to deal with particular characteristics of dialogue. Besides, a coordination mechanism has been provided in order to select and trigger, from time to time, the most adequate modules to manage the conversation with the user for efficiency reasons.

Modules.
Each module of the dialogue engine has its own specific features, that make it different from the other modules. For example, the differentiation can be done on functionality, on topics, on mood recognition or emulation, on specific goals to reach, on management of specific user profiles, or on a particular combination of them.
The trivial case is to organize specific modules for determines topics: each module is suited to deal with a particular subject and from time to time the corpus callosum evaluates which are the best modules to deal with the current state of the conversation.
Even if AIML has the topic tag, the proposed approach has the advantage to separate the KB of the chatbot at module level instead of AIML level and, most important, the recognition of the topic can be realized through a semantic process instead of a lexical, pattern-matching guided, approach.  We have defined a module as an ALICE extension, obtained through the insertion of specific plugins in the ALICE architecture, importing the necessary libraries for the module execution. Each module is characterized by the definition of specific AIML tags and processors to query external repositories like ontologies, linguistic dictionaries, semantic spaces, and so on. Each module (see Figure 2) is composed of (a) a metadescription (metadata that semantically characterize the module), (b) a knowledge base, composed of the standard Alice AIML categories, which can be extended with other repositories, like ontologies or semantic spaces, and (c) an inferential engine capable to retrieve information, to select the chatbot answers or to perform actions.
The modular knowledge base is easier to define, design, and maintain. Each module has its own inference engine whose complexity is variable and defined by the module designer. The framework is general purpose: any new module, designed according to the rules of the architecture can be connected or disconnected without affecting the source code, the core of the chatbot or the behavior of any other module. It is possible to create a module at any time in order to manage specific cognitive activities, that mix "memory-oriented elements" (modules specifically suited to manage a specific topic of the conversation) with elements oriented to specific reasoning capabilities (modules oriented to semantic retrieval of information, modules oriented to the lexical analysis of the dialogue, modules oriented at inferring concepts from an ontology-based knowledge representation, modules oriented to evaluation of decisional processes, and so on). The new tags defined in new modules can be used to reach higher complexity levels and add new reasoning capability with the aim of enhancing the interaction characteristics of the conversational agent.

Dialogue
Analyzer. This component is capable of capturing particular features related to the context of the dialogue and to manage them as variables. It can analyze the whole dialogue using syntactic, semantic, or pragmatics rules. The dialogue analyzer extracts from the current conversation what we define as "context variables." Possible variables are the topic of the dialogue, the goal of the conversation, the speech act, the mood of the user, the kind of user, the kind of dialogue (e.g., formal, informal), particular keywords, and so on.

Corpus Callosum.
The corpus callosum is equipped with a module selector and a planner. The module selector enables or disables the dialogue engine modules at runtime by selecting the most appropriate modules from time to time. Modules are disabled as soon as they are not useful. The planner uses the context variables in order to define the temporal evolution of the state of each module.
Let C t be the set of m context variables C jt ( j = 1, 2, . . . m) at time t; the planner maps the context representation on the module states. The state s it of the ith module at time t can be a binary value (e.g., active, not active) or a real value in [0, 1] representing its probability of activation. C t contains all the past values for each one of the variables: The planner is characterized by the mapping function: with S t = [s 1t , s 2t , . . . , s mt ].
Given the metadescription of the modules, the corpus callosum must determine the function f . The corpus callosum reconfigures the mapping function when new modules are enabled/connected or disabled/disconnected. It modifies the mapping using a learning algorithm that enhances the chatbot behavior in terms of activation/deactivation of the most fitting modules. It is possible to define a training set and an evaluation feedback mechanism of the chatbot's answers. The module selector activates or deactivates the chatbot modules, checking the value of the s it state of each module i at time t: for binary values the activation is straightforward; for continuous values it is necessary to use a thresholding mechanism that can be the same for all modules or a specific threshold for each module, or dynamic, computed from time to time according to specific constraints.

A Case Study
As a proof of concept of the proposed system, we have implemented a conversational agent oriented at assisting people, typically students, in the Computer Science Engineering Department of the University of Palermo, Italy.
The conversational agent plays the role of a virtual doorman, or secretary, who is also capable of showing a different behavior according to the current dialogue context. Possible users are students, professors, researchers, or other people. Requests can vary from generic information to particular questions regarding specific people.
In the following subsections we will describe the key implemented components. For the dialogue analyzer we will describe the extracted contextual variables, and the modality of extraction. Particular emphasis will be given to the extraction of variables like speech acts, which are a fundamental characteristic that conduct conversations.

Dialogue Analyzer Component.
In the particular implementation we have chosen to extract the following context variables: the speech act, the kind of user, the goal of the user, and the topic of the conversation. Speech acts derive from John Austin studies [12] and characterize the evolution of a conversation. According to Austin each utterance can be considered as an action of the human being in the world. Specific sequences of interaction are common in spoken language (e.g., question-answer), and they can be recognized and evaluated in order to better understand the meaning of sentences and generating the most appropriate answer to a user question. A conversation is affected also by the kind of interlocutor. As an example a simple language would be adopted to speak with a child, while a refined style is more adequate to speak to a well-educated person. Also the kind of terms between people is important during a dialogue: a conversation can be more or less formal. An agent can also have a thorough knowledge or not of a specific topic of conversation.
In consideration of this, we have identified four main variables that characterize the dialogue: (i) Topic: it is the topic of the conversation; it can be artificial intelligence, image processing, computer languages, administration questions, and generic.
(ii) Speech act: the kind of speech act characterizing the dialogue at a given time; it can assume the values assertive, directive, commissive, declarative, expressive with positive, neutral, or negative connotation.
(iii) User: the kind of user that is interacting with the chatbot. According to the definition of the scenario users are student, professor, and other.
(iv) Goal: the goal of the user; it can be "looking for people" or "looking for information".

Topic Extraction.
In order to detect the topic of the conversation, we semantically compare the requests of the users with a set of documents, which have been previously classified according to the possible topics of conversation. The comparison is based on the induction of a semantic space. A semantic space is a vector model based on the principle that it is possible to know the meaning of a word analyzing its context. The building of a semantic space is usually an unsupervised process that consists of the analysis of the distribution of words in the documents corpus.
The result of the process is the coding of words and documents as numerical vectors, whose respective distance reflects their semantic similarity.
In particular, a large text corpus composed of microdocuments classified according to the possible topics of conversation has been analyzed. Each document used to create the space has been then associated with a very specific topic. A semantic space has been therefore built according to an approach based on latent semantic analysis (LSA) [13], reported in [5].
Given N documents of a text corpus, let M be the number of unique words occurring in the documents set. Let A = {a} im be an M × N matrix whose (i, j)th entry is the square root of the sample probability of finding the ith word in the vocabulary in the jth document. According to the Truncated Singular Value Decomposition technique, a matrix A can be approximated, given a truncation integer K < min{M, N} as a matrix A k given by the product After the application of LSA the corpus documents have been coded as vectors in the semantic space.
Let N be the number of documents used to build the semantic space, d i and s the numerical vectors associated respectively, to the ith document of the corpus and to the current sentence of conversation, and let T(d i ) be the topic associated to d i . During the conversation, to evaluate the topic associated to the current sentence, we encode it as a vector in space, by means of the folding its technique [14], obtaining the vector s, as reported below.
Let v be a vector representing the current sentence. The ith entry of v is the square root of the sample probability of finding the ith word in the vocabulary in the sentence, then: Then we compare the obtained vector with all the documents encoded in the space by using an appropriate geometric similarity measure sim defined in [5]. The topic of the sentence T(s) is equal to the topic of the closer document T(d k ): according to the following similarity measure [5]: where cos(s, d k ) is the cosine between the vectors s and d k .

Speech Acts.
Exploiting the support of a linguist, we have adopted two elements of the speech act theory: the illocutionary point and the illocutionary force. An illocutionary point is the basic purpose of a speaker in making an utterance. It is a component of illocutionary force. The illocutionary force of an utterance is the speaker's intention in producing that utterance. An illocutionary act is an instance of a culturally defined speech act type, characterized by a particular illocutionary force, for example, promising, advising, warning, and so forth. Recalling the Searle definition [15], we have five kinds of illocutionary points: (i) assertive: to assert something; (ii) directive: to commit to doing something; (iii) commissive: to attempt to get someone to do something; (iv) declarative: to bring about a state of affairs by the utterance; (v) expressive: to express an attitude or emotion.
We have simplified the concept of illocutionary force introducing three kinds of act connotation: In our approach a directive act has a negative connotation when it is conducted with some sort of coercion. An explanation request conducted in a polite manner has a positive connotation; in other cases it will be labeled as a "neutral" connotation. For assertive acts we have a negative connotation when the assertion induces an adverse mood of the interlocutor, while it will be labelled as "positive" on the contrary. Commissive acts will be basically "neutral," apart from promises (positive connotation), and threats (negative connotation). Expressive acts, like wishes, have a positive connotation; greetings have a neutral connotation, and complaints have a negative connotation. Declarative acts have not been used so far, since in dialogues schema that we have considered they are substantially absent (characteristic that is reported also in the literature [16,17]).
This classification is not obvious and clear; therefore we have used some heuristics that associate a positive connotation to acts that cause in the agent a more favorable behavior (e.g., empathy and understanding are kinds of favorable behaviors). Illocutionary point and illocutionary force are variables of a speech act in an absolute sense, since they do not have any reference to a previous act.

Kind of User and Goal of the Dialogue.
At present these kind of variables are extracted trough ad hoc AIML categories suited to capture the information.

Created Modules.
We have realized five kinds of modules.
(i) Module 1: it is aimed at characterizing a friendly behavior of the chatbot.
(ii) Module 2: it is oriented to characterize the chatbot with though behavior as a consequence of a negative evolution of the dialogue.
(iii) Module 3: it makes the chatbot empathetic with the user.
(iv) Module 4: this module is designed to induce a situation of submissiveness of the chatbot with respect to the interlocutor.
(v) Module 5: it is built to make the chatbot capable to manage informative tasks.
Each module is activated by the corpus callosum, according to specific thresholds.

Corpus Callosum Implementation.
In this section we illustrate the corpus callosum component and its relative mapping function realized with a Bayesian network. The network is shown in Figure 3.
The network makes inference on the variables extracted from the context in order to trigger the activation/deactivation of a module.
The status of a module is directly influenced by variables such as the conversation goal, the topic and the kind of user, while it is indirectly influenced by speech acts sequences (see Figure 3).
Since specific sequences of speech acts can imply a mood change in the chatbot, we have defined a "chatbot-induced behavior" variable, representing the chatbot behavior, with the aim of realizing a more realistic and plausible conversational agent. Speech acts are detected through a simple rulebased speech act classifier, whose description goes beyond the scope of this paper. Once recognized, speech acts relating both to current and past time slices are used to code the current behavior of the chatbot. The corpus callosum selects therefore the most appropriate module to activate by analyzing the chatbot behavior induced by the speech acts sequence and the other context variables.
The module with the highest probability is then selected, according to the causal inference schema coded in a dynamic bayesian network.
In particular, a variable "Modules" is defined in the network, whose states represent the possible modules to activate. The probability value associated with each state will determine whether that particular module must be turned on or off. The mapping function is then obtained by evaluating the conditional probability of the variable "Modules," given its parents (see Figure 3) The relationship between the variables follows the Bayes' rule: The labeled arcs are temporal arcs, for example, the arc labeled as "1" represents the influence of the variable at the former instant of time. The behavior of the chatbot is influenced by a triplet of speech acts (see (ii) the act of the chatbot at time t − 1; (iii) the act of the user at time t − 1.
The acts sequencing is therefore given by the last three speech acts in the dialogue. Of course, it is possible to extend the temporal window to more than three steps of speech acts. We have restricted the behaviors of the chatbot to 4: (i) friendly, (ii) determined, (iii) submissive, (iv) reassuring.
In Figure 4 we show an example of a possible sequence that induces a positive behavior in the chatbot, that can be mapped as a friendly behavior in our domain.

Dialogue Examples
In this section we show some samples of dialogues obtained during the interaction of users with a proof of concept of our prototype that we have realized at Department of Ingegneria Informatica of the University of Palermo, Italy.
The first example illustrates how the behavior of the chatbot changes according to different sequences of speech acts.  Table 1 shows the triplets of speech acts for the previous dialogue. Time starts from "−1" since the first triplet is characterized only by the speech act of the user. As a consequence the starting of the dialogue is in the last column at time t − 1, where an expressive act of the user is detected. With this kind of dialogue we expect the activation of Module 1, which has been designed to manage friendly communications with students and have the goal of looking for people. At step 3 it is detected the goal of the user, and at step 5 the kind of user. Figure 5 shows the evolution of the Bayesian network for each step of the dialogue. In this picture the probability of activation of each module is shown during the temporal evolution of the dialogue, which consists of 6 time slots. On the x-axis time slots are reported, while y-axis shows the probability of activation for each module. The Module 1 is the prevalent one.     In the following dialogue we show how the system detects a change of goal of the user: from a search for information, managed by Module 5, to the search for a professor, managed by Module 1. The shape of the probabilities of activation of the modules is shown in Figure 6: at time 1 the user is looking for information, and module 5 is activated; at time 3 the goal of the user becomes the search of a professor (step 6) and Module 1 becomes the most probable.

Conclusion
We have presented a system which implements a framework capable to dynamically manage knowledge in conversational agents. As a proof of concept we have illustrated a prototype, which is characterized by a Bayesian network planner. The planner is aimed at selecting, from time to time, the most adequate knowledge modules for managing specific features of the conversation with the user. As a result, the architecture is capable to generate complex behaviors of the agent. The architecture is analogous to the structure of the human brain: two hemispheres cooperate for the management and understanding of a dialogue, through a connection element like the corpus callosum. The left hemisphere is specialized in logic, reasoning, linguistic, ruleoriented, and syntactic processing of language; the right hemisphere is more oriented to intuition and emotions and processes information holistically. The corpus callosum is the bridge that connects the two hemispheres. It makes possible their mutual interaction through the migration of information between them. The dialogue analyzer extracts context information (the topic, the goal, the linguistic act, etc.) and the planner exploits these information to select and activate only the most appropriate modules, that incorporate the most adequate rules to process the specific sentences typed by the user. The conversational agent is also capable to perform analogical reasoning in order to understand the context and the structures in a general manner; the corpus callosum analyzes this information and selects the most appropriate module that is capable to properly process the specific sentence. However, it is worthwhile to point out that it is possible to realize other modules that fulfill other kind of specific functions.