We propose extension-by-unification method to improve reusability of the dialogue components in the development of communication function of the robot. Compared to previous extension-by-connection method used in behavior-based communication robot developments, the extension-by-unification method has the ability to decompose the script into components. The decomposed components can be recomposed to build a new application easily. In this paper, first we, explain a reformulation we have applied to the conventional state-transition model. Second, we explain a set of algorithms to decompose, recompose, and detect the conflict of each component. Third, we explain a dialogue engine and a script management server we have developed. The script management server has a function to propose reusable components to the developer in real time by implementing the conflict detection algorithm. The dialogue engine SEAT (Speech Event-Action Translator) has flexible adapter mechanism to enable quick integration to robotic systems. We have confirmed that by the application of three robots, development efficiency has improved by 30%.
In recent years, there has been an increasing demand for robots that work in a human life environment.
Replacement of human labor by robots in the manufacturing sectors (e.g., factory production lines) has already shown success. In the case of manufacturing robots, professional operators give commands to the robot. Professional operators have expert knowledge, and they are able to command the robot in a robot-friendly manner.
However, in the case of the robots used in a life environment, the operator who gives commands to the robot only has imperfect knowledge about the robot (called a “naïve user” hereafter). Naïve users often use natural language to command the robot. To create a robot that can be easily used by naïve users, the robot not only needs to have mechanical skills but also linguistic ability to understand a variety of commands.
The biggest problem in understanding language is diversity. Words used by a naïve user to command the robot will be diverse for various reasons (described in Section
An advantage of using the machine learning method is that the developer can implement the vast patterns of language understanding without any programming effort. For example, Iwahashi has used Markov model and stochastic context-free grammar to let the robot understand lexicons as well as associations between objects and words [
In contrast, in the case of scripting methods, the developer can program the specific behavior of the robot as intended. While the disadvantage is, however, the difficulty to cover the diversty of language understanding ability required in each application, because the effort of human developer is limited.
SHRDLU [
The inference-based scripting approach is useful for the developer who has deep understanding about the inference system, but this requirement is sometimes difficult to fulfill in collaborative and incremental development (discussed later in Section
Recently, “behavior-based” scripting method has been applied in many practical robotic systems. The application presented by Brooks [
The behavior-based scripting method can also be applied to communication robots by incorporating speech input with the situation model. Application of the behavior-based scripting method to the communication robot is first presented by Kanda et al. [
However, in the existing behavior-based scripting methods for communication robot, there is an inefficiency in terms of reusing the script to develop different types of robots (this problem is described in Section
In our approach, we will not only focus on the ability of the model itself, but also on the descriptive format of the script and its operation. We show that the reuse can be enhanced by reformulating the conventional descriptive format and also show the effectiveness of the reformulation by implementing a computer-assisted development environment to enhance the development activity of the developer.
In Section
In Section
In Sections
In Section
A state-transition model is a modeling method in which the input and output of the system assume the following form:
The state transition function
The output function
When the system is in state
At the same time, we get output alphabet
Even the input to the system is the same, the output of the system may be different, because the internal state
We have explained the state-transition model in an equation form, however, the state-transition model can be also presented in a 2-dimensional diagram called “state-transition diagram”. In the diagram, each state is represented by a circle, and the transition between states is represented by arrows. In this paper, we annotate the transition conditions and the associative actions by including text over each arrow. We use a black circle (called a “token”) to represent the current state.
For example, Figure
Example of state-transition model.
In the model presented in Figure
The above example is expressed as follows in the equation form:
As we have seen here, the expression in equation form has an advantage in formalization, while the expression in diagram form has an advantage in quick understanding. In later discussion, we will use both the equation and the diagram forms to explain the concept quickly and formally.
State-transition model is a very simple get very powerful modeling method and has been applied to very wide applications. Because the structure of state-transition model is very simple, it is frequently misunderstood that the state-transition model can only model simple behavior. However, it can model diverse behavior by applying some extensions (e.g., [
The original state-transition model uses a single input channel. In the case of a conversational system, the input channel is assigned to receive input from the speech recognition subsystem. However, it can accept multichannel input by formulating the transition function
The example in Figure
Example of state-transition model using multichannel input.
The state-transition model updates its internal state using external input. But by connecting output of the system to the input, it can realize autonomous behavior generation based on the internal event (in this paper, we call this a “loop-back event”). Loop-back events are important in realizing the autonomous behavior of the robot (examples are presented in Section
A frame-based question is an interaction that requires answers to two or more questions in an arbitrary order. Example in Figure
Example of state-transition model that can realize a 2-frame question. The structure of the model looks complex, but it can be generated easily using a simple algorithm.
There have been many script engines implemented (e.g., [
VoiceXML [
Our script engine does not only implement the above extensions, but also has a function to support incremental development. In the next section, we discuss our incremental development method.
Commands given by the human to the robot are diverse. The following are the factors that cause this diversity.
Human language is ambiguous, and different expressions can be used to give instructions that carry the same meaning.
Robots working in a life environment have to accept a variety of tasks. In order to cope with this, it is necessary for them to understand a variety of commands.
The diversity is also caused by the ability of the robot itself. A command from a human becomes effective due to the functions of the robot. For example, humans do not say “walk N steps” to a robot on wheels.
The language comprehension system of the robot must be able to deal with these diversities.
In the script-based development approach, diversity has been dealt with by stacking a newly developed script onto the existing scripts. By accumulating a number of scripts, the developer can accumulate the number of commands that the system can deal with.
Incremental development of the state-transition model has previously been conducted using the “extension-by-connection” method (described in the next section). In this section, we propose an “extension-by-unification” method that can cope with the diversities mentioned above (described in Section
The simplest way to extend state-transition model is as follows. Add a new state to the existing state-transition model. Add a new transition from the existing state to the new state.
This process is illustrated in Figure
Extension of a state machine using the extension-by-connection model.
Here, we formulate the above process. Let the existing state-transition model be
As explained in Section
Here,
Similarly, we define the accumulated state-transition model
Then, the new state
Here,
The new state transition
The transition function of the accumulated part
The state-transition model is easy to understand in drawing a state-transition diagram. Extension-by-connection can also be carried out very easily by editing this diagram. There are several GUIs that can add state-transition rules through the operation of mouse clicks (e.g., [
Extension-by-connection is a useful method, but it has the following problems.
As we can see in ( Robot “ For the robot “ We have developed another robot, “
Here, the state-transition model for function
First, the state
However, the definition of state-transition function
Because states
Ideally, once a feature is developed, it would be possible to share with other robots that need the same feature. In order to achieve this, we introduce the extension-by-unification method.
In the extension-by-unification method, we extend the state-transition model by the following procedure. Develop a state-transition model to realize a new function. Unify a state with the same ID between the existing and the new state-transition models.
This process is illustrated in Figure
Extension of the state-transition model using the extension-by-unification method.
Here, we formulate the above process.
The existing state-transition model
Similarly, the new state-transition model
We accumulate the state-transition model
Here,
Next, the transition between the state
By defining initial state
As visible in (
As noted in Section
As discussed above, the extension-by-unification method can overcome a limitation in the extension-by-connection method by applying a simple reformulation. However, as a counterpart to this reformulation, we have dealt with the following problems that do not occur in conventional methods.
First, a conflict in transition conditions may occur. For example, when we try to unify two states with one another, the states may have different actions associated with the same transition conditions. In this case, the state-transition models cannot be unified.
Second, an isolated state may occur. For example, when we try to unify state-transition models that do not have the same state IDs in common, there will be no transitions between the old and the new states. In this case, the developer cannot activate the new function as intended.
In this study, we not only implement a script engine that has a state unification function (detailed in Section
We developed the script-management system, which is based on wiki.
The developer can write the script in XML form on the wiki page, and the document of the script can also be written on the same wiki page. The developer can annotate each wiki page using tags. Tags are used as identifiers to indicate multiple pages working as a set.
Algorithm
Our run-time engine SEAT can read script using HTTP protocol. Thus, the developer can directly load and run the script (or the set of scripts defined by the tag) by specifying the URL.
The script-management server uses the core functions of dokuwiki (
When we try to unify state-transition models that do not have state IDs in common, there will be no transition between the old and the new states. In this case, the developer cannot activate the new function as intended. Isolation of the state can be detected in Algorithm
When we try to unify two states with one another, the states may have different actions associated with the same transition conditions. In this case, the state-transition models cannot be unified. A conflict between the state-transition conditions can be detected in Algorithm
An “unexecutable action” is an action that is defined in the state-transition model but cannot produce any output because the robot does not have the ability to generate the actual output. In this case, the developer cannot achieve the intended output. By using the instance ID of the adaptor mechanism (described in Section
By using the above algorithms, the possibility of unification between scripts can be identified as “Unifiable”, “Unifiable (occurrence of isolated state)”, or “Conflict”. Similarly, scripts can be classified as “Executable” or “Unexecutable”. By comparing a script and an adaptor definition for the existing scripts, we can obtain a list of scripts annotated with 6 (
Our script-management server displays the above list at the bottom of each wiki page. By displaying the list, the developer can easily find a script that can be included in his/her current application.
Figure
Example of using the web-based interface. (a) Overview of the development interface. Visualization of the state-transition model (left), XML-based editing panel (right top), real-time annotation of existing scripts (right bottom). Editing task script. When the developer types the keyword “Hello”, the existing script from the script database is annotated as “conflict” and suggests reuse. At this step, the system only accepts 3 (“Hello”, “What can you do?”, “Come here”) phrases. (b) When the developer checks the “greet” script, which already contains several vocabularies for greeting, it is unified to the task script. As a result, the developer only has to increment the application specific vocabulary to realize the whole script with many vocabularies. At this step, the system accepts 7 (“Hello”, “Good morning”, “Good afternoon”, “Thank you”, “Nice to meet you”, “What can you do?”, “Come here”) phrases.
SEAT consists of an adaptor mechanism, phrase matcher, automaton driver, and automaton unifier. In the next sections, we briefly overview each subsystem.
The adaptor mechanism is used to connect the run-time engine to the other subsystems of the robot.
Adaptors are configured in XML format. For each adaptor configuration, an instance ID is defined. In the body of the state-transition model, the instance ID is used to describe the actions. By using this mechanism, even if the developer has changed the hardware configuration, the same state-transition model can be used by employing an adaptor definition that has the same instance IDs.
SEAT supports BSD socket communication, child process communication, UNIX standard input and output, and OpenRTM [
A speech recognition function is also important in improving the accuracy of the robots’ linguistic understanding. In the human life space, many noises occur around the robot. In such an environment, normal speech recognition algorithms are not accurate enough.
We have developed a speech recognition algorithm that works in a practical noise environment by using a signal processing technique combined with the speech recognition engine Julius [
For HumanAID application (Figure
HRP-2 during a HumanAID task.
Speech recognition accuracy not only depends on environmental noises, but also depends on number of vocabularies. Because the recognizer needs to distinguish each word among given vocabularies, as the vocabulary increases, the recognition accuracy will go down. SEAT has a function to switch the speech recognition vocabularies depending on the situation. By using this function, the developer can increase the number of vocabularies of the total system while keeping the high speech recognition accuracy.
The phrase matcher compares the input for each state-transition condition. To cope with the diversity of human language, we utilized a subset of regular expressions. If we write “[A]”, phrase A is omissible. If we write “(
When a match is found, the result is passed to the automaton driver. The automaton driver updates the current state and executes the commands based on the definition of the model. When a state transition occurs, switching of the speech recognition dictionary occurs at the same time.
In this section, we present the applications we have developed using the development environment.
We have implemented the HumanAID task in the HRP-2 humanoid robot. The task is designed to assist people in everyday life. In the task, the robot greets the human, and the human gives commands to the robot, such as controlling the video or the TV, carrying drinks from the refrigerator to the table (Figure
TAIZO is a health exercise demonstration robot [
TAIZO (a) and RH-1 (b) robots.
RH-1 is a mobile robot that is designed to assist humans in the office environment (Figure
The development of the HumanAID task in HRP-2 has taken place from 2006 to 2007. Development of TAIZO and RH-1 has taken place from 2007 to the present. Table
Scripts used by each application and its development period.
Name of | Used by | # of trans. | Period | ||
HRP-2 | TAIZO | RH-1 | |||
greet | * | * | * | 3 | 1, 3 (HRP-2) |
robot-ctl | * | * | 13 | 1 (HRP-2) | |
tv-ctl | * | * | 14 | 1 (HRP-2) | |
vtr-ctl | * | * | 10 | 1 (HRP-2) | |
hrp-menu | * | 3 | 2 (HRP-2) | ||
greet-taizo | * | 4 | 3 (TAIZO) | ||
exercize | * | 17 | 3 (TAIZO) | ||
taizo-menu | * | 17 | 4 (TAIZO) | ||
wander | * | 15 | 5 (RH-1) | ||
ask-who | * | 3 | 6 (RH-1) |
Here, we list the development history.
HumanAID task functions have been developed separately. The state-transition model for demo conversation (e.g., saying “hello”, “bye”, introducing itself), the model for controlling the robot (e.g., walking, picking up objects), and the model for controlling the TV and VTR using an infrared controller were split into different scripts and developed simultaneously in this period.
After the development of each part of the function, a script was developed that defined a menu. Within this script, a central state and states corresponding to each function were defined. For states corresponding to each function, nothing was contained in this script, but it was unified with other scripts to obtain the functions. Only transitions from the central state to each function state were defined in the script.
Each automaton has been developed and tested individually, but at the final stage of development, it was possible to unify the script by simply confirming the warning messages given by the script-management server.
Script development of TAIZO was conducted by extending the scripts developed for HRP-2. The scripts for greeting and basic demo tasks were selected for reuse, and the other scripts (TV-control, VTR-control) were not selected because TAIZO has no ability to control this equipment. To add more patterns to the greeting, the “greet-taizo” script was defined, which shares the same state as “greet” but adds more transitions. Functions specific to TAIZO were developed as the script “exercise”. Finally, a menu script was developed to integrate all of the functions.
Although the composition of the subsystems (e.g., speech recognizer, behavior generation) of TAIZO and HRP-2 was different, it was possible to share the scripts with no modification by simply switching the adaptor configurations.
The script development of RH-1 was conducted simultaneously with the development of TAIZO. In RH-1, some control functions were imported from HRP-2. The script “wander” was defined as wandering around the office. This script not only uses speech input, but also uses visual information to find people. A multichannel input mechanism was used to integrate visual and speech inputs.
As a result of these developments, the number of acceptable command types has reached 43, 54, 45 for the respective applications.
Each application shares 93%, 30%, or 60% of its scripts with the other applications, respectively. As a total, 30% (
In the above discussion, we compared our method to the extension-by-connection method. Both the extension-by-connection and extension-by-unification methods belong to the same state-transition model group. For other modeling methods, and especially for artificial intelligence applications, a production system model is used in some applications.
A production system model is a modeling method that maintains the state of the system as a multidimensional feature vector, and controls the execution of actions by comparing the state to the pattern written in the script.
By using the same symbols as in Section
Condition
Production system models are generally known to be extensible. Our model can easily accumulate rules, as follows:
Although these are good points theoretically, in practical development there have only been a few examples of successful large-scale development. It is generally said that in a production system, the developers are required to be proficient in script development in order to ensure the extensibility of the system. We discuss this problem from the viewpoint of handling the states.
Let developer “
In the extended system, the rules
Antinomy between an existing rule and an accumulated rule. State 1 may be split into state 1-1 and state 1-2 when a new feature vector (frame B) is considered. There are no clear criteria to use to decide whether to select rule 1 or rule 1-1.
To avoid this problem, the script developer needs to project the final system before beginning to develop the rule
In contrast, the state-transition model clearly defines the model in the form of state and transition conditions in the design phase.
In state-transition model, we need to define a new state every time we add a new feature vector to the internal state so that the system can handle it. Here, we explain this using a concrete example. In rule
In the state-transition model, the developer assesses the internal parameters of the system in the design step, but for the description, the developer needs to break down the combination of parameters into a set of states. When the developer wants to increase the internal parameters, he/she has to use a different state. Due to this restriction, the definition of a state is always clear, and is not overwritten by a script that is added later. Thus, the developer can proceed without being trapped by the issues discussed earlier in this section.
In this paper, we have proposed a development environment to enhance the reuse of dialogue components. However, total development cost of the robot has to be calculated from both speech communication part and motion generation part.
Because the cost for reusing the motion depends on the algorithm, we first explain the algorithm we used. There are two methods for robot motion generation, one is the planning-based algorithm, and the other is the motion database. In our examples, for HumanAID and RH-1, we have used the planning-based algorithm. For TAIZO robot, we have used the motion database.
In the planning-based algorithm (HumanAID and RH-1) the motion generation algorithm will automatically generate the motion. We do not need to adjust the motion by hand, but only have to change the parameter such as structure of the arm, location of the target object. Because we share the motion generation algorithms between the robot, the cost of reusing the motion is very low in this case.
When we use motion database (TAIZO), there is a problem in reusing the motion. Because the motion database is generated by hand, we have to create a new motion by hand for the new robot with different structure. For this problem, we are planning to implement motion retargeting technique. This algorithm was originally developed for creating a motion for computer animated creatures in movies [
State-transition model is possible to model the input in word level. An example is shown in Figure
An example of word-level input model.
Syntactics model for “bring” command
Syntactics model for “pick” command
Vocaburary for objects and locations (part of command is abbriviated)
vocaburary for additional objects
Unified automaton (commands are abbriviated in this figure)
In the example, first, state-transition model in Figures
In our application, we have used the input in command (sentence) level. Modeling of input in word-level may be also useful, especially when the developer wants to share the syntactical structure of commands among the several scripts.
In VoiceXML, the industrial standard for scripting general voice operation system, there is an extension called RDC (Reusable Dialogue Component) [
Our proposed method has equivalence to the template method. As we have seen in Figure
The difference between our method and the template method is the possiblity to compose single model like in Figure
An example of three automaton unification. Dotted rectangle part is the extended part from the example shown in Figure
As shown in the examples of previous section, by using the proposed extension-by-unification method, the developer can develop functions simultaneously, and in the final step he/she can easily unify those functions to create an integrated system. In addition, scripts created in the past can easily be reused in the new application. In the conventional extension-by-connection method, the developer had to develop each function in turn, because it did not support the “merging” of scripts that had been developed simultaneously. Moreover, the developer needs to erase the unneeded functions manually when he/she wants to reuse the script in another robot. The proposed method does not require this process, because the script keeps information about the function even after the integration, and it can easily be separated for reuse. In our example, the script created for HRP-2 could be reused for TAIZO, but we had to remove the functions for TV and VTR control because TAIZO does not have this capability. In the conventional extension-by-connection method, we would have to do this manually. However, in the extension-by-unification method, this process is done simply by selecting the scripts to be unified.
As a result, we have confirmed that the extension-by-unification method significantly improves the efficiency of developing conversational function of the robots.
In terms of unification problems, it was possible to prevent the occurrence of isolated states by displaying a warning message, but there was a problem when the developer intentionally isolated the state. This happens when the developer has written a script in a redundant manner, or when he/she has tried to use pushdown automation. We are currently trying to solve this problem using 2 methods. One is a more intelligent isolation detection algorithm that can reduce misdetection (e.g., [
By applying a probabilistic weight to each transition of the state-transition model, the model became equivalent to the Markov model. There are some robots that have realized interactions with humans using such a model (e.g., [
In this paper, we have proposed an extension-by-unification method to improve reusability and flexibility in the incremental development of state-transition models. The dialogue engine SEAT has been developed to realize the incremental development of state-transition models to give robots a dialogue ability that can cope with various kinds of speech inputs in various tasks. SEAT has a flexible adaptor mechanism that can connect to many types of robotic interfaces, and the developer can accumulate scripts by using the script server, which has a function to propose existing reusable scripts to the developer. We have confirmed that the application of this system to the development of three robots has significantly improved the efficiency of their development.