This paper presents a visual/motor behavior learning approach, based on neural networks. We propose Behavior Chain Model (BCM) in order to create a way of behavior learning. Our behavior-based system evolution task is a mobile robot detecting a target and driving/acting towards it. First, the mapping relations between the image feature domain of the object and the robot action domain are derived. Second, a multilayer neural network for offline learning of the mapping relations is used. This learning structure through neural network training process represents a connection between the visual perceptions and motor sequence of actions in order to grip a target. Last, using behavior learning through a noticed action chain, we can predict mobile robot behavior for a variety of similar tasks in similar environment. Prediction results suggest that the methodology is adequate and could be recognized as an idea for designing different mobile robot behaviour assistance.
The robotics research covers a wide range of application scenarios, from industrial or service robots up to robotic assistance for disabled or elderly people. Robots in industry, mining, agriculture, space exploration, and health sciences are just a few examples of challenging applications where human attributes such as cognition, perception, and intelligence can play an important role. Inducing perception and cognition, and thence the intelligentsia into robotics machines is the main aim in constructing a robot, able to “think” and operate in uncertain and unstructured conditions.
To successfully realize the instruction capability (e.g., object manipulation, haptically guided teleoperation, robot surgery manipulation, etc.), a robot must extract relevant input/output control signals from the manipulation system task in order to learn the control sequences necessary for task execution [
Predictive strategy in robotics may be implemented in the following ways [ Model-based reinforcement learning. The environment model is learnt, in addition to reinforcement values. Schema mechanism. A model is represented by rules and learnt bottom-up by generating more specialized rules where necessary. The expectancy model. Reinforcement is only propagated once a desired state is generated by a behavioral module and the propagation is accomplished using dynamic programming techniques, applied to the learnt predictive model and sign list. Anticipatory learning classifier systems. Similar to the schema mechanism and expectancy model, they contain an explicit prediction component. The predictive model consists of a set of rules (classifiers) which are endowed with an “effect” part, to predict the next situation the agent will encounter if the action specified by the rules is executed. These systems are able for generalization over the sensory input. Artificial neural network (ANN). The agent controller sends outputs to the actuators, based on sensory inputs. Learning to control the agent consists of learning to associate the good set of outputs to any set of inputs that the agent may experience. The most common way to perform such learning is through using the back-propagation algorithm.
The learning trajectory in the context of programming by demonstration through reinforcement learning is presented under [
Different forms of visual-based learning are presented in [
The paper [
In [
The perception-action scheme for visually guided manipulation that includes mechanisms for visual predictions and detecting unexpected events by comparisons between the anticipated feedback and incoming feedback is proposed under [
Artificial neural networks (ANN), as universal approximators, are capable of modeling complex mappings between the inputs and outputs of a system up to an arbitrary precision. The ALVINN example illustrates the power of standard feed-forward networks, as well as their limitations. The control network solved a difficult pattern recognition task, which required complex image preprocessing, the use of line extraction algorithms, and so forth, if programmed by a human designer. However, due to its use of a feed-forward network, the ALVINN is a reactive system. This means that it has no notion of the temporal aspects of its task and will always react to its visual input in the same fashion, regardless of the current context [
The situation, however, changes fundamentally, as soon as the artificial neural networks are used as robot controllers; that is, the network could, by means of the robot body (sensors and effectors), interact with the physical objects in its environment, independent of an observer’s interpretation or mediation. In [
The various forms of the neurologically inspired RNN networks were referred to in literature in recent years. For example, a continuous recurrent neural network (CTRNN) was implemented in a humanoid robot for object-manipulation tasks [
The most important factor of robot assistance by the behavior sequence learning is the design of interface between neural network and sensors/actuators. Although an ANN could theoretically adapt to different representations of sensor/actuator interfaces, it was necessary to find an interface with low cognitive complexity for the ANN [
This paper presents the behavior description, which emphasizes the repetition of numbering in a sequence of actions, noticed as Behavior Chain Learning. In our research, using the characteristics of neural networks, the system learns the necessary set of actions for movement of a mobile robot in order to access the object in space of observation. On the basis of such trained and tested network, the prediction set of robotic actions for new scenarios of object recognition is constructed. The learned motions can be applied in similar circumstances. Our approach is easily scalable for other applications.
Our approach focuses on a behavioral system that learns to correlate visual information with motor commands in order to guide a robot towards a target. We chose this task setting, because this approach can be useful for any form of visual/motor coordination, so the task specification can be reformulated as a variety of behavioral responses.
Figure
Experimental mobile robot with gripped target.
This camera can detect stationary and moving objects. CMUcam1 is an SX28 microcontroller, interfaced with OV6620 Omnivision CMOS camera on a chip. The mobile robot has a gripper, whose length is 12 cm, and it serves him to grip the ball. The gripper length specifies that it must stop at a distance of 9–13 cm in front of the target.
The robot is in the center of environment and the ball could be at any position in front of robot with respective angle in scope (0–180°). An interaction between the visual perception and motor behavior (a sequence of actions) is obtained through the real-time visual 2D tracking routines.
Figure
Initial target position for experimantal setup.
The robot is able to find the ball by turning
The behavior mobile robot scheme consists of the following stages: (1) vision processing involves detecting features, such as color or spatial and temporal intensity gradients; (2) obtaining the fundamental relationship between visual information and robot motions by correlating visual patterns and motion commands; (3) mobile robot behavior of learning to grip a target; (4) prediction of motor actions for new visual perceptions, Figure
Behavior mobile robot scheme.
In the first phase, visual detection of features is made on the basis of a specific data set from the camera’s streaming video sent to the mobile robot. Centre of the mass (RCVData(2)), number of pixels within the window (RCVData(8)), and data reliability regarding the color (RCVData(9)) are specific parameters from image.
When the object is positioned in middle of robot’ camera window, the variable called RCVData(2) has value 45. Possible action selections are represented in (Pseudocode
Step 1: Start, Move = 0 Step 2: Send command “Track Color” in order to get back the array of the data if (RCVData Move = 1; else if (RCVData Move = 1; else if (RCVData Move = 1; else if RCVData Move = 1, else if Move = 0 then else go to Step 2.
The behaviors can be implemented as a Finite State Acceptor (FSA) [
We propose Behavior Chain Model (BCM) in order to generalize the form to cope with a variety of similar tasks in similar environment. Each change in action type presents a behavior changing. For example, each time, when the human starts to do something new, it starts to counter (we counter feet in one directions and then change direction or when cooking, we counter spoons, before mixing, etc.). This is inspiration for introducing a formal definition of this behavior model.
BCM consists of: (1) creating of behavior chain from a sequence of actions and (2) extracting physical variables using behavior transform function. We introduce next definitions:
The behavior of system
We introduce a formal definition of the behavior transform function, which give us variables from mathematical description of real problem.
The system behavior transform function
where
For our behavior model, we introduce the coefficients, which counts changing actions:
In case of more changing of action repetitions (longer target distances or environment with obstacles), we can introduce more coefficients
One example of creating of behavior chain is presented in Figure
Creating of behavior chain.
Table
Examples of correlating behavior sequence of actions.
Position |
Turning |
Sequence of actions |
---|---|---|
(55, 170) | 6 |
|
(35, 160) | 5 |
|
(15, 150) | 3 |
|
(25, 140) | 3 |
|
(45, 130) | 2 |
|
(45, 120) | 1 |
|
(65, 110) | 1 |
|
(25, 100) | 0 |
|
(65, 90) | 0 |
|
(55, 80) | 0 |
|
(35, 70) | 0 |
|
(45, 60) | 1 |
|
(65, 50) | 3 |
|
(25, 40) | 2 |
|
(15, 30) | 3 |
|
(45, 20) | 6 |
|
(65, 10) | 6 |
|
A turn to the left has a positive value, while a turn to the right has a negative value in an action matrix. In the this example of ball position (55, 170°), a sequence of actions is presented with the following Behavior Chain:
In the example of ball position (65, 50°), a sequence of actions is presented with the following Behavior Chain:
The examples of mobile robot’s behaviors for some ball positions ((65 cm, 50°), (35 cm, 70°), and (45 cm, 120°)) are presented on Figure
Experimental robot scenario.
For example, the mobile robot position in polar system is presented with
In order to calculate the positions
Mobile robot positioning.
In our approach, one turn is 10 degrees (
Angle
From Figure
Finally, angle
Table
Description examples of behavior sequence of actions.
|
|
|
|
|
|
|
---|---|---|---|---|---|---|
(55, 170) | 7 | 1 | 1 | 6 | 42 | 169 |
(35, 160) | 7 | 3 | 0 | 0 | 18 | 160 |
(15, 150) | 5 | 0 | 0 | 0 | 0 | 140 |
(25, 140) | 4 | 1 | 1 | 1 | 12 | 135 |
(45, 130) | 4 | 5 | 0 | 0 | 30 | 130 |
(45, 120) | 3 | 5 | 1 | 0 | 30 | 120 |
(65, 110) | 2 | 7 | 1 | 2 | 54 | 112 |
(25, 100) | 1 | 2 | 0 | 0 | 12 | 100 |
(65, 90) | 0 | 9 | 0 | 0 | 54 | 90 |
(55, 80) | −1 | 6 | 0 | 0 | 36 | 80 |
(35, 70) | −2 | 3 | 0 | 0 | 18 | 70 |
(45, 60) | −3 | 4 | 0 | 0 | 24 | 60 |
(65, 50) | −4 | 6 | −1 | 2 | 47 | 45 |
(25, 40) | −5 | 1 | 0 | 0 | 6 | 40 |
(15, 30) | −6 | 1 | 0 | 0 | 6 | 30 |
(45, 20) | −8 | 4 | −1 | 1 | 30 | 8 |
(65, 10) | −8 | 5 | −1 | 3 | 48 | 6 |
The third phase in our approach is the learning of a sequence of actions, which establishes an appropriate correspondence between the perceived states and actions. The calculated mobile robot positions, based on coefficients extracted from the experimental patterns, will be compared with mobile robot positions
Based on artificial cognition, a robot system can simulate goal-directed human behavior and significantly increase the conformity with human expectations [
A set of input data consists of
Results of error by different neural network training.
|
|
|
|
Error test (MSE) | Error |
RMS | ||
---|---|---|---|---|---|---|---|---|
|
|
|
|
|||||
1 | 10 | 500 | 0.001 | 7.74 | 7.77 | 3.29 | 4.55 | 4.00 |
1 | 10 | 1000 | 0.001 | 5.18 | 8.48 | 2.45 | 4.44 | 3.76 |
1 | 20 | 500 | 0.01 | 5.57 | 10.5 | 7.13 | 3.11 | 4.07 |
1 | 30 | 500 | 0.001 | 38.13 | 13.6 | 29.7 | 8.88 | 7.30 |
2 | 30 | 500 | 0.001 | 5.99 | 8.65 | 4.20 | 14.43 | 3.89 |
2 | 20 | 500 | 0.01 | 5.01 | 7.77 | 2.43 | 4.63 | 3.63 |
2 | 20 | 1000 | 0.01 | 7.57 | 8.07 | 3.57 | 4.43 | 4.02 |
2 | 30 | 500 | 0.1 | 8.52 | 8.93 | 4.79 | 4.30 | 4.25 |
2 | 20 | 1500 | 0.1 | 8.86 | 736 | 3.04 | 4.43 | 3.80 |
2 | 30 | 1500 | 0.01 | 17.62 | 10.6 | 3.06 | 4.30 | 5.40 |
During neural network training, we changed the number of hidden layers (
The data collected during the experiments are comprised of a large amount of information. Several analyses can be carried out over this data, especially those regarding the appropriateness and usefulness of the different features. However, we are more interested in the predictive capabilities that can be inferred from these data and the methods that can make the best use of it.
In the fourth phase of our approach, we present prediction results of selected input data using neural network configuration with minimal value of the error (RMS). For each target position from the prediction set, we calculated
Selected set of mobile robot positions from experiments and predicted values from neural network learning process.
|
15 | 25 | 35 | 45 | 55 | 65 |
|
160 | 140 | 120 | 100 | 80 | 60 |
| ||||||
Real values of coefficients |
||||||
| ||||||
|
6 | 4 | 3 | 1 | −1 | −3 |
|
0 | 1 | 3 | 5 | 6 | 8 |
|
0 | 1 | 0 | 0 | 0 | 0 |
|
0 | 1 | 0 | 0 | 0 | 0 |
| ||||||
Real values of ( |
||||||
| ||||||
|
0 | 12 | 18 | 30 | 36 | 48 |
|
150 | 135 | 120 | 100 | 80 | 60 |
| ||||||
Values of coefficients |
||||||
| ||||||
|
6.66 | 5.09 | 3.05 | 1.00 | −0.68 | −2.70 |
|
−0.06 | 2.10 | 3.34 | 4.68 | 4.92 | 5.71 |
|
−0.14 | 0.09 | 0.02 | 0.04 | −0.22 | −0.32 |
|
0.00 | −0.34 | −0.07 | 0.48 | 1.55 | 2.14 |
| ||||||
|
||||||
| ||||||
|
0.37 | 10.57 | 19.65 | 30.99 | 38.83 | 47.09 |
|
149.95 | 129.83 | 119.99 | 100.04 | 79.46 | 59.12 |
Using Table
Figures
Experimental mobile robot with tracked target on position (65, 60), (45, 100), and (25, 140).
Experimental mobile robot with tracked target on position (55, 80) and (35, 120).
For example, the target is located at position (45, 100), Figure
For example, the target is located at position (55, 80), Figure
We proposed a methodology which tries to emulate the human action of vision in a general-conceptual way that includes: primary recognition of object in environment, visual-based mobile robot behavior learning, and prediction of new situations. This form of robot learning does not need the knowledge about the environment or kinematics/dynamics of the robot itself, because such knowledge is implicitly embodied in structure of the learning process.
This approach is very flexible and can be applied to a wide variety of problems, because behavior description is “elastic” enough to adapt to various situations. In order to apply our approach to any kind of tasks, we have to solve two important problems. One is how to construct the behavior description of actions and other is how to generalize the learned form to cope with a variety of similar tasks in similar environment.
Although the neural network could theoretically adapt to different representations of sensor/actuator interfaces, it was necessary to find an interface with low cognitive complexity for neural network, which, in our case, was a simple polar representation of the sensors and intended robot movements through “
We have implemented a prediction approach that uses such features to produce reliable output. Feature space data were obtained from real experiments with a mobile robot with camera and gripper. The obtained prediction results are satisfactory enough to suggest that the methodology is adequate and that further progress should be made in this direction. In future work, more involved strategies may be developed by expanding a set of new manipulation tasks, independent learning, adaptation in space, or involving multiagent behaviour learning.