^{1,2}

^{1,2}

^{1,2}

^{1}

^{1}

^{2}

We briefly surveyed the existing obstacle avoidance algorithms; then a new obstacle avoidance learning framework based on learning from demonstration (LfD) is proposed. The main idea is to imitate the obstacle avoidance mechanism of human beings, in which humans learn to make a decision based on the sensor information obtained by interacting with environment. Firstly, we endow robots with obstacle avoidance experience by teaching them to avoid obstacles in different situations. In this process, a lot of data are collected as a training set; then, to encode the training set data, which is equivalent to extracting the constraints of the task, Gaussian mixture model (GMM) is used. Secondly, a smooth obstacle-free path is generated by Gaussian mixture regression (GMR). Thirdly, a metric of imitation performance is constructed to derive a proper control policy. The proposed framework shows excellent generalization performance, which means that the robots can fulfill obstacle avoidance task efficiently in a dynamic environment. More importantly, the framework allows learning a wide variety of skills, such as grasp and manipulation work, which makes it possible to build a robot with versatile functions. Finally, simulation experiments are conducted on a Turtlebot robot to verify the validity of our algorithms.

With the development of robot technology, especially the development of computer science and cognitive science, some promising technologies come out, such as deep learning and computer vision. These technologies greatly improve the intelligence level of robots and also laid the foundation for the modern intelligent robots. Unlike the traditional mature industrial robots, intelligent robots are expected to coexist with people, such as service robots. To achieve this purpose, robots are required to have strong environmental adaptability and autonomous decision-making ability. Obstacle avoidance and planning are basic skills of robots, which are also research focus in the field of robotics. There have been a lot of mature algorithms for obstacle avoidance, but how to develop obstacle avoidance algorithm with strong robustness, especially algorithms that can make robots fulfill task under unstructured environment, is still worthy of further research.

The obstacle avoidance algorithms can be classified into traditional obstacle avoidance algorithms and intelligent algorithms based on the developments history. In [

The core idea of heuristic algorithms is to simulate some intelligent behaviors of animals or people. Algorithms are designed by bionic mechanism. There are a large number of representative algorithms, such as Ant Colony Optimization (ACO) algorithm, Particle Swarm Optimization (PSO) Algorithm, and Genetic Algorithm (GA). In [

In this paper, we proposed a kind of obstacle avoidance method based on teaching and learning. First, learning sample is obtained by showing robots how to avoid the obstacle and then using probabilistic mixture model for extraction and encoding task constraints autonomously. Finally, obstacle-free path is generated by GMR. Our approach differs from previous work in the following: (i) it is more in line with the habit of human cognition, drawing lessons from imitation learning and (ii) the algorithm has good generalization ability and can adapt to dynamic unstructured environment. Besides, the computational complexity is low, and in particular the path generation process can be realized online efficiently; (iii) the learning framework not only can be used to learn obstacle avoidance but also can be used for learning other skills, which is a key point to achieve generalized artificial intelligence (GAI).

The main contributions of the paper are as follows: (i) the first contribution is using LfD for the obstacle avoidance task for the first time, making up for the disadvantages of other methods; (ii) due to the nature of LfD, nontechnical people can programme the robots to realize obstacle avoidance function; (iii) obstacle avoidance learning using mixture models is able to adapt the environmental changes and dynamic obstacles and at the same time shares low computational complexity.

The remainder of the paper is organized as follows. In Section

LfD is a paradigm which allows robots to automatically acquire new skills. Unlike the traditional robots, to fulfill a task, a previous task decomposition work and programming for each action need to be done by humans. LfD adopts the point that robot’s skills can be acquired by observing the human operation. This approach has the advantages of simple programming and even people who do not understand programming technique can programme robots through teaching. Besides, due to the nature of LfD, robots are more advantageous to adapt to new environment. LfD is also named robot Programming by Demonstration (PbD) (Imitation Learning and Apprenticeship Learning), Learning by Demonstration (LbD), Learning by Showing, Learning by Watching, Learning from Observation, and so forth. It is worth noting that some authors think that imitation learning and LfD have different meanings; in [

Solving process for LfD can be divided into two steps. First, acquire training data; second, use the obtained data to derive a control policy. We use

Framework of learning form demonstration (LfD).

Nehaniv and Dautenhahn [

In order to solve the above problems (mainly the former two problems), there are three main ideas:

Directly get a map function

Learn a system model, namely, the return function and state transition probability distribution, form training set, and then derive a policy using MDP methods.

Learn the pre- or postcondition of each action, and then obtain control sequence using programming methods.

For the obstacle avoidance problem, through learning the barrier-free path from teachers to produce a generalized path can be seen as a supervised regression problem. GMM is a probabilistic model which is suitable to model the natural variations in human and robot motions. Theoretically, GMM can model any probabilistic density functions (PDFs) by increasing the number of Gaussian components, so GMM has strong encoding capacity. Finally, movements recognizing, predicting, and generating can be integrated together within the same GMM and GMR encoding strategy. So it is very favorable for us to use GMM encoding our prior demonstration information and use GMR as the solution to the regression problem mentioned above. Details of GMM theory are described below.

For a probability density estimation problem, if, in addition to the known observation variables, there may exist some unobservable latent variables, but we know the distribution of the latent variable and the conditional distribution of the observable variable given the latent variable, then we can use mixture models to estimate the parameters of the distribution, so as to get the approximate probability distribution of the known data. Before introducing GMM, we deviate to talk about Gaussian distribution.

Assume that we have a random variable

And

Because the marginal distribution of a Gaussian is also a Gaussian distribution, it is easy to get that

Assuming that we have a training set

Maximize (

Expectation Maximization (EM) is an iterative algorithm, mainly including two steps, which are called E-step and M-step. The main idea is to find a tight lower bound function of the objective function and then maximize the lower bound function so that the objective function can be maximized indirectly. The reason for introducing a lower bound function is that finding the extreme value of the lower bound function is much easier compared to the original function. For GMM, the E-step tries to guess the value of

E-step: for each

M-step: modify our model parameters:

Note that the value of

Figure

System architecture of obstacle avoidance learning.

The experimental data is acquired form a Turtlebot robot. Turtlebot is a widely used research platform which is compatible with ROS, integrating a variety of sensors, such as odometer and extensible vision sensors. We can develop many kinds of applications by using Turtlebot, such as mapping and navigation. In this paper, we will implement our obstacle avoidance algorithm on the bases of ROS development kit for Turtlebot. The teaching process can be achieved by three ways: (i) using a remote joystick; (ii) with the aid of Turtlebot development kit for the navigation algorithms, simulating some trajectories as training samples; and (iii) by means of the following apps implemented on Turtlebot, the teacher can guide robot to avoid obstacles and record sensor data as samples. In this paper, we adopt the second method, which means that we create simulated environment in software and then make robot execute multiple obstacle avoidance tasks in the simulated environment using built-in navigation algorithms. During this process, linear velocity and angular velocity are obtained by built-in sensors like odometer. Using Kinect we can get the distance between the robot and obstacles, as well as distance from the robot to the target.

We assume that the important sensing information comes from: (i) states of the robot, defined as linear velocity and angular velocity; (ii) the robot’s absolute location; (iii) distance between the robot and obstacles, noting that the Kinect returns many pieces of distance information representing the relative position of the obstacle in a specific range, and we choose the shortest distance as our reference value; and (iv) distance between the robot and the target, which can be computed indirectly. As illustrated in Figure

Useful data collected by sensors. Line in blue shows the demonstrated trajectory. Sectors with a fan shape in shallow yellow color represent the visual field of the Kinect.

In the experiment, demonstration is conducted

Because of noise and bad demonstration quality, there exist serious mismatch and invalid data in the original data. As shown in Figure

Comparison of original data and processed data. (a) shows the original data. Lines in red represent cutting position. (b) displays valid data after processing.

Considering a data-point in training set

In order to get a general barrier-free trajectory, which is given an input, which can be temporal signals or other kinds of sensor data, then a barrier-free path is generated according to the given input, a natural idea is to divide the variable

Based on (

Taking

Equation (

Substituting

Through the previous section, we can obtain a general trajectory, and this is a planning problem. How to combine planning with control? In other words, how to find a set of control inputs that can make the robot follow the general trajectory accurately? For the time-based GMR,

Naturally, the above equation can be applied to multiple variables. For this paper, we have four variables. Let

According to performance indicator

Adding (

Substituting

Weight matrices in (

Experiments are implemented by using a Turtlebot robot. In addition to the robot’s own hardware system, a Kinect is integrated to obtain distance information. The size of the map used in the simulation is

Two different maps used in our simulation. (a) and (b) are actual maps generated by software. (c) and (d) are simulation environment corresponding to maps above them.

Figure

Demonstrated trajectories in two situations. (a) represents sampled data in task 1. (b) represents sampled data in task 2.

Figure

Trajectory encoding and retrieval using GMM and GMR for task 1. (a) represents model encoding with GMM. Ellipse in dark blue represents projections of Gaussian components. (b) represents general trajectory generated by GMR. Shades in red color represent generation region.

Figure

Encoding and retrieval for orientation and distance information in task 1, sharing the same meaning with charts in Figure

Figures

Trajectory encoding and retrieval using GMM and GMR for variables

Encoding and retrieval for orientation and distance information in task 2.

Experiments in this part are mainly conducted to verify the derived control policy. For a Turtlebot robot, the control input is linear velocity

Figure

Figure

Tracking performance using the derived policy.

Comparisons between the expected angular velocity and calculated value.

Linear velocity

Angular velocity

In this paper, based on the idea of the imitation learning, we propose a novel idea that the robot’s obstacle avoidance skill can be learned by showing the robot how to avoid an obstacle. Under this consideration, task constraints extraction is a key issue, that is, how to make robots understand the task. In order to achieve this purpose, the Gaussian mixture model is used to encode the data. GMR is used to retrieve satisfactory trajectories. Eventually, to obtain the control strategy, a multiobjective optimization problem is constructed. The control strategy is derived by solving this problem. Simulation experiments verified our algorithms.

For the future, improvement work mainly includes the following aspects: (i) in order to avoid obstacles online, online incremental GMM algorithm should be used; (ii) in this paper, a time-based trajectory is generated by GMR, that is, use of time signal as input. A more reasonable idea is using state information as input, so that the robot can change its decision according to the current environment, for example, using distance information as input and velocity as output; thus the control inputs can be directly generated; (iii) reinforcement learning algorithms can be integrated to strengthen robots’ exploration ability, so that better generalization performance is achieved.

The authors declare that the grant, scholarship, and/or funding do not lead to any conflict of interests. Additionally, the authors declare that there is no conflict of interests regarding the publication of this manuscript.

This work is supported by National Natural Science Foundation of China (Grant no. 51505470) and the Dr. Startup Fund in Liaoning province (20141152).