Navigation robots must single out partners requiring navigation and move in the cluttered environment where people walk around. Developing such robots requires two different people detections: detecting partners and detecting all moving people around the robots. For detecting partners, we design divided spaces based on the spatial relationships and sensing ranges. Mapping the friendliness of each divided space based on the stimulus from the multiple sensors to detect people calling robots positively, robots detect partners on the highest friendliness space. For detecting moving people, we regard objects’ floor boundary points in an omnidirectional image as obstacles. We classify obstacles as moving people by comparing movement of each point with robot movement using odometry data, dynamically changing thresholds to detect. Our robot detected 95.0% of partners while it stands by and interacts with people and detected 85.0% of moving people while robot moves, which was four times higher than previous methods did.
Mobile navigation robots are expected to move smoothly at big facilities such as big supermarkets, museums, and airports [
Our navigation robot system.
People that robots want to detect while standing by are different from people that robots want to detect while moving. While standing by and interacting with people, the robot has to detect “people that call the robot positively” in order to offer a navigation service. While moving, it is important for a robot to detect all “moving people (obstacles)” around the robot in order to move smoothly.
Moreover, the feature of people detection is different, too. One different feature is related to calculation cycle. While standing by and interacting with people, slower people detection can be allowed comparing to the people detection for moving. Therefore, the robot can use multiple sensors that are used naturally in human-human interaction. For example, the sensors are cameras (eyes), microphones (ears), and tactile sensors (skins). While the robot moves, it needs fast people detection for safety and does not have to use all sensors that are used for interaction. Therefore, detection by one sensor is desirable while moving.
The other feature is related to localization accuracy (resolution). The robot does not need very high resolution for interaction. While interaction, it is efficient to use appropriate resolution for interaction. On the other hand, while the robot moves, it needs high resolution and it wants to detect people accurately.
Recently, many works use distance measurement devices such as the Laser Range Finder (LRF) and stereo cameras [
In order to detect all moving people around a robot by using one sensor while the robot moves, an omnidirectional camera is useful. However, it is difficult to apply the previous methods that classify obstacles as moving people or not [
We deal with two problems related to the people detection for a mobile navigation robot. One is detecting interaction partners who call a robot positively from among multiple people by using cameras, microphones, and tactile sensors. The other is classifying all obstacles around the robot as moving people or not by only one omnidirectional camera while the robot moves.
While robots stand by and interact with people, we have developed a method for detecting an interaction partner based on the degree of friendliness as mapped onto the “space”, considering interaction distance and the range of multiple sensors for interaction.
For obstacle classification, we have also developed a new method that focuses on objects’ floor boundary points where the robot can measure the distance from itself by only one omnidirectional camera. Our robot classifies a floor boundary point as a moving person when its movement is different from the robot’s movement.
Solving these two problems, we have developed a mobile navigation robot which can select an appropriate person who calls the robot positively while robot stands by and can detect moving people while the robot moves. A contribution of this paper is developing the people detection method for the navigation robot while the robot stands by and moves.
Section
When people interact with each other, the distance between them is associated with their degree of friendliness. Proxemics [ Intimate distance (approximately 50 cm): people can communicate via physical interaction and express strong emotions. Personal distance (approximately 50–120 cm): people can talk intimately. Social distance (approximately 120–360 cm): people do not know each other well. Public distance (approximately 360 cm and more): people who have no personal relationship with each other can comfortably coexist at this distance.
These distances can be used to set the degree of friendliness between the robot and each person, which shows how positively each person calls the robot. The distances shown in parentheses are only typical ones. They depend on each person’s personality and cultural background.
Since most functions and devices used by a robot are not effective for all distances, we assessed the effective distance for them. We investigated the effective distance of tactile recognition, speech recognition, sound source localization, and face localization, which are implemented into many robots as general functions.
Tactile recognition is done using tactile sensors, which are effective when people can touch the robot. The average length of a person’s arm is up to 50 cm. This distance is similar to the intimate distance.
To determine the range for speech recognition, we place a speaker in front of a robot at every 50 cm from 50 cm to 3.0 m and played 200 words of the ATR phonetically balanced corpus [
A well-known sound source localization function uses the Interaural Phase Difference (IPD) and Interaural Intensity Difference (IID) [
We use MPIsearch [
Detail discussions of effective distances are described in [
The relationship between the interaction distance and the effective distance for the four functions is shown in Table
Relationship between distance and function.
Intimate distance | Personal distance | Social distance |
---|---|---|
Tactile recognition | ||
Speech recognition | Speech recognition | |
Face localization | Face localization | Face localization |
Sound localization | Sound localization | Sound localization |
The sensor functions a robot can use effectively differ depending on the distance between the robot and each person. In other relational studies, the robot always used all sensors and interacted with people by focusing on the people. In our study, the robot interacted with people by focusing on the “space” of the people. In particular, the robot acted based on the space around the robot, segmented as described in Table
Given the size of a person’s face and the accuracy of the robot’s functions, the direction element of space must be segmented to some extent. We segmented the space every 15 degrees based on the average size of the human face (16 cm × 23 cm) and the errors of functions within the personal distance.
To identify the intimate space for the robot to interact with, we defined polar coordinates as shown in Figure
Friendliness space map and effective area of functions.
The effects of detecting the interaction partner using this map are as follows. Since a robot can change its motion and select an interaction partner based on the friendliness of various spaces, it can attract people while it stands by. The action selection based on space can also be applied to various other objects.
In each cell on the map, the HED is calculated by taking advantage of the integrated functions. When a function
The HED calculated by integration of all functions,
The cells on the Friendliness Space Map are affected by the kind of stimulus which shows positivity or negativity. Our robot recognizes two kinds of stimuli by using tactile recognition. One is uncomfortable stimuli which show negativity, such as hitting the robot’s head or touching the robot’s bust. The other is comfortable stimuli which show positivity, such as patting the robot’s head. These stimuli are decided by human-human interaction when a person selects interaction partner. Comfortable stimuli are used to call person. On the other hand, uncomfortable stimuli are used to just tease.
Since tactile recognition cannot localize people precisely, we assume that the person delivering the stimulus is in the cell with the highest HED within the intimate distance. That is, it is cell (
If the stimulus occurs at time
The Friendliness Space Map is renewed and consists of both the HED and the CD obtained using the robot’s functions. The friendliness,
We use floor colors for floor detection because floor colors are generally simple. Previous works use the Gaussian Mixture Model (GMM) for specific color detection [
Our robot learns representative colors of the floor by itself based on the distribution of floor color data without prior setting. Considering the distribution, our floor detection method can adjust more easily than the GMM can and detects the floor as accurately as the GMM does. Here, in order to detect the representative colors of the floor, we assume that our robot is activated in the free space. Moreover, we use Ward’s clustering [
Our robot takes an image and gets
We choose two clusters
In step 2, when
Steps 2 and 3 continue until all data are not used.
Because Ward’s clustering considers the distribution of data, each cluster is identified easily by Mahalanobis distance. A color datum
When a robot uses an omnidirectoinal camera mounted on its head, the floor is projected to around the image center. Therefore, our robot classifies the pixels from center to outer by applying (
In the case of using an omnidirectional camera incorporating a hyperbolic mirror, a position (
Many robots are equipped with an omnidirectional camera, and they can measure or know the distance from the floor to the camera while they are moving [
In order to decide the parameters
The omnidirectional image of the cross-stripes on the floor (a) and the bird’s eye images (b).
For confirmation of parameters, a bird’s-eye image is created by using the decided parameters. Figure
A floor boundary point
It is easy to transform the coordinates of
When
The following conditions should be satisfied in order to regard ( Floor boundary points have to be located at the boundary between obstacles and the floor correctly in the image. Floor boundary points have to be tracked correctly. Camera parameters have to be decided correctly. Odometry has to be calculated correctly.
Condition 4 is satisfied in the general environment, because the odometry is comparatively correct during short movement. Figure
The CE is satisfied as long as floor boundary point
The example of classification process by using floor boundary points. Triangles are tracked from the left image to the right image. Circles are tracked from the right image to the left image. The robot moves by 20 cm.
If the threshold is low at the beginning of the robot’s activation, all points are located on the floor. However, they are located between the boundary and the robot, and free space looks very small. Our classification method first uses high thresholds and detects the boundary that is a little larger than the true boundary. Moving and confirming the CE refine the threshold of each direction where the floor boundary point classified as a moving obstacle is located. Finally, the robot adapts the threshold of each direction and makes it possible to locate and classify obstacles accurately. When the illumination and floor color change, our robot adapts the threshold again.
Our people detection method is implemented on our robot called ApriTau as shown in Figure
ApriTau and experimental setting.
Figure
The whole system of classification.
The output of classification system.
In these experiments, the thresholds
We investigated whether our method detects interaction partner while the robot stands by and interacts with people. We asked 4 people to interact with our robot freely. Our robot looks at the highest friendliness place and talks with people by using only simple words. Two labelers observe their interaction and select interaction partners whom our robot should interact with on the second time base.
We evaluate our method by two values
The experimental results show that
One of the reasons why
In order to confirm the effectiveness of changing the threshold ApriTau and another robot move on the given route. They pass each other. ApriTau takes images synchronized with odometry data continuously while moving. The images and the odometry data are input to the systems of our method, the simple method, and the previous method. Note that although same data are input to three systems, each system processes some of them because of the difference of the processing speed. The classification ratios of our method, the simple method, and the previous method are calculated by outputs.
In this experiment, the classification ratio is the
The classification ratios of three methods are shown in Table
The classification ratios.
Method | Recall ratio | Precision ratio | |
---|---|---|---|
Previous | 0.18 (3/17) | 0.25 (3/12) | 0.21 |
Simple | 0.63 (10/16) | 0.13 (10/79) | 0.21 |
Ours | 0.94 (17/18) | 0.77 (17/22) | 0.85 |
However, the precision ratio of our method is a little low for robots’ smooth movement. In this paper, we assume that errors of tracking points are very small, which is certainly correct to some extent for the image coordinates. In the case of omnidirectional camera image, the distance resolution changes depending on the distance from the image center. It is very low for a distant place. Tracking errors of a few pixels become errors of a few meters for the world coordinates. Because of errors of a few meters, (
In order to confirm our method detects moving people, we calculate the classification ratio in various patterns. In this experiment, a person and ApriTau move on the given route as shown in Figures
The experimental setting (Pattern 1 and 3).
The experimental setting (Pattern 2 and 4).
The experimental setting (Pattern 5).
The classification ratios in each pattern are shown in Table
The classification ratios in various patterns.
Pattern | Recall ratio | Precision ratio | |
---|---|---|---|
1 | 1.00 (21/21) | 0.64 (21/33) | 0.79 |
2 | 0.98 (42/43) | 0.64 (42/67) | 0.77 |
3 | 0.80 (16/20) | 0.64 (17/29) | 0.71 |
4 | 0.93 (42/45) | 0.59 (42/74) | 0.72 |
5 | 0.93 (13/14) | 0.57 (13/23) | 0.71 |
The classification ratio in the case of the robot rotation (Pattern 5) is also a little low. One of the reasons why the classification ratio is a little low is that tracking area in the image in the case of rotation changes more than tracking area in the case of straight transition (Patterns 1–4) does. Changing tracking area very much makes robots fail to track the floor boundary points. Moreover, we need to synchronize the timestamps between odometry and images. We also think that it is effective to take into account uncertainty in sensing. The accuracy of odometry or tracking differs according to the robot movement. We have to use probabilistic method in the future work.
This work has dealt with two problems related to people detection that is needed for the navigation robot system. One is how robot detects the person who calls it positively while standing by in order to select a person. The other is how one moving omnidirectional camera detects all moving people around the robot while moving in order to move safely. Changing the people detection methods according to tasks of the robot, we aim to select the person who needs navigation and detect moving people while robot moves in particular for safety.
In order to solve the first problem, we have developed a people detection method based on the “friendliness space map,” which focuses on the “space” rather than the person to find and select people who call our robot positively.
In order to solve the second problem, we have developed the new method that focuses on floor boundary points where one omnidirectional camera can measure the distance from the robot.The points are detected by the floor detection method using Ward’s clustering to find representative colors and Mahalanobis distance to identify floor colors. For detecting moving people, our robot tracks the floor boundary points. Comparing the robot’s movement with floor boundary points’ movement, our robot detects moving people and dynamically changes the threshold that the floor detection uses.
We performed three experiments. The first experimental result showed that our robot detects 95% of the person who calls the robot positively by using friendliness space map. In the second experiment, we confirmed the classification ratio increased to 85%, which was four times higher than that of a previous method. The third experimental result showed that our method could detect a moving person in various situations. In future work, we plan to evaluate our navigation system in a crowded place such as a real supermarket. (This paper is an extended version of a conference paper [
This research was supported by New Energy and Industrial Technology Development Organization (NEDO, Japan) Project for Strategic Development of Advanced Robotics Elemental Technologies, Conveyance Robot System in the Area of Service Robots, and Robotic Transportation System for Commercial Facilities.