Obstacles Regions 3 D-Perception Method for Mobile Robots Based on Visual Saliency

1College of Electronic Information & Control Engineering, Beijing University of Technology, Beijing 100124, China 2School of Mechanical and Electrical Engineering, Henan Institute of Science and Technology, Xinxiang 453003, China 3Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China 4Engineering Research Center of Digital Community, Ministry of Education, Beijing 100124, China


Introduction
In recent years, the robots have already massively been applied in many fields.In particular, mobile robots have developed rapidly to help human complete kinds of difficult tasks.For mobile robots, to achieve efficient processing of visual information about the environment is obviously important, so the method which a vision-based obstacle avoidance system needs to recognize obstacles and describe them mathematically to give out the obstacle avoidance strategy has been a hot research focus.
Currently, the obstacle recognition and segmentation methods based on visual information mainly include four types [1].Ohya et al. [2] had presented a model-based approach used for locating obstacles; obstacles are detected by computing the difference between the edges estimated from the 3D environment model and the edges detected from the actual camera image.Nevertheless, their results failed to detect the brown box in 50% of the cases.Aiming to solve these problems, we present an automatic obstacles regions extraction algorithm, which is based on the saliency of visual attention.To mimic the astonishing capability of human visual system, visual saliency algorithm can easily detect salient visual subsets from images and videos.Camus et al. [3] developed a model that uses only real-time image flow divergence to avoid obstacles while driving toward a specified goal direction in a lab containing office furniture and robot and computing equipment.The major contribution of this work is the demonstration of a simple, robust, minimal system that uses flow-derived measures to control steering and speed to avoid collision in real time for extended periods.But this approach only uses observable image information represented in the 2D image sequence.It cannot convey adequate information on real users' environments needs.Compared with the existing methods, our method can achieve obstacles regions 3D-perception for mobile robot by effectively taking advantage of the Kinect sensor system.Hong et al. [4] described the local method which is based on feature correlation.According to certain criteria, "characteristic points" are selected from each view and paired with characteristic points in the other view by computing a measure of similarity between the two windows around the characteristic points.Bing et al. [5] built a model that the behavior state can generate according to the optical flow field changes of ground while the robot is moving.But for the noises, shadows and occlusion existing, the calculated optical flow is not precise and reliable enough, while the optical flow field computing is also time-consuming.
Considering the movements of robot, the relative position of obstacles area in robot's view would be changed; therefore, obstacles area extraction can be transformed into a motion region extraction problem.Based on this, in this paper, we present a new obstacle regions localization method in indoor environment, which is applied on mobile robot.To ensure the mobile robot can pass through the suitable path, the robot needs to detect and locate obstacles in the path.Firstly, scene images are generated by the Kinect sensor.Then, the mobile robot system gains OSM and IFM of the original image by the salience filtering algorithm.Referring to the improved PCNN theory, we point that multiplication algorithm was taken between PCNN internal neuron and binarization salience image of OSM; then we determine the final ignition pulse input.Finally, the salience binarization region abstraction was fulfilled by improved PCNN multiple iterations.On this basis, the binarization area is mapped to the depth map obtained by Kinect sensor and the mobile robot can locate the obstacle [6].
The remainder of this paper is organized as follows.In Section 2, the overall design is addressed.Section 3 introduces ISRE method of SF algorithm with PCNN algorithm.Section 4 describes the advantage of ISRE algorithm on mobile robot followed by experiment results and analysis.Conclusions and future work are discussed in Section 5.

Overview of Our Model
To achieve obstacle information well, the new obstacle regions localization method is applied on mobile robot for its advantages such as recognition and location of obstacle.The detail of application is shown in Figure 1.
Broadly speaking, to accomplish the location task, this proposed method can be divided into three parts [7].One is perceived image.The purpose of this part is to acquire scene images by the Kinect sensor.Another one is salient region extraction.This part aims to gain OSM and IFM of the original image by SF algorithm firstly, then IFM is being as the input neutron of PCNN, and finally salience binarization region abstraction was fulfilled by improved PCNN.Last one is acquiring depth information and locating obstacle.For this part, the salience binarization area is mapped to the depth map obtained by Kinect sensor; we can obtain depth map of the obstacle and locate the obstacle.

The Architecture of Algorithm
In order to locate obstacle for mobile robot, firstly the mobile robot needs to recognize the obstacle.In this section, we use the ISRE method combining SF algorithm [8] with PCNN algorithm.The method extracts original saliency image and intensity feature image based on SF algorithm [8]; all pixels of intensity feature image are regarded as the input neurons of PCNN.In order to achieve the more precise sparking range, the ISRE algorithm improved the inputs of PCNN sparking pulse to complete the salience binarization region abstraction by many times iterations.Then, according to the depth map of obstacle, the mobile robot can obtain the distance information to locate the obstacle.

The ISRE Method.
As shown above, ISRE algorithm is a mixed model which simulates human eye's biological process.
The ISRE algorithm has main 6 steps.
Step 1. Gain OSM and IFM through image's rough segmentation.
Step 2. Every pixel of IFM is regarded as the PCNN's input neuron.
Step 3. Every external stimulation forms local stimulation through connecting synaptic weight.
Step 6. Generate the binary saliency map through many iterations.
The process is shown in Figure 2.Each element is represented by the mean color of the pixels belonging to it.Then, two contrast measures per element based on the uniqueness and spatial distribution of elements were defined.The first contrast measure implement is a commonly employed assumption that image regions, which stand out from other regions in certain aspects, catch our attention and hence should be labeled more salient, as shown in Figure 2(c).The second contrast measure implement renders unique elements more salient when they are grouped in a particular image region rather than evenly distributed over the whole image, as shown in Figure 2(d).Step 1 will be as described in Section 3.2.Section 3.3 describes the concrete process of Steps 2 and 3. Step 4 will be discussed in Section 3.4, and Steps 5 and 6 will be presented in Section 3.5.[8] does not overcome the faults which the contrast difference model brings based on single global vision.In other words, when the high intensity region is in the background, it will extract the background as the saliency region, as shown in Figure 3(b).Referring to the question, ISRE model utilizes the segmentation idea that is from rough to fine.Firstly, we conduct rough segmentation with SF algorithm, then fine segmentation with the improved PCNN algorithm for the image, and complete the salience binarization region abstraction finally, as shown in Figure 3(c).

PCNN Input Unit.
The original image obtains OSM and IFM through SF algorithm.Its specific process is divided into four steps.
Step 1. Conduct super pixels segmentation for the input original image.
Step 2. Extract each element uniqueness of super pixels.
Step 3. Measure the distribution of the whole image; that is to say, refer to the certain region elements and highlight their saliency by rendering depth.
Step 4. Fuse Step 2 with Step 3 and determine OSM and IFM.

Modulation Unit.
The modulation unit's main function is coupling modulation of outside stimulation main input   and local stimulation input connection   as shown in [9] where   is internal items of neuron and  is the connecting strength coefficient among synapses.The value is higher; 8 neighborhoods' neurons have more effect on central neurons.
The value of  is 0.4 in the paper.

Pulse Generator
Unit.An important characteristic of the human biological visual process is that visual attention changes as the change of scene.In order to simulate the process accurately, the ISRE algorithm improved traditional PCNN ignition pulse unit in the paper.The specific steps can be described as follows.
Step 1.Take OSM's pixels which are 75% maximum nonzero gray value, and set their value as 1 (white), so as to define a maximum range of saliency region.The others' values are set as 0 (black) to define certain background region.That is to say, we have achieved OSM's binary image OSM C.
Step 2. The neuron's internal items   and OSM C conduct dot product to determine the final ignition pulse input ⋅ as shown in Step 3. The   determined neuron's dynamic threshold Θ  .This process is presented as follows: where  [9] is index attenuation coefficient of traditional PCNN's dynamic threshold.
The value of  is 0.3 in the paper.
Step 4. Compare maximum of   with dynamic threshold Θ  to determine all firing neurons' value range and generate timing pulse sequence   , as shown in Through many iterations, we finally complete the extraction of the binary saliency map.The accuracy of the ISRE is related to the number of iterations; the more the number of iterations, the higher the accuracy.But increasing the number of iterations will take a lot of time.After repeated tests, we choose three iterations in this paper.The number of iterations in ISRE model has the effect of inhibiting the human visual nervous system; that is to say, it could effectively suppress the noise from significant target zone through isolated neurons or tiny area of ignition.

Obstacle Regions 3D-Perception.
With the advent of new sensing technology, 2D color image (RGB) along with the depth map (D) can now be obtained using low cost web camera style sensors such as Microsoft's Kinect [10].The depth map provides per pixel depth information obtained using an infrared laser projector combined with a camera.While it does not provide a true 3D mesh, an RGB-D image provides more information about the captured object than a traditional 2D color image.RGB-D images have been applied to a variety of tasks such as general object recognition, surface modeling and tracking, modeling indoor environments, face recognition, discovering objects in scenes, and robotic vision [11][12][13].Since the nature of an RGB-D image obtained from Kinect and a regular 3D map is fundamentally different, existing obstacle localization based on visual saliency techniques may not directly be applicable on these images.Each pixel in Kinect's depth map has a value indicating the relative distance of that pixel from the sensor at the time of image capture.It is important to utilize both RGB and depth data for feature extraction and classification.
In order to locate the obstacles, we obtained the depth map of obstacle through the binarization area that is mapped to the depth map collected by Kinect sensor and then calculated the distance between obstacle and robot through where  stands for the distance between robot and obstacle,   means the value of the depth image, and  and  stand for the depth image's row and col.

Standard Database Results.
To verify that the ISRE algorithm is effective in extracting the image significantly, do contrast experiment in two aspects: visual effect and objective quantitative data comparison, based on the database which is about the 1000 original images and the significant true value offered by [14,15] and the database about the seven existing salience extraction algorithms.Visual contrast experiment result is shown in Figure 4.
As can be seen from the figure, the second column IT algorithm [16] failed to fully highlight the significant area.The third column SR algorithm [17] only focuses on the extraction of significant edge, but lost inside information.The fourth column GB algorithm [18] and the fifth column MZ algorithm [19] are blurred for the extraction effect.The sixth column FT algorithm [15] and the seventh column AC algorithm [20] overall performance is better, but the effect is not good when significant background is too strong.For example, the grid of image in the third line and the bamboo of image in the fourth line are treated as a significant area and are extracted out.The eighth column SF algorithm [8] is the basis of ISRE model, and it extracts the significant area more accurately in various types of scenarios, while compared with the real value the extraction effect still needed to be improved when the significant region edge and the brightness have large differences.The ninth column is the final ignition pulse input result according to the algorithm mentioned in this paper; the overall effect has improved significantly on the basis of the SF algorithm; the extraction about the leaf areas of the image in the fourth row and the geese wings of the image in the second row has significantly improved and become closer to the value in tenth row (Ground Truth, GT).
To further verify the proposed algorithm more objectively, we need to consult [15] and design three experimental programs to quantitative calculation of IT [16], SR [17], GB [18], MZ [19], FT [15], CA [1], AC [20], SF [8], and ISRE algorithms to obtain objective comparative experimental data.All of the three protocols given by [12] make a comparison of the number of pixels in significant area extracted by the significant algorithm and the number of pixels in true area and then calculate their recall, precision, and -Measure values which are shown in (7).The difference is that experiment 1 used custom threshold to divide the binary saliency map and every database's image result is displayed by curve graph, while experiment 2 used adaptive threshold to divide the binary saliency map and every database's image result is displayed by histograms graph.The result was shown in Figure 5. DSR (Detected Salient Regions) are significant regions calculated by column.Refer to the standard mentioned in [21]; that is to say, to avoid having too high value of recall which maybe resulted by extracting much larger significant area, we should set  2 = 0.3 to enhance the proportion of precision.
It can be seen from Figure 5(a) that the extraction effects of each image in database based on the ISRE column mentioned in this paper were improved or unchanged compared to those based on SF algorithm.According to Figure 5(b), the precision ratio, recall ratio, and -Measure of the ISRE algorithm can reach 0.891, 0.808, and 0.870, which are obviously higher than the other eight columns:

Mean Absolute Error.
In order to demonstrate that, the extraction results of ISRE algorithm are more approximate to the binary ground truth GT.For a more balanced comparison that takes these effects into account, we therefore also evaluate the mean absolute error (MAE) between the continuous saliency map S (prior to thresholding) and the binary ground truth GT.The mean absolute error is defined as where  and  are the width and the height of the respective saliency map and ground truth image.Figure 6 shows that the same method MAE measure for ISRE and GT is very approximate.The IT algorithm MAE measure for ISRE is 0.610 and for GT it is 0.599.The MAE is almost equal.That demonstrates that our extraction is close to GT, as shown in Figure 6.

System Hardware Platform.
The platform used for environment exploration is an American Mobile Robots Inc.3 Pioneer3-DX embedded with a Kinect, illustrated in Figure 7.
The robot weighs over 9 kg, and the body of the robot is 44 cm × 38 cm × 22 cm.It possesses two driven wheels and one follower wheel, and the driven wheels' radius reaches 4.5 cm.The Kinect is a new and widely available device for the Xbox 360.The interest for Kinect is increasing in computer vision due to its advantages of providing 3D information of the environment and low cost.The device contains an RGB camera, a multiarray microphone, and a depth sensor.Using these sensors, Kinect executes depth measurement by light coding and captures full body 3D motion.Three software systems need to be installed to drive the Kinect, which are "OpenNI", "SensorKinect", and "NITE".The version number of the "OpenNI" which we used is 1.5.2.23.

Obstacles Regions 3D-Perception.
To further verify the effectiveness of the ISRE algorithm on mobile robot, we use this algorithm to extract significant area of real scene image in our laboratory, and extracted object is set to three; there are basketball, bricks, and carton; the experiment result was shown in Figure 8.
When the mobile is located in position A, the bricks are recognized by ISRE, and the distance between brick and robot can be calculated by (6); it is 1436 mm.We mark this position of the robot and measure the true distance with a ruler.The true distance is defined as .The value is 1458 mm.In the same way, when the robot is located in positions B, C, D, and E it can recognize the bricks, basketball, and carton, and the distances, respectively, are 1273 mm, 1108 mm, 1704 mm, and 2378 mm, while the true distances, respectively, are 1301 mm, 1084 mm, 1741 mm, and 2359 mm.The relative error does not exceed 5%.We also can see from Figure 8 that this algorithm simulates human vision; it is possible to achieve identifying the target object accurately, detecting ground obstacle, and locating the obstacle functions.When the Pioneer3-DX moves, the obstacle distance information is gained through the depth of obstacle image meanwhile.While the extraction may be quite different in different scenes, there is a problem that some significant information might be lost.

Contrast Experiment.
The obstacles regions 3Dperception strategy based on ISRE method is used to compare with SF method in the same scenario of our laboratory.The experiment results are demonstrated in Figure 9.
In this experiment, the robot starts at the red node position shown in the right half of Figure 9.Many obstacles were put carelessly in our laboratory.The robot move path with our method and SF method in the same scenario is shown by black and yellow colors separately.We can find that when the high intensity region is in the background, SF method will extract the background as the obstacle region (the areas surrounded by red lines in the middle of Figure 9).The robot move path will be affected by these wrong regions.So, from the right half of Figure 9, we can see that the avoidance path depended on our ISRE method move in an optimal path and is obviously shorter than only depending on SF method.

Conclusions
This paper proposed a new obstacle regions 3D-perception method for mobile robots in indoor environment based on visual saliency.The mobile robot can accurately extract and perceive the obstacle in experimental environment.ISRE algorithm improved extracting effect significantly based on PCNN algorithm.When a high brightness significant area exists in the background, SF algorithm [8] will extract the background as the saliency region.Referring to this question, we use PCNN to improve SF algorithm by simulating human visual biological processes.The salience binarization region abstraction was fulfilled by improved PCNN multiply iteration.In order to make the ignition range more exact, we use an adaptive thresholding method to improve the ignition pulse unit of original PCNN model.To contrast the performance of our proposed method, we made some qualitative and quantitative experiments based on standard image database.The extraction of the algorithm is closer to the true value.In real scene image significant region extraction test this algorithm also showed high accuracy and further demonstrated the effectiveness and efficiency.The method is conducted on Pioneer3-DX.Experimental results have demonstrated that the presented method could locate obstacles accurately and robustly and satisfy the real-time requirement.
In a subsequent study, we will take advantage of the information about the color and texture in significant areas of the original image to recover lost significant information to enable the extraction of the real scene to be closer to

Figure 1 :
Figure 1: Frame diagram of our model's application.

Figure 2 :
Figure 2: Effect map of units in ISRE model.

Figure 2 (
a) is the original image.The original image is first abstracted into perceptually homogeneous elements, as shown in Figure 2(b).
Figure 2(e) is the Original Salience Map (OSM) and Figure 2(f) is the Intensity Feature Map (IFM).
Figure 2(g) is the final salience map with our ISRE method.

Figure 3 :
Figure 3: Effect contrast map of ISRE and SF.

Figure 4 :Figure 5 :
Figure 4: Contrast experiment among the seven existing algorithms, the ground truth, and ISRE.

Figure 6 :
Figure 6: Comparison of mean absolute error relative to ISRE and GT.

Figure 7 :
Figure 7: System hardware platform for environment exploration.

Figure 8 :
Figure 8: Experiment results of real world images with our algorithm.