A person tracking algorithm by fusing multicues based on patches is proposed to solve the problem of distinguishing person, occlusion, and illumination variations. Kinect is mounted on the robot for providing color images and depth maps. A detector representing a person by using the fusion of multicues based on patches is proposed. The detector divides the person into many patches and then represents each patch by using depth-color histograms and depth-texture histograms. The appearance representation, considering depth, color, and texture information, has powerful discrimination ability to handle the problems of occlusion, illumination changes, and pose variations. Considering the motion of the robot and person, a tracker called motion extended Kalman filter (MEKF) is presented to predict the person’s position. The result of the tracker is treated as a candidate sample of the detector, and then the result of the detector is the previous knowledge of the tracker. The detector and tracker supplement each other and improve the tracking performance. To drive the robot towards the given person precisely, a fuzzy based intelligent gear control strategy (FZ-IGS) is implemented. Experiments demonstrate that the proposed approach can track a person in a complex environment and have an optimum performance.
With the popularity of robot in human environments, it is necessary to detect and track a person in many applications including surveillance, search, rescue, combat, and human assistant. Person detecting and tracking are very challenging computer vision tasks due to automatic initialization, pose variations, expensive calculation cost, and occlusions in complicated environments [
In real-world settings, persons are nonrigid and difficult to be tracked. To resolve the problem, an efficient representation should be considered for an available appearance model. Color is widely used for modeling a target, and one of the best methods for color-based object tracking is to realize the mean shift algorithm [
To further eliminate the influence of background, depth information captured from stereo cameras is employed. The depth information easily performs the foreground-background segmentation [
Occlusion is a difficult problem in object tracking. To cope with the problem, patches based algorithms were proposed [
For a robot system in clustered environment, a continuous and stable controller is important for following a person. However, to the author’s knowledge, many works have focused on the problem of target detection and tracking but rarely addressed the problem of designing a suitable controller for driving a robot [
In the past decades, person tracking system using a robot has achieved a lot of improvements. However, the problems of distinguishing person, occlusion, and safe following still exist. We address the problem by representing a person with multicues based on patches and designing a fuzzy based intelligent gear control strategy (FZ-IGS). The person detection algorithm includes a detector and a tracker. The detector divides a person into many patches and represents a patch by the use of multicues including depth, color, and texture. The depth information, indicating the person’s location, is combined with color and texture features for generating depth-color histograms and depth-texture histograms, respectively. As track evolves, the detector adjusts the person’s size according to depth information. By analyzing the depth histograms and patches’ similarity with the given person, the detector can easily recognize the occlusion and then make a decision to update the person’s appearance model and change the tracking strategy. When there is a partial occlusion, the detector recognizes the person by using the patches which are not occluded. The tractor called MEKF is generated from the EKF by considering the motion of the robot and person. The MEKF predicts the person’s position as a candidate sample for the detector. Finally, FZ-IGS is designed to change the turning gain and linear velocity of the robot according to the position of the person from the robot. The FZ-IGS drives the robot towards the person continuously and stably.
The paper is organized as follows: the overview of the proposed method is discussed in Section
The section details the platform and the system overviews for performing the person following task.
The platform used for performing person following task is an American Mobile Robots Inc. Pioneer 3-DX embedded with a Kinect, illustrated in Figure RGB camera: depth sensor: sensor range: 1.2 m–3.5 m, field of view: horizontal:
The platform for person tracking.
Using these sensors, the Kinect can provide two kinds of images: depth image and color image. The depth image is obtained by the depth sensor which contains a CMOS camera and an infrared projector. The infrared projector produces speckle pattern in the scene. Then, the CMOS camera records the speckle pattern and results in the depth image. The color image is produced by the RGB camera with a resolution of
Given a stream of color images and depth maps, our goal is to continuously track a person. The overview of our system is presented in Figure
The overview of the system.
It is reported that the appearance represented by a single feature often fails in tracking process when there is similar background. To handle the problem, we represent a person by using multicues including depth, color, and texture. The detector can successfully recognize a person by using one feature while the other features are invalid. The depth feature, easily discriminating the person from background, is extracted for representing the person to overcome the background’s inference. Furthermore, the detector detects the problem of occlusion considering the depth histograms and the patches’ appearance similarity and then adjusts the online update strategy.
Depth map, captured from the sensor Kinect, provides 3D information of the environment and is invariant to illumination [
The depth is discretely distributed in
The illustration of the depth histogram. (a) The depth image, the blue rectangle is for the target. (b) The depth histogram for the depth image. (c) The depth histogram for the target in the blue rectangle.
Furthermore, the person’s size changes according to the variation of his position from the robot in the tracking process. The appearance model obtained by using the fixed rectangle size will introduce background’s inference or lose some important information when the distance changes. While the distance is large, we expect the rectangle size to be small for fitting the person. When the person is close to the robot, we expect the rectangle size to be large to fit the person. The depth information indicates the changes of the person’s size. In the case in which the person’s position changes, the bin values of the depth histograms will correspondingly vary. Thus, we adaptively adjust the rectangle’s size based on the depth histograms. The person’s current size is obtained as follows:
In order to successfully discriminate a given person, multicues are employed for representing the person. Color has been proved to be useful for modeling a target. Compared with other features, color is insensitive to scale and translation. Therefore, it has been widely adopted for target representation. Texture, as another effective description operator, indicates the pixels’ space property. To obtain more powerful representation, color and texture are mixed for modeling a target.
The traditional color and texture based object representation has successfully discriminated person when there is color or texture clutter in background [
The person is represented by modeling each patch using the obtained depth-color and depth-texture histograms. The color histograms describe the target integrally, while the texture histograms depict image’s local texture. The two features somehow supplement each other. The depth information, identifying the target from background, deals with the color or texture clutter in background.
As tracking evolves, there may be occlusion which will result in tracking failure. The patches based tracking algorithm was proposed to deal with the problem [
The illustration of depth information when there is occlusion.
For a person tracking system, the person’s depth information lies in the last bin of the depth histograms and is far away from the background usually. In the case in which the person is occluded, the last two bins are next to each other. The last bin belongs to the passerby, while the last bin but one is for the person. As shown in Figure
In an ideal tracking process, the bin for the person maintains stability. The depth feature similarity is calculated as
Compared with only one feature, to represent a person by extracting different features can improve the model’s discrimination ability. Once one feature fails to discriminate the person, the other features are valid. The person is represented by many patches which are in a decreasing order based on the depth-color histograms and depth-texture histograms, respectively. For a given threshold, th, the detector recognizes the person according to their appearance similarity. Normally, the candidate sample with the maximum overall similarity and over
The illustration of depth information when there is occlusion.
The patch’s similarity between the candidate sample and the person is measured by using the cosine similarity metric:
The similarity between the candidate sample and the model is
The overall similarity is
EKF is a set of mathematical equations providing an efficient solution for prediction problem. The algorithm is very powerful to deal with the short time occlusion problem in tracking process. However, for the person tracking system with a mobile robot, the EKF often fails to accurately predict the person because the robot and person are moving together. To deal with the problem, we present a tracker called Motion EKF combining the motion of the robot and the person:
Considering the robot’s motion, the state transition function is
The state equation and observation equation of the MEKF are obtained by considering the robot and person’s motion. Compared with EKF, MEKF introduces the robot’s trajectories to improve the robustness of the tracking. Moreover, the tracking result is a sample of the candidate set of the detector. The detector recognizes the result from the candidate set including the tracking result. The detector and tracker complement each other, which improves the ability of person detecting.
The proposed tracking framework has been detailed in Figure Input is as follows: the depth image and color image. Get depth histograms for the depth image. Divide the depth image and color image into For a new frame, candidate samples are obtained around the result. MEKF predicts the person’s position which is treated as a sample for detector. Extract the candidate samples’ depth histograms and divide the depth image and color image into
extract each patch’s depth-color histograms and depth-texture histograms; compute the patch’s similarity of the depth-color histograms and depth-texture histograms.
The patches will be in a decreasing order based on Determine the occlusion problem based on the depth histograms and image pieces’ similarity, and then detect the person accordingly. Output is as follows: person’s position.
Illumination changes and pose variations may result in appearance variation. To cope with this problem, an efficient update strategy should be used for adjusting to the appearance changes after detecting the person. The update strategy studies the person’s appearance model according to patches’ similarity in different tracking circumstances:
Normally, the
Our goal is to design an efficient controller to drive the robot towards a given person and remain at a secure distance from him. To follow the robot smoothly and continuously, an intelligence control strategy (IGS) was presented [
The path of the robot towards the person.
For path B, the velocities of the robot’s wheels are computed as follows:
As following evolves, the turning-gain
Our task is to keep the robot in a safe distance from the person while both the robot and person are moving. The distance between the robot and person varies due to their motions. In order to achieve a success track, the robot should change its linear velocity according to the distance obtained from the detector. Therefore, a fuzzy based linear velocity controller is designed to adaptively adjust the robot’s velocity.
For the controller, the distance
The membership functions for linear velocity controller and turning-gain controller.
The membership functions for the velocity controller
The membership functions for the turning-gain controller
The fuzzy logic is established based on the human knowledge, which is shown in Table
The fuzzy logic for velocity controller.
|
| |||||
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
As following evolves, the person often wanders from the center of the robot’s field. In such a case, the robot should change its turning radius in time to make sure that the person is in the center of the robot’s field. To implement the task, a fuzzy based turning-gain controller is designed, where the robot’s turning gain is adjusted according to the direction between the person and robot and the person’s horizontal velocity.
The inputs for the fuzzy based turning-gain controller are the direction
The fuzzy logic is designed according to the human knowledge to determine the robot’s turning-gain
The fuzzy logic for turning gain controller.
|
| |||||
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
|
Our person tracking algorithm is conducted on the Pioneer 3-DX robot.
In this set of experiments, our method is compared with the color-texture based object representation algorithm [
The tracking result using CT algorithm and our method when user is moving but robot is still.
In this section, our method is evaluated on a moving robot. As tracking evolves, there are occlusion, turning, appearance changes, and motion of both the robot and the given person. The tracking results are shown in Figure
The tracking result using CT algorithm and our method when both of the user and the robot are moving.
In this section, the performance of the presented FZ-IGS is evaluated. The given person’s 3D position (
The robots path for following the target in the Lab.
In this paper, we developed a new person tracking algorithm for a mobile robot. The paper exhibited four contributions. The first contribution concerned the person representation algorithm based on the fusion of multicues including depth-color histograms and depth-texture histograms. The color and texture information complement each other which improves the appearance’s discrimination ability. The depth information easily discriminates the person from the background. The second contribution concerned patches based detection algorithm which divided the person into many patches. It could handle the partial occlusion problem by analyzing the unoccluded patches’ similarity. The third contribution concerned the tracker MEKF which considers the motion of the robot and person. The fourth contribution concerned the fuzzy based intelligent controllers (FZ-IGS) which can adaptively change the linear velocity and turning-gain according to the person’s positions obtained from the detector. The experimental results have demonstrated that the proposed method is able to track a person robustly and accurately. In the future, we will study the obstacle avoidance method in the tracking process.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The research work is supported by National Natural Science Foundation (61175087), the Youth Foundation of Hebei Education Department, QN2014170, and Hebei province Science and Technology Support Program, 14275601D.