^{1,2}

^{1,2}

^{1,2}

^{1,2}

^{3}

^{1,2}

^{1}

^{2}

^{3}

Segmenting human hand is important in computer vision applications, for example, sign language interpretation, human computer interaction, and gesture recognition. However, some serious bottlenecks still exist in hand localization systems such as fast hand motion capture, hand over face, and hand occlusions on which we focus in this paper. We present a novel method for hand tracking and segmentation based on augmented graph cuts and dynamic model. First, an effective dynamic model for state estimation is generated, which correctly predicts the location of hands probably having fast motion or shape deformations. Second, new energy terms are brought into the energy function to develop augmented graph cuts based on some cues, namely, spatial information, hand motion, and chamfer distance. The proposed method successfully achieves hand segmentation even though the hand passes over other skin-colored objects. Some challenging videos are provided in the case of hand over face, hand occlusions, dynamic background, and fast motion. Experimental results demonstrate that the proposed method is much more accurate than other graph cuts-based methods for hand tracking and segmentation.

There are four main kinds of object tracking methods which are points, skeleton, contour, and silhouette tracking in recent papers [

In the last decade [

In recent years, graph cuts-based methods have been applied in tracking or segmentation systems. Xu and Ahuja [

Hand tracking is a challenging problem because the hand presents 27 degrees of freedom (DOFs), including 21 DOFs for the joint angles and 6 DOFs for orientation and location [

To avoid the degeneracy problem of interest points [

An augment graph cuts method is introduced to track and segment hand regions and different hands labelled with different colors.

The proposed method can track and segment hands on some challenging environments, such as hands overlap, hand fast motion, and hand over face. Also the proposed method can track and segment hands in dynamic backgrounds where some skin-colored objects may be present.

The framework of our method is shown in Figure

The framework of our method for hand tracking and segmentation.

The rest of the paper is organized as follows. We describe basic notions of multiobject tracking based on graph cuts in Section

Here, we describe the basic principle of graph-cuts based methods for object tracking and segmentation. We review image segmentation via graph cuts at first. Then, object tracking is described via graph cuts and dynamic model.

We briefly outline multilabel graph cuts technique. The detailed information can be found in [

The smooth term

Suppose

In [

Although the methods [

(a) Initialization at time

Suppose that

When the segmentation result

To compute unknown mean velocity

(a) Some points are out of hand range via optical flow at

To compute the unknown velocities, a set of interest points is considered. At time

And the mean velocity

From (

In order to capture fast hand motion, we can set a large value to

As time goes on, the number of interest points may goes down via optical flow (see the second row in Figure

In this work, we accept the idea of the work [

Here,

Now we explain how to define new terms and incorporate them into energy function. Those new terms are the core principle in augmented graph cuts.

Owing to the similar color of human skin, it is difficult to eliminate the effect of each hand by the works [

As illustrated in Figure

Illustration of computing function

In (

Using the motion information allows to reject some bad segmentations in the case of hands over skin-colored objects. When a pixel

The above defined terms are based on motion information and the prediction set

(a) Source image and (b) result by chamfer distance transform.

We merge all of the mentioned terms. Therefore, the hand tracking problem consists of six terms to minimizing the following energy function:

Compared with the energy function equations (

We have described the principle of our method to track and segment hands in different circumstances. We use four steps to achieve hands tracking and segmentation. At first, initialization segmentations for all tracked hands are provided by manual operation at time

(i)

(ii)

(iii)

(iv) Manually Initialize the sets

(v) Find interesting points

For at time

(i) Find interesting points

(ii) If

(iii) Compute the hands mean velocity

(iv) Predict the sets

Build the graph and apply

If

Update interesting points in the region

If

Return to Step 2.

To validate and evaluate the proposed approach, we afford four videos (three videos were captured by our webcam and one video is an American sign language (ASL) video provided by Purdue ASL database [

The proposed method is implemented in Microsoft Visual Studio 2008. All the videos we have offered are tested on a Core 2 Duo P8600 Processor with 2 GB RAM. The initialization segmentations (at time

This video has 141 frames and the frame size is 320

In Figure

(a) Results by the method [

Now we give an example to demonstrate that our method can achieve hand segmentation even though hands pass over skin-colored objects, such as face. The video called video 2 is recorded outdoors including 106 frames. The frame size is 640

As shown in Figures

Result by [

Result by our method (initialization at time

The video called video 3 is from Purdue ASL database [

Result by our method (initialization at time

In order to further evaluate the effectiveness of the proposed method under complex situations, we test our method in dynamic background. The video called video 4 was captured in lab environment including 174 frames with frame size 320

Result by our method (initialization at time

The energy minimizing function in (

Here, we give some hints about adjusting parameters to help use the proposed method. The spatial parameter

The parameter

Results with a small value

In order to perform objective comparison, we first manually segment hand mask (ground truths) for every frame in our test videos. Then we calculate mean percentage error (MPE) [

When hands and other skin-colored objects are in the same scene (hand over face, hand occlusions), MPEs by our method are much lower than the method [

The MPE (0.1714%) of video 4 by our method is close to ground truth (0%), which proves that the proposed method is well suitable for hands tracking and segmentation in sign language video.

Comparisons of the methods on MPE and AET.

Next, we give the running times of both the proposed method and the method [

In this paper, we present a method based on augmented graph cuts and the dynamic model for hand tracking and segmentation in different environments. The proposed algorithm can resolve three problems: fast hand motion capture, hand occlusions, and hand over face. In our method, we reformulate the energy function by adding some new energy terms which are more robust to hand tracking and segmentation. Additionally, the new terms can deal with occlusions and obtain accurate segmentation.

Meanwhile, there are a lot of perspectives that can be improved. At first, we can develop a method to automatically extract hand region instead of manually segmented hands in initialization step. For instance, we can apply AdaBoost algorithm [

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported partly by the National Natural Science Foundation of China (61172128), National Key Basic Research Program of China (2012CB316304), New Century Excellent Talents in University (NCET-12-0768), the Fundamental Research Funds for the Central Universities (2013JBZ003), Program for Innovative Research Team in University of Ministry of Education of China (IRT201206), Beijing Higher Education Young Elite Teacher Project (YETP0544), and Research Fund for the Doctoral Program of Higher Education of China (20120009110008).