Automatic Segmentation of Nature Object Using Salient Edge Points Based Active Contour

,


Introduction
Object segmentation is one of the most important and challenging issues in image analysis and computer vision research.It facilitates a number of high-level applications, such as object recognition, image retrieval, image editing, and scene reconstruction [1,2].Most existing object segmentation systems adopt interaction-based paradigms [3,4]; that is, users are asked to provide segmentation cues manually and carefully.
Although the interaction-based methods are promising, they all pose a critical problem in which they need the users' semantic intention.Such manual labeling is time consuming and often infeasible.Moreover, the segmentation performance heavily depends on the user-specified seed locations.Thus, additional interactions are necessary when the seeds are not accurately provided.Specially, localizing region-based active contour (called LRAC) [5] is exactly one of the classic interaction-based methods.Segmentation results heavily depend on the initial contour selection.Thus, it needs the specified initial contour which should be close to the boundary of object.
For this reason, developing a sophisticated fully automatic object segmentation method has been strongly demanded.The human brain and visual system can effortlessly grasp certain salient regions in cluttered scenes.By observing the fact that, under most circumstances, the salient parts of an image are usually consistent with interesting objects to be segmented, salient regions have been attempted for estimation.In contrast with existing interaction-based approaches that specify the object and background seeds by manual labeling, some methods (e.g., Fu's method [6] and Achanta's method [7]) determine the seed locations based on the visual attention model.Since the accuracy of the visual attention model plays a crucial role in object segmentation, these algorithms also depend on the quality of the chosen saliency 2 Mathematical Problems in Engineering map.Alternatively speaking, the worse the chosen saliency map is, the worse the corresponding final extraction result is.
To remedy such shortcoming, we pay close attention to salient object edge points rather than the saliency map itself.After the salient object edge points were detected, the region which is constrained by these corner points will be obtained.The boundary of this region is close to the object edge.Thereby, the boundary of this region is used as the initial contour of LRAC model.
In our method, the salient edge points are generated by the color boosting Harris detector for input image firstly.We then explore the salient object seeds by the core saliency map, and the salient object edge points are determined by these salient object seeds.Initial contour is then created by convex hull algorithm with salient object edge points automatically.Finally, the object will be extracted accurately by LRAC method with the initial contour in the previous step.
The remainder of this paper is organized as follows.Section 2 reviews some related work about saliency models and an interactive image segmentation method.Section 3 presents the proposed salient edge point based active contour for natural object segmentation algorithm.Section 4 demonstrates extensive experimental comparison results.Section 5 finally draws the conclusions.

The State-of-the-Art Automatic Image Segmentation Methods.
In [6], Fu et al. proposed an automatic object segmentation approach integrating saliency detection and graph cuts [8], namely, Fu's method, to overcome the disadvantages of interactive graph cuts.They also explored the effects of labels to graph based segmentation, and the so-called "Professional Labels" are introduced to evaluate labels and a multiresolution framework is designed to provide such "Professional Labels" automatically.This method obtains quite complete object segmentation comparable to interactive graph cuts with manual "Professional Labels." Achanta's method [7] is also an automatic image segmentation method.It oversegments the input image using mean-shift algorithm and retains only those segments whose average saliency is greater than an adaptive threshold.The binary maps representing the salient object are thus obtained by assigning ones to pixels of chosen segments and zeroes to the rest of the pixels.
These two methods are absolutely automatic and involve none of manual interactions.Fu's method is based on either of the graph cuts while Achanta's method uses mean-shift algorithm.However, there are several desirable advantages of LRAC over graph cuts and mean-shift algorithm.First, LRAC can achieve subpixel accuracy of object boundaries [5].Second, LRAC can be easily formulated under a principled energy minimization framework and allow incorporation of various prior knowledge for robust image segmentation.Third, LRAC can provide smooth and closed contours as segmentation results which are necessary and can be readily used for further applications, such as shape analysis and recognition.

Localizing Region-Based Active Contour Model. In [5],
Lankton and Tannenbaum proposed a natural framework that allows any region-based segmentation energy to be reformulated in a local way.
In general, this algorithm could reliably extract the object contour if the user inputs appropriate markers.Namely, the interactive segmentation algorithm is more or less sensitive to the position and quality of the user-inputs (see an example in Figure 1).
Here, we choose a complex energy that looks past simple means and compares the full histograms of the foreground and background.Consider   () and  V () to be two smoothed intensity histograms computed from the global interior and exterior regions of a partitioned image using intensity bins.
Here, we choose the global region-based energy that uses mean intensities which is the one proposed by Wen et al. [9] which we refer to as histogram separation energy: where BC is the Bhattacharyya coefficient used to compare probability density functions and   () and  V () represent two smoothed intensity histograms computed from the global interior and exterior regions of a partitioned image using intensity bins.Optimizing this energy causes that the interior and exterior means have the largest difference possible.
In [5], Lankton and Tannenbaum introduced (, ) to mask local regions.Function (, ) will be 1 when the point  is within a ball of radius centered at  and 0 otherwise.
Accordingly, the corresponding internal energy function  is formed by localizing the histogram separation energy as shown in where  , () and  V, () represent the intensity histograms in the local image regions (, ) ⋅ () and (, ) ⋅ (1 − ()), respectively.We can get the following local region-based flow: Figure 1: Interactive image segmentation by LRAC [5] with user-specified strokes of the object (green).First row: input image.Second row: three different user-specified inputs.Third row: the corresponding segmented objects with respect to different user-specified inputs.
where  is a Gaussian kernel,  is a parameter which weights the length of the curve, Ω  denotes a bounded open subset of R 2 , and   and  V are the areas of the local interior and local exterior regions, respectively, given by In general, this algorithm could reliably extract the object contour if the user inputs appropriate markers.Namely, the original interactive segmentation algorithm is more or less sensitive to the position and quantity of the user-inputs.Although many markers were used to cover the object features, in some regions it does not achieve satisfying results (see the third row of Figure 1).Moreover, it is tedious and time consuming in some cases.

Saliency Detection Models.
During the last two decades, visual saliency detection and saliency map generation aiming to find out what attracts human's attention got broad interest in computer vision, especially for object detection or recognition from different scenes.A majority of computational models of attention follow the structure adapted from the Feature Integration Theory (FIT) [10] and the Guided Search model [11].The saliency detection models fall into two general categories: local contrast based method and global contrast based methods.
Local contrast based methods investigate the rarity of image regions with respect to (small) local neighborhoods.Based on the highly influential biologically inspired early representation model introduced by Koch and Ullman [12], Itti et al. [13] define image saliency using central surrounded differences across multiscale image features.Ma and Zhang [14] propose an alternative local contrast analysis for generating saliency maps, which is then extended using a fuzzy growth model.Harel et al. [15] normalize the feature maps of Itti et al., to highlight conspicuous parts and permit combination with other importance maps.Liu et al. [16] find multiscale contrast by linearly combining contrast in a Gaussian image pyramid.More recently, Goferman et al. [17] simultaneously model local low-level clues, global considerations, visual organization rules, and high-level features to highlight salient objects along with their contexts.Such methods using local contrast tend to produce higher saliency values near edges instead of uniformly highlighting salient objects.
Global contrast based methods evaluate saliency of an image region using its contrast with respect to the entire image.Zhai and Shah [18] define pixel-level saliency based on a pixel's contrast to all other pixels.However, for efficiency, they use only luminance information, thus ignoring distinctiveness clues in other channels.Achanta et al. [7] propose a frequency tuned method that directly defines pixel saliency using a pixel's color difference from the average image color.The elegant approach, however, only considers first order average color, which can be insufficient to analyze complex variations common in natural images.A recent excellent model proposed by Cheng et al. [19], which is named RC, calculated the saliency map by evaluating global contrast differences based on histogram.
We compared the abovementioned 5 state-of-the-art saliency detection methods.The comparison results are shown in Figure 2.

The Proposed Method: LRACSEP
For the issues pointed out in Section 2, in this paper, we focus our attention on the automatic acquisition of prior information.For one pixel in a saliency map, the saliency value is proportional to the intensity value.In other words, normally, for an image, pixels which have higher values in the corresponding saliency map are object pixels; conversely, they are background pixels.Inspired by this idea, we proposed our approach called localizing region-based active contours via salient edge points (LRACSEP).This strategy is intended mainly for the acquisition of prior information automatically instead of user-inputs.
Our purpose is to set the initial contour close to the object boundary.It is noted that the color boosting Harris detector yields the salient edge points.Consequently, we have to detect the salient object edge points firstly.For this purpose, we propose the core saliency map to find the salient object edge points.As is known to all, the initial contour of the level set is a closed curve.Therefore, we choose convex-hull polygon to embody the detected salient object points.
A general schematic framework of our proposed method (LRACSEP) is depicted in Figure 3.The major steps include (i) detecting the salient edge points; (ii) obtaining the core saliency map; (iii) finding the core edge points corresponding to the core saliency map; (iv) detecting the salient object edge points based on the core saliency map; (iv) using convex hull to generate the initial level set contour.

Salient Edge Points Detection via the Color Boosting Harris Detector.
Traditional luminance-based saliency detection methods incline to completely ignore the color information and thus are very sensitive to the background noises.van de Weijer et al. [20] analyze the statistical distribution of color derivative and propose a color saliency boosting function to enhance rare color edges or corners.Their goal is to incorporate color distinctiveness into salient point detection or, mathematically, to find the transformation for which vectors with equal information content have equal impact on the saliency function.The desired color saliency boosting function is obtained by where Λ is a 3 × 3 diagonal matrix with Λ 11 = , Λ 22 = , Λ 33 = , and Λ 11 2 + Λ 22 2 + Λ 33 2 = 1, ℎ is one of the color transformations S, Õ, or , and  = (, , )  for a color image.
Meanwhile, the Harris corner detector [21] is a popular interest point detector due to its strong invariance to rotation, scale, illumination variation, and image noise.The Harris detector has been shown to outperform other detectors both on "shape" distinctiveness and repeatability.
The Harris corner detector is based on the local autocorrelation function of a signal, where the local autocorrelation function measures the local changes of the signal with patches shifted by a small amount in different directions.Thereby, the boosting color saliency theory can be applied to Harris detector.As can be seen in Figure 4, compared with the intensity-based feature detectors, the boosted color saliency points [20] are shown to be more stable and informative.
In this paper, we adopt the color boosting Harris points as salient points (Figure 4(d)) to catch the corners or marginal points of visual salient region in color image.The salient points provide us a coarse location of the salient areas.These points are denoted by SI(),  = 1, 2, . . ., .However, these points contain not only salient object points but also salient background points.The salient background points (from the tree in Figure 4) are noises for us to get the initial contour close to the object.Thus, the objective of our model is to distinguish object points from background points.It is exactly binary classification problem.Hereby, we will present a clustering method to find salient object points, which is based on the initial object seeds.Therefore, the objective is to select the most appropriate initial seeds.For this purpose, we present the core saliency map.The seeds of salient points are determined by the core saliency map.

The Seeds Determined by the Core Saliency Map.
We choose the three prominent saliency models: RC, MZ, and FT.MZ is local contrast method while RC and FT are based on global contrast.The reason of choosing the two global contrast based models is that FT can output desirable results with very efficient computation while RC can well represent the regional contrast feature and is insensitive to local sudden changes.
As can be seen in Figure 5, the details highlighted by the three saliency maps ( RC ,  MZ , and  FT ) are not the same.In spite of this, these saliency maps prefer to highlight the common parts of objects (referred to as core saliency map).From the left to the right: original image, CA [17], FT [8], RC [19], IT [13], and GB [15] saliency maps.

Original image
The salient edge points The core map The core edge points The salient object edge points The segmentation result Initial convex hull Improved convex hull For any pixel , the core saliency map ( coresaliency ()) is computed as For convenient show, we propose the core map ( core ()), which is the binarization of  coresaliency (): where (⋅) is the binarization operator.To bianarize  coresaliency (), we introduce adaptive threshold  which is determined as where ℎ and  are the height and width of the image, respectively.The corresponding core map is exactly Figure 5(d).Pixels which are included in the core map are highly likely to be parts of the object.Consequently, the points which are included not only in the salient edge points (white dots in Figure 4(d)) but also in the core map are labeled as foreground seeds.These seeds are indicated by blue dots in Figure 6.

The Salient Object Edge Points Detection and Using Convex
Hull.As known to all, each superpixel is a perceptually consistent unit; that is, all pixels in a superpixel are most likely uniform in color and texture.For this reason, provided that one of the color boosting Harris points is in the same superpixel with the foreground seed, this point should be treated as the salient object edge points.According to this strategy, the search of the salient object edge points is shown in Figure 7.We can observe that the points in the left-hand part of Figure 7(c) are omitted in Figure 7(d).
Then the convex hull (Figure 8(a)) is used to embody these salient object edge points.The contour of the convex hull (green line in Figure 8(b)) is chosen as the initial contour    of LRAC model.This initial contour is not sufficiently close to the boundary of this object.

Improved Convex Hull by an Edge-Preserving Filter.
Given input image  and initial convex hull   , we want to get a refined convex hull.We note that solving this problem is similar to the image matting method.Therefore, our goal can be achieved by minimizing where   denotes the output that we want to get,   is the initial convex hull, Λ is a diagonal matrix encoded with the weights of the constraints, and  is the matting Laplacian matrix [21].The (, )th element of  is given by where   is Kronecker delta,  and  are pixel indexes of input image ,   is a 3 × 1 mean vector of the colors in a squared window   with dimensions  × , centered at pixel , ∑  is a 3 × 3 covariance matrix,   is a 3 × 3 identity matrix, || denotes the number of pixels in the window   , and  is a smoothness parameter.
As can be seen from Figure 9(a), the initial contour (the contour of the convex hull) is close to the object boundary.It gives rise to the fact that LRAC model provides good segmentation performance and the times of iterative steps are reduced in contour evolution.
We use more images to better show the performance of our improved convex hull.They are shown in Figure 10.It is obvious that the obtained convex hull is more close to the real object than the initial convex hull.

Experiments
In order to verify the proposed method, we have evaluated the results of our approach on the publicly available database provided by Achanta et al. [7].This database includes 5000 images, originally containing labeled rectangles from nine users drawing a bounding box around what they consider the most salient object.There is a large variation among images, including natural scenes, animals, indoor, and outdoor.To the best of our knowledge, the database is the largest of its kind and has ground truth in the form of accurate humanmarked labels for salient regions.For consistency in these experiments, we chose  = 0.15 in all trials to weight the influence of contour smoothness.

Comparison and Evaluation.
Firstly, to measure the segmentation performance of LRACSEP algorithm comprehensively, we compare LRACSEP algorithm with the Grabcut [22] algorithm using more saliency maps, that is, the abovementioned 9 state-of-the-art saliency maps.Grabcut is very useful for image segmentation and one can get satisfactory results when giving a very informative input.It enabled users to roughly annotate (e.g., using a rectangle) a region of interest and then use Grabcut to extract a precise image mask.To automatically initialize Grabcut, we use a segmentation obtained by binarizing the saliency map using a fixed threshold.We set the threshold to 0.3 empirically.Once initialized, we iteratively run Grabcut 4 times to improve the segmentation result.Figure 11 shows the comparison results.
Here, we use the precision, recall, and -measure to evaluate the performance of our proposed model.
-measure, a harmonic mean of precision and recall, is a measure that combines precision and recall.It is calculated as follows: where  is a positive parameter to decide the importance of precision over recall in computing the -measure.
We use  = 0.3 [19] in our work for fair comparison.The segmentation performance is compared in Figure 11.It is shown in the figure that the proposed method significantly outperforms the abovementioned 9 models with respect to precision, recall, and -measure.
As seen in Figure 11, the Grabcut using RC saliency map is better than other saliency maps based Grabcut.For the convenience of visual inspection of the segmentation performance, the LRACSEP method is compared with the Grabcut on RC saliency map on a group of images (see Figure 12).As shown in Figure 12, the Grabcut on RC saliency map yields high false-positive (i.e., the background areas misclassified to object areas) and false-negative (i.e., the object areas misclassified to background areas) rates.In   contrast with that, the proposed algorithm robustly works even with complicated cluttered background.Such favorable segmentation results can be achieved since we use localizing region-based active contour model which can achieve subpixel accuracy of object boundaries.Additionally, for the Grabcut on RC saliency map, the performance of saliency map affects the final segmentation result.We secondly measure the segmentation performance of the proposed algorithm, as compared with existing competitive automatic salient object segmentation methods, such as Fu's method [6] and Achanta's segmentation method [7]. Figure 13 shows the segmentation performance of the three methods.It is shown in the figure that the proposed method significantly outperforms the state-of-the-art algorithms with respect to precision, recall, and -measure.

The Comparison of Iteration Times.
To verify the effectiveness of our method, we compare LR with the abovementioned two state-of-the-art algorithms: Fu's method [6] and Achanta's segmentation method [7].The average numbers of iterations are depicted in Figure 14.It can be observed that our method is more efficient.The reason for the advantage of our method is that our method makes use of the salient edge points, while the other two methods are based on the saliency maps.The computation of saliency map is consuming.

Conclusions and Future Work
In this paper, we propose a novel automatic approach to extract interesting objects from natural images.This approach uses the salient edge points as the prior knowledge.It makes the original semisupervised segmentation method LRAC become unsupervised.Our main contributions are threefold: the first is that the core saliency map is proposed to determine the foreground seeds; the second is that salient object edge points are detected by the foreground seeds; the last is that the proposed framework can apply any active contour model to segment the salient object automatically.From the experimental results, our method is better than several state-of-the-art saliency-based segmentation methods on the public database.In contrast with existing interactive segmentation approaches that require considerable user interaction, the proposed method does not require it; that is, the segmentation task is fulfilled in a fully automatic manner.

Figure 3 :
Figure 3: A general schematic framework of LRACSEP.

Figure 4 :
Figure 4: The salient edge points.(a) Original image; (b) luminance-based Harris points; (c) color based Harris points; (d) the color boosting Harris points.
MZ (d) The core map

Figure 5 :
Figure 5: Three state-of-the art saliency maps and the core map.

Figure 6 :
Figure 6: The foreground seeds determined by the core map.
(a) The image superpixels (b) The foreground seeds (c) The color boosting Harris points (d) The salient object edge points

Figure 7 :
Figure 7: The salient object edge points detection.

Figure 8 :
Figure 8: The initial contour of our model.(a) The convex hull which contains the salient points in Figure 7(d) and (b) the contour of the obtained convex hull as the initial contour.

Figure 9 :
Figure 9: The segmentation results of our model.

Figure 10 :
Figure 10: Visualization of the improved convex hulls.The images from top to bottom are the original input images, the initial convex hulls   , and the improved convex hulls   .

Figure 11 :
Figure 11: Precision-recall bars for the proposed algorithm and the Grabcut using different saliency maps.Our method, LRACSEP (LR), shows high precision, recall, and -measure values over the 1000-image database.

Figure 12 :
Figure 12: The segmentation results by LRACSEP and the Grabcut on RC saliency map.(a) Original image, (b) LRACSEP, and (c) the Grabcut on RC saliency map.

Figure 13 :Figure 14 :
Figure 13: Results of object segmentation.The leftmost is the original image.The segmentation results from the second left to right are obtained from [6, 7] and the proposed algorithm.