Nonlocal Variational Model for Saliency Detection

We present a nonlocal variationalmodel for saliency detection from still images, fromwhich various features for visual attention can be detected by minimizing the energy functional. The associated Euler-Lagrange equation is a nonlocal p-Laplacian type diffusion equation with two reaction terms, and it is a nonlinear diffusion. The main advantage of our method is that it provides flexible and intuitive control over the detecting procedure by the temporal evolution of the Euler-Lagrange equation. Experimental results on various images show that our model can better make background details diminish eventually while luxuriant subtle details in foreground are preserved very well.


Introduction
Saliency is an important and basic visual feature for describing image content.It can be particular location, objects, or pixels which stand out relative to their neighbors and thus capture peoples' attention.The saliency detection technologies, which exploit the most important areas for natural scenes, are very useful in image and video processing such as image retrieval [1], video compression [2], and video analysis [3].However, saliency detection is still a difficult task because it somewhat requires a semantic understanding of the image.Furthermore, the difficulty arises from the fact that most of the natural images contain variant texture and color information.So far, a large number of good algorithms and methodologies have been developed for this task.Saliency detection methods can be roughly categorized as biologically based [4,5], purely computational [6][7][8][9][10][11][12], or those that combine the two ideas [13][14][15].
Itti et al. [4] devise their method based on the biologically plausible architecture proposed by Koch and Ullman [16], in which multiple low-level visual features, such as intensity, color, orientation, texture, and motion are extracted from images at multiple scales and are used for saliency computing.They determine center-surround contrast using a difference of Gaussians approach.Inspired by Itti's method, Frintrop et al. [5] present a method in which they compute center-surround differences with square filters.
Different from the biological methods, the pure computational models [6][7][8][9][10][11][12] are not explicitly based on biological vision principles.Ma and Zhang [6] and Achanta et al. [7,8] measure saliency using center-surround feature distances.Hou and Zhang [9] devise a saliency detection model based on a concept defined as spectral residual (SR).Liu et al. [10] obtain the saliency map of images from the technology of machine learning.The model in [11] achieves the saliency maps by inverse Fourier transform on a constant amplitude spectrum and the original phase spectrum of images.Feng et al. [12] define the multiscale contrast features as a linear combination of contrasts in the Gaussian image pyramid.
The third category of methods are partly based on biological models and partly on computational ones, that is, the combination of the two ideas.For instance, Harel et al. [13] create feature maps adopting Itti's method but perform their normalization by a graph-based approach.In [14], Bruce and Tsotsos present a saliency computation method within the visual cortex which is based on the premise that localized saliency computation serves to maximize information sampled from one's environment.Fang et al. [15] propose a saliency detection model based on human visual sensitivity and the amplitude spectrum of quaternion Fourier transform.
These methods [4][5][6][7][8][9][10][11][12][13][14][15] build up elegant maps based on biological theories and/or computational framework.However, some key characteristics in the object are still neglected in these models.For example, the saliency maps generated by the methods [4-6, 9, 13] have low resolution.Moreover, the outputs of [4,5,13] have ill-defined boundaries, and the methods [6,9] produce higher saliency values at object edges instead of the whole object.The methods [7,8,15] capture the saliency maps of the same size as the input image.Though methods [7,8] achieve higher precision than methods [4-6, 9, 13], the information in the background cannot be well suppressed.Additionally, the method [15] seems difficult to extract subtle details (e.g., the texture in saliency) which are very important for visual perception and are primary visual cue for pattern recognition.Moreover, the study of human attention mechanism is not mature yet.Therefore, if we are concerned with high level application such as image retrieval and browsing, we should exploit some mechanism producing accurate saliency.
In this paper, we focus on the problem of saliency detection in the variational framework.The main advantage of variational methods for image processes is that they can be easily formulated under an energy minimization framework and allow the inclusion of constrains to ensure image regularity while preserving important features.Over the past decades, many researchers have devoted their work to the development of variational models and proposed many good algorithms to solve important topics in image analysis and computer vision, including anisotropic diffusion for image denoising [17], p-Laplacian evolution for image analysis [18], nonlocal p-Laplacian evolution for image interpolation [19], active contour model for image segmentation [20], and complex Ginzburg-Landau equation for codimension-two objects detection [21] and image inpainting [22], respectively.But, to our knowledge, there exist very few saliency detection methods which take benefits of variational framework.
Inspired by the nonlocal p-Laplacian [19,23] and the complex Ginzburg-Landau model [21,22], we propose a nonlocal p-Laplacian regularized variational model for saliency detection.Our work is a pure computational model for saliency extraction from still images.The proposed energy functional is described by a diffusion-based regularization, phase transition, and a reaction term for the fidelity.In the energy functional, the nonlocal p-Laplacian is introduced to penalize the intermediate values of image intensity, and the phase transition makes the background vanish while preserving visually prominent features.Our approach offers the following technical features.First, we formulate saliency detection as a phase transition over an image domain and then a variational framework for saliency selection is developed.Various visual features can be detected by minimizing the energy functional in the variational framework.Second, a dynamical formulation follows naturally from the definition of the energy functional.The associated Euler-Lagrange equation is a nonlocal p-Laplacian type diffusion equation with a nonlinear reaction term for saliency extraction and a linear reaction term for the fidelity.It achieves the control of the information flow from original images to saliency maps.So the process of saliency extraction is a nonlinear diffusion.This makes our method quite different from the existing models for saliency detection.Third, our method employs the nonlocal p-Laplacian regularization which restricts the features of the resulting image.Compared to the classical p-Laplacian regularization, the direction of edge curves indicated by nonlocal p-Laplacian is more accurate than the direction indicated by gradient in p-Laplacian equation [19].Due to the accuracy of our model, our saliency maps can be seen as salient objects directly.Experimental results on various images show that our model can better make background details diminish eventually, while luxuriant subtle details in foreground are preserved very well.
The remainder of this paper is organized as follows.In Section 2, we review the Ginzburg-Landau model and the nonlocal evolution equation.The proposed model is introduced in Section 3. Section 4 presents numerical method, followed by experiments and results in Section 5.This paper is summarized in Section 6.

Background
2.1.The Ginzburg-Landau Models.The Ginzburg-Landau equation was originally developed by Ginzburg and Landau [24] to phenomenologically describe phase transitions in superconductors near their critical temperature.The equation has proven to be useful in many areas of physics or chemistry [25].A lot of mathematical theories about this matter can be found in the literature [26].Moreover, Ginzburg-Landau equations have already been used for image processing [21,22,27,28].Most of them rely on the simplified energy as or on the associated flow governed by the following evolution equation: where  is a small nonzero constant and  is a complex-value function indicating local state of the material: if || ≈ 1, the material is in a superconducting phase; if || ≈ 0, it is in its normal phase.A rigorous mathematical theory on the Ginzburg-Landau functional shows that there exists a phase transition between the above two states [21].Minimization of the functional (1) develops homogeneous areas which are separated by phase transition regions.In image processing, homogeneous areas correspond to domains of constant grey value intensities and phase transitions to features.

Nonlocal Evolution Equations.
Recently, nonlocal evolution equations have been widely used to model diffusion processes in many areas [19].Let us briefly introduce some references of nonlocal problem considered along this work.
A nonlocal evolution equation corresponding to the Laplacian equation is presented as follows: The kernel  is a nonnegative, bounded continuous radial function with sup() ⊂ (0, ) (compact support set).Equation ( 3 with Neumann boundary condition.It was proved that the solution of (4) converges to the solution of the classical p-Laplace if  > 1 and to the total variation flow when  = 1 with Neumann boundary conditions when the convolution kernel  is rescaled in a suitable way [23].The energy functional corresponding to (4) is

The Proposed Model for Saliency Detection
In this section, we propose a variational model (nonlocal p-Laplacian regularized variational model) whose (local) minima extract salient objects from image background.

Nonlocal p-Laplacian Regularized Variational Model.
Let Ω ⊂  2 be an image domain.For a given image  : Ω → , and we construct a complex-value image  0 from image  as following.We first rescale the intensity image () into interval [−1, 1] by the formula  0 = (−1)  ((2()/255) − 1) ( = 0 or  = 1) and assume  0 = √1 −  2 0 ; then () is identified with real part  0 of the complex image  0 =  0 +  0 , so that | 0 | = 1 for all  ∈ Ω.In order to extract salient objects from a still image, we propose the following energy functional: with where  > 2,  > 0,  is a small constant,  is a complexvalued function, and   () is defined by energy functional (5).Note that the functional () is slightly different from the second term of the Ginzburg-Landau model (1).
In the following, we will explain the proposed energy functional defined as (6).
(I) The functional   () in ( 6) serves the purpose of penalizing the spatial inhomogeneity of ().As we know, certain penalties on intermediate densities are equivalent to restrictions on the microstructural configuration [29].So the nonlocal p-Laplacian acts as a regularizer to restrict the feature of the resulting images, physically.
(II) The potential () in ( 6) has clearly a minimum at || = 1.Thus the minimization of the functional (6) develops homogeneous areas separated by phase transition regions, which makes || ≈ 1 almost everywhere after enough diffusion except for the regions of the visually prominent features.
(III) The third term is a fidelity term which forces () to be a close approximation of the original function  0 .

Behavioral Analysis of Our Model.
In calculus of variations, a standard method to minimize the functional () is to find steady state solution of the gradient descent flow equation as where ()/ is Gâteaux derivative of the functional ().Equation ( 8) is an evolution equation of a time-dependent function with a spatial variable (, ) in the domain Ω and an artificial time variable  ≥ 0, and the evolution starts with a given initial function (, 0) =  0 ().So a dynamical formulation follows naturally from definition of the energy functional (6) as with the initial condition (, 0) =  0 () and the Neumann boundary condition / ⃗  = 0 on Ω (where ⃗  is the outward unit normal to Ω), where The kernel  : Ω →  in (10) is a nonnegative, bounded continuous radial function with sup() ⊂ (0, ) and satisfies the following properties: (1) (−) = (), Equation ( 9) is a nonlocal p-Laplacian type diffusion equation with nonlinear reaction terms.Here we will explain further the nonlocal p-Laplacian equation.The nonlocal p-Laplacian    () in ( 9) acts as a regularizer to restrict features of the output images.First, the regularizer    () shares many properties of the classical p-Laplacian regularization.In the case of saliency detection, we can achieve a reasonable balance between penalizing irregularities (often due to noise) and reserving intrinsic image features by the regularizers with different values of .Second, the regularizer    () improves the classical p-Laplacian regularization based on local gradient because the nonlocal diffusion at a point  and time  depends on all the values of  in a larger neighborhood of .The evolution process at artificial time  given by ( 9) is viewed as an anisotropic energy dissipation process.The direction of anisotropic diffusion is indicated by |(, ) − (, )| −2 in a larger neighborhood.It approximates to the direction of edge curve more accurately than the direction indicated by gradient.
We conclude this subsection by discussing dynamical behavior of the formula (9).The temporal evolution of the dynamical formulation ( 9) makes the energy in ( 6) decrease monotonically in time.We may make a supposition that the regions with less activity in temporal evolution have rich information and are most likely to attract human attention.Therefore, the irrelevant information will be suppressed gradually, and the visual features can be preserved to the last.This achieves the control of information flow from original images to saliency maps.

Numerical Algorithms
In this section, we briefly present the numerical algorithm and procedure to solve the evolution equation (9).In this paper,  is a complex-valued function.Let  = (, ), and we can get the following Euler-Lagrange equations with (9): with the initial condition (, 0) =  0 () and (, 0) =  0 ().Equation ( 9) can be implemented via a simple explicit finite difference scheme.Let ℎ and Δ be space and time steps, respectively, and let (, ) = (ℎ, ℎ) be the grid points.Let   , = (, , Δ) with  ≥ 0. Then we discretize time variable using explicit Euler method for (9) The iteration formulas are given by ( Remark 1.In all numerical experiments, we choose the following kernel function: The constant  is selected such that ∫ Ω () = 1.
Remark 2. For color image, let , , and  be red, green, and blue channels of the input image, respectively.I, the intensity channel, is defined to be used in our model, where
Figure 1 demonstrates the effects of the proposed method on various images with objects having luxuriant subtle details and/or complex background.We compare our saliency maps with four state-of-the-art methods.The four saliency detectors are Hou and Zhang [9], Harel et al. [13], Achanta et al. [8], and Fang et al. [15], hereby referred to as SR, GB, IG, and AS, respectively.The codes of AS model are cited from http://qtfm.sourceforge.net/.And the results of SR, GB, and IG models are cited from http://ivrg.epfl.ch/supplementarymaterial/RK CVPR09/index.html.From Figure 1, we can see that our saliency maps have well-defined borders, highlight whole objects features, and suppress background better than the other methods even in the presence of complex backgrounds.In addition, our saliency maps have higher accuracy than the previous approaches.We can see from Column 6 that the preservation of subtle details in foreground is very good for these test images; for example, subtle details such as the textures in petals, the downy flower of a dandelion, and the hairs of dog are maintained clearly.In methods SR and GB, Col. 2 and Col. 3 show that the net information retained from original image contains very few details and represents a very blurry version of the original image.In method IG, Col. 4 shows that the high frequencies from the original image are retained in saliency maps whereas some details in background are still clear.In method AS, Col. 5 shows that the background information is suppressed better, but some subtle details in saliency are smoothed out, and the saliency maps suffer from "stair-case" effects for smooth-texture salient objects, for example, the egg in Figure 1.Due to the accuracy of our model, our saliency maps can be seen as salient objects directly.However, in order to segment a salient object, the other methods need to binarize saliency map such that ones (white pixels) correspond to salient object pixels while zeros (black pixels) correspond to the background [8].
In order to perform an objective comparison of the quality of the saliency maps with other methods, we adopt the precision, recall, and F-measure used by Achanta et al. [8] and Fang et al. [15] to evaluate these methods.The quantitative evaluation of this experiment is based on 1000 images which come from the experimental settings of Achanta et al. [8].This image database includes original images and their corresponding ground-truth saliency maps.The quantitative evaluation for a saliency detection algorithm is to see how much the saliency map from algorithm overlaps with the ground-truth saliency map.And then for a ground-truth saliency map  and the detected saliency map S, we have precision = Σ      /Σ    and recall = Σ      /Σ    , with a nonnegative :   = ((1 + ) * precision * recall)/( * precision + recall).We set  = 0.3 in this experiment as in the literature [15] for fair comparison.The comparison results are shown in Figure 2. It can be clear that the overall performance of our proposed model for 1000 images is better than the others under comparison in terms of all three measures.

Conclusion
In this paper, we develop a variational model for saliency detection, which bases on the phase transition theory in the fields of mechanics and material sciences.The dynamics of the system, that is, the temporal evolution from the energy functional, yields information of attention.And the process of saliency extraction is interface diffusion.Compared to the existing models for saliency detection, our method provides flexible and intuitive control over the detecting procedure.Experimental results show that the proposed method is effective in extracting important features in terms of human visual perception.

Figure 2 :
Figure 2: Overall mean scores of precision, recall, and F-measure from five different algorithms for 1000 images.
[23] called a nonlocal diffusion equation since the diffusion of the density at a point  and time  depends not only on (; ) but also on all the values of  in a neighborhood of  through the convolution term  * .This equation shares many properties with the classical heat equation   = Δ.This nonlocal evolution can be thought of as nonlocal isotropic diffusion.For the p-Laplacian equation   = div(|∇| −2 ∇), a nonlocal counterpart was studied mathematically in the literature[23] −2 ( (, ) −  (, )) and we have