Fast Mumford-Shah Two-Phase Image Segmentation Using Proximal Splitting Scheme

. The Mumford-Shah model is extensively used in image segmentation. Its energy functional causes the content of the segments to remain homogeneous and the segment boundaries to become short. However, the problem is that optimization of the functional can be very slow. To attack this problem, we propose a reduced two-phase Mumford-Shah model to segment images having one prominent object. First, initial segmentation is obtained by the k-means clustering technique, further minimizing the Mumford-Shah functional by the Douglas-Rachford algorithm. Evaluation of segmentations with various error metrics shows that 70 percent of the segmentations keep the error values below 50%. Compared to the level set method to solve the Chan-Vese model, our algorithm is signi ﬁ cantly faster. At the same time, it gives almost the same or better segmentation results. When compared to the recent k-means variant, it also gives much better segmentation with convex boundaries. The proposed algorithm balances well between time and quality of the segmentation. A crucial step in the design of machine vision systems is the extraction of discriminant features from the images, which is based on low-level segmentation which can be obtained by our approach.


Introduction
Image is an entity having different objects.Image segmentation is one of the techniques to recognize the regions that show different objects in the image.
Let the function gðx, yÞ represent the intensity of light at any point ðx, yÞ of domain Ω ∈ R 2 when an image is captured.So, image is defined as g = fgðx, yÞ | ðx, yÞg ∈ Ω ⊂ R 2 .The RGB color representation of an image is mathematically represented by gðx, yÞ ∈ R 3 .The aim is to find regions Ω 1 , Ω 2 , Ω 3 , ⋯ ⋯ Ω n for n ∈ N, of a domain Ω ⊂ R 2 , corresponding to different objects or their parts or shadows in the image.These regions Ω i , i = 1, 2, ⋯ ⋯ n, of the image will have boundaries.In summary, the segmentation problem has the following definition: suppose Ω be the domain of g.Compute and find Ω 1 , Ω 2 , ⋯Ω n such that and g varies smoothly over the regions Ω i , i = 1, 2, ⋯ ⋯ n as well as g varies discontinuously over the boundaries between Ω i , i = 1, 2, ⋯n.The same problem can be represented in terms of approximation theory: find an approximation f of g in such a way that each f i represents the entire region Ω i via a piecewise smooth function.
There are many approaches to image segmentation [1] including thresholding, clustering, or classification methods [2][3][4], region-based methods [5], and edge-based active contour methods [6][7][8].Segmentation techniques based on deep convolutional neural networks have been developed for various medical images MRI, CT, and X-rays, which show promising results in comparison to conventional segmentation methods [9][10][11][12].The segmentation problem has also been considered as an energy minimization problem.But till date, obtaining a generalized solution that works without prepriori facts about the object elements, its characteristics like shape, color, texture, appearance of shadows, and overlapping of objects are an open problem.
Segmentation is essentially a clustering problem with additional connectivity constraint.k-means is widely used, its properties are well known, and it has also been applied to image segmentation with modest modifications.It is fast and simple to implement [13], which makes it suitable for large datasets like images.However, applying k-means merely based on the pixel values usually results in very fragmented segments.Regularized k-means (reg-KM) [14] attacks this by combining the intensity and the spatial information using two terms.The first term calculates the ratio of the segment perimeter and its area while the second term measures the intracluster dissimilarity as in the standard kmeans.
In 1989, Mumford and Shah introduced the Mumford-Shah model [15].Since then, it has applications in the areas such as image denoising, image restoration, and image segmentation [16].The Mumford-Shah model can be implemented using the Douglas-Rachford algorithm, ADI implicit method, and other numerical optimizations [17][18][19][20][21][22].The improvement of local minima was achieved by graph-based methods, but these methods do not allow for open boundaries [23].
A convex relaxation approach was presented in [24], using functional lifting which guarantees a globally optimal solution, is independent of initialization, and takes care of open boundaries.There are suitable algorithms to solve the simplified Mumford-Shah model which include convex boundaries for any kind of image segmentation [18,22,[25][26][27].The problems of the previous approaches are that those based on Mumford-Shah are rather slow, while many region and contour-based models provide poor segmentation result if the parameters are not carefully tuned.
In this paper, we introduce a two-phase image segmentation algorithm.Instead of using region or block-based heuristics to solve the segmentation problem, we define it as a global optimization problem.We input the number of segments, select the initial centroids randomly, and apply the k-means clustering technique to obtain the initial segmented image, which is further improved by minimizing the Mumford-Shah model.k-means is known to be sensitive to the initialization.A typical solution is to try the algorithm several times with different initialization or use some more robust initialization technique, see [13].Our main contribution is to show how the proximal splitting technique can be used to solve the two-phase Mumford-Shah model.
We compare our segmentation with ground truth data for 100 color images from the Weizmann image dataset [28].Statistically evaluating using error metrics defined in [29], we find 70% of our segmentation have an error value of 0.5 or less.The method works especially well with segments that are difficult to determine, there are multiple objects in the image, and objects have stripes, or complex objects posing so that they have artificial boundaries.For these troublesome images, we find our algorithm gives slightly better segmentation when compared to the classical Chan-Vese optimization algorithm [8], and much better than the k-means variant (reg-KM) [14].Our implementation of the Douglas-Rachford (DR) algorithm is also a lot faster than the Chan-Vese algorithm.The Chan-Vese model jointly uses the reduced Mumford-Shah model and level set method which also uses active contours for image segmentation.This is time consuming because it is solved using reinitialization of a level set function [30,31].
The modern machine vision methods have higher-level goals like extracting an anatomical object of interest for diagnosis of diseases, scene classification, detection of human activity in visual surveillance, assigning a name to a human face in an image, and classifying handwritten characters [32].A crucial step in the design of such machine vision systems is the extraction of discriminant features from the images [33,34].The application of the proposed approach is to obtain a low-level segmentation based on color features and spatial connectivity, which can be further fed to the above-mentioned higher-level semantic segmentation.
The rest of the paper is organized as follows: Section 2 introduces the Mumford-Shah image segmentation model and shows its discrete formulation.Section 3 explains the proximal splitting technique used for optimizing the Mumford-Shah functional and also contains pseudocode.In Section 4, we explain the evaluation criteria used and discuss the results.Conclusions are drawn in Section 5.

The Mumford-Shah Image Segmentation Model
The Mumford-Shah model is based on the functional Eð f ; ΓÞ which is to be minimized and it depends on two parameters μ and λ.The functional Eð f ; ΓÞ is given by where g is an image, function f approximates g which is differentiable over the regions Ω i , and where Γ is made up of smooth arcs and it creates boundaries for Ω i .The overall length of all the smooth arcs of Γ is expressed by LðΓÞ.The parameters μ and λ control the approximation quality and segment coarseness, respectively.The smaller value of E gives good segmentation ð f , ΓÞ and better representation of the image g.The first term in equation ( 2) measures how well f approximates the image g.The second term measures how much f varies on each Ω i .The third term in equation ( 2) measures the length of the boundaries and ensures that the boundaries Γ are as small as possible.Mumford and Shah mentioned that [15], dropping any of the terms in equation ( 2), we have inf fEg = 0.If the first term is not in equation ( 2), we have f = 0, Γ = ∅.If the second term is not there in equation ( 2), we take f = g, Γ = ∅.If the third term is not there in equation ( 2), we take Γ as a fine grid of N horizontal and vertical lines and hence Ω i = N 2 number of small squares.Also, f is average of g on each Ω i .
The presence of all the three terms leads to nontrivial optimal solution Eð f , ΓÞ.The optimal value ð f , ΓÞ can be visualised as a cartoon of the actual image g.Both Mumford and Shah were uncertain for the well-posedness of minimization of E. However, they conjectured that for the continuous function g, E has a minimum in the set of all pairs ð f , ΓÞ where f is differentiable on each Ω i and Γ a finite set of singular points joined by a finite set of C 1 arcs.
The energy functional Eð f , ΓÞ in equation ( 2) whose exact minimization is a nontrivial task.When f is a piecewise constant function on each Ω i , obtained as the average value of g on each Ω i , the second term in equation ( 2), Ð Ð Ω−Γ k∇f k 2 dxdy = 0. Therefore, equation ( 2) reduces to the simplified Mumford-Shah model [5,21,35] given as This is a special case of equation ( 2).This is also called the minimal partition problem [6].If λ is large, the emphasis is on the length of the boundaries Γ and the minimization of E leads to fewer boundaries and coarser segmentation.If λ is small, the emphasis is on the first term of equation ( 4) which leads to an approximation f that follows g much more closely and allows for more boundaries.Thus, a small λ leads to a finer segmentation.If λ = 0, the minimum of E forces f = g leading to finest possible segmentation, which is a trivial segmentation, where every region consists of a single point (pixel).4) is the reduced two-phase segmentation model.In the two-phase model, the image is divided into two segments, the object and the background.Chan-Vese used the level set method to implement the two-phase segmentation model [6,8].In our approach, we solve the reduced two-phase model by rewriting the problem using the Douglas-Rachford proximal splitting algorithm.We use a convexification technique similar to [18,22].

Image Segmentation Model. Equation (
In this work, we have considered RGB images.The goal is to divide the given image into two regions, the background and the foreground object.We define the foreground object as Ω 0 , background as Ω 1 , and the entire image as Ω.So, image g = Ω 0 ∪ Ω 1 ∪ Γ, here Γ represent the arcs of the segment Ω 0 .The image g contains n × m pixels. Let c 0 and c 1 denote the color values of randomly selected pixels as foreground pixel and background pixel, respectively.From simplified equation ( 4), we have where Now, our aim is to find the regions Ω 0 and Ω 1 that minimize functional Eðc 0 ; c 1 ; ΓÞ which is given in equation (5).Defining ω = ω 0 − ω 1 , we represent the domain Ω with binary indicator function f so that This function f is the expected approximation of image g .Using f , the minimization of functional Eðc 0 ; c 1 ; ΓÞ from equation ( 5) can be seen as follows: The first term in equation ( 7) is the inner product of f and ω and is defined as h f , ωi = Ð f ω.The total variation pseudonorm is k f k TV , which equals to LjΓj for binary indicator f = χ Ω by coarea formula established by Fleming and Rishel [20].

Discrete Formulation of Segmentation Model.
To discretize variational problem (7) on a grid of n × m pixels, we assume that N = n × m.Let v i ∈ R 2 be a vector field for every i and suppose then norm of v is given by The total variation pseudonorm, for f = ðf i ÞðN /i = 1Þ ∈ R N , is defined as using equation (9).Thus, we obtain We assume that the pixels are indexed on a twodimensional grid where δ 1 = ð1, 0Þ and δ 2 = ð0, 1Þ, the finite difference gradient operator is We use periodic boundary conditions for simplicity.The inner product R N is Equation ( 11) is a nonsmooth and nonconvex and usually direct minimization that leads to poor local minima.By replacing the binary constraint f i ∈ f0, 1g with a box constraint f i ∈ ½0, 1, we obtain the convex formulation of the energy functional of equation (11).This defines the following finite dimensional convex problem: One can prove that this relaxation is exact, meaning that the minimizer f , when it is unique, is binary, f ∈ f0, 1g.It means that Ω such that f = χ Ω actually solves the original segmentation problem.

Proximal Splitting Technique
Proximal algorithms are used for solving large-scale, constrained, or distributed optimization problems [36,37].Many image processing problems can be formulated as where f 1 , ⋯, f n : R N → − ∞; + ∞: These algorithms use splitting technique such that the functions f 1 , ⋯, f n in equation (15) can be used to get an algorithm which is easily implementable.Usually, these algorithms are called proximal because each function in equation ( 15) is involved via its proximity operator.We propose to use proximal splitting scheme to solve our optimization problem, i.e., simplified Mumford-Shah model of equation ( 14).
3.1.Splitting.We introduce the variable u = ∇f (http://www .numericaltours.com).The optimization problem ( 14) can be seen as follows: where FðzÞ and HðzÞ are defined as follows: Here, the constraints are defined using indicator function as follows: Here, prox γH ðtÞ = s * is the minimizer of function T. Proximal operators can be viewed as generalized projections (see Appendix A).Similarly, reflection is defined as 3.3.Douglas-Rachford Algorithm.The Douglas-Rachford (DR) algorithm minimizes functionals iteratively as in equation (16): It is given that FðzÞ and HðzÞ are convex functions.Hence, we can compute their proximal mappings prox γF and prox γH , respectively.It is not required that the functions FðzÞ and HðzÞ are smooth.A DR iteration is (see Appendix C): where We can show that for any value of γ > 0, 0 < μ < 2, and z0 , z k ⟶ z * is a minimizer of F n + H n .

Proximal
Operator of H.The proximal mapping of H is the orthogonal projection on the convex set D: The following linear system of equations gives f and ũ because Journal of Applied Mathematics where Here, by convention, Δ = div ∘ ∇ and div = −∇ * .We have where As we have discretized variational problem (7) on a grid of n × m pixels, m 1 takes values 1 ⋯ n and m 2 takes the values 1 ⋯ m.
Also, h n = f n − div ðu n Þ and where the two-dimensional discrete Fourier transform of an image h is h n .

Proximal
Operator of F. The function Fðf , uÞ can be written as where The proximal mapping of F n is We define the value of λ > 0. The proximal operator of F 0 is where The proximal operator of the l 1 − l 2 norm k:k 1 is a soft thresholding of the amplitude of the vector field:

Experimental Results
4.1.Image Dataset.The performance of our algorithm is evaluated using the partial Weizmann dataset [28] images depicting one object in the foreground.Currently, the image database contains a hundred grey level and hundred color images along with their ground truth segmentations, out of which we have used only color images.The dataset comprises of images that clearly have an object that is different from the background.Each image has three corresponding ground truth segmentations as shown in Figure 1.

Evaluation Criterion.
The performance of our algorithm is based on region-based error metrics defined in [29].Considering S 1 as segmentation obtained by an algorithm, S 2 as one of the ground-truth, we calculate a value in the range ½0, 1.For a given pixel p i , we consider segment S 1 in the solution and segment S 2 in the ground truth that contain the pixel.If one segment is a proper subset of the other, then the pixel lies in an area of refinement, and the local error should be zero.Otherwise, the two regions overlap in an inconsistent manner and we should calculate the corresponding error.Here, RðS ; p i Þ is the set of pixels corresponding to the region in segmentation S that contains pixel p i , and the local refinement error is defined as This local error measure is not symmetric, and it gives refinement in one direction only.It is to be noted that EðS 1 , S 2 , p i Þ = 0 if S 1 is a refinement of S 2 at pixel p i , but not the opposite.Considering this local refinement error in each direction at each pixel, there are two methods to combine the values into an error measure for the entire image: global consistency error ðGCEÞ that forces all local refinements to be in the same direction and local consistency error ðLCEÞ that allows refinement in different directions in different parts of the image [38].For a given n as the number of pixels, we have The region-based consistency measure LCE is tolerant to refinement in either direction at each image pixel.If we simply replace the pixelwise minimum with a maximum, we get a measure that does not tolerate refinement at all and penalizes dissimilarity between segmentations proportional to the degree of region overlap.Applying to segmentations S 1 and S 2 , the bidirectional consistency error (BCE) is defined as MS (g, segments, max iterations, lambda) → f rc 0 ⟵ rgb values of randomly selected pixel 0 rc 1 ⟵ rgb values of randomly selected pixel In addition, we can ask if segmentation S 1 obtained by our algorithm is consistent with the collection of all ground truth segmentations S k of that image.The BCE * measure is obtained by computing the minimum error at each pixel over each ground truth segmentation S k : Structural similarity index [39] measures the visual quality between two images.It is calculated as a weighted combination of three factors luminance, contrast, and structural similarity.It can be expressed as where l is the luminance, c is the contrast, s is the structural similarity, and α, β, and γ are positive constants.The three factors are calculated separately as follows: where μ x and μ y are local means, σ x and σ y are the standard deviations, and σ xy is the cross-covariance between the two images x and y, respectively.If α = β = γ = 1, then SSIM simplifies to We use SSIM to calculate the similarity between the image generated by the segmentation result, and the image generated from the ground truth segmentation.Here, we have implemented the proposed method for two-phase segmentation, which gives us foreground (object) and background.We have used SSIM to evaluate the foreground.

Results.
We have executed the DR algorithm over 100 images from the Weizmann dataset [29], for various values of λ as 0:10, 0:25, 0:50, 0:75, and 1:00:λ is a parameter which is positive, and unfortunately, there is no fitness function to adjust it automatically.This is a drawback of the method.reg-KM and Chan-Vese methods also have similar limitation.
For each of the 100 images, the segmentation results are compared with their three ground truths to obtain the error measures GCE and BCE * .The average value of the three error values is taken as the final error.Out of these λ values, we found that the results using λ = 0:75 and λ = 1:00 are the best in terms of accuracy in most of the images.We elaborate these in Figure 2. It is very important that the error values are more concentrated towards zero.Figure 2(a) illustrates that for λ = 0:75 and λ = 1:00, around 69% of the images have GCE value of 0.5 or less.Similarly, Figure 2(b) illustrates that for λ = 0:75 and λ = 1:00, around 70% of the images have BCE * value of 0.5 or less.
Figure 1 shows the original image, the three human segmentations comprising the ground truth, the initial 9 Journal of Applied Mathematics segmentation obtained by the k-means clustering technique, and the result of the Douglas-Rachford algorithm for two selected images from the dataset.It is clear that when λ is large, the minimization of E leads to fewer boundaries and coarser segmentation and when λ is small, we obtain a finer segmentation.Minimization of E is achieved when the difference of E between two iterations is less than 0.1 tolerance.We vary λ from 0.1 to 1.00 to have segmentation from finer to coarser, as established in [15].Figure 3 compares the performance and efficiency of the iterations with different λ values.It shows that the value of the minimization of energy functional keeps decreasing when iterated long enough.We observe that for λ values of 0.75 and 1.00, minimization of energy functional reaches the lowest values.At the same time for λ value of 0.75, it converges faster.
Further, for comparison with other methods, we select three images from the dataset such that their segmentation is not an easy task.Figure 4 shows these three selected images and their different ground truths.For the church image, it is difficult to determine which is the object, the church, or the signboard.This confusion is also seen in the ground truths.The same is observed in the case of a lion, where either the lion or the ball, or both, is considered as the object.The segmentation of caterpillar is the most difficult because it has stripes which can act as an artificial boundary.
Figure 5 shows segmentations obtained by the reg-KM algorithm [14], Chan-Vese algorithm [8], and the proposed DR algorithm.The results are optimized as follows.Results of reg-KM are with the regularization parameter 0.5.The Chan-Vese algorithm has been used with various initial contours like a small circle, large circle, and user-defined box.We tested all these and chose to use small circles (see Figure 6) as it provided the best segmentation for these images.In our algorithm, we have used the k-means clustering method for obtaining initial segmentation and then Mumford-Shah functional minimized by Douglas-Rachford using the parameter λ = 0:5.
Table 1 shows the numerical comparison of reg-KM, Chan-Vese, and the proposed DR algorithm.The results show that the reg-KM algorithm has worse accuracy in all three cases which we obtained through the error evaluation criteria mentioned in Section 4.2.The results of the Chan-Vese method and the proposed DR method have similar accuracy; Chan-Vese is better with caterpillar while DR is better with lion.
Table 2 summarizes the SSIM values for the three methods: reg-KM, Chan-Vese, and the proposed DR.In case of all images, the proposed DR algorithm achieves significantly higher SSIM values which indicates its superiority over the compared methods.
Table 3 summarizes the processing times of these algorithms.We can see that the proposed DR algorithm is an order of magnitude faster than Chan-Vase.The three images are segmented in about 1.5 seconds, on average, while Chan-Vese takes 40 seconds per image, on average.The reg-KM technique does not give very good segmentation but is definitely fast in terms of execution time.However, the DR algorithm uses a minimization technique and therefore it takes more time but gives much better results than reg-KM and nearly the same results when compared to the popular Chan-Vese method.

Conclusions
In this paper, we proposed a reduced two-phase Mumford-Shah model to segment images with one object.We obtained initial segmentation by the k-means clustering technique and further minimized it by our implementation of the Douglas-Rachford algorithm.In k-means, k is a userselected parameter about how many segments.If k needs to be automatically selected, then it should also be included in the Mumford-Shah functional in (2).Possible solutions how to do it, we refer to solutions in the clustering context [40].Experiments with various error metrics LCE, GCE, and BCE * show that 70 percent of the segmentations keep the error values below 0.5.We also compare our algorithm to Chan-Vese implementation which is one of the classical methods to optimize the Mumford-Shah functional.Our segmentation is slightly better in two of the three selected images, while the main benefit is that our method is significantly faster than the Chan-Vese algorithm.Secondly, we compare our algorithm with a recent k-means variant, reg-KM.According to the BCE * error metric, our segmentations are significantly better.However, reg-KM is much faster as it has a statistical approach of clustering but it does not give smooth segmentation with convex boundaries.Our algorithm balances time as well as quality of segmentation.As a future research, we consider adopting the Mumford-Shah model in k-means context.

Repeat
ðA:2Þ The proximal mapping of an indicator function i is given by ðA:3Þ which is nothing but projection of i on ½0, 1 N and is denoted by proj ½0,1 N ðtÞ: The reflection with respect to set C is a function ðB:2Þ Let h = ð f − div ðuÞÞ and ĥ denote fast Fourier transform of G then ðB:3Þ
The solution is denoted by y n+1 ; Douglas-Rachford iterations are as follows: 11 Journal of Applied Mathematics Applying the same to our minimization problem, in equation ( 16) by taking y 0 ⟵ z0 ⟵ ð f , uÞ, x k ⟵ z k ⟵ prox γF ðz k Þ and k = 0.

Figure 1 :Figure 2 :
Figure 1: Segmentation results for selected two images using the Douglas-Rachford algorithm.

Figure 4 :
Figure 4: Images and their ground truths.

Algorithm 3 :
Appendix A. The Projection, Proximal Mapping, and ReflectionLet C ⊂ R N be a closed convex set.The projection onto set C is a mappingP c : R N ⟶ C, ∀x ∈ R N , ðA:1Þdefined by

R
C : R N ⟶ C, ∀x ∈ R N , ðA:4Þ defined by R C x ð Þ = P C x ð Þ + P C x ð Þ − x ð Þ= 2P C x ð Þ − x, ∀x ∈ R N :ðA:5ÞLet R N be N-dimensional Euclidean space, k:k be norm, anddomf = fx ∈ R N j f ðxÞ<∞g, Γ 0 ðR N Þ be the class of lower semicontinuous functions from R N ⟶ ð−∞, + ∞Þ such that ð f Þ ≠ ∅ and f ∈ Γ 0 ðR N Þ.For any x ∈ R N , the minimizasolution, which is denoted by prox f ðxÞ.The operator prox f : R N → R N thus defined is the proximal operator of function f : B. Solving a Linear System of Equations to Obtain ð f , ũÞ Let C = fx = ðy, zÞ: Ax = bg = ðy, zÞ: ½B N½y/z and D = fz = ð f , uÞ: u = ∇f g Proj C ðf , uÞ is given by p = x + A T AA T À Á −1 b − Ax ð Þ: ðB:1Þ Let p = ð f , ũÞ, x = ð f , uÞ, and A = ½−∇−I, than the Proj D ð f , uÞ is given by .2. Proximal Mapping.The proximal mapping, prox γT : R N → R N , for any convex function T with γ parameter and t, s ∈ R N , rc 1 : color values of randomly selected pixel 0 and pixel 1 c 0 , c 1 : color values of pixel 0 and pixel 1 after iterations of k-means ω, ω 0 , ω 1 :indicate regions Ω, Ω 0 , and Ω 1 Implementation.The implementation is sketched in the following pseudocode.We use the following notations: g: input image of size X × Y segments : number of segments iterations: number of iterations lambda λ: control parameter in Mumford-Shah functional rc 0

Table 1 :
Comparison of reg-KM, Chan-Vese algorithm, and proposed DR algorithm in terms of BCE * error.

Table 2 :
Comparison of reg-KM, Chan-Vese algorithm, and proposed DR algorithm in terms of SSIM.

Table 3 :
Comparison of reg-KM, Chan-Vese algorithm, and proposed DR algorithm in terms of execution time (seconds).