A New Multiphase Soft Segmentation with Adaptive Variants

Soft segmentation is more flexible than hard segmentation. But the membership functions are usually sensitive to noise. In this paper, we propose a multiphase soft segmentation model for nearly piecewise constant images based on stochastic principle, where pixel intensities are modeled as random variables with mixed Gaussian distribution. The novelty of this paper lies in three aspects. First, unlike some existing models where the mean of each phase is modeled as a constant and the variances for different phases are assumed to be the same, the mean for each phase in the Gaussian distribution in this paper is modeled as a product of a constant and a bias field, and different phases are assumed to have different variances, which makes the model more flexible. Second, we develop a bidirection projected primal dual hybrid gradient (PDHG) algorithm for iterations of membership functions. Third, we also develop a novel algorithm for explicitly computing the projection from RK to simplex Δ K−1 for any dimension K using dual theory, which is more efficient in both coding and implementation than existing projection methods.


Introduction
Image segmentation has played an important role in image processing and computer vision.Recently, variational segmentation models have attracted increasing interest [1][2][3][4][5][6][7][8].With level set technique [9], variational models can be solved efficiently.The technique was originally developed for two-phase segmentation, and then extended to multiphase segmentation [10][11][12][13].With carefully choosing the initial values, these methods can achieve an ideal solution for multiphase segmentation.However, the nonconvexity of the energy functional in the level set formulation is an inherent drawback of level set-method.As a result, many level setbased variational segmentation models are sensitive to initial values, especially for multiphase segmentation, and may lead to an inferior local minimum.
One way to overcome the drawback is to use fuzzy membership function to replace the Heaviside function of a level set function in level set formulation for modeling a characteristic function, called soft segmentation.With this relaxation (characteristic function can be viewed as a special case of membership function), Chan and Bresson et al. [1,2,5] proved that global minimum can be achieved in these models due to the convexity of the energy functional.Unfortunately, this method cannot be easily applied to multi-phase segmentation.Very recently, based on a variational convexification technique developed by Pock et al. [14], Brown et al. [15], and Bae et al. [16] provide different ways to find globally optimal solution for piecewise constant Mumford-Shah model (more generally called continuous Potts model).
There have been many soft segmentation methods [17][18][19][20][21][22].Mory and Ardon extended the original region competition model [8] to a fuzzy region competition method [19,20].The technique generalizes some existing supervised and unsupervised region-based models.The proposed functional is convex, which guarantees the global solution in the supervised case.Unfortunately, this method only applies to two-phase segmentation and is hard to be extended to multiphase segmentation.Fuzzy C-mean (FCM) is a method developed for pattern classification and recognition and has been applied to image segmentation [17,18,21].The standard FCM model partitions a data set {  }  =1 ⊂   into  clusters by the following objective function: where   ≥ 0 is the membership value of datum   for class  with ∑  =1   = 1 and V  stands for the cluster centers [23,24].The original FCM method is very sensitive to noise.Pham et al. proposed an adaptive fuzzy C-mean (AFCM) model [21] which is more robust to noise than the standard FCM, where the constant cluster centers V  used in the FCM model (1) are substituted by functions that are smooth enough and close to the corresponding cluster centers.
Another class of soft segmentations is based on stochastic approaches [18,22,25].In these approaches, pixel intensities are considered as samples of one or several random variables.The advantage of stochastic method is its stronger ability to deal with random noise.In most stochastic segmentation models, the likelihood functions are used to represent the fitting term in an energy functional.It starts from the assumption that reasonable segmentation should maximize the likelihood.The method is called maximum likelihood (ML) method [26].An expectation-maximization (EM) algorithm is usually employed to solve it when data is incomplete.However, simply using likelihood to model an image is not enough since it ignored the prior knowledge of an image.In [18], a segmentation framework based on maximum a posteriori principle (MAP) was proposed for partial volume (PV) segmentation of MRI brain images, which is a classic application of soft segmentation.
Shen proposed a general multiphase stochastic variational fuzzy segmentation model combining stochastic principle and Modica-Mortola's phase-transition theory [22].The intensity of images was modeled as a mixed Gaussian distribution.The model assumed that membership functions should be either close to 1 or close to 0, which simplified the model but limited its application.For example, it is not reasonable to apply the model to partial volume segmentation since in that case the membership functions are usually neither close to 1 nor close to 0 at the boundary of different matters.
Bias correction is an important mean in soft segmentation to deal with intensity inhomogeneity [21,[27][28][29].For example, Wells et al. proposed an expectation-maximization (EM) algorithm to solve the bias correction problem and the tissue classification problem [29].The EM algorithm was used to iteratively estimate the posterior tissue class probabilities when the bias field is known and to estimate the MAP of the bias field when tissue class probabilities are known.The disadvantage of this method is that the directly computed bias field may not be smooth which will lead to a poor bias correction and segmentation results.Pham and Prince proposed an adaptive fuzzy C-means algorithm which is formulated by modifying the objective function in the fuzzy C-means algorithm to include a multiplicative bias field, which allows the centroids of each class to vary across the image.Smoothness of the bias field is ensured by penalizing its first and second order derivatives, which leads to a computationally expensive procedure for the smoothing of the bias field.Ahmed et al. proposed to add a neighborhood term that enabled the class membership of a pixel to be influenced by its neighbors [27].The neighborhood term acts as a regularizer and forces the solution toward a piecewise homogeneous labeling.Li et al. proposed a variational level set-based method for medical image segmentation and bias correction [28]; the smoothness of the bias field is intrinsically ensured by the data term in the variational formulation.
In all the above papers related to Gaussian distributions, there is a common feature in that all assume that the variants for different phases are same.However, when the variants of different phases are quite different (It is true sometime.Experiments are also provided later.), the models introduced above may misclassify the phases.In this paper, we propose a stochastic variational model for multiphase soft segmentation in the presence of noise and intensity inhomogeneity, where the image intensity at each point is modeled as a mixed Gaussian distribution with means and variances to be optimized.Different from Shen's work [22], our model does not set the assumption that membership functions must be close to either 1 or 0. So, our model is more suitable for soft segmentation and application to partial volume analysis.Since our model is developed based on the assumption that the image intensity is a mixed Gaussian distribution with possibly different variances for different phases, it is also different from [18,21] in that our model adaptively corrects bias of intensities and removes noise by finding optimized mean and adaptive variances.It is demonstrated by experiments that our model is not only robust to noise but also powerful in bias correction.The model can be implemented very fast using a bidirection projected PDHG algorithm.The rest of the paper is organized as follows.The new model is developed in Section 2. The numerical implementation scheme is presented in Section 3. In Section 4, we show some experiment results and also give some explanation and analysis.Both synthetic images and authentic images are used.Finally, we summarize the paper with a short conclusion.

Model Development
Let () be a 2D image defined on an open bounded domain Ω containing  phases.Let  be phase label variable (i.e., () ∈ {1, . . ., } for all  ∈ Ω).At each pixel , both () and () are viewed as random variables indexed by .The probability that  belongs to the th phase is represented by the ownership functions   (), 1 ≤  ≤ .If we denote the probability density function (PDF) of the random variable () given that  belongs to the th phase by Prob(() | () = ), then the PDF of () is a mixed distribution given by Suppose in further that Prob(() | () = ) is a Gaussian PDF for each  = 1, . . .,  and all random variables {() :  ∈ Ω} are independent.Then the likelihood (joint PDF) is where The negative log-likelihood is where  = [ By adding the  2 -norm of ∇ and the total variation of   () to   (, , , ) as regularity terms for bias field () and membership functions   (), respectively, we get the following energy functional   (, , , ) with Gaussian mixture and bias correction: Remark.We want to mention that our model is not the first time to use Gaussian distribution.On the contrary, the Gaussian distribution has been introduced to many segmentation models, such as graph cut [30] and soft Mumford-Shah model [22].The difference between the proposed model and the previous models is that those previous models all assume that different Gaussian distributions have a same variance and are usually fixed.However, in our model, we assume that different Gaussian distributions have different variances which increase the flexibility.

Numerical Implementation
Note that the energy functional is convex with respect to all its variables except for variances.For fixed variances, global minimization can be achieved for any initialization.
The Euler-Lagrange equations of variances, means, and bias are as follows: Correspondingly, we use the following iteration schemes: The challenge in the implementation is the optimization of membership functions   () because of the constraints which requires  = ( 1 ,  2 , . . .,   ) to be in the simplex Δ −1 .There have been two ways to deal with the simplex constraint.One is to use Lagrangian multiplier method (or augmented Lagrangian multiplier method) for ∑  =1   () = 1, and add an exact penalty term for each 0 ≤   () ≤ 1 (see [1,5,31]).The drawback of Lagrangian multiplier method is its low convergence rate.The so-called exact penalty term is exact only under some constraint and is not differentiable at end points and must be replaced by a smoothed version for approximation which finally hurts the exactness.Another way to deal with the simplex constraint is to use the Euler-Lagrangian equation of the unconstraint problem for iterations and then project each iteration result to the simplex Δ −1 [22].The drawback of this method is that no general analytic expression can be written for all dimensions.For different dimensions, the projection functions are different and need to be written in a different way.Particularly, when the dimension is greater than three, the projection function becomes complicated, which leads to a low efficiency in both coding and implementation.In this paper, we give a novel way of projection using dual method.The projection can be expressed uniformly for all dimensions, and the analytic property is guaranteed due to dual theory.
Dual method has been extensively studied to deal with total variation which is not differentiable at points where the first-order variation is zero.One of the popular example is Chambolle dual method [32].Recently, Zhu and Chan developed a new algorithm combining the gradient decent method and dual method, called primal dual hybrid gradient method (PDHG) (see [33] for details).The method integrates the advantages of both gradient method and dual method, and thus is faster than using either method.It is proved to be faster than using dual method only, and its modified iteration form is guaranteed to converge when step size satisfies some condition (see [34][35][36]).In our application, we adopted the ideal of PDHG and apply it to our model with constraint on simplex Δ −1 .

Optimize Membership Functions Using PDHG.
By the principle of PDHG, to minimize (7) with respect to membership functions   ( = 1, . . ., ) under constraint (12), it is equivalent to solve the following discrete min-max problem: ( The descent direction for min ∈  ⟨, ⟩ + () is  + ∇  ().So, the evolution of membership  (primal step) is where  Δ −1 (V) is the projection to the simplex defined by for ∀ ∈   , where ‖ ⋅ ‖ denotes the Euclidean distance.We will see a novelty method for the projection to simplex in the next section.
Since the first variation of ( 13) with respect to   is   , the dual step is where    is the projection to space  defined by where  denotes the number of component of a vector.
3.2.Projection to Simplex Δ −1 .Now, we want to deduce the explicit solution for  Δ −1 ().Let  * denote the solution.Define Then by Morean's Identity, we have where By definition, we have So, we finally get the solution which is simple and very fast.

Summary of the Algorithm.
In this section, we display the complete algorithm as below.
(1) Initialization:  (3) Termination: The iteration will be terminated if the changes of means are all less than a pre-set threshold  > 0.
The choice of parameters  and  depends on specific images.Usually, at the beginning,  should be chosen big enough.After some iterations, decreasing  can accelerate the convergence.The parameter  must be ≤2/9 to guarantee the convergence [32].The efficiency of the algorithm PDHG can be found from [37].

Experiment and Discussion
Since the main difference between our model and other Gaussian distribution-based model lies in the variable variants, we especially show the difference between variants varied and variants fixed.Since our model can be viewed as an extension of Shen's paper [22], we present many experimental results based on a comparison with Shen's model.The first experiment aims at testing robustness to noise.In Figure 1, the original image contains obviously three phases.We added a mixed Gaussian noise with zero mean and an overall variance 0.03.First, we applied Shen's model.We choose  1 = 5 and stop iterations using criterion max 1≤≤3 {|() new − () old |} < 0.001, where () old denotes the old mean before each iteration and () new denotes the new mean after each iteration (the same for the rest experiments).Then we applied our model (7) to the image.Obviously, the result of the new model is much better.
Explanation Analysis.This big difference comes from the difference of the fitting terms in two models.Note that in Shen's model, to make the fitting term small enough, the image intensity at each point must be very close to the mean of its phase.Thus it is sensitive to noise.Comparatively, in model (7), the effect of isolated noise to the energy functional can be counteracted by the variances appeared in the denominators of the fitting term.So the new model is more robust to noise.
The second experiment aims at comparing robustness to bias.In Figure 2, the first line is the original biased image and its ground truth of all three membership functions.The second line and the third line show the soft segmentations obtained using Shen's model and the proposed model, respectively.Obviously, the proposed model gives more precise result compared with the ground truth since there is no bias in the segmentation.
Our third experiment aims to give a comparison between variances fixed and variances updated in the new model.In all the five lines of Figure 3, from left to right are the original image, three membership functions, and hard segmentation, respectively.From the first line to the fourth line are the results with variances fixed.For example, we set  2  = 0.005 for all (1 ≤  ≤ 3) in the first line, and we set  2  = 0.010 for all (1 ≤  ≤ 3) in the second line.However, the last line is the result where variances are updated, and we obtained the final variances for the three phases, which are  1 = 0.0069,  2 = 0.0193, and  3 = 0.0135, respectively.Obviously, the last row gives the best result.This experiment shows that updating variances are better than fixing variances and assuming all of them are equal.Since Shen's model is a special case when all variances are fixed and the same, this experiment shows that the proposed model outperforms Shen's model.
Finally, we test our model using real images.In Figure 4, the liver is not very clear due to the existence of bias.Using Shen's model leads to a wrong result where a big part of the liver was incorrectly classified to background as shown in the first line.This can be easily seen from the hard segmentation.However, using the proposed model can get much better result as shown in the second line.This is because the fitting term in the model contains bias, as well as variance.By calculating the variances of the three phases, they are 0.013, 0.011, and 0.002, respectively.This fact also proves that it is reasonable to assume that different phases may have different variances as in our model.
As we mentioned at the beginning of the paper, one of the most important application of soft segmentation is partial volume segmentation of MRI brain images.Figure 5 gives a comparison in MRI brain image soft segmentation.There is a big difference between the soft segmentations (the membership functions).By using MAP-AFCM model, most      pixels are classified to be partial volume, that is, its intensity is neither close to 1 nor close to 0 (in the figure, brightness of intensity means close to 1, darkness means close to 0, and intensity between brightness and darkness means partial volume).However, this is not true because it is well known that partial volume of MRI brain image should appear mostly often at the boundary of different tissues.Comparatively, using the proposed model can get more reasonable results, where the partial volume only appears at the boundary of different tissues.
We also present some natural images for comparison.In Figure 6, the left image is the original image, and the middle one and the right one are hard segmentations after thresholding using Shen's model and the proposed model, respectively.In Figure 7, the first column is the original image.We present all membership functions and hard segmentations for readers to compare.For all three examples, the results using our model are all better than using Shen's model.

Conclusion
In this paper, we proposed a stochastic variational model for multiphase soft segmentation based on Gaussian mixture.Compared with previously associated models, the proposed model is more robust to noise and bias.For implementation, we developed a bi-direction projected PDHG algorithm, which is easy to carry out.Several experiments are presented to demonstrate the efficiency of our new model.

Figure 1 :
Figure 1: Experiment 1: robustness to noise.The first line: segmentation using Shen's model.The second line: segmentation using the proposed model.For each line, from left to right are original image, three membership functions, and hard segmentation after thresholding.

Figure 2 :
Figure 2: Experiment 3: robustness to bias.In the first line from left to right are original image and three ground truth membership functions.In the second line and the third line, from left to right are original image and soft segmentations (three membership functions).The second row is the soft segmentation using Shen's model, and the third line is the soft segmentation using the proposed model.

Figure 3 :
Figure 3: Comparison between variances fixed and updated.In every row, the first three graphs are membership functions, and the forth is the hard segmentation after thresholding.Row 1: result with  = 0.005, Row 2: result with  = 0.010, Row 3: result with  = 0.015, Row 4: result with  = 0.020, and Row 5: result with  updated.

Figure 4 :
Figure 4: MRI liver segmentation.The first line: segmentation using Shen's model.The second line: segmentation using the proposed model.From left to right: original image, three membership functions, and hard segmentations, respectively.

Figure 5 :
Figure 5: MRI brain image segmentations.The first line: segmentation using Shen's model.The second line: segmentation using the proposed model.From left to right: original image, three membership functions (white matter, gray matter, and CSF (cerebrospinal fluid)), and hard segmentations, respectively.

Figure 6 :
Figure 6: Natural image segmentation after thresholding.From left to right: original image and three phases of hard segmentations.Line 1: Shen's model.Line 2: proposed model.

Figure 7 :
Figure 7: Natural image segmentation after thresholding.Line 1: membership functions using Shen's model.Line 2: membership functions using the proposed model.Line 3: hard segmentations using Shen's model.Line 4: hard segmentations using the proposed model.