Segmentation of Overlapped Cervical Cells Using Asymmetric Mixture Model and Shape Constraint Level Set Method

Accuracy segmentation of the nuclei and cytoplasm in Pap smear images is challenging in cervix cytological analysis. In this paper, a new fusion algorithm combining the asymmetric generalized Gaussian and Cauchy mixture model (GGCMM) with a shape constraint level set method to segment overlapping cervical smear cells is put forward.-e proposed approach starts by separating nuclei and cytoplasm cluster through asymmetric GGCMM, where each component is a mixture of generalized Gaussian distribution and Cauchy distribution. -e proposed asymmetric GGCMM takes into account the asymmetry of generalized Gaussian distribution and the heavier tail of Cauchy distribution. New probability distribution fits different shapes of observed data more flexibly. -en, we apply the morphological operation to remove fake nuclei which is usually much smaller than real nuclei. After that, the improved level set energy function with a distance map and a new shape prior term are applied to extract the contours of overlapping cervical cells. Due to this new level set energy function, the segmentation of every individual cell worked well, especially in overlapping areas. We evaluate the proposed method by using the ISBI 2014 Challenge Dataset. -e results demonstrate that our approach outperforms existing methods in extracting overlapping cervical cells and obtains accurate cell contours.


Introduction
Cervical cancer is a disease caused by malignant cells formed in the tissues of the cervix. In the early stage of cervical cancer, there are no obvious symptoms of the disease. In spite of this, it can be detected with the Papanicolaou test (also known as the Pap smear test) combined with a regular screening program and an appropriate follow-up. However, locating abnormal cells in thousands of cells is still a tough and endless work for pathologists. And results of this manual recognition may be different for different observers, which may lead to misdiagnosis. erefore, a computer-aided diagnosis system can help to estimate whether cytopathy occurs on cervical cells accurately and efficiently. e precise segmentation of cervical nuclei and cytoplasm is a crucial part of this system.
Recent computer-aided diagnosis systems mainly aim to segment nuclei and cellular clusters. Automatic thresholding, morphological operations [1], watershed [2], and active contour models [3] are primary methods for nuclei or cellular mass segmentation. ese approaches assume that cervical cells do not overlap or in a slight degree. ey failed to isolate individual cytoplasm well. However, real cervical cells usually have a higher overlapping degree. In addition, the segmentation accuracy of cervical cells is also affected by noise, poor contrast, dust, variable staining, and similar grayscale between different cell components. ese popular methods are still weak in robustness and need further improvement. Recently, although there has been progress in segmentation of overlapping cervical cells [4,5], the isolation of nuclei and cytoplasm from overlapping cervical cell clusters is an important task.
In this paper, we present an asymmetric generalized Gaussian and Cauchy mixture model (GGCMM) to extract different components of overlapped cervical cells and then apply a shape constraint level set energy function to get the individual cytoplasm inside each cellular cluster. Our method is performed in two steps: extraction of nuclei and cellular clusters and segmentation of overlapping cytoplasm. In the first step, the Cauchy distribution used in our model is a heavier tailed one and makes the distribution of the proposed model asymmetric by combining with generalized Gaussian distribution. Besides, the shape parameter of generalized Gaussian distribution controls the tail of the distribution. erefore, new probability distribution has flexibility to fit different shapes of observed data. Our distribution provides a heavy-tailed alternative symmetric distribution for potential outliers and therefore can produce a segmentation method which is more robust to outliers. e image of cervical cells is characterized by poor cytoplasmic contrast and noise; therefore, the proposed heavy-tailed asymmetric distribution plays an important role in balance between the noise and the image details (outliers). Our asymmetric GGCMM could accurately partition the Pap smear images into nuclei, cytoplasmic clusters, and background, which would serve as the input of modified level set framework. In the second step, in our level set energy function, we added a distance map and shape constraint term to correct the evolution of the boundary. Distance map could make boundary evolve along with an accurate direction and adjust the evolution speed of the boundary. e elliptical shape prior is defined based on the geometry of the clump and the detected centroid of nucleus. e shape constraint prior could modify the final cell contour shape to make it approach to the ground truth. e experimental dataset of overlapped cervical cells comes from the ISBI 2014 Challenge Dataset [6]. e evaluation is carried out to prove that the proposed algorithm is superior to other related methods in segmentation accuracy.

Related Studies
In the last few decades, many methods have been raised to solve the overlapping cervical cell segmentation problem [7,8]. For example, Ushizima et al. [9] proposed an unsupervised method which combines superpixel with Voronoi technique to detect cervical cells. eir method incorporated low-mean pixel clusters with adaptive histogram equalization to improve nuclei detection and applied Voronoi diagrams to implement cervical cell segmentation. Lu et al. [4] represented an algorithm that combines quick shift, Gaussian mixture model (GMM), and maximally stable extremal region detector (MSER) [10] with joint level set method to segment overlapping cervical cells. eir method was also the baseline in the ISBI 2014 overlapping cervical cytology image segmentation challenge. Standard Gaussian distribution is short tail and symmetric, which means that it is weak in the case of heavy outliers. It is hard to fit various shapes of the observed data [11]. Zhou et al. [12] divided cervical cell images into overlapping subimages and applied adaptive threshold estimator to each of them. Tareef et al. [13] turned to incorporate SLIC [14] with support vector machine (SVM) to get nucleus and cytoplasm boundaries, and they applied sparse coding (SC) theory and some morphological processing to reconstruct every cell's boundary. Recently, multipass watershed method [15] was also used in overlapping cervical cell segmentation, which applied different thresholds to get different regions-of-interest (ROIs). In addition, Chang et al. [16] introduced a method which incorporates morphology techniques with double threshold to detect cervical cells, and it was able to segment every cell in a short time. Active contour model was also covered in [17] to extract cells' features. Chen et al. [18] presented an improved level set method based on combinations of edge, region, and prior information. But the energy function they proposed did not consider overlapping areas.
In recent years, full convolution network (FCN) has shown its high performance in image processing [19]. FCN can preserve the spatial information of input image and label all pixels separately. U-Net is a specific neural network built upon FCN, and it can extract and classify different kinds of features to obtain good performance even when the training data are few [20]. Despite the classical U-Net has achieved impressive performance [21], it does not focus on ROI which would lead to a waste of computing resources and space, such as the excessive processing of irrelevant regions. Attention mechanism can handle this problem well [22]. is mechanism has been widely used in natural language processing (NLP) [23] and image processing [24]. More recently, Oktay et al. [25] integrated the attention gate (AG) model to classical U-Net architecture to highlight salient features.

Dataset Description
We train our approach on the training set provided by the ISBI 2014 Challenge Dataset [6].
ere are totally 945 synthetic cytology images in this dataset, including 855 training images and 90 testing images with a size of 512 × 512. All these images were generated by collecting the nuclei and cytoplasm from depth of field (EDF) images, recombined by various transforms and using an alpha channel with a random value (from 0.88 to 0.99) to locate them on synthetic images. en, some transforms, such as rotation, scale, and random linear brightness transform, are used to gather the cells in EDF images and put those cells into the random location of synthetic images. e cells in the synthetic images must have at least one overlap area, and the overlap coefficient is in one of the following options:

The Proposed Model
e proposed approach implements three steps to segment the overlapping cervical cells. Firstly, the nuclei and the contours of cervical cell clumps are extracted through the asymmetric GGCMM model. Based on the hypothesis each nucleus represents a cell, we could acquire the 2 Mathematical Problems in Engineering corresponding cytoplasm according to each nucleus. erefore, segmentation of nucleus is the important step toward analysis of cervical smear images. Secondly, depending upon the count of nuclei, obtain the initial contour of individual cell because the cytoplasmic boundary is usually located at an uneven radial distance from the centroid of its associated nucleus. erefore, the shape of every detected nucleus can be applied to define an ellipse that represents the region of the corresponding cytoplasm. e third step focuses on cytoplasm segmentation for each cell within each clump. Each cell clump is processed individually to obtain single cell boundary in terms of the modified level set technique based on distance map and shape prior. Lastly, update the contour of individual cell to bring it to a more realistic level. We start by introducing the cell component labeling.

Cell Component Labeling Based on GGCMM.
Let x � {x 1 , . . ., x n , . . ., x N } denote a set of N pixels in an image, and x n stands for an observation at the n-th pixel of an image. {Ω 1 , . . ., Ω k , . . ., Ω K } represents different labels. e finite mixture model assumes that the density function f(x n | Θ) at each pixel x n can be described by where Θ represents the model parameters, and the prior probability π nk satisfies the following constraints: In this study, p(x n | Ω k ) is the mixture of generalized Gaussian and Cauchy distributions, which are written in the following form: where η k1 and η k2 are called weighting factors and satisfy the following constraints: GN(x n | θ k1 ) conforms to a generalized Gaussian distribution with parameter set θ k1 � {μ k1 , σ k1 , λ k1 }, defined by where μ k1 is the mean value, σ k1 is the standard deviation, and λ k1 is the shape parameter. A(λ k1 ) and B(λ k1 ) are, respectively, calculated by formulae Gamma function Γ(z) is defined as Figure 2 shows the probability density function of generalized Gaussian distribution with different parameters. It can be seen that the generalized Gaussian distribution of some parameters has too sharp peaks. It is difficult to flexibly fit observation data of different shapes with this generalized Gaussian distribution. In order to overcome this problem, we should introduce Cauchy distribution into the generalized Gaussian distribution. e function of each component C(x n | θ k2 ) submits to a Cauchy distribution, written in the form where θ k2 � (μ k2 , c k2 ) and μ k2 and c k2 represent location and scale parameters, respectively. Figure 2 gives the plot of 1D Cauchy distribution with different parameters. We show the estimated distributions obtained by employing the symmetric Gaussian distribution and proposed asymmetric distribution p(x n | Ω k ), shown in Figure 3. It can be visualized that our asymmetric distribution makes the flexibility to fit the observed data. e log-likelihood function of the density function can be written as To estimate the parameters Θ � {μ k1 , σ k1 , λ k1 , μ k2 , c k2 } of the mixture model, we maximize the log-likelihood function Mathematical Problems in Engineering 3 in (9). Considering the complexity of (9), the expectation-maximization (EM) algorithm [26] cannot be directly applied for maximizing it. erefore, Jensen's inequality is introduced to overcome this problem [27].
us, the log-likelihood function (9) can be rewritten as follows: where the posterior probability z nk is calculated using us, maximizing the density function f(x n | Θ) is equivalent to maximizing the objective function E(Θ) in (10), where hidden variables y nk1 and y nk2 are defined by e derivation of parameter estimation can be found in Appendix A.
Step 3 (M step): update mean μ k1 , standard deviation σ k1 , and shape parameter λ k1 using Compute prior distribution π nk by using (A.6) Update the hidden variables y nk1 and y nk2 using (12) and (13), respectively Step 4: evaluate the log-likelihood function in (9) and check its convergence. If convergence condition is satisfied, the iteration is terminated.
(14) Figure 4 shows the results of nuclei detection and cervical cell clump segmentation. As shown, the proposed GGCMM is flexible enough to fit the different shapes of the cervical cell clumps.

Initial Curve.
In the level set technique, the initial curve is crucial for the final results. Generally, many pixels in the cell boundary have a similar distance to its nucleus, and this prior helps us to locate the initial curve. In this subsection, we should infer the initial curve for each cell in terms of the geometric structure between the nucleus and nearby cytoplasm. e approach for getting the initial curve has the following three processes: (1) Calculate the centroid for each nucleus In practice, there are a few fake nuclei which are much smaller than normal nuclei. erefore, we first get rid of them by morphological operation. In this study, we discuss the case that the nuclei are enclosed in the cytoplasm. Depending upon the count of nuclei, we calculate the centroid for each nucleus shown in Figure 5(a). ese centroids are important in distance map calculation because they also stand for cell counts.
(2) Determine junction points A set of points is applied to represent the cell boundary.
Calculating the distance between every boundary point to each centroid, each point is affiliated to the nearest centroid. As seen in Figure 5(b), the centroid is surrounded by these boundary points. ese points, which keep a same distance to two centroids, marked in yellow, are regarded as junction points.

(3) Estimate initial curve in overlapping area
We assume that the initial contour of each cell is circular. Regard the centroid as the center of the circle, and take the lines between centroid and junction point as the radius. In this way, we get two straight lines a and b of different lengths. Let a represent a long side, then the initial contour is drawn in terms of the following expression: where side a is shorter than b. θ is an angle between two sides a and b with the centroid as the vertex. e length of radius will change with θ in the process of rotation, displayed in Figure 5(c). is approach makes possible to get access to an accurate contour for level set evolution. Figure 5(d) represents the initial curve from each overlapping cervical cell.

Single Cell Contour Update Based on Elliptical Shape
Prior. is subsection shows an improved level set energy function based on the DRLSE [28] model. To eliminate excessive contour evolution, this paper would apply a distance map in terms of centroid and initial contour, described as where m is the label of cell in the clump, x k is the point inside the m-th cell, and Z is a constant. L(x k ) is the relative distance from the centroid to the boundary points, and the range of L(x k ) is [0, 1]. According to (16), one could get the distance map of single cell displayed by Figure 6(b). e grayscale value of point x k in distance map indicates the respective distance to the centroid. is distance map can be used as a constraint to prevent the cell boundary from being evolved into the part that do not belong to this cell. As we all have observed, although the shape of cervical cells is varied, the contours of most cervical cells are nearly elliptical. So, besides the initial boundary and distance map we get in previous stage, we also incorporate the elliptical shape prior constraint into our energy function. Compared to classical level set energy function without shape prior item, for example, CV model [29], the proposed energy function can stabilize the evolution process by keeping an elliptical shape of our level set function and avoiding reinitialization.
Consider ϕ i (x, y): Ω ⟶ R 2 representing the level set function (LSF) on an image domain Ω, and there are N cervical cells being detected. So ϕ i N i�1 denotes all LSF of N cells in Ω. en, this paper defines a new energy function E(ϕ i ) as follows: where c � 0.04, β � 3, and α � 8 in this paper. R p (ϕ i ) stands for the distance regularization term, which could keep the signed distance property |∇ϕ i | � 1. R p (ϕ i ) is defined as where p is the potential function p: [0, ∞] ⟶ R, for example, p(s) � 0.5(s − 1) 2 . L g (ϕ i )is the length term in our energy function, which is defined by where g � 1/(1 + |∇G σ * I|) is the edge stop function with the Gaussian kernel G σ , which is used to smooth the image and eliminate the effect of noise. δ(·) is the Dirac delta function. A g (ϕ i , ϕ j ) is a modified weighted region term, which contains the distance map D i (x) calculated using (16), and A g (ϕ i , ϕ j ) is written as where H(·) is the Heaviside function. A g (ϕ i , ϕ j ) represents the overlapping area. For the distance map D i (x) reasons, the effect of A g (ϕ i , ϕ j ) would eliminate if the distance between nucleus and overlapping area become farther. If ϕ i and ϕ j do not intersect during the evolving process, then A g (ϕ i , ϕ j ) � 0; it is because no overlapping place is found. In order to prevent the overlapping area from disappearing due to excessive evolution, we add the following   Mathematical Problems in Engineering S(ϕ i , ϕ j , ψ), which is regarded as a shape prior term, in our energy function: where ψ is the elliptical shape prior, obtained from the ellipse fitting of ϕ i , shown in Figure 6(c). It is because, as we have mentioned, although the shape of cervical cells varies, the vast majority of cervical cells' contour are similar to elliptical shape. erefore, we choose the ellipse fitting method for ϕ i to get the elliptical shape prior. S(ϕ i , ϕ j , ψ) attains minimum when the overlapping area of LSF approaches the elliptical shape prior ψ. In this way, the cell boundary of the overlapping area is becoming smooth and realistic, and other nonoverlapping areas would not be affected by this term.

Evaluation Metrics
In order to assess the segmentation results of nuclei and cytoplasm, the experiment applies the following pixel-level and object-level evaluation metrics developed by Gençtav et al. [2]. If O d and its corresponding ground truth O GT satisfy the following condition, the nucleus area O d results better in segmentation: In general, the threshold th is usually set to 0.6 [2]. We get pixel-level true positive (TP P ), object-level false negative (FN O ), and pixel-level false positive (FP P ) from segmentation results, and then, we could compute the object-level precision (P O ) and recall (R O ) using In the same way, the pixel-level precision (P P ) and recall (R P ) could be obtained where the subscript is P instead of O.
Another important metric is the Dice coefficient (DC) [30], and DC represents the overlapping rate between segmentation result and ground truth. e segmentation is deemed as good when DC is higher than a certain threshold. It is computed by where O d is the segmentation result and O GT is the ground truth of the input image. And |·| denotes the number of pixels in the object. e experiment applies MATLAB 2017a for nuclei and cell clump segmentation as well as cell contour initialization and evolution.

Parameter Analysis.
Our method has three primary parameters α, β, and c in (17). ese parameters help the algorithm implement more effectively.
is subsection conducts experiments to discuss the setting of these parameters. In fact, parameter setting is a complicated procedure of fine tuning. Here, we propose a strategy of fixed parameter β. By discussing the relationship between parameter α (or c) and parameter β, parameter selection could be simplified and became more effective. For the 90 testing images of the ISBI 2014 Challenge Dataset, we fixed the coefficient β � 3 in all runs, and Table 1 reports the performance of two different settings for parameters α and c. From the table, the average fluctuations of DC, TP P , and FP P are very limited. erefore, the following suggestions are put forward for the parameter selection of our proposed method: (i) parameter β is fixed at 3; (ii) the range of parameter α can be set to 2∼3 times that of parameter β; and (iii) parameter c is set within the range [0.01, 1].

Shape Prior Analysis.
In the next experiment, the effect of the elliptical shape prior on cytoplasmic segmentation would be discussed. Two performances are engaged: one with elliptical shape prior and another without prior. Figure 7 shows segmentation results to four cervical images in the test set. It could be seen that, by incorporating elliptical shape prior into level set evolution, the cytoplasmic boundary extraction is governed by the representative shape feature of cells, and evolution of cytoplasmic boundary is guided toward ground truth. We also qualitatively evaluated the cytoplasm segmentation results of these images. As Mathematical Problems in Engineering reported in Figure 8, the proposed framework with elliptical shape prior gives better segmentation for overlapping areas. Especially for highly overlapping cervical cells, our method effectively depicts cytoplasmic boundaries.

Performance Evaluation.
Some of the visual results of the proposed methodology are illustrated in Figure 9. e nuclei and cytoplasm are properly labeled by GGCMM despite the poor contrast of cell clump and background with noise. It is obvious that individual cell boundary in the overlapping area is correctly detected by the proposed level set method which involves distance map and shape prior. Specifically, the nuclei and cytoplasm segmentation results of our method are very close to ground truth ( Figure 9). ere is only a few difference to the ground truth. is is mainly due to the fact that the overlapping rate of cervical cells is too high and the contrast of images is poor, so it is difficult for the naked eye to accurately identify the boundary of each cell. Furthermore, we investigate the conduct of our method through comparing with several state-of-the-art segmentation algorithms, such as Lu et al. [4], Ushizima et al. [9], Nosrati and Hamarneh [31], and Tareef et al. [13]. A visual comparison of these results is reported in Figure 10 is is because the inaccurate random decision forest probability map was utilized. It is found that the troubles Lu et al.'s method have are the same as those Nosrati et al. do. And this problem can be solved by elliptical shape prior and distance map in our method. It is worth mentioning that our segmentation process has demonstrated a competitive performance. It is because of the applying asymmetric GGCMM in current study, which helps to classify the nuclei and cytoplasm more precisely. In addition, the distance map and shape prior can produce stable and smooth results of segmentation. e following experiment conducts quantitative evaluations of our approach for nuclei and cytoplasm segmentation performance.

Evaluation of Nuclei Segmentation.
e segmented nucleus is judged to be a true positive if this nucleus and its   Figure 9 presents the performance of our method on nuclei segmentation. e quantitative evaluation of nuclei segmentation is shown in Table 2, where μ and σ stands for the mean and standard deviation, respectively. As displayed in Table 2, GGCMM is done well, and the recall of true-positive rate in our method is 1.00 and 6% increase by Tareef et al. at 0.94, which indicates that the experiment detects all truepositive nuclei and do not leave out any one.

Evaluation of Cytoplasm Segmentation.
We also qualitatively analyze the cytoplasm segmentation results on the ISBI 2014 Challenge Dataset. With our method, as shown in Table 3, the pixel-based true-positive rate TP P is 0.94 (0.09), much higher than any other methods. is means that a better performance for cytoplasm segmentation can be obtained. e DC of 0.90 (0.07) is the highest DC in all above models, which is 3% higher than Ushizima et al. Generally, with the pixel-based true-positive rate, with increasing TP P , the false-positive rate FP P would also increase, and vice versa. As shown in Table 3, Ushizima et al. approach had the lowest FP P , but the cost is that the TP P of their method is also the lowest one. Our method had the highest TP P among all methods, whereas our FP P still remains in a low level. Similarly, TP P and DC are negatively correlated with falsenegative rate FN O . e DC values of Lu et al. and Tareef et al.'s technology are similar to ours, but their FN O values are about three times higher than us. Our segmentation method has better results in both cytoplasm and nuclei. is is because the shape constraint term in energy function prevented level set contour from overevolution, which can keep the boundary smooth and stable in overlapping areas. In addition, the distance map serves as another attractor to make the evolution of overlapping regions approach the real shapes of cell.

Evaluation of Segmentation for Different Cell Structures.
In the last experiment, the proposed method is analyzed by evaluating all the performance measures with varying overlap ratios and number of cells. Table 4  e pixel-based false-positive rate FP P remained very low throughout the experiment with the highest value of 0.01, and its value is positively related to the overlapping rate. According to the results in Table 4, the

Conclusions
In this paper, a fusion algorithm, involving asymmetric GGCMM and shape constraint level set method, was proposed to extract overlapping cervical cells in Pap smear images. Applying asymmetric GGCMM could help us to extract the cytoplasm clumps and nuclei more precisely than other previous methods. More specifically, we proposed a novel asymmetric probability distribution through combining generalized Gaussian distribution and Cauchy distribution so that each component of the proposed asymmetric GGCMM can model the observed data of different shapes. Furthermore, the shape constraint level set energy function which considered the distance map about the initial single cell boundary and elliptical shape prior makes it possible for the proposed approach to get a smooth cell contour of Pap images. Here, both distance map and shape constraint prior can prevent the excessive evolution of level set. e evaluation of contour extracted results proved that the proposed fusion scheme was effective and reliable in overlapping cervical cell segmentation tasks.
e main conclusions of the study may be summarized as follows: (i) nucleus is the best biomarker and plays an important role in disease diagnosis. Each cell contains a nucleus, and segmenting nuclei is generally easier than segmenting individual cytoplasm. Actually, the nuclei detection and segmentation method GGCMM proposed in this work is a general segmentation algorithm, which could be applied to various natural images. (ii) In addition to the cervical smear images, the proposed approach is also applicable to other kinds of cell images. (iii) One question raised from this work is how to select the proper weight factors α, β, and c in (17). (iv) Another limitation is that Cauchy distribution is applied to the proposed cell clump extraction, which increases the complexity of the algorithm. One possible solution to overcome this problem is to apply deep learning techniques. However, the deep learning methods require a large number of cervical cell images and high time consumption. In our future work, we will explore these possibilities. In addition, we train the method on the training set provided by the ISBI 2014 Challenge Dataset.
is dataset does not contain images of serious cervical tumors. In reality, however, with the growth of the disease in the cell, the nucleus almost occupied the cytoplasmic area. Sometimes, the entire cytoplasm is not visible in the image. In the future, we will evaluate the effectiveness of our method on the cervical cell with malignant tumors. is appendix presents the derivation of parameter estimation using the EM algorithm. According to log-likelihood function (10), to obtain the estimation of parameter μ k2 , we consider calculating the partial derivative of the objective function E(Θ) with respect to it, letting zE(Θ)/zμ k2 � 0; thus, we have In a similar way, it is easy to obtain c k2 through the solution of zE(Θ)/zc k2 � 0: 2 .

Conflicts of Interest
e authors declare that they have no conflicts of interest.