A Spatial Shape Constrained Clustering Method for Mammographic Mass Segmentation

A novel clustering method is proposed for mammographic mass segmentation on extracted regions of interest (ROIs) by using deterministic annealing incorporating circular shape function (DACF). The objective function reported in this study uses both intensity and spatial shape information, and the dominant dissimilarity measure is controlled by two weighting parameters. As a result, pixels having similar intensity information but located in different regions can be differentiated. Experimental results shows that, by using DACF, the mass segmentation results in digitized mammograms are improved with optimal mass boundaries, less number of noisy patches, and computational efficiency. An average probability of segmentation error of 7.18% for well-defined masses (or 8.06% for ill-defined masses) was obtained by using DACF on MiniMIAS database, with 5.86% (or 5.55%) and 6.14% (or 5.27%) improvements as compared to the standard DA and fuzzy c-means methods.


Introduction
Image segmentation is a process which divides an image into several meaningful areas such that the segmented image can be further analyzed and interpreted. A segmentation algorithm, in a mammographic context, is an algorithm used to detect something, usually the whole breast or a specific kind of abnormalities like microcalcifications or masses. In the digitized mammograms with low contrast, masses are embedded in various breast tissues with fuzzy margins. This variability introduces a challenge for breast mass segmentation and causes the false positive detection rate to increase as well as decreasing the sensitivity.
In the past decades, a number of image processing techniques have been developed to segment masses from their surrounding breast tissues in digitized mammograms, as reviewed in [1][2][3][4]. Among them, clustering methods are one of the most commonly used techniques for image segmentation [5] as well as for mass detection and/or segmentation [4]. Partitioning clustering and hierarchical clustering are two main approaches to clustering. -means [6] and fuzzy -means (FCM) [7] algorithms are widely used partitioning techniques by the researchers in many real world applications. For mass segmentation, -means has been used in [8,9] to generate initial segmentation results and in [10,11] to refine an initial detection from adaptive thresholding. FCM was also used for mass segmentation with different objectives: while [12] used it to group pixels with similar grey-level values in the original images, [13] used it over the set of local features extracted from application of a multiresolution wavelet transform and Gaussian Markov random fields analysis. In contrast to -means and FCM, which are sensitive to data initialization and converge to local optimal solutions, deterministic annealing (DA) clustering [14] is a global minimisation algorithm by incorporating randomness into the energy function to be minimized, such that it is independent of the choice of the initial data configuration and has the ability to avoid poor local optima. The DA approach has also been used for mass segmentation in [15,16].

Computational and Mathematical Methods in Medicine
Most clustering algorithms (including -means, FCM, and DA) perform image segmentation directly from the intensity (or color) space with an intensity filter to enlarge the difference between normal and abnormal breast tissue. The processing time is a prominent advantage of these algorithms. However, the intensity-based methods cannot satisfactorily outline the boundary of the mass region when the image contrast and signal noise ratio are low and therefore lead to poor segmentation results. Markov random field technique was used in mass segmentation [17] to exploit the spatial continuity in order to improve the performance of segmentation algorithm. It has the ability to reduce segmentation error caused by intensity noise; however, the computational cost is high. Reference [18] proposed a fuzzy clustering algorithm incorporating an elliptic shape function for lip image segmentation. The pitfall is that the convergence time increases as the weighting parameter that controls the spatial shape information increases.
In this paper, we propose a novel clustering algorithm based on DA approach to overcome the problems of most existing clustering techniques. In the standard DA clustering [14,19] for image segmentation, the dissimilarity measure in the objective function is defined merely based on Euclidean distances between the image intensity and the intensity centroids without knowledge of the spatial shape information. Solely using the intensity or intensity related information is hard to differentiate pixels with the same intensity information but located in unconnected regions. As a result, large number of subregions in the same cluster that contains a mass may lead to heavy computational load. Additionally, it is hard to find the fuzzy boundary when the image contrast is low. To handle these challenges, a new dissimilarity measure for DA clustering incorporating a circular shape function (DACF) is proposed. Since both intensity and spatial information are used in the optimization process, the DACF algorithm offers two advantages. First, it is robust against noise and cluster number; that is, pixels having similar intensity information but located in different regions can be differentiated, with just two clusters for the entire images. Second, it is computationally efficient. The convergence time decreases as the difference between the two weighting parameters increases. Experimental results have demonstrated the advantages of the DACF algorithm.
The main contribution of our current work includes the following: (1) the geometry shape is integrated into the intensity feature space for mass segmentation in terms of dynamically fitted circular shape function; (2) the proposed method can differentiate the pixels with the same intensity values but located in different (mass and nonmass) regions, which cannot be achieved by standard clustering methods like FCM and DA; (3) the proposed method achieves better segmentation performance than FCM and DA in terms of segmentation accuracy and computational time; (4) the proposed method is general, which can be integrated into other segmentation algorithms and applicable for other biomedical applications.
The rest of this paper is organized as follows. Section 2 briefly reviews the standard DA clustering approach and derives the formulation and implementation of the proposed DACF algorithm. The experimental results and related discussions on real mass images are given qualitatively and quantitatively in Section 3. The conclusion is given in Section 4.

A Brief Review of Standard Deterministic Annealing
Approach. Suppose there are input vectors x ∈ , = 1, . . . , , which are partitioned into clusters with mass center at {k 1 , k 2 , . . . , k } ⊂ . The DA clustering algorithm [14,19] aims to minimize the following Lagrangian formulation: where is the Lagrange multiplier, which is analogous to the temperature in statistical mechanics, is the cost function, is the Shannon entropy, (x ) is the source distribution (equal to 1/ in [19]), (k | x ) is the association probability (distribution) relating input point x with cluster center k , and (x , k ) is the squared Euclidian distance between x and k defined by It turns out [19] that the resultant distribution is the titled distribution given by is the mass probability of (th) cluster. Plugging (3) back into (1), the effective cost to be minimized becomes the free energy (a well-known concept in statistical mechanics [14]) as follows: The expression of cluster center is then derived by minimizing (4) with respect to V ; that is Alternatively updating (3) and (5) with phase transition gives the DA algorithm. The DA approach to clustering has demonstrated to be independent of the data initialization and has ability to avoid poor local optima, as discussed in [14,19].

Deterministic Annealing Clustering Incorporating Circular Function (DACF). Consider an image with
where are predefined regulated parameter, , , stands for the Euclidean distance between the ( , )th pixel x , and the centroid k of the th cluster, as and ( , , , s) represents the shape information, given by circular function as where s = { , } is a unique clique, and , are the physical -coordinate of the center of a mass region. The dissimilarity measurê, , consists of a measure of the intensity dissimilarity between the ( , )th pixel and the centroid k in the intensity feature space, and the spatial distance between the pixel (located at ( , )) and the center (denoted by ( , )) of the targeted mass region. With the inclusion of circular shape information, the pixels with similar intensity but located in disjointed region will be differentiated. The purpose of the inclusion of the shape function is to obtain a large membership for the cluster associated with mass region. In order to achieve it, the weighting parameter is defined as the weight of the spatial distance against the intensity feature. According to the dissimilarity definition of the Euclidean distance, the closer a pixel belongs to a cluster, the smaller the distance is. Therefore, the shape distance between the location of a pixel and a specific cluster center is small if the pixel belongs to the cluster; otherwise the distance is larger if it belongs to other clusters.
The expected distortion or objective function of the DACF incorporating spatial information is then defined aŝ where is the distortion measure as in the original DA method defined by and is the distortion measure of spatial information, as We recast the optimization problem as seeking the distribution which minimizeŝsubject to a specified level of randomness that is measured by Shannon entropy The optimization is reformulated as minimization of the Lagrangian̂=̂− Computational and Mathematical Methods in Medicine  Figure 1, where the pixels in the white region belong to the mass cluster, and pixels in dark region belong to the nonmass cluster.
Minimizinĝwith respect to the probability of (k | x , ) leads to the titled distribution [14,19] where the normalized factor is given by Taking the partial derivative on̂with respect to cluster center, we havek It can be seen that the partial derivative of the objective function with the new dissimilarity measure with respect to V is identical to that of DA. Hence, the formula for computing centroids of DACF in the intensity feature space is the same as in DA; that is, The partial derivative of̂with respect to s is given bŷ that is, 6 Computational and Mathematical Methods in Medicine  Figure 2, where the pixels in the white region belong to the mass cluster, and pixels in dark region belong to the nonmass cluster.
Substituting (8) into (19), the spatial parameters can be obtained as Alternatively updating (k | x , ) and k according to (14) and (17) as well as and according to (20) gives the proposed DACF algorithm.
The titled distribution (14) is the membership of each pixel belonging to different clusters. Generally, the intensities of the center part of a mass region are higher than those locating outside of mass region. For pixels inside a mass region, the intensity dissimilarity is in dominant position, while spatial information plays a major role in dissimilarity measure for pixels outside the mass region. Therefore, the pixels with the same intensity values but locate in different positions in an image will be differentiated, which makes DACF yield better performance for both mass and nonmass related regions.

Experimental Results
Thirty-six mammograms from MiniMIAS database [20] that contain thirty-nine masses with various backgrounds (fatty, fatty glandular, and dense-glandular breast tissues) were examined. The mammograms mdb005, mdb132, and mdb144 each contain two mass regions. The two masses in mammogram mdb005 were heavily overlapped, so they were processed together as a single one. The two mass regions in mdb132 and mdb144 were processed independently. Therefore, thirty-eight regions of interest (ROIs) were analyzed. Instead of automatic extraction, in this study, the ROIs were taken from the mammographic image based on the information provided by the database. The size of each extracted ROI, as well as the center and radius of each mass are listed in the appendix at the end of this paper.
According to the information of "class of abnormality" provided by the database, the thirty-eight ROIs were classified into two categories: well-defined masses (twenty-three cases) and ill-defined masses (fifteen cases). The ROIs including well-defined masses are illustrated in Figure 1 Figure 1, where the pixels in the white region belong to the mass cluster, and pixels in dark region belong to the nonmass cluster. and the annealing factor alpha = 0.9 for the standard DA and proposed DACF algorithms.

Segmentation Results.
Three clustering methods of DACF, DA, and FCM are tested on the twenty-three ROIs with well-defined masses and fifteen ROIs with ill-defined masses to show their performance on mammographic mass segmentation. The decision to choose the cluster that contains mass region is based on the assumption that the suspicious mass area is brighter than its surrounding breast tissues, which is valid for most of the real applications [17]. For the illustration purpose, the clustering results are transformed into binary images, where pixels with gray value 128 belong to the suspicious cluster, and pixels with gray value 0 belong to the nonmass cluster.
In the experiment, the cluster number is set as two for DACF and two to six for DA and FCM (in order to get reasonable results). It is one of the advantages of DACF to use the fixed cluster number (two in our experiment). The values of weighting parameter 1 = 1/4 (for mass region) and 2 = 10 (for background) were applied to the testing dataset; the details of the value selection can be found in the next subsection. Figures 3 and 4 show the segmentation results by the DACF algorithm for the ROIs in Figures 1 and  2, respectively, where pixels with high intensity (gray value 128) belong to the suspicious cluster. The mass region in each ROI is identified as the one with the maximum number of pixels in the suspicious cluster. Figures 5 and 6 show the segmentation results by DA, and the segmentation results by FCM are illustrated in Figures 7 and 8, respectively. From the figures, it can be seen that due to the incorporation of spherical shape information, pixels belonging to the same intensity feature cluster while locating in different positions can be differentiated to certain degrees by DACF. In contrast, standard DA and FCM failed to differentiate them in most cases. Additionally, less number of patched regions was found in the mass cluster by DACF shown in Figures 3 and 4, as compared to that of the standard DA shown in Figures 5 and  6 and FCM shown in Figures 7 and 8.
In order to evaluate the segmentation performance, a quantitative technique was applied to the three clustering algorithms on mammographic masses. The methods used   Figure 2, where the pixels in the white region belong to the mass cluster, and pixels in dark region belong to the nonmass cluster.
to evaluate the quality of image segmentation algorithms can be broadly classified into two groups, supervised and unsupervised approaches. Unsupervised evaluation does not depend on a true segmentation [21], while in supervised evaluation, the difference between a reference segmentation and the output of a segmentation algorithm is computed. (Unsupervised evaluation is stand-alone and objective, which does not request any user intervention. But we will use supervised evaluation in our work due to the following: (1) one of the issues we have to consider is that the unsupervised method may not perform well in comparison evaluation produced by different algorithms and in comparing human versus machine segmentations [22]; (2) another consideration is, in the field of biomedical image analysis, it is common to use supervised but not unsupervised method for the evaluation of image segmentation.) This study chose a supervised evaluation method and the mass boundaries were given by the physician. The probability of segmentation error (PSE) is formulated by [23] where ( ) and ( ) are the priori probability of the object (mass region) or background (nonmass region), respectively. ( | ) is the probability of classifying objects as background, and ( | ) is the probability of classifying background as object. Suppose the pixel number of the mass region in the reference segmentation image is trueobj and the pixel number of the mass region in the segmented image by DACF, DA, or FCM is calobj , then the probabilities are defined as where trueobj is the region of mass in reference image, and calobj is the calculated mass region by DACF, DA, and FCM. Therefore, trueobj ∩ calobj represent the number of pixels in the overlapped mass region between the reference image and calculated image. The computed PSE for DACF, DA, and FCM for the ROIs in Figures 1 and 2 are shown in Tables 1 and 2, respectively. We can see that DACF performs better than DA and FCM algorithms for almost all the ROIs, especially for the cases with comparatively low contrast such as mdb091 and mdb141 for well-defined masses and mdb030 and mdb063 for ill-defined masses. Numerically, for the welldefined cases, an average PSE of 7.18% was obtained by using DACF, as compared to 13.04% and 13.32% by using standard DA and FCM methods, respectively; while for the ill-defined cases, DACF achieved an average PSE of 8.06%, with 5.55%  Figure 1, where the pixels in the white region belong to the mass cluster, and pixels in dark region belong to the nonmass cluster. and 5.27% improvement when compared with standard DA and FCM methods, respectively.

Weighting Parameter Analysis.
The weighting parameter in (6) controls the influence of the geometrical distance and the intensity feature in the dissimilarity measure. It is desirable that the membership of a pixel is close to one if it is located near the center of a mass. For a pixel far away from the center of the mass, its membership to the cluster should be close to zero. Suppose that the two clusters in DACF algorithm are cluster 1 and cluster 2, where cluster 1 represents the mass region and cluster 2 represents the nonmass region. For an intensity-based algorithm, like standard DA or FCM, the membership of a pixel in nonmass region to cluster 1 may approximately equal one if it has the same intensity value as the pixels in mass region. In this situation, it is hoped that the spatial information will be dominant in the objective function of DACF; that is, the weighting parameter of cluster 2 should be large enough such that the influence of intensity-based feature is reduced significantly. For the weighting parameter of cluster 1, it is expected that the value should be small enough such that the intensity-based feature is dominant in the objective function.
However, it is difficult to analyze the two weighting parameters separately since the incorporation of different values of 1 and 2 will lead to different dissimilarity measures. According to the experimental results, we find that the influence of weighting parameter on the segmentation results depends on the image content of each ROI. Experimental results show that better segmentation results are obtained when the value of 1 is smaller than one, and the value of  Figure 2, where the pixels in the white region belong to the mass cluster, and pixels in dark region belong to the nonmass cluster.

Convergence Time Analysis.
The main parameter that affects the convergence time in DACF is the weighting parameter since the number of clusters was fixed (two clusters were used). The convergence time here refers to the total CPU time for clustering. A Duo Core 1.59 GHz laptop with MATLAB 8 was used to run the three clustering algorithms. For DA and FCM, the relative segmentation results were obtained with a wide range of number of clusters from two to six, in order to get reasonable results. Table 3 shows the convergence time of DACF, DA, and FCM for the segmentation results as shown in Figures 3, 5, and 7, respectively while Table 4 shows the convergence time of DACF, DA, and FCM for the segmentation results as shown in Figures 4, 6, and 8, respectively. It can be seen that DACF has less convergence time than DA and FCM, while FCM runs faster as compared to DA.
The convergence speed of DACF is not affected significantly by changing the values of weighting parameters. Basically, the convergence time decreases or keeps approximately constant as the difference between the two weighting parameters increases. Through experiments, it can be seen that the relatively higher difference between the two weighting parameters makes the DACF capable of handling images with more complicated content. In contrast, to handle this situation, larger number of clusters has to be used for both DA and FCM to obtain reasonable segmentation results.

Conclusion
The proposed DACF algorithm offers two advantages for mammographic mass segmentation on extracted ROIs. First, the segmentation ability is improved. The average PSE by DACF is much smaller than those by standard DA and FCM. Additionally, less number of patched regions was found in mass cluster by using DACF. Second, the convergence time is reduced. In DACF, the number of clusters is two, and the optimal segmentation results were obtained by regulating the weighting parameters, with much less convergence time for all the thirty-eight cases as compared to those by DA and FCM. To summarize, DACF is robust against noisy regions and computationally efficient with a fixed number of clusters. Unlike classical clustering methods for image segmentation, the objective function of DACF contains both intensity-based information and geometry-based circular shape function as a means to improve the image data partitions. Experimental results show that the proposed DACF improved the segmentation performance for mammographic images.
It is noted that one of the major limitations of the proposed method is that the current formulation can only deal with two clusters. We will investigate the possibility to incorporate multicircular shape to handle more general cases in the near future. It is also noted that the current study determines the weighting parameter through experiments;   though it works for all the tested cases in this paper, a numerical solution is desirable to improve the intelligence of the proposed method. There is a need to verify the efficiency of the proposed method by performing evaluations on other mammographic image sets, such as the DDSM database [24]. These will be the subjects of our future research on mammographic mass segmentation. What is more, it is worthy to mention that the proposed approach is general, which may be applicable for other biomedical applications like left ventricle segmentation from cardiac magnetic resonance images. We will also investigate this topic in our future research work.    Su Yi from IHPC A * STAR, Singapore, for his valuable help for the modifications of this paper. 200 (516, 577) 62 * In the case of mdb144-a, ROI center is not identical to mass center since the latter is too close to image border.