Segmentation of Brain Tissues from Magnetic Resonance Images Using Adaptively Regularized Kernel-Based Fuzzy C-Means Clustering

An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity.


Introduction
Image segmentation is to partition an image into meaningful nonoverlapping regions with similar features. Segmentation of brain magnetic resonance (MR) images is necessary to differentiate white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Such segmentation is essential for studying anatomical structure changes and brain quantification [1]. It is also a prerequisite for tumor growth modeling as tumors diffuse at different rates according to the surrounding tissues [2]. Due to potential existence of noise, bias field, and partial volume effect, segmentation of brain images remains challenging.
Image segmentation techniques can be roughly categorized into [3] thresholding, region growing, clustering, edge detection, and model-based methods. Clustering is an unsupervised learning strategy that groups similar patterns into clusters and can be hard or soft. Soft clustering is preferred as every pixel can be assigned to all clusters with different membership values [4,5]. The most popular soft clustering methods applied to MR images are [4] fuzzy -means (FCM) clustering [6,7], mixture modeling, and hybrid methods of both.
Although the FCM algorithm comes with good accuracy in the absence of noise, it is sensitive to noise and other imaging artifacts. Therefore, enhancements have been tried to improve its performance by including local spatial and grayscale information [8][9][10][11][12][13][14] which will be briefly elaborated in Section 2.
A mixture model composed of a finite number of Gaussians has been employed for brain MR image segmentation. The main strategy to incorporate the local information into the mixture model is to use hidden Markov random fields for more accurate segmentation [15]. Nikou et al. [16] proposed a hierarchical and spatially constrained mixture model that takes into account spatial information by imposing distinct smoothness priors on the probabilities of each cluster and 2 Computational and Mathematical Methods in Medicine pixel neighborhoods. In [17], a nonparametric Bayesian model for tissue classification of brain MR images known as Dirichlet process mixture model was explored. Nguyen and Wu [18] introduced a way to incorporate spatial information between neighboring pixels into the Gaussian mixture model (GMM).
To be more robust to noise and attain fast convergence, FCM and GMM can be combined. Chatzis and Varvarigou [19] embedded the hidden Markov random field model into the FCM objective function to explore the spatial information. Chatzis [20] introduced a methodology for training finite mixture models under fuzzy clustering principle with a dissimilarity function to incorporate the explicit information into the fuzzy clustering procedure. Recently, Ji et al. [21] employed robust spatially constrained FCM (RSCFCM) algorithm for brain MR image segmentation by introducing a factor for the spatial direction based on the posterior probabilities and prior probabilities.
In [22], Li et al. presented an algorithm for brain tissue classification and bias estimation using a coherent local intensity clustering. Later they explored multiplicative intrinsic component optimization (MICO) [23] to improve the robustness and accuracy of tissue segmentation in the presence of high level bias field.
Generally, the current brain MR image segmentation algorithms suffer from one or more of the following shortcomings: lack of robustness to outliers [8,9,13], high computational cost [8,13,14,16,21], prior adjusting of crucial or many parameters [8][9][10][11]21], limited segmentation accuracy in the presence of high level noise [8,11,19,22,23], and loss of such image details like CSF [9,13,14,21]. In this paper, a new soft clustering framework is to be explored for better handling of the aforementioned segmentation problems.
The rest of this paper is organized as follows. Related work of FCM algorithm is presented in Section 2. The proposed framework is then elaborated in Section 3. Experiments on synthetic and clinical MR images are presented in Section 4. Sections 5 and 6 are devoted to discussion and conclusion, respectively.

Related Work
The FCM algorithm in its original form assigns a membership value to each pixel for all clusters in the image space. For an image with set of grayscales at pixel ( = 1, 2, . . . , ), = { 1 , 2 , . . . , } ⊂ in -dimensional space and cluster centers V = {V 1 , V 2 , . . . , V } with being a positive integer (2 < ≪ ), there is a membership value for each pixel in the th cluster ( = 1, 2, . . . , ). The objective function of the FCM algorithm is [7] where is a weighting exponent to the degree of fuzziness, that is, > 1, and ‖ − V ‖ 2 is the grayscale Euclidean distance between pixel and center V . The membership should be constrained to the following: The membership function and cluster centers are updated iteratively in an alternating process known as alternate optimization. The membership function and cluster centers are As the objective function in (1) does not include any local information, the original FCM is very sensitive to noise and the accuracy of clustering in the presence of noise and image artifacts will decrease. To overcome this problem, Ahmed et al. [8] modified the objective function by adding a term for the spatial information of neighboring pixels. This algorithm is denoted as FCM S with the following objective function: where is a parameter to control the spatial information of neighbors with 0 < ≤ 1, is the set of pixels around pixel , and is the cardinality of . The FCM S algorithm is computationally expensive as the local neighborhood term has to be calculated in each iteration step. To overcome this drawback, Chen and Zhang [10] replaced the term where is the grayscale of a filtered image that could be calculated once in advance, and used kernel function to replace the Euclidean distance. The enhancement could be in two forms, that is, FCM S1 by using the average filter and FCM S2 by adopting the median filter. Their objective function is as follows: Although the accuracy has been improved, it is sensitive to high level noises and different types of noises. In addition, the parameter , which has a great impact on the performance, is set manually with care and requires prior information about noise.
Yang and Tsai [12] proposed a Gaussian kernel-based FCM method with the parameter calculated in every Computational and Mathematical Methods in Medicine 3 iteration to replace for every cluster. Similar to FCM S1 and FCM S2, this method has two forms: GKFCM1 and GKFCM2 for average and median filters, respectively. The parameter is estimated using kernel functions: where is the kernel function. The replacement of with could yield better results than FCM S1 and FCM S2. However, for good estimation of , cluster centers should be well separated which might not be always true; hence the algorithm has to iterate many times to converge. Moreover, the learning scheme requires a large number of patterns and many cluster centers to find the optimal value for .
To tackle the problem of parameter adjustment, Krinidis and Chatzis [13] proposed the FLICM algorithm with a fuzzy factor that combined both spatial and grayscale information of the neighboring pixels. The fuzzy factor = ∑ ∈ , ̸ = (1/(1 + ))(1 − ) ‖ − V ‖ 2 was embedded into (1) as follows: where pixel is the center of the local window, pixel is in the neighborhood, and is the spatial Euclidean distance between pixels and .
Although FLICM algorithm enhances robustness to noise and artifacts, it is slow since the fuzzy factor ( ) has to be calculated in each iteration. Moreover, is heavily affected by spatial Euclidean distance from the central pixel to its neighboring pixels to lose small image details due to the smoothing effect.
To enhance the FLICM algorithm, Gong et al. [14] developed KWFLICM algorithm with a trade-off weighted fuzzy factor to control the local neighbor relationship and replaced the Euclidean distance with kernel function. The weighted fuzzy factoŕof KWFLICM iś where is the trade-off weighted fuzzy factor of pixel in the local window around the central pixel and 1 − ( , V ) is the kernel metric function. The trade-off weighted fuzzy factor combines both the local spatial and grayscale information [14]. Because of the trade-off weighted fuzzy factor, its computational cost increases substantially. In addition, the algorithm is unable to preserve small image details.
In addition to the abovementioned shortcomings, Szilágyi [24] pointed out serious theoretical mistakes in FLICM and KWFLICM. It was shown that the iterative optimization nature of FLICM and KWFLICM did not minimize their objective functions; instead, they iterated until the partition matrices converged. Furthermore, their objective functions intended to employ local contextual information but theoretically failed and were not even suitable for creating a valid partition [24].
To this end, a new way to modify the existing FCM clustering is explored with adaptive regularization for contextual information. The proposed framework utilizes a new parameter to control the effect of pixel neighbors based on the heterogeneity of local grayscale distribution. A weighted image is devised that combines the local contextual information with respect to the heterogeneity of local grayscale distribution and the original grayscale that is calculated once in advance to reduce the computational cost. To improve segmentation accuracy and robustness to outliers, a kernel function is employed to replace the Euclidean distance metric. Validation against both synthetic and clinical MR data has been carried out to compare the proposed algorithms with 6 recent soft clustering algorithms in terms of segmentation accuracy and computational costs.

Proposed Algorithms
We introduce a regularizing parameter to enhance segmentation robustness and preserve image details, devise a weighted image, and adopt the Gaussian radial basis function (GRBF) for better accuracy.

The Introduced Regularization
Term. The parameter used in [8][9][10] is usually set in advance to control the desirable amount of contextual information. Indeed, using a fixed for every pixel is not appropriate since noise level differs from one window to another. In addition, setting such parameter needs prior knowledge about noise which is not always available in reality. Hence, adaptive calculation of is necessary according to the pixel being processed.
To be adaptive to noise level of the pixel being processed, we first calculate the local variation coefficient (LVC) to estimate the discrepancy of grayscales in the local window to be normalized with respect to the local average grayscale. In the presence of noise to have high heterogeneity between the central pixel and its neighbors, LVC will increase. Consider where is the grayscale of any pixel falling in the local window around the pixel , is the cardinality of , and is its mean grayscale. Next, LVC is applied to an exponential function to derive the weights within the local window: The ultimate weight assigned to every pixel is associated with the average grayscale of the local window:  The parameter assigns higher values for those pixels with high LVC (for pixel being brighter than the average grayscale of its neighbors, will be 2+ , and will be large when the sum of LVC within its neighborhood is large) and lower values otherwise. When the local average grayscale is equal to the grayscale of the central pixel, will be zero and the algorithm will behave as the standard FCM algorithm. The value 2 in (11) is set through experiments to balance between the convergence rate and the capability to preserve details. The proposed parameter is embedded into (5) to replace . Figure 1 shows the calculation of with different cases of noise.
Here are some remarks on the parameter . The first point to emphasize is that is only relevant to the grayscales within a specified neighborhood, which is very different from FCM S, FLICM, and KWFLICM, where the contextual information is expressed, respectively, by ∑ ∈ ‖ − V ‖ 2 , , and́, all containing a loop on the neighborhood and the cluster center V . Due to its irrelevance to clustering parameters, could be calculated in advance before the clustering process which can greatly reduce the computational cost. On the contrary, FCM S, FLICM, and KWFLICM will need to update the contextual weights at each iteration, which is the main reason why they have higher computational cost.
The second point is that the contextual information provided by is based on the heterogeneity of grayscale distribution within the local neighborhood, which is completely different from existing enhanced versions of FCM to base the contextual information on the difference between the grayscales of neighboring pixels and cluster centers. As a result, the proposed tends to yield a homogeneous clustering according to local grayscale distribution while existing enhanced FCM algorithms tend to make the clustering to have more homogeneous labels by incorporating the contextual information.

Devising a Weighted
Image. In addition to making , respectively, the grayscale of average/median filter of the original image, can also be replaced with the grayscale of the newly formed weighted image : where and are, respectively, the grayscale and neighborhood of pixel and is the cardinality of . Formula (12) is inspired by the weighted image in [9] but utilizes explicitly to make the weighted image free from parameters that are difficult to adjust.

Measuring Distance Using Kernel Function.
The Euclidean distance metric is generally simple and computationally inexpensive, but it is sensitive to perturbations and outliers. Recently, with popular usage of support vector machine, a new direction appears to use kernel functions. The kernel functions are able to project the data into higher dimensional space where the data could be more easily separated [25]. To do this, a so-called kernel trick has been adopted that can transform linear algorithm to nonlinear one using dot product [26]. Using the kernel trick, the Euclidean distance Computational and Mathematical Methods in Medicine 5 term ‖ − V ‖ 2 can be replaced with ‖ ( ) − (V )‖ 2 that is defined as where is the kernel function.
In this paper, we use GRBF kernel [27]: where is the kernel width. Using GRBF, the kernel function in (13) will be The choice of the kernel width is still a problem and has to be selected carefully. If it is large the exponential effect will be almost linear. On the contrary, if it is small, the cluster boundaries will be sensitive to outliers [27]. In [10], was set to a fixed value of 150 while in [12] the authors used sample variance to estimate . Similar to [14], we calculate based on the distance variances among all pixels: where = ‖ − ‖ is distance from the grayscale of pixel to the grayscale average of all pixels and is the average of all distances .

The Proposed
Framework. The proposed adaptively regularized kernel-based FCM framework is denoted as ARK-FCM. First, we calculate the adaptive regularization parameter associated with every pixel to control the contextual information using (11). The objective function is defined as Under the conditions specified in (2), the minimization of ARKFCM ( , V) can be calculated through an alternate optimization procedure using (derivation is given in the Appendix) When is replaced with the grayscale of the average/median filter of the original image, the algorithm is denoted as ARKFCM 1 /ARKFCM 2 . When is replaced with the weighted image defined in (12), the algorithm is denoted as ARKFCM . The main steps for the proposed algorithms are as follows: (1) Initialize threshold = 0.001, = 2, loop counter = 0, V, and (0) .

Experiments
In this section, we present the experiments on both synthetic and clinical MR images. For validation, the proposed algorithms (ARKFCM 1 , ARKFCM 2 , and ARKFCM ) are compared with 6 recent soft clustering algorithms, namely, GKFCM1 [12], GKFCM2 [12], FLICM [13], KWFLICM [14], MICO [21], and RSCFCM [23]. As we are unable to have faithful implementation of RSCFCM [23], RSCFCM is compared only against the common data with results reported from [23]. The algorithms are implemented using MATLAB software package (a demo version is freely available online (http://www.mathworks.com/matlabcentral/fileexchange/54141-arkfcm-algorithm)). All the experiments are conducted with window size of 3 × 3 pixels, maximum number of iterations = 100, and = 0.001. The accuracy of segmentation is measured using the Jaccard Similarity (JS) metric [28] which is defined as the ratio between the intersection and union of segmented volume 1 and ground truth volume 2 :

Experiments on Synthetic Brain MR Images.
The following experiments are carried out using Simulated Brain Database (SBD) [29] which contains a set of realistic MR volumes produced by an MR imaging simulator with ground truths of CSF, GM, and WM available. The first experiment is to segment a T1-weighted axial slice (number 100) with 217 × 181 pixels corrupted with 7% noise and 20% grayscale nonuniformity into WM, GM, and CSF. Figure 2 shows the segmentation results while Table 1 summarizes the JS and average running times.
The second experiment is to segment a T1-weighted sagittal slice (number 100) with 181 × 217 pixels corrupted with 7% noise and 20% grayscale nonuniformity. This image is chosen to show the capability of preserving details.    The segmentation results and JS are presented in Figure 3 and Table 2, respectively. The third experiment is to check the robustness to Rician noise that commonly affects MR images [30]. The segmentation results and the JS with average running times of a T1-weighted axial slice (number 91) with 217 × 181 pixels corrupted with 10% Rician noise are given in Figure 4 and Table 3, respectively.
From black to white are, respectively, background, CSF, GM, and WM. It should be noted that clustering is carried out only for CSF, GM, and WM, with the pathology region being considered as background.
Since images Brats1 and Brats2 come only with ground truth for the pathology and have no ground truth for normal tissues, a quantification measure other than JS should be used. To this end, an entropy-based metric proposed by Zhang et al. [31] is adopted. This is to maximize the uniformity of pixels within each segmented region and to minimize the uniformity across the regions. First, the entropy of every region is calculated: Computational and Mathematical Methods in Medicine 7  where is the set of all possible grayscales in region , ( ) is the number of pixels belonging to region with grayscale , and is the area of region . For the grayscale image , the measure is calculated as where the first term represents the expected region entropy of the segmentation and the second term is the layout entropy. The measure yields smaller value for better segmentation and higher otherwise. Table 4 summarizes the segmentation accuracy in terms of and running times for Brats1 and Brats2. It should be noted that the metric is calculated only for WM, GM, and CSF regions without considering the background.

Discussion
FCM clustering is a well-known soft clustering method that assigns a membership degree for each pixel to every cluster. As the segmentation accuracy of FCM algorithm decays in the presence of noise, artifacts, and increased number of clusters, many investigations have been carried out on using contextual information to enhance the quality of segmentation. But how the contextual information shall be used effectively is still a challenge.
We propose an adaptively regularized kernel-based FCM clustering framework, with new parameter that adaptively controls the contextual information according to the heterogeneity of grayscale distribution within the local neighborhood. The new parameter is estimated using local variation coefficient among pixels within a specified neighborhood. A weighted image is devised that combines the original 8 Computational and Mathematical Methods in Medicine   image and the parameter to represent the image contextual information embedded through the weighting procedure. Furthermore, a GRBF is adopted to replace Euclidean distance for better partitioning and to be less sensitive to outliers. The proposed framework can be in the form of 3 algorithms: ARKFCM 1 , ARKFCM 2 , and ARKFCM for the local average grayscale to be replaced with the grayscale of the average filter, median filter, and the devised weighted image, respectively. Experiments have been carefully carried out to show the superiority of the proposed algorithms in comparison with 6 recent soft clustering algorithms to be discussed further.

Segmentation
Accuracy. The differences in JS between the proposed algorithms and the other 6 algorithms are clear for the axial slice with 7% noise and 20% grayscale nonuniformity (Table 1 and Figure 2) and become more distinctive in preserving details in the sagittal slice corrupted with the same noise (Table 2 and Figure 3). The average JSs of ARKFCM 1 , ARKFCM 2 , and ARKFCM are, respectively, 0.889, 0.892, and 0.891 for the axial slice as shown in Figure 2, which are better than the other 6 algorithms with range 1.3-4.8% (Table 1). For the sagittal slice with 7% noise and 20% grayscale nonuniformity shown in Figure 3, the average JSs of ARKFCM 1 , ARKFCM 2 , and ARKFCM are, respectively, 0.824, 0.825, and 0.825, which are better than the other algorithms with range 1.4-8.9% (Table 2). For the axial slice corrupted with 10% Rician noise shown in Figure 4, the average JSs of ARKFCM 1 , ARKFCM 2 , and ARKFCM are, respectively, 0.884, 0.886, and 0.881, which are better than the other algorithms with range 1.20-3.6% (Table 3). The higher JSs may imply that the proposed algorithms achieve a better balance in preserving image details in the presence of noise and grayscale inhomogeneity.
For the clinical brain MR images with tumors (Brats1 and Brats2) shown in Figures 5 and 6, the proposed algorithms achieve smaller than other algorithms (Table 4)

Computational Cost.
In terms of computational cost, the objective function of FCM algorithm in its original form [6] contains only the difference between the grayscale of the current pixel and the cluster centers V . This is basically to cluster grayscales as there is no spatial information, so it has the smallest computational cost and can be implemented based on grayscale histogram to reduce the computational cost further [9,11]. The enhancement of original FCM is to add the local contextual information to make it robust to noise and image artifacts at the expense of increased computational cost [7][8][9][10][11][12][13][14]. The GKFCM1 and GKFCM2 algorithms [12] use the parameter to replace and need an additional loop on the number of clusters to be calculated at each pixel to update the local contextual information. So they have higher computational cost than the original FCM, FCM S1, and FCM S2 at each iteration.
The FLICM algorithm [13] introduces which needs an additional loop on the neighborhood of the current pixel to calculate the local information in every iteration; thus it has a high computational cost. As an extension of FLICM, KWFLICM [14] introduceśthat requires two additional loops on the neighborhood, so it has the highest computational cost at each iteration.
The RSCFCM algorithm [21] uses a spatial fuzzy factor that is constructed based on the posterior and prior probabilities and takes the spatial direction into account. That increases the complexity as many parameters have to be optimized and consequently the computational complexity.
MICO algorithm [23] comes with fast calculations due to the convexity of its energy function particularly in the presence of less noise but tends to have many iterations in the presence of high level noise.
The proposed algorithms incorporate the local contextual information by introducing LVC, which is a measure of grayscale heterogeneity and has nothing to do with the cluster centers. So it can be calculated once in advance and hence reduce the complexity of the clustering procedure.
The eventual computational cost will be the multiplication of the computational cost for each iteration and the number of iterations for convergence. The number of iterations will be dependent on the initialization as well as the objective function. To make fair comparisons, initializations are all set randomly and the average number of iterations for convergence is then recorded for 10 converged times. From Figure 7, it can be seen that the average iteration times will be data dependent, with ARKFCM 2 , ARKFCM , and ARKFCM 1 having the minimum number of iterations followed by FLICM, GKFCM2, and GKFCM1, respectively.
The running times for the tested images agree well with the above analysis as illustrated in Figure 8. For SBD images given in Figures 2, 3, and 4, the KWFLICM algorithm takes the longest time (124-166 seconds), followed by FLICM (3-5.4 seconds), RSCFCM (about 2.5 seconds), GKFCM1/ GKFCM2 (0.6-1.9 seconds), and the proposed algorithms (0.22 to 0.36 seconds). For the clinical MR images in Figures  5 and 6, it takes more time (due to more iterations) as the images are more complicated, but the trend remains the same with the proposed algorithms having the lowest computational cost (Table 4 and Figure 8).

Neighborhood
Size. The local neighborhood window size is a crucial factor to determine the smoothness of clustering and details to be preserved. We have experimented with different window sizes and found that a window size of 3 × 3 pixels achieves the best balance between segmentation accuracy and computational cost. Increasing window size to 5 × 5 pixels has very small impact on the JS but 7 × 7 pixels            or more will, significantly, decrease the accuracy, such as losing image details like CSF. Therefore, it is recommended to use a local window of size 3 × 3 pixels for constructing the neighborhood.

Limitation.
From the experiments, it was found that the proposed algorithms could be sensitive to severe noise in small areas between edges that have width 1 or 2 pixels (e.g., area between ventricles, Figures 4(h), 4(i), and 4(j)). Potentially, this may be solved by embedding edge detection technique into the clustering process which is yet to be explored.

Conclusion
An adaptively regularized kernel-based FCM framework has been proposed to enhance the original FCM for higher segmentation accuracy at low computational cost. The framework can be in the form of three algorithms that employ the heterogeneity of grayscales in the neighborhood employed for local contextual information. The main advantages are adaptiveness to local context, enhanced robustness, and independence of clustering parameters to decrease computational cost. GRBF kernel has been adopted as a distance metric. We validated the proposed algorithms on synthetic and clinical MR images with tumors. The proposed algorithms attain a higher JS (Tables 1, 2, and 3) and lower entropy measure (Table 4) than the 6 recent soft clustering algorithms and could preserve small image details (Figures 2, 3, 4, 5, and 6). In addition, the proposed algorithms have a low computational cost and are, to the best of our knowledge, the only algorithms that are adaptive to local context and do not include cluster centers. Therefore, they attain a trade-off between high segmentation accuracy and low computational cost. The proposed algorithms can be a potential tool for segmenting brain MR images for further processing and other images as well.

Appendix
The objective function ARKFCM given in equation (17) with conditions in (2) is a constrained minimization problem which can be solved using Lagrange multiplier method. Let the minimization problem of ARKFCM be written as follows: Similarly, by taking the derivative of ARKFCM with respect to V and setting the result to zero, we have