Brain MRI Segmentation with Multiphase Minimal Partitioning: A Comparative Study

This paper presents the implementation and quantitative evaluation of a multiphase three-dimensional deformable model in a level set framework for automated segmentation of brain MRIs. The segmentation algorithm performs an optimal partitioning of three-dimensional data based on homogeneity measures that naturally evolves to the extraction of different tissue types in the brain. Random seed initialization was used to minimize the sensitivity of the method to initial conditions while avoiding the need for a priori information. This random initialization ensures robustness of the method with respect to the initialization and the minimization set up. Postprocessing corrections with morphological operators were applied to refine the details of the global segmentation method. A clinical study was performed on a database of 10 adult brain MRI volumes to compare the level set segmentation to three other methods: “idealized” intensity thresholding, fuzzy connectedness, and an expectation maximization classification using hidden Markov random fields. Quantitative evaluation of segmentation accuracy was performed with comparison to manual segmentation computing true positive and false positive volume fractions. A statistical comparison of the segmentation methods was performed through a Wilcoxon analysis of these error rates and results showed very high quality and stability of the multiphase three-dimensional level set method.


INTRODUCTION
Segmentation of three-dimensional anatomical brain images into tissue classes has applications in both clinical and research settings. Although numerous methods to segment brain MRI for extraction of cortical white matter, gray matter, and cerebrospinal fluid (CSF) have been proposed for the past two decades, little work has been done to evaluate and compare the performance of different segmentation methods on real clinical data sets, especially for CSF.
Segmenting three-dimensional anatomical brain images into tissue classes has applications in both clinical and research settings. As a clinical example, segmentation can provide volumetric quantification of cortical atrophy and thus aid in the diagnosis of degenerative diseases. These volumetric measurements apply to the research setting as well, where segmentation can also be used to define regions of interest for quantifying the physiological responses measured with fMRI or PET acquired on the same patients and coregistered with the MRI data. Clinical studies based on quantitative measurements of cortical brain structures include Alzheimer's disease [1][2][3], epilepsy [4], schizophrenia [5], cerebrovascular deficiency [6], and multiple sclerosis [7].
Several methods have been proposed in the literature to segment brain MRI. A good review of these methods can be found in [8] and we can distinguish two general families.
(1) Statistical methods The first family of methods is based on classification of brain tissues into different classes, based on intensity values (direct values of features computed from these values). Gray values thresholding is the most intuitive classification approach [9]. One common difficulty with this method is the selection 2 International Journal of Biomedical Imaging of the threshold level. Many selection approaches have been proposed based on histograms [10,11], combination of morphological operators and region growing [12], and so forth. Derived from a statistical framework, Bayesian analysis [13,14] is a popular classification method for brain tissue where automatic segmentation is performed with expectation maximization (EM) [4]. The freely distributed statistical parametric mapping (SPM) software tool [15] is widely used by neuroradiologists for clinical research. Additional classification methods include clustering [16,17], fuzzy classification [18], neural networks [19], deterministic annealing [7]. Hybrid "neuro-fuzzy" methods, see [2,20,21], combining fuzzy logic and neural networks perform an unsupervised learning process. Another class of statistical methods is based on Markov Random Field (MRF) [22][23][24][25]. MRF model encodes spatial information through the mutual influences of neighboring voxels for class assignments. A major issue with MRF-based classification methods is the requirements for training the model and setting the MRF parameters which typically require supervised learning and a priori information from manual labeling or from an atlas. In this context, classification and nonuniform registration are sometimes combined together for more robustness [1,26]. Finally, these statistical methods can be applied to multispectral MRI (i.e., MRI data of the same patient acquired with different protocols such as T1-weighted, T2-weighted, proton diffusion) using multivariate statistics. Several applications of multispectral brain MRI classification have been presented in the literature [25][26][27].
(2) Deformable models The second family of segmentation methods deals with geometric deformable models, including active surfaces [28] and level-set-based deformable models. The level set implementation framework for surface propagation offers the advantages of easy initialization, computational efficiency, and the ability to capture deep sulcal folds. Two coupled level set surfaces were proposed by Zeng et al. [29] for cortex segmentation from 3-D MR images, assuming a constant thickness range of the cortical mantle. A combination of jointprior shape appearance models with a level set deformable model was proposed by Yang and Duncan [30]. This method was motivated by the observation that the shapes and gray levels variations in an image had some consistent relations building a MAP shape-appearance prior model provided some configurations and context information to assist the segmentation process. The model was formulated in a level set framework rather than using landmark points for parametric shape description. Goldenberg et al. [31] proposed a geometric variational formulation for the propagation of two coupled bounding surfaces, similar to Zeng et al. [29]. The authors put forward an efficient numerical scheme for the implementation of a geodesic active surface model. Several external driving forces, derived from brain MRI, have been proposed based on image gradient [32], image intensity [33], and probability density function [28] of tissue classes. Combining classification and deformable models has been proposed by Ballester et al. [34] combining Bayesian analysis and active surface method, and Shen et al. [35] combining geometric deformable model and statistical information about the shapes of interest.
Automation of the segmentation process is critical for applications in clinical research where the number of cases to process is large and the time available for experts to analyze the data is very limited. Several of the aforementioned methods were developed in a supervised or semi-automated framework still requiring operators' intervention. Full automation can be achieved using automatic parameter tuning [10], automated initialization [12], and combination with atlas information [4].
This paper presents the comparison of the three "classical" segmentation methods: histogram-based thresholding, tissue classification based on fuzzy connectedness, and maximum-likelihood classification with hidden MRF using the FSL-FAST software tool. We also present in this paper the implementation of a new three-dimensional automated segmentation method of brain MRI using a four-phase threedimensional active contour implemented with a level set framework. This multiphase level set framework was initially proposed by Vese and Chan [36] to simultaneously deform coupled level set functions without any prior models or shape constraints. This framework achieves a global partitioning of the image data into 2 N homogeneous areas using N level set curves, solely based on average gray values measures.
These four segmentation methods were applied to a set of ten clinical T1-weighted MRIs for segmentation of cortical tissues: white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Segmentation errors are reported with comparison to manual labeling. The segmentation methods were also compared in a statistical framework to assess their relative performance and overall "quality."

METHOD
We present in this section the four segmentation methods that were applied to ten brains T1-weighted MRI and compared together. Manually labeled data was available for this comparative study. This "ground truth" was used to optimize the parameter settings of the three "classical" segmentation methods, as detailed below.

Intensity thresholding
Intensity thresholding (IT) is the easiest and fastest segmentation method, often adopted for preprocessing of medical images and preregistration problems [9]. Segmentation of the three brain cortical tissues is performed via thresholding of voxel values within adjacent intervals. The position of the interval bounds was initialized as follows: we used the manually labeled data to mask the MRI data and compute the means of the three cortical tissues of interest. These mean values were used to initialize the threshold values at the two interfaces CSF/GM and GM/WM. The Tanimoto index [37] was calculated according to the manually labeled data. Elsa D. Angelini et al. 3 Finally, the simplex method [38] was applied to maximize the Tanimoto value and identify optimal interval bounds corresponding to a minimization of the segmentation error.

Fuzzy connectedness
Classification of homogeneous objects using fuzzy connectedness (FC) was introduced by Falcão et al. [18]. This method can be used to extract homogeneous tissues in a volume given an initial set of seed points and prior statistics defining these tissues. The source code for this method was obtained from the freely distributed insight segmentation and registration toolkit (ITK) [39] sponsored by the National Library of Medicine. Segmentation of the different tissues was performed via thresholding of the real-valued fuzzy connectedness maps. The threshold levels were also determined via minimization of the segmentation error using the simplex method [38].

Hidden Markov random field-expectation maximization
The FAST software tool [40] is a publicly available automated segmentation tool for brain MRI volumes. This software tool is part of the FSL comprehensive library of functional and structural brain image analysis tools, distributed by the Oxford Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB) at the University of Oxford, UK. Brain MRI volumes were segmented into three tissue types (WM, GM, and CSF) with simultaneous correction of RF bias-field. The segmentation process is based on hidden Markov random field (HMRF) models. Fitting of the HMRF model to the data was performed via maximum likelihood with expectation maximization (EM). A thorough description of the algorithm was published in [41].

Multiphase three-dimensional level set
The multiphase three-dimensional level set (M3DLS) segmentation method performing a minimal partitioning of the image data into piecewise constant objects, based on the Mumford-Shah functional [42], was introduced by Chan and Vese [43].

Energy functional
This method uses a deformable model controlled by a homogeneity-based energy functional to segment piecewise constant or piecewise smooth volumetric data u 0 . Assuming a piecewise-constant data with an object, of value c 1 , and a background, of value c 2 , separated by the contours C, the proposed energy functional is defined as F C, c 1 , c 2 = μ length (C) + υ area (inside C) where Segmentation of the data is performed via minimization of the functional F with respect to (C, c 1 , c 2 ). This energy functional can be extended to the segmentation of multiple homogeneous objects in the image by using several curves {c 1 , c 2 , . . . , c i }. In the case of two curves the following energy functional is used: The set of parameters (λ 1 , λ 2 , λ 3 , λ 4 , μ 1 , ν 1 , μ 2 , ν 2 ) takes real positive values. The two closed curves (c 1 , c 2 ) split the domain Ω into four phases defined by their relative positions as illustrated in Figure 1. Inside these four phases, u 0 has mean intensity values (c 00 , c 01 , c 10 , c 11 ).
Minimization of this energy functional deforms simultaneously the two curves and identifies four homogeneous areas defined by the intersection of the two curves.

Level set formulation
Minimization of the functional in (1) and (2) can be performed within a level set framework. This framework, introduced by Osher and Sethian [44], provides an effective implicit representation for evolving curves and surfaces, which has found many applications in image segmentation, denoising and restoration as reviewed in [45]. In this framework, a given curve C (being now the boundary ∂w of an open set ω ∈ Ω) is represented implicitly, as the zero level set of a scalar Lipschitz function φ : Ω → R (called level set function), such that International Journal of Biomedical Imaging The level set function φ is typically defined as the signed distance function of spatial points defined on Ω to the curve C. Once φ is computed, we define its associated Heaviside function H and Dirac function δ as Using these two functions, the different components of the functional in (1), parameterized with the contour curve C, can be reformulated as integrals, parameterized with the level set function φ and defined over the entire domain Ω.
(a) Length of the curve C: (b) Area inside the curve C: (c) Homogeneity of u 0 inside and outside the curve C: Minimization of the energy functional leads to the expression of the associated Euler-Lagrange equations for derivatives with respect to (c 1 , c 2 , φ). The first two derivatives, with respect to (c 1 , c 2 ), lead to the following results, used for computation of the mean statistics: The third partial derivative of F(φ, c 1 , c 2 ), with respect to φ, leads to the following dynamic equation: This segmentation framework is extended to the detection of multiple objects via the introduction of multiple level set functions {φ 1 , φ 2 , . . . } and the computation of mean data values in areas of constant values defined via the combination of their Heaviside functions (H(φ 1 ) × H(φ 2 ) × · · · ). In this study we implemented the segmentation functional with two level set functions generating four phases. We can introduce the binary functions χ (also called characteristic functions) that define the four regions: The Euler-Lagrange equations, for the four-phase configuration, lead to the following equations defining the mean values of each phase: The partial derivatives with respect to the two level set functions define the following system of equations: We present in the next section details regarding the numerical implementation of this system.

Numerical implementation
Segmentation was performed via iterations of the system of equations defined in (12), deforming the two level set fronts (φ 1 , φ 2 ) until convergence to a stable position was reached.
The iterative scheme was organized as follows.
(1) Initialize the system for time n = 0, with (φ n 1 , φ n 2 ) defined as the distance functions from an initial set of curves.
(2) For time n > 0, while the system is unstable, (a) compute the average values in the four phases (c n 11 , c n 10 , c n 01 , c n 00 ); (b) compute the curvature and homogeneity terms of the speed, defined for each points in the spatial domain Ω; (c) compute (φ n+1 1 , φ n+1 2 ) from (12) with the speed term (from (b)) and (φ n 1 , φ n 2 ); (d) evaluate the stability of the system. Iterate at time n + 1 if the system is not stable.
Elsa D. Angelini et al.

5
(3) When the system is stable, extract the four phases as binary volumes corresponding to (χ n+1 11 , χ n+1 10 , χ n+1 01 , χ n+1 00 ). In our implementation, the segmentation was initialized with two level set functions defined as the distance function from two sets of initial curves. The curves were defined as 64 cylinders centered at regularly spaced seed locations across the entire data volume. The two sets of cylinders were slightly shifted from each other as illustrated in Figure 2.
Note that such initialization does not use any a priori information on the location of particular tissues or on the anatomy of the brain and does not require manual input by the user. As commented by Vese and Chan [36], this type of initialization also brings some robustness to the method, limiting the risks of convergence of the minimization process to local minima.
The level set algorithm was implemented with a regularized Heaviside function and a semi-implicit scheme as proposed by Vese and Chan [36] and extended to three dimensions. In this scheme, the speed term at time n was computed with the curvature derived semi-implicitly (us- 2 ) while the homogeneity force term was computed explicitly with (c n i ) i=1,2 . This semi-implicit scheme provides unconditional stability for any temporal and spatial discretization parameters. This means that we can set the time increment to an arbitrary high value to speed up the segmentation process without altering the stability of the minimization process.

Addressing the inhomogeneity issue
All four segmentation methods tested in this work perform a partitioning of the volumetric data into three tissue classes and a background relying on a strong assumption of tissue homogeneity for WM, GM, and CSF. The notion of homogeneity is then translated into a statistical framework (HMRF-EM, homogeneous tissue is characterized by its mean and variance), a fuzzy connectivity (FC, homogeneous tissue is characterized by high affinity measures to a prior seed point given a statistical model), or a distance measure (M3DLS, homogeneous tissue is characterized by its mean intensity value; IT, interval bounding values).
Even though homogeneity-based segmentation methods are widely used for brain MRI segmentation, it is well known that there are strong tissue inhomogeneities in MRI volumes of the following four origins: (1) biological tissues are inherently heterogeneous with internal structures and multimolecular components which contribution are integrated in the recorded MRI signal, (2) the MRI acquisition system is degraded by inherent statistical noise, (3) design of the MRI acquisition system suffers from inhomogeneity of the radiofrequency (RF) field (this phenomenon is also referred to as RF bias-field), (4) MRI imaging systems have limited spatial resolutions which generate partial volume effects (i.e., mixing of MRI signals from adjacent tissues) at tissue interface locations.
The first two sources are inherent to the modality but have minor impacts on the cortical structures that we are trying to segment. The other two sources generate artifacts that tend to become undetectable by simple visual inspection as MRI scanner technology improves, but that still constitute the major source of error and failure of homogeneity-based segmentation algorithms.
(a) RF bias-field: elaborated algorithm has been developed to correct RF bias-field [46][47][48][49]. We compared two methods commonly used and available in freesoftware packages. The first method was proposed by Styner et al. [47] and is available in the ITK package [50]. The second method was proposed by Ashburner and Friston [46] and is available in the FSL software package [40]. (b) Partial volume effect introduces inhomogeneities at tissue interfaces that can be modeled in a statistical framework by manipulating tissue mixture models as in [24,51,52]. Mixed classes are created and if necessary assigned to one of the "true" tissue classes constituting the mix. Neither the statistical HMRF-EM method nor the three other methods tested in this work included such correction and initial experiments on clinical data sets revealed misclassification of tissue labels for voxels located at tissue interfaces. We therefore derived a postprocessing sequence, based on morphological operators to correct for misclassifications at interfaces between GM and WM and between the CSF and WM or GM.

Postprocessing
A simple postprocessing scheme was designed to correct for pixel assignment at tissue interfaces. After the level set segmentation was completed, WM, GM, and CSF structures, corresponding to separate phases, were saved as binary volumes. These volumes were then used as masks applied to the original data and a Gaussian fit of the histograms of each phase was performed. Each phase was then characterized by the mean and standard deviation (μ phase , σ phase ) of the fitted Gaussian distribution. Interface voxels were tested against the 6 International Journal of Biomedical Imaging phase parameters as follows. First, the GM mask was dilated, to correct for the under segmentation of this thin structure. An interval of intensity values of [μ GM ± 3σ GM ] was selected.
All new voxels, from the dilated mask, with intensity values inside this interval were included in the GM phase while voxels with intensity values outside the interval were removed from the phase and assigned to the adjacent WM phase. This process was iterated until no new points were added to the GM phase. A similar process was then applied to the CSF phase with dilation of the binary mask. An interval of intensity values of [μ CSF ± 4σ CSF ] was selected. Finally, a 3D connectivity algorithm was performed to remove isolated voxels in the three phases. This simple postprocessing approach has provided very robust performance on the ten clinical MRI cases segmented for this study. A second postprocessing correction was needed to compare results from the different segmentation methods to manual labeling. Indeed, manual labeling did not include sulcal CSF and labeled the outer most layer of cortical brain tissue as gray matter. This was an arbitrary assignment made by the manual expert, that does not take into account partial volume effects in these voxels. A similar arbitrary classification had to be applied to the segmented data, before comparison to manual labels. This problem was especially important for the FAST HMRF-EM classification which is based on an anatomical model with sulcal CSF, leading to overestimation of the CSF on the MRI data. This oversegmentation of sulcal CSF lead to very high false positive errors when compared to manual labeling. Sulcal CSF was removed from the segmented data, after postprocessing of interface voxels, for all segmentation methods, as follows. Given the spatial resolution of the data (0.86 mm 2 × 3 mm slice thickness), a 2-voxels dilation of the manually labeled CSF ventricles was performed in axial views, and one 1-voxel dilation was performed in the longitudinal direction. The dilated ventricle masks were applied to the segmented CSF mask and segmented voxels outside the dilated manual mask were assigned to the GM phase.

Clinical T1-weighted MRI
We applied our segmentation to one phantom and ten T1weighted MRI volumes acquired on healthy young volunteers. Axial slices were 1.5 mm thick with an in-plane (hyphen) resolution of 0.86 mm. These images were resliced coronally (3 mm slice thickness) and labeled via a laborintensive (40 hours per brain) manual method in which expert raters with extensive training in neuroanatomy choose histogram thresholds on locally hand-drawn regions of interest. This labeled data was used as a "ground truth" for evaluation of the segmentation accuracy.
MRI volumes were preprocessed to remove all noncortical brain tissue by using the manually labeled data sets as binary masks. This preprocessing is illustrated in Figure 3.
To determine the practicality of masking out subcortical gray matter on naïve images, we constructed from a library of labeled atlases two probabilistic atlases in which each voxel was assigned a likelihood of being made of cortical gray matter or subcortical gray matter. Less than 0.1% of the voxels in the whole brain simultaneously had over 20% chance of being made of both gray matter and subcortical gray matter. Such statistical finding confirms that one can apply a population-based method for masking out subcortical gray matter for the purpose of applying cortical segmentation methods without introducing significant errors.

Mathematical brain phantom
A mathematical brain phantom was built from one manually labeled MRI data set to validate the performance of the multiphase level set segmentation algorithm in an ideal case. A constant intensity value was applied for each tissue corresponding to the average intensity value of the MRI data under the manual mask. This phantom data is illustrated in Figure 3.

Validity of homogeneity hypothesis
Prior to segmentation, we evaluated the homogeneity of the three cortical tissues: WM, GM, and CSF, computing the mean and variance statistics of the intensity distribution for each tissue within each slice. These statistical measurements were performed on axial slices across the entire data volumes for the ten MRI cases available for the study, masking the data with the manually labeled data. Results, illustrated in Figure 4 for three typical cases, showed very stable estimates of intensity mean and variance Elsa D. Angelini et al.  values for each tissue across the whole volume, corroborating the accuracy of the homogeneity assumption for the three tissues, and the absence of strong bias-field inhomogeneity. Lower mean intensity values on extremity slices were computed on small tissue samples (less than 10 voxels) corresponding to small anatomical structures with relatively high partial volume effects.
A Gaussian fit was performed on the histogram of the entire gray level distribution of the three brain tissues for each case. We observed that the three tissue types have wellseparated average values suggesting that global homogeneity measurements could separate tissue types for each patient. Nevertheless, the agreement between the volume histograms and the fitted Gaussian distribution was evaluated with a chisquared test. Results for the different tissues did not show a systematic agreement between the data and the Gaussian fit, at level 0.05, except for the gray matter. Therefore, despite reasonable homogeneity, we need further investigation before being able to introduce additional constraints based on a priori Gaussian statistics to the method as proposed, for example, by Baillard et al. [28].

Quantitative evaluation of segmentation performance
Segmentation errors were measured using the recent methodology from Udupa et al. [39] for comparison of segmenta- tion methods. Accuracy of the object contours obtained with the proposed segmentation method, referred to as C Seg , was evaluated by comparing the results to our "ground truth" segmentation of each object, using manually labeled contours, referred to as C ML . The overlap and difference between the two contours were measured via counting the true positive (TP), false positive (FP), and false negative (FN) voxels as illustrated in Figure 5. These quantities are reported as volume fractions (VF) of the true delineated object volume C ML as follows.

8
International Journal of Biomedical Imaging (i) FNVF = fraction of tissue defined in C ML that was missed by the segmentation method C Seg . (ii) FPVF = fraction of tissue defined in C Seg falsely identified by the segmentation method. (iii) TPVF = fraction of the total amount of tissue in the segmentation C Seg which overlaps with the true object C ML .
We point out here that TPVF + FNVF = 1, so that we only need to observe TPVF and FPVF measures to evaluate the segmentation method.

Evaluation of bias-field correction
In order to evaluate the two bias-field correction methods and their effects on the homogeneity of the tissues, we applied the intensity thresholding (IT) segmentation before and after bias-field correction, providing three groups of results for the ten MRI cases: segmentation of original data (RAW), segmentation of corrected data with the ITK software tool (ITK), and segmentation of corrected data with the FSL software tool (FSL). The IT segmentation method was selected as it only relies on good separation of the tissue range of intensities, without any additional constraint or model. The FSL bias-field correction did not need any parameter setting, and the corrected data was generated along with the segmentation via HMRF-EM. The bias-field correction with the ITK toolkit used the following parameters: mean and standard deviation of the three tissue classes were calculated from the "ground truth," the input and output mask with nonbrain tissues removed were also generated from the "ground truth," the degree of the method was set to 2, growing parameter was set to 1.05, shrinking parameter was set to 0.9 (suggested values), and a maximum of 2000 iterations were performed with an initial step size of 1.02.
Effects of bias-field correction on IT segmentation are illustrated with bar plots of TPVF, in Figure 6.
We observed that the bias-field correction methods did not improve and even degraded the TPVF accuracy of the IT segmentation method. The ITK method provided similar accuracy for all cases and all tissues while the FSL method tended to degrade TP performance, except for the CSF on one MRI case. Based on homogeneity results presented in Figure 4, these results and the difficulty involved with parameter settings of the bias-field correction methods, we decided to exclude bias-field correction for this study.

Comparison of segmentation methods
The level set segmentation was initialized for the phantom and MRI clinical cases with two sets of regularly spaced cylinders as illustrated in Figure 2. A stable behavior of the four phases was observed after 10 iterations.

Brain phantom
Segmentation of the brain phantom was performed with three phases corresponding to the cortical tissues and the 4th phase to the background. In Table 1, we observed almost perfect segmentation of the brain structures, demonstrating the inherent ability of the multiphase level set framework, initialized with small regular cylinders, to the following.
(a) Identify the four homogeneous phases. (b) Extract highly convoluted surfaces.
(c) Perform topology splitting and merging of the evolving front, enabling the identification of separate structures within a single phase (such as the three ventricles for the CSF) that exist as different spatial objects.
Computation of homogeneity-based speed terms for phases with small structures, such as the CSF, suffered from higher estimation inaccuracy. In the absence of a curvature term, the homogeneity-based speed terms generate a k-means classification of the image data with poor quality extraction of these small structures. In this context, the curvature term of the deformable model plays a critical role in the M3DLS to preserve small structures shapes and sizes. Partial volume effect was not simulated in the phantom and therefore no postprocessing was applied. We used this phantom to tune the parameters of the segmentation method set to The parameter associated with the curvature term is defined proportional to the data volume size (Volume size) and inversely proportional to the diagonal distance of the volume data (Diagonal distance). By doing so, we consider this diagonal distance as the unitary distance of our domain of definition Ω. Setting the constant speed term υ to zero eliminates the use of a constant inflating force on the model. This type of constant force should be used with caution as it can override the homogeneity-based speed term when driving the deformable contour and force it to move in only one direction (i.e., constant inflation or deflation).

Clinical brain MRI
We present in Figure 7 a three-dimensional rendering of each segmented structure for one MRI clinical case segmented for this study. Visual rendering of the three cortical structures confirmed the overall high performance of the multiphase segmentation method to extract homogenous objects that correspond to distinct anatomical tissues. Elsa D. Angelini et al.  The segmentation method was able to handle multiple challenges without any a priori information or shape constraints that include the extraction of highly convoluted white matter surfaces, the extraction of separate ventricular structures for the CSF, and handling of different volume sizes of the three structures in a simultaneous segmentation scheme.
Accuracy of the object contours obtained with the proposed multiphase level set deformable model and the three other segmentation methods was evaluated by comparing segmentation results to the manual ground truth. Bar plots of error measurements for the segmentation of the ten clinical cases and average errors are reported in Figures 8,  9, and 10. Because TPVF and FNVF are both relative measures with respect to the ground truth volume, we have TPVF + FNVF = 1. Therefore, we only generated plots for TPVF and FPVF.
We can make the following observations based on the results presented in these graphs for segmentation of GM, characterized as a thin structure with large surface area: HMRF generates the highest TPVF and FC the lowest. IT and M3DSL perform well (TPVF > 85%) and stably over the 10 cases. FC generates over-segmentation of the GM with high FPVF (up to 48%), followed by HMRF-EM (up to 25%). M3DSL (FPVF < 18%) generates low and stable oversegmentation errors, and IT (FPVF < 10%) does not generate this type of error. Overall, IT provides the best performance for TPVF and FPVF, followed by M3DLS, with good performance. FC showed lower sensitivity and HMRF-EM lower selectivity.
We can make the following observations based on the results presented in these graphs for segmentation of WM (characterized as a large structure): IT and M3DSL generated high stable and similar TPVF (TPVF > 90%), FC generated low TPVF on 2 cases, while HMRF-EM systematically generated lower TPVF. All segmentation methods generated very low over-segmentation errors. M3DLS and FC showed FPVF errors up to 10%. Overall, IT and M3DLS showed the highest sensitivity with very good specificity.
We can make the following observations based on the results presented in these graphs for segmentation of the CSF (characterized as small disconnected structures): HMRF-EM was the only segmentation method to show very high TPVF (> 90%). All other three methods generated variable errors over the 10 cases with M3DLS generating the smaller range of errors (50% < TPVF < 80%). In particular, IT performed very poorly, certainly due to the absence of shape constraint. HMRF showed large over-segmentation errors (above 10% and up to 30%), while M3DSL showed very low FPVF errors (< 5%). Overall, HMRF-EM showed perfect sensitivity along with poor selectivity (i.e., over segmented the CSF) and M3DLS showed the best tradeoff of sensitivity versus specificity.
The four segmentation methods were compared statistically using a characteristic index of their performance. For this task, the Tanimoto index (TI) was selected [37]. This index is a quantitative parameter used to evaluate the segmentation results and is defined as Because TI populations do not follow a normal distribution, a nonparametric analysis was performed for the four    methods over the 10 segmented cases, measuring the differences of the TI indexes. Small p values (below 0.05) indicate a significant statistical difference between the methods [53]. Distributions of the TI index over the 10 cases for each method are plotted in Figure 11.
Box limits are placed at the lower and upper quartile values. The median value is indicated by a line inside each box. Whiskers are lines extending from each end of the box to show the extent of the rest of the data (1.5 interquartile ranges). Outliers, identified with red crosses, are data with values beyond the ends of the whiskers. If there is no data outside the whisker, a dot is placed at the bottom whisker.
These results illustrate graphically the average performance and the variability of performance of individual segmentation methods. We observed that M3DLS and HMRF-EM generated the most stable performance over the three tissue types, with mean performance higher with M3DLS for WM and GM.
Evaluation of the statistical differences of results from the four segmentation methods was performed with a Wilcoxon signed rank test for paired data. Significance values for the three tissue types for each pair of segmentation methods are reported in Table 2.
From these results we see the following.
(1) For GM, the two best methods, M3DSL and IT were not statistically different. (2) For WM, the two best methods, M3DSL and IT were statistically different. The two weakest methods, FC and HMRF-EM were not statistically different. (3) For CSF, the two best methods, M3DSL and HMRF-EM were statistically different. On the other hand,

Specific problems for segmentation of the CSF phase
As observed in the error plots and illustrated in Figure 11, all methods except HMRF-EM performed significantly poorly on CSF than on WM and GM, corresponding to under segmentation of the ventricles, whose pixels were assigned to white matter. On the other hand, the HMRF-EM segmentation was very sensitive but provided poor specificity. Very low resolution at the ventricle borders explains in part this result. In addition, manual labeling of the MRI data for the ventricle can also bear some error as localizations of its borders are difficult even for an expert performing manual tracing. We illustrated in Figure 12 CSF segmentation with the different methods with a three-dimensional rendering of the two lateral ventricles and a section of the third ventricle. We also recall that manual labeling is only used as a method of reference, and does not provide a real "ground truth" to the segmentation problem. In that context, Kikinis et al. reported in [54] a variation in volumetric measurements between manual observers in the order of 15% for WM, GM, and CSF.

DISCUSSION
This study used a 3D implementation of a segmentation framework initially proposed by Chan and Vese [43]. A single illustration of the ability of the method to segment a brain MRI slice was illustrated in their paper. No group has previously implemented and tested this method on a whole-brain MRI dataset, with fixed parameter settings, which is a critical aspect for the demonstration of the ability of such a segmentation approach to be an efficient clinical tool. We chose to use a whole-brain MRI data set in order to establish a benchmark of the performance of existing brain MRI segmentation tools, as well as to develop a new automated and robust segmentation tool that would alleviate the need for a priori knowledge such as tissue statistics and the need for manual initialization.
We can compare our segmentation error to results reported by Zeng et al. [29] and Niessen et al. [55]. In Zeng et al. [29], the authors tested their algorithm for the segmentation of frontal lobes on seven high-resolution MRI datasets from a randomly chosen subset of young autistic and control adult subjects. They ran a coupled-surfaces level set algorithm to isolate the brain tissue and segment the cortex. The average TPVF and FPVF for the cortical GM in the frontal lobe were 86.7% and 20.8%. In Niessen et al. [55], a "hyperstack" segmentation method, based on multiscale pixel classification, was tested for 3D brain MRI segmentation. A supervised segmentation framework with manual postediting was applied to a probabilistic brain phantom for estimation of segmentation error. First, a binary segmentation of the brain phantom was performed to evaluate the minimal segmentation error due to partial volume effects. The study reported a volume fraction of misclassified pixels (FPVF+FNVF) around 20% for WM, GM, and CSF. "Hyperstack" segmentation was applied with and without a probabilistic framework. Optimal (FPVF + FNVF) errors were obtained with the probabilistic version reporting 10% for WM, 21% for GM, and 25% for CSF.
In our case we proposed a fully automated segmentation method with no a priori information, leading to the following conclusions.
(a) For GM, we showed that the M3DLS method was statistically equivalent to the idealized IT with average (FPVF + FNVF) equal to 11 The random initialization ensured robustness of the method to variation of user expertise and biased or erroneous input information that could be influenced by variation in image quality or user expertise. The method is automated, given that the set of parameter values selected on the brain phantom is suited for the clinical data.
Comparing the M3DLS method to the other ones, we see automation as a major advantage compared to the supervision required by FC segmentation and stability of performance over clinical cases and tissue types as a major advantage over the HMRF-EM method. The IT was set up with ideal parameters and cannot be considered as a potential segmentation method. It was just proposed to test ideal clustering based on voxel intensity.

Noncortical structures
Using two simultaneous level set functions limits the segmentation process to the extraction of four homogeneous phases, corresponding to four tissue types at most. Segmentation of several other noncortical brain structures including the thalamus, the caudate, the putamen, the palladium, the hippocampus, and the amygdale is important for detection of neurological diseases such as schizophrenia or Alzheimer's disease. As shown in a recent study for manual labeling of whole brain MRI data [56], these structures have greatly overlapping grayscale distributions which explains the failure of simple thresholding methods to perform efficient extraction of these individual more subtle structures. On the other hand, as long as the histograms show reasonable compactness (with a small standard deviation), the homogeneity-based proposed method should be capable of segmenting them into separate phases. For the present study, we only focused on cortical structures. Subcortical structures were therefore removed prior to segmentation using the manually labeled data to mask them, as illustrated in Figure 3. For future applications of the method to data sets without the benefit of manually labeled data, we can either use the labeling software used by clinicians to remove these structures or use an in-house "intelligent manual segmentation tool" such as the one designed by Barrett et al. [57].

CONCLUSION
This paper presented a novel clinical application and quantitative evaluation of a recently introduced multiphase level set segmentation algorithm using T1-weighted brain MRIs. The segmentation algorithm performed an optimal partitioning of a three-dimensional data set based on homogeneity measures that naturally evolved toward the extraction of different tissue types and cortical structures in the brain. Experimentation on ten MRI brain data sets showed that this optimal partitioning successfully identified regions that accurately matched WM, GM, and CSF areas. This suggests that by combining the segmentation results with fiducial anatomical seed points, the method could accurately extract individual structures from these tissues. Random initialization was used to speed up the numerical calculation and avoid the need for a priori information input. A regular initial partitioning of the data added some robustness to the presence of local minima during the optimization process. Comparison to three other segmentation methods was performed with individual assessment of segmentation performance, statistical comparison of the performance, and evaluation of the statistical difference between the methods. Results showed very high quality and stability of the M3DLS method. Future work will include postprocessing of the segmented volumes to extract individual structures such as the ventricles as well as incorporation of available coregistered T2-weighted MRI and PET data to improve the segmentation performance by running the algorithm on vectorial-type data. Such extension of the method has been proposed for color images but never for multimodality clinical data.