Validation of Alternating Kernel Mixture Method: Application to Tissue Segmentation of Cortical and Subcortical Structures

This paper describes the application of the alternating Kernel mixture (AKM) segmentation algorithm to high resolution MRI subvolumes acquired from a 1.5T scanner (hippocampus, n = 10 and prefrontal cortex, n = 9) and a 3T scanner (hippocampus, n = 10 and occipital lobe, n = 10). Segmentation of the subvolumes into cerebrospinal fluid, gray matter, and white matter tissue is validated by comparison with manual segmentation. When compared with other segmentation methods that use traditional Bayesian segmentation, AKM yields smaller errors (P < .005, exact Wilcoxon signed rank test) demonstrating the robustness and wide applicability of AKM across different structures. By generating multiple mixtures for each tissue compartment, AKM mimics the increased variation of manual segmentation in partial volumes due to the highly folded tissues. AKM's superior performance makes it useful for tissue segmentation of subcortical and cortical structures in large-scale neuroimaging studies.


INTRODUCTION
Current magnetic resonance image (MRI) studies investigate abnormalities of cortical and subcortical structures in neurodevelopmental and neurodegenerative disorders.These studies require a delineation of a region of interest (ROI) by manual segmentation by an expert rater. For example, studies on Alzheimer's disease and mild cognitive impairment examine the hippocampus [1] while those in schizophrenia have studied the occipital lobe and prefrontal cortex [2,3]. Once the ROI is defined, segmentation into tissue types such as gray matter (GM), white matter (WM), or cerebrospinal fluid (CSF) can assess subtle volume changes caused by disease [4,5]. While manual segmentation would provide gold standard, it is labor intensive limiting the number of subjects in any study [6]. Also, the rater needs to be trained to ensure small inter-or intrarater variation. Therefore, it is necessary to develop a method that allows for efficient processing of large number of subjects with high interor intrarater reliability, thereby increasing statistical power. Such a method will facilitate greater understanding of shape change in networks of cortical structures implicated in neuropsychiatric diseases [7,8].
A variety of methods have been proposed for the segmentation of subcortical tissue such as the hippocampus [9][10][11] and cortical tissues such as prefrontal cortex, cingulate cortex, and planum temporale [12][13][14][15][16]. However, even though tissue classification methods have been improving in their performance, relatively low accuracy (comparing with expert-rater standards) has prevented accurate structural segmentation, for example, distinguishing the hippocampus from surrounding structures in the medial temporal lobe.
Partial volume voxels containing multiple tissue types present challenges to traditional Bayesian tissue classification methods [17][18][19][20][21][22][23][24] that model each tissue type as a fixedbandwidth, single Gaussian in mixture-of-Gaussian models. Priebe et al. [25] proposed an alternating Kernel mixture (AKM) method which allowed for the flexibility of a Gaussian mixture model, with bandwidth, and the number of Gaussians selected adaptively from the data for each tissue type. The purpose of our study was to compare the performance of AKM and traditional Bayesian methods. The two methods were compared by determining which method was closer to the manual segmentation (ground truth) of cortical and subcortical structures in MRI subvolumes acquired from 1.5T and 3T scanners.

Journal of Biomedicine and Biotechnology
The manuscript is organized as follows. Section 2 describes the AKM mixture modeling methodology in detail and other Bayesian-based segmentation methods. Section 3 describes the dataset being investigated and image analysis employed. Section 4 reports the results.

Alternating Kernel mixture method
Priebe and Marchette [26] and James et al. [27] introduced a semiparametric solution to the problem of estimating the common probability density function for multiple identically distributed random variables. Their solution is an iterative one that combines parametric and nonparametric estimates with a resulting model that incorporates both the complexity and the smoothness of the data.
We applied this method to the problem of MR segmentation. Gaussian mixture modeling is a popular segmentation technique. The marginal probability density function for the observations is where C := {C, G, W} is the set of tissue types (CSF, GM, and WM), f c are the class-conditional marginals, and π c are class-conditional mixing coefficients. These coefficients are nonnegative and sum to unity. Thus, the image is the sum of the three tissue types. Each class-conditional marginal is a mixture of normals given by where π ct are the strictly positive, class-specific mixing coefficients, which sum to one, and ϕ ct are the Gaussian probabilities with a mean of μ ct and a variance of σ 2 ct . Combining these equations we see that the marginals are given by The method estimates the class-conditional mixture complexities k c , the mixing coefficients π c , and the mixture components ϕ ct . The Expectation-Maximization (EM) algorithm is used to estimate the means and variances of the components [18,19]. The mixture complexities are estimated from the data.
The method alternates between parametric finite mixture estimates and nonparametric Kernel estimates. Each estimate is based on the previous one of the opposite type. The first step of the algorithm is to find a parametric estimate and a nonparametric estimate of the data. Then, at each iteration, a parametric estimate that minimizes the distance between the two previous estimates is computed. Using the parameter estimates thus derived, a nonparametric estimate is found. This continues until the distance between two consecutive parametric estimates is smaller than a desired constant.

Case
Classification The filtered Kernel estimate (i.e., the nonparametric estimate), with bandwidth b, is where X = {X 1 , . . . , X n } is the subject's MR voxel observation, σ 2 t is the variance of the tth component of the mixture, and ϕ 0 is the standard normal with zero mean and unit variance. The nonparametric estimates are each based on the parametric estimate from the previous iteration and are given by where F k is the class of k-component Gaussian mixtures, and is the integrated squared error.
To actually classify voxels, the Bayes plug-in classifier is used: where x is the voxel to be labeled. The label is assigned to a class based on which one maximizes posterior probability of class membership. This can also be seen as a likelihood ratio test procedure given by Tissues are then classified according to Table 1. This method results in the voxels being classified into three categories. Priebe et al. [25] showed how a training set could be used to determine the number of components for each tissue. However, the focus of this paper is on how this could be done on a case-by-case basis using visual inspection. It was found that two or three components of CSF, GM, and WM produced the best result; in a couple of cases the complexity was better modeled with four components. Nayoung A. Lee et al.

Bayesian segmentation
For comparison, voxels are classified into three tissue types by Bayesian segmentation: where I n is the image intensity, h n is the anatomical label, μ n is the mean, and σ 2 n is the variance of the Gaussian density. The algorithm is where π(h n ) is the prior distribution that represents the relative amount of each of the tissue types and H := {C, G, W}. As with AKM, the EM algorithm is used to estimate the means and variances of the three tissues [18,19].

Neyman-Pearson recalibration
Bayesian segmentation can be extended to two additional classes for C/G and G/W partial volumes which are optimally determined by [28] p I n | h n = G p I n | h n = CSF at each voxel. Here, the four thresholds (θ 1 , . . . , θ 4 ) are determined by the five Gaussians. Thresholds are selected to minimize the misclassification error (Section 3.5) such that The means then are used to recalibrate the segmentations yielding new thresholds. This is referred to as Neyman-Pearson recalibration.

Data acquisition
Four different sets of ROIs were extracted from subjects scanned via the magnetization prepared rapid gradient echo sequence on different scanners. Two came from a 3T scanner (10 hippocampi [29] and 5 pairs of left and right occipital lobes [30]); two came from a 1.5T scanner (10 hippocampi [31] and 9 prefrontal cortices [31]). Processed datasets were reformatted to 8 bit and interpolated to 1 × 1 × 1 mm 3 isotropic voxels except for the prefrontal set with a resolution of 0.5 × 0.5 × 0.5 mm 3 , and are available at http://www.cis.jhu.edu/data.sets/index.html.

MRI subvolumes
To obtain a smaller ROI around a hippocampus, we manually outlined hippocampus and dilated it by 3 × 3 × 3 mm 3 cubes with three iterations to generate a mask via BLOX (http://sourceforge.net/projects/blox/). Figure 1 shows an example of the mask generated for a left hippocampus. The prefrontal cortex [31] and occipital lobe [30] subvolumes were defined by an expert neuroanatomist.

Manual tissue segmentation
The 39 subvolumes were hand segmented into CSF, GM, and WM tissue compartments by three different raters in independent studies (e.g., [30,31]) and blind to the autosegmentation. Segmentation was done by visual inspection on contiguous sagittal slices on Analyze software [32] and saved as Analyze image data with labels for CSF, GM, and WM.

Automated tissue segmentation
AKM and Bayesian segmentation were applied to the 39 subvolumes. For comparison, FreeSurfer [33] and Brain-Voyager [34] were used to segment the hippocampi and occipital lobes, respectively. Neyman-Pearson segmentation was applied to prefrontal cortex. The EM algorithm [18] ensured that computations were done in real time. The 10 hippocampus subvolumes from the 1.5T scanner were processed by FreeSurfer [33] to segment and label the volume by its anatomical structure. Each voxel was classified by a given anatomical label (i.e., hippocampus, ventricles). Then we group the structures into WM, GM, and CSF to create WM, GM, and CSF masks. Lateral ventricle and left inferior lateral ventricle were categorized as CSF. Cerebellum-exterior, hippocampus and amygdala were grouped as GM and cerebral white matter, thalamus proper, putamen, ventral diencephalon, and WM hypointensities were grouped as WM.

Quantification of segmentation accuracy
Segmentations were compared via the L 1 distance between two distributions as a measure of misclassification error. A cost is assigned to each labeled voxel. If it was labeled correctly, that cost is 0, and if labeled incorrectly, that cost is generally 1. This cost, called the L 1 distance, is where p A (h n | I n ) is the posteriori probability of hypothesis h n at voxel n for the automated, p M (h n | I n ) is the same for the manual segmentation, and m is the number of tissue types [18,19,28,35]. The distance measures agreement between segmentations based on distance between probability distributions [36]. The standard overlap measures penalize small objects assuming that most of the error occurs at the boundary of objects thus L 1 distance is more appropriate for assessing 3D segmentation [37]. Another standard measure, the Dice measure, was also used [38].    Figures 2, 3, 4, and 5 show that L 1 distances for AKM method are lower than those Nayoung A. Lee et al.  for Bayesian and other segmentation methods (P < .005, Exact Wilcoxon signed rank test). Lower L 1 distances mean that AKM segmentation have more overlap with manual segmentation than other methods. Dice measures for AKM were consistently smaller than other methods. for manual segmentation voxels are similar to those for AKM segmentation. Vertical lines are threshold intensity values calculated from AKM method. Manual segmentation histograms show that each tissue type has wide range of intensities thus resulting in large overlaps between tissue types due to partial volume problems where the boundaries between tissue types are not obvious. Figure 7 shows how the large tails for each tissue types is captured by AKM yielding more accurate threshold values compared with the single Gaussian approach. Further, Table 3 and Figure 3 shows that AKM models the partial volume better than Neyman-Pearson; note that Neyman-Pearson yielded larger errors than Bayesian since the recalibration was based on the averaged thresholds. Finally, Figures 8, 9, and 10 show views of the AKM segmentation of hippocampus, prefrontal cortex, and occipital lobe subvolumes.

CONCLUSION
This paper describes an algorithm that models each tissue type in brain MRI subvolumes as a semiparametric mixture of Gaussians. The classification method which uses this algorithm results in better segmentation than a traditional, single-component Bayesian method especially when there is not enough CSF or WM in the subvolume. Human raters are good at segmenting partial volume voxels by adapting to the high variance of intensities in these regions. AKM is also able to adaptively select the bandwidth and the number of Gaussians for each tissue type. Thus, AKM approximated the manual segmentation more closely compared to Bayesian methods. AKM can automatically delineate cortical and subcortical structures which can be distinguished by intensity information. However, there are structures that cannot be segmented by intensity alone. For example, anterior boundary of the hippocampus merges with the amygdala which has similar intensity [39] or the anatomical boundary of prefrontal cortex and occipital lobe has to be defined with spatial information. For these structures, AKM may be useful when combined with mapping and image registration approach. Also, AKM can be applied to other imaging modalities of other anatomical structures, such as segmenting myocardium, blood, and bone in a cardiac CT scan.