Segmentation of Brain MR Images by Using Fully Convolutional Network and Gaussian Mixture Model with Spatial Constraints

Accurate segmentation of brain tissue frommagnetic resonance images (MRIs) is a critical task for diagnosis, treatment, and clinical research. In this paper, a novel algorithm (GMMD-U) that incorporates the modified full convolutional neural network U-net and Gaussian-Dirichlet mixture model (GMMD) with spatial constraints is presented. The proposed GMMD-U considers the local spatial relationships by assuming that the prior probability obeys the Dirichlet distribution. Specifically, GMMD is applied for extracting brain tissue that has a distinct intensity region and modified U-net is exploited to correct the wrong-classification areas caused byGMMDor other conventional approaches.TheproposedGMMD-U is designed to take advantage of the statisticalmodelbased segmentation techniques and deep neural network. We evaluate the performance of GMMD-U on a publicly available brain MRI dataset by comparing it with several existing algorithms, and the results reported reveal that the proposed framework can accurately detect the brain tissue fromMRIs.The proposed learning-based integrated framework could be effective for brain tissue segmentation, which will be helpful for surgeons in brain disease diagnosis.


Introduction
Precise segmentation of human brain tissue from magnetic resonance images (MRIs) can aid in identification and diagnosis of neurological diseases, such as Parkinson's disease and Alzheimer's disease.This is a challenging task because brain MRIs are severely affected by intensity nonuniformity, complex structure, and low contrast during acquisition.During the last decades, a large majority of MRI segmentation algorithms were based on machine learning techniques [1].In general, machine learning can be divided into supervised approaches, semisupervised or unsupervised approaches.For example, both random forests [2][3][4] and support vector machines [5] are typical supervised techniques commonly used in MRI segmentation.Recently, some new unsupervised algorithms have been proposed for extracting the required brain tissue, including grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF) [6,7], but clustering analysis is among the most studied [8,9].It has been found that clustering performance depends heavily on the selection of the initial cluster centres, and the segmentation results are sensitive to noise.The finite-mixture model (FMM) has become another important branch in unsupervised segmentation algorithms.It applies an unsupervised learning method to label the observation data and supposes that the intensity of each pixel obeys the chosen probability distribution.Typical FMMs include the Gaussian mixture model (GMM) [10], Student's- mixture model (SMM) [11], circular mixture model [12], Rayleigh mixture model [13], etc. GMM is widely applied to model uncertainty of the data by assuming the conditional probability as a Gaussian distribution.It is the main model because the classical GMM is simple to carry out, and there are very few parameters that need to be estimated.However, GMM suffers from a problem of modelling data with different shapes and sensitivity to outlets.To overcome these shortcomings, segmentation techniques adopting mixture models with longer-tailed probability distributions have widely received consideration.For example, SMM with the longer tail can be regarded as an alternative to GMM [14].However, one limitation of SMM is that its freedom parameter does not exist in closed form [15]. Another is that it cannot consider the relationships between the local spatial information between neighbouring pixels.An improvement can be obtained by combining the mixture models with the Markov random field (MRF) [16].Additionally, the FMM was incorporated with level set-based methods in some studies [17].The unsupervised methods mentioned above are often subject to various uncertainties, such as the tissue intensity overlaps caused by limitations of the MRI acquisition process.These factors decrease the brain MRI segmentation performance.Instead of employing FMM for modelling the MRI tissue intensity, deep-learningbased algorithms (supervised or unsupervised) have received considerable attention of recently published studies [18].As one of the parts of broader family of machine learning, deep learning has achieved state-of-the-art results in MRI segmentation [18] or other computer vision tasks including image retrieval [19], image classification [20], object detection [21], etc.The convolutional neural network (CNN) is a deep learning architecture inspired by the biological networks akin to the multilayer perceptron.Classical CNN framework consists of three main building blocks: fully connected layers, pooling layers, and convolutional layers.Some recent studies have claimed that it is also suitable for MRI segmentation and brain tissue detection [22,23].However, for the classical CNN, it is a tedious task to recognize image patches.For example, the normal algorithm introduced by Ciresan [24] needs huge amount of time and redundant images for training because the recognition on an image patch acquires a sliding-window operation.Therefore, an end-toend segmentation method called fully convolutional network (FCN) was recently introduced [25,26] that can make pixelwise prediction based on the images' ground truth and can output the label map directly [27].The purpose of the network FCN is to extract critical feature maps and to restore these maps into the image labels.This procedure focusing on features could be more suitable for precise segmentation, especially for medical images [28].Actually, the capability of feature representation is the primary reason that FCN achieves great success on object detection, classification, and segmentation, but often only when sufficient amount of training data is available.However, in medical imaging field, the data acquisition is expensive.Besides, there are some other factors that impact the data availability, like the privacy and regulations/ethical concerns, etc.Recently, U-net was proposed to extend the FCN network by increasing and recycling more feature maps [29], so that U-net can perform well with relatively small amounts of training samples.The earliest U-net [29] was designed for labelling cells from light microscopic images and to yield very accurate segmentation results.Such network usually produces high-precision segmentation mainly because its architecture includes more constructions that copy and overlay down-sampling features.
Motivated by the aforementioned research, in this paper, we addressed a novel fusion framework called GMMD-U, which incorporates the Gaussian-Dirichlet mixture model (GMMD) and modified U-net to accurately analyse brain MRIs in terms of tissue types.The proposed algorithm differs from conventional clustering or FMM algorithms with respect to the following considerations.First, the proposed method merges the advantages of deep learning so that our algorithm has ability to use convolutional network to precisely recognize the uncertainty regions.Second, this paper improves the classical U-net by attaching a padding operation and batch normalization in order to improve the convergence speed.More clearly, CSF makes up a rather low proportion in the brain and its greyscale is very close to the pure black background in most MRIs; the modified Unet should be able to learn and to distinguish the shape of CSF as fast as possible.Third, this paper develops a novel FMM, Gaussian-Dirichlet mixture model that is a modified version of the classical GMM.Comparing with our previous work [30], the proposed GMMD takes the local spatial and intensity information into consideration through Dirichlet distribution so that the performance of the proposed GMMD is insensitive to noise and outlets.Four, in the proposed framework, the majority of pixels belonging to WM and GM can be accurately determined by the proposed GMMD module.The modified U-net is utilized to predict the CSF, as well as the error-prone region done by GMMD or other traditional methods.The experimental results in the Internet Brain Segmentation Repository (IBSR) dataset demonstrate that the proposed framework is superior to several existing unsupervised and supervised models.
The rest of this article is organized as follows.Section 2 addresses the construction of the modified GMM with spatial constraints and details the proposed fusion framework.The experimental results and discussion on the algorithm's performance are given in Section 3. The final concluding remarks are provided in Section 4.

Methodology
The proposed fusion architecture consists of two fully convolutional networks: modified U-net and a Gaussian-Dirichlet mixture model with spatial constraints.In this paper, the finite-mixture model technology is adopted in our fusion scheme.It is mainly because it provides a statistical-based approach to model observed data in a probabilistic manner.Another advantage is that it classifies every pixel of an image into certain labels, while U-net network structure could only label object from the background.More precisely, the output of U-net has only two labels, which means WM, GM, and CSF could not be extracted at once.The aim of improving U-net framework in this study is to correct wrong label regions raised from conventional approaches.Therefore, the proposed approach incorporates finite-mixture model scheme with improved U-net network to extract brain tissue more accurately.We start by introducing modified GMM, which follows a Dirichlet distribution for brain MRI segmentation.

Gaussian-Dirichlet Mixture Model with Spatial
Constraints.To consider the local spatial information between neighbouring pixels, this subsection presents a novel Gaussian-Dirichlet mixture model which is a modified version of GMM in terms of Dirichlet distribution.Assume that each pixel   ,  = (1, 2, . . ., ), is an independent and identically distributed value in a grayscale image.Then, GMM assumes that   is independent of the label Ω  ,  = (1, 2, . . ., ).To partition an image consisting of  pixels into  labels, GMM assumes the density function (  | Π, Θ) at pixel   is as follows: where Θ  = {  ,   ,  2  } denotes the GMM parameters and   and  2  are the mean and covariance of Gaussian distribution, respectively.Π = {  } is the set of prior probabilities of pixel   belonging to Ω  , which satisfies the constraints In (1), (  | Θ  ) is a component of the Gaussian distribution with the form The joint conditional probability density of  = ( 1 ,  2 , . . .,   ) is expressed by Taking the natural logarithm of (4), we have the following maximum logarithmic likelihood function: Next, we introduce the Dirichlet distribution [33] into the classical GMM in order to model the spatial information between neighbouring pixels.First, the probability label is defined as where   = ( 1 ,  2 , . . .,   ) is the th pixel's discrete label taking the following form: Corresponding to (6),   = ( 1 ,  2 , . . .,   ) is the Dirichlet parameters.It also satisfies the constraints Define   = ( 1 ,  2 , . . .,   ) with   ≥ 0 in (6) as the vector format of the Dirichlet parameters.Then, according to method introduced in [31,34], the probability density function (  |   ) can be written in the polynomial form Hence, the probability density function (  |   ) takes the form This paper defines the Dirichlet parameters   to incorporate the neighbourhood spatial information by where   represents the number of neighbour pixels that include   and points around   in a certain window. ()  represents the posterior probability at iteration step .
By (12), one can find that (11) can be regarded as a linear filter, where pixel   is replaced by the mean value of its corresponding window.To eliminate the effect of noise, generally, the size of the filter template can be 3 × 3 or 5 × 5. Taking ( 9) and ( 10), ( 6) can be expressed as According to the property of the probability density function, one has Mathematical Problems in Engineering Substituting (11) and ( 14) into (13) leads to The Dirichlet-based constraints consider the spatial information of neighbouring pixels in the form of linear filtering.More specifically, if considering discrete label   in ( 7) and the Gamma function Γ( + 1) = Γ(), the prior probability of ( 15) with  = 1,   = 1 for each pixel   can be written by In this case, we have the following new log-likelihood function: where the parameter set Ξ is defined as Ξ = {  ,  2  ,   }.It is reasonable to optimize the negative likelihood function because the logarithm is monotonically increasing.Thus, the new loss function is expressed by (Ξ) can also be supposed as the difference expressions Considering  ()  ≥ 0, ∑  =1  ()  = 1, and the Jensen inequality [35], expression (19) can be rewritten as Thus, minimizing the negative log-likelihood function in (18) is equivalent to minimizing the following error function: Next, we apply the gradient descent method [36] for parameter learning.
where  is the learning rate and ∇(Ξ) = (/  , / 2  , /).The partial derivatives of the parameter (Ξ) with respect to  are calculated by Similarly, we take the partial derivative of (Ξ) with respect to  2 as Next, considering the partial derivative of (Ξ) with respect to  yields In detail, the computation process of parameters is summarized as follows.
Step 4. Check the convergence of log-likelihood in (17).If the convergence criterion is not satisfied, increase the iteration  =  + 1, and repeat Steps 2-4.
where  is the original brain MRI and mask  is the ground truth of CSF.All original brain MRIs and mask  images will be fed into the first modified U-net network as the training dataset.This procedure is summarized in Training Model I, shown in Figure 1.To clarify this process, Figure 2 presents several examples to show the visually extracted CSF and   .After that, the brain MRIs (without CSF) are fed into the GMMD as original input images.After obtaining the segmentation results of GM and WM using the proposed GMMD, as shown in the schematic representation of Training Model II, the training set of wrong-classification regions is then acquired from the difference between the regions labelled by GMMD and the ground truth without CSF.For the purpose of clarifying this process, we present four examples for extracting wrong-classification areas in Figure 3.The term 'wrong-classification area' indicates the pixels that are not usually sufficiently distinguished by the classical segmentation methods.In the training phase, as can be seen from Figure 4, we found that the wrong-classification regions are almost always GM.More specifically, it can be found that almost 8 − 9% of GM pixels are mistakenly classified as WM (see Figure 5).In most cases, sample images that are segmented by GMMD have few wrong-classification pixels, except for GM pixels.Due to the small proportion of WM area in the residual region, in this paper, the prediction of the second modified U-net module is regarded as the GM detector to correct the wrong-classification regions caused by GMMD or other classical segmentation algorithms.

Modified U-Net
Framework.U-net [29] is a fully convolutional network that performs excellent in image segmentation.Classical U-net network does not require a huge amount of training sets.What is more, the training time of U-net is relatively short, having a simple structure, and demanding less parameters compared with other network.
In this paper, we improve the classical U-net by attaching a padding operation and batch normalization in order to improve the convergence speed.This is important because batch normalization can allow one to utilize higher learning rates and require less intensive initialization.The whole architecture of this modified U-net network is depicted in       6.This network combines a feature-extracting path for collecting global features and an expanding method to locate pixels belonging to the features.More specifically, the network structure in Figure 6 contains batch normalization next to each 3 × 3 convolution through zero-padding.The ReLU layer, which is the nonlinear activation function, is used instead of the traditional Sigmoid function.The ReLU function only needs a single threshold to activate itself, and moreover can eliminate the complexity of calculation [26].After two continuous convolution operations, max-pooling with stride 2 is attached in the upper structure.Correspondingly, upsampling with stride 2 is operated in the lower structure.Before each max-pooling, the feature maps will be copied and then be transferred to the same location as the upsampling part.At the output layer, a 1 × 1 convolution with the generalized linear soft-max function is applied to calculate the probability of each pixel in the classification.Cross entropy is adopted as the loss function in this network, which describes the distance between the prediction and real values of the proposed model [38].
where   is the logit value from the image-prediction matrix and  is the size of the matrix.Then, substituting   into the equation of cross entropy yields where  represents   and  is the ground-truth value.As illustrated in Figure 1, the proposed model utilizes two modified U-net modules for training the CSF and wrongclassification areas based on the Tensorflow framework [39].The first training model is for CSF detection and the second model is attributed to wrong-classification region prediction.
Here, the wrong-classification region is some pixels that cannot be precisely classified using GMMD.III).We run two trained U-net models on the testing set and produce heat maps to illustrate the probabilities of prediction.In this paper, all testing samples are randomly selected from the IBSR 18 dataset, and the test results are presented in Figure 10.Obviously, the results of prediction are visually similar to the ground truth.This means some regions that GMMD never labels precisely can be predicted by the proposed model.Furthermore, comparing the ground truth (labels) and predicted areas reported in Figure 10(b), one can observe that almost all of the predicted pixels belong to GM because the prediction areas are basically the same as the ground truth of GM.GMMD has an ability to model the shape of the pixel regions better.This is significant because these regions have fewer blur boundaries and patches.Finally, for each MRI, the final segmentation results come from three parts: (1) CSF detected by modified neural network U-net I, (2) GM and WM labelled using the posterior probability of the th pixel of the MRI to class , and (3) GM and WM obtained from uncertain pixel areas via modified neural network U-net II and the wrong-classification prediction module.

Experimental Results
We have evaluated the proposed fusion algorithm on the public brain datasets from IBSR 18.Specifically, we focus on the segmentation of clinical MRI and aim to separate the three parts of the real brain image data that contain CSF, GM, and WM.For experimentation purposes, all MR images from the IBSR 18 dataset are preprocessed by removing blank images and extracting the main brain parts from the skull.[31]), (g) CBCLO (see [32]), and (h) ground truth.Thus, there are 1000 images of size 255 × 255 that are selected from ten random subjects for training, and 300 images are adopted from other three subjects for testing.The groundtruth images have been divided into four parts: background, CSF, GM, and WM.Our experiments have been developed under the environment of MATLAB2016a and the Tensorflow library.
The experiment first runs the proposed framework on several brain MRIs (slice 92, slice 23, slice 40, and slice 96 in ISBR 18).The corresponding results of the four different patients are presented in Figure 11.These results indicate that our supervised network performs well in detecting the CSF and error-classification pixels in a visual medium.In addition, we observe, regardless of which slices are used, that the proposed model can obtain rich details.In the next experiment, the performance of GMMD-U is compared to that of FCM, GMM, K-means, SMM-SC (see [31]), CBCLO (see [32]), and classical U-net by using the brain MRI (slice 70 of one patient) in ISBR 18, and the results are provided in Figure 12.The classical U-net algorithm used here is divided into three parts for dealing with the segmentation task, which contains CSF, GM, and WM.Based on visual comparison with the ground truth, GMMD-U performs better than several other comparing methods.SMM-SC method achieves better segmentation results compared to CBCLO.
In order to make a quantitative comparison of the different algorithms, this paper considers the Dice similarity coefficient [40] to evaluate the performance of segmentation models.The Dice coefficient is extensively accepted to evaluate the performance of segmentation algorithms in the imageprocessing field.The Dice coefficient of each part's prediction and ground truth can be calculated by where   means all pixel labels that belong to the ground truth of the brain image and   represents the result of the predicted labels of each tissue.The Dice coefficient lies in the range [0, 1], and a high value indicates high performance of the segmentation approach.The experiment determines the performance of five algorithms on the testing set. Figure 13 shows the Dice coefficient of the segmentation performance of FCM, K-means, GMM, SMM-SC (see [31]), CBCLO (see [32]), classical U-net, and GMMD-U on the brain MRI of one patient (ISBR 18).It is clear from this figure

Figure 1 :
Figure 1: The architecture of the proposed algorithm in the training phase.

2. 2 .
Training Procedure.In this paper, all experiments are performed on the IBSR 18 dataset[37], which provides manually guided expert segmentation results along with magnetic resonance brain image data.This dataset consists of MRIs and 3D ground truth volume of 18 brains of size 256 × 256 × 128 with 1.5 mm slide thickness.These volumes are provided after skull-stripping, normalization and bias field correction.The ground truth is provided with manual segmentation by experts with tissue labels as 0, 1, 2, 3 for background, CSF, GM, and WM, respectively.Each MRI volume is read, via 256 number of axial brain slices of size 256 × 128 each, in the proposed model.The proposed network structure of the training procedure can be divided into two modified U-net subnetworks and GMMD, as shown in Figure1.To acquire the training set, we define the ground truth of CSF as the binary mask and then obtain the brain MRI without CSF by the following relation:

Figure 2 :
Figure 2: CSF extraction.From left to right: the original images, ground truth, extracted CSF, and brain images (without CSF).

Figure 3 :Figure 4 :Figure 5 :
Figure 3: Examples for extracting wrong-classification areas.From left to right: the brain images (without CSF), segmentation results for WM and GM using GMMD, ground truth without CSF, and wrong-classification regions. 512

Figure 6 :
Figure 6: The architecture of the improved U-net.The number within the block demonstrates the image size and the outside number stands for the quantity of feature maps.

Figure
Figure 6.This network combines a feature-extracting path for collecting global features and an expanding method to locate pixels belonging to the features.More specifically, the network structure in Figure6contains batch normalization next to each 3 × 3 convolution through zero-padding.The ReLU layer, which is the nonlinear activation function, is used instead of the traditional Sigmoid function.The ReLU function only needs a single threshold to activate itself, and moreover can eliminate the complexity of calculation[26].After two continuous convolution operations, max-pooling

Figure 7 :
Figure 7: Network training output.The first column is CSF training epoch (Training Model I), and the second column is GM region training epoch (Training Model II).For each picture, from left to right are the original images, CSF/wrong-classification region, and prediction.(a) and (b) Epoch 10, (c) and (d) epoch 40, (e) and (f) epoch 70, and (g) and (h) epoch 90.
Figure 7 shows some training output epochs of Training Models I and II.As indicated by the figure, our proposed training model achieves higher accuracy with increasing epoch.The curves of loss function versus the number of iterations corresponding to each training procedure are depicted in Figure 8.It can be clearly observed that the loss function presents a steady decrease with increasing iterations, which confirms the effectiveness of the proposed training models.

Figure 9 Figure 8 :Figure 9 :
Figure 9 displays the flowchart of the test procedure using the proposed GMMD-U framework, which contains three parts corresponding to CSF detection (Part I), wrong-classification prediction (Part II), and remaining region clustering (Part

Figure 10 :
Figure 10: Testing procedure; from left to right are the brain MRI, labels from ground truth, and prediction.(a) CSF prediction result using U-net I; (b) wrong-classification area prediction result using U-net II.

Figure 11 :
Figure 11: The results of each part of the whole algorithm architecture.From left to right are the original images, the segmentation results of CSF, prediction of the wrong-classification region, segmentation results using GMMD, proposed method, and ground truth.