Brain tumors can appear anywhere in the brain and have vastly different sizes and morphology. Additionally, these tumors are often diffused and poorly contrasted. Consequently, the segmentation of brain tumor and intratumor subregions using magnetic resonance imaging (MRI) data with minimal human interventions remains a challenging task. In this paper, we present a novel fully automatic segmentation method from MRI data containing
Although brain cancers are less prevalent, they are very lethal. Among them, gliomas are the most common brain tumors. They can be graded into low-grade gliomas (LGG) and high-grade gliomas (HGG), with the latter being more aggressive and infiltrative than the former [
Compared to CT, MRI or contrast-enhanced MRI becomes the imaging modality of choice for diagnosis and treatment planning in the brain because of its sensitivity and superior image contrast in soft tissues. However, the multiplicity and complexity of the brain tumors under MRI often make tumor recognition and segmentation difficult for radiologists and other clinicians [
In the past few decades, significant research efforts in the computer vision and image processing community have been devoted to developing computer-aided systems that can be used for automated tumor characterization/classification [
Based on MRI data, our primary goal of this paper was to propose a new fast and accurate computer system that could first localize complete tumor region and then segment the more detailed intratumor structure. Our computer system contained two major steps. First, by leveraging an FCN [
The paper is structured as follows: Section
In recent years, many methods have been proposed to automatically segment brain tumors based on MRI data. These methods can be largely divided into two categories: (1) hand-crafted feature and classifier methods based on traditional machine learning such as support vector machine (SVM) and random forests (RF) [
Methods in the first category use manually extracted features, and these features are input to classifiers. In other words, once these hand-crafted features are solely determined by human operators, classifiers “weigh” them during the training but cannot modify these features in any way. One significant concern of hand-crafted features stems from the fact that these features could have significant inter- and intrauser variability. A brief summary of these methods can be found in Table
A summary of brain tumor segmentation methods based on traditional machine learning. Only methods using MRI data were included in this table.
Number | Publication | Database | Summary of method | Performance | |
---|---|---|---|---|---|
1 | Corso et al. [ |
20 cases of |
A hybrid method combining an affinity-based segmentation method with a generative model | 0.62–0.69 (Jaccard) | |
2 | Hamamci et al. [ |
Synthetic data from Utah + |
A cellular automata method combining a probability framework | 0.72 (DICE complete tumor) | |
3 | Mehmood et al. [ |
BrainWeb data + |
A novel saliency model for lesion localization and an N-cut graph segmentation model for classification | 83%~95% (classification accuracy) | |
4 | Havaei et al. [ |
MICCAI-BRATS 2013 dataset | Hand-crafted features + a support vector machine | 0.86 (DICE complete tumor) | |
5 | Usman and Rajpoot [ |
MICCAI-BRATS 2013 dataset | Automated wavelet-based features + a random forest classifier | 0.88 (DICE complete tumor) | |
6 | Tustison et al. [ |
MICCAI-BRATS 2013 dataset | Combine a random forest model with a framework of regularized probabilistic segmentation | 0.88 (DICE complete tumor) | |
7 | Zikic et al. [ |
40 multichannel MR images, including DTI | Decision forests using context-aware spatial features for automatic segmentation of high-grade gliomas | GT: 0.89 |
AC: 0.84 |
(10/30 tests) | |||||
8 | Pinto et al. [ |
MICCAI-BRATS 2013 dataset | Using appearance- and context-based features to feed an extremely randomized forest | 0.83 (DICE complete tumor) | |
9 | Bauer et al. [ |
10 multispectral patient datasets | Combines support vector machine classification with conditional random fields | GT: 0.84 |
NE: 0.70 |
(Intrapatient regularized) |
In contrast, methods in the second category can self-learn the feature representations adapted to a specific task from training data. Recently, deep learning neural networks, especially CNNs, are rapidly gaining their popularity in the computer vision community. This trend has certainly been accelerated after the recent record-shattering performance of the CNN in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [
A summary of brain tumor segmentation methods based on deep-learning neural networks. Only methods using MRI data were included in this table.
Number | Publication | Database | Summary of method | Performance (DICE) | ||
---|---|---|---|---|---|---|
Complete | Core | Enh | ||||
1 | Urban et al. [ |
MICCAI-BRATS 2013 dataset | 3D CNN with 3D convolutional kernels | 0.87 | 0.77 | 0.73 |
2 | Zikic et al. [ |
MICCAI-BRATS 2013 dataset | Apply a CNN in a sliding-window fashion in the 3D space | 0.84 | 0.74 | 0.69 |
3 | Davy et al. [ |
MICCAI-BRATS 2013 dataset | A CNN with two pathways of both local and global information | 0.85 | 0.74 | 0.68 |
4 | Dvorak and Menze [ |
MICCAI-BRATS 2013 dataset | Structured prediction was used together with a CNN | 0.83 | 0.75 | 0.77 |
5 | Pereira et al. [ |
MICCAI-BRATS 2013 dataset | A CNN with small 3 × 3 kernels | 0.88 | 0.83 | 0.77 |
6 | Havaei et al. [ |
MICCAI-BRATS 2013 dataset | A cascade neural network architecture in which “the output of a basic CNN is treated as an additional source of information for a subsequent CNN” | 0.88 | 0.79 | 0.73 |
7 | Lyksborg et al. [ |
MICCAI-BRATS 2014 dataset | An ensemble of 2D convolutional neural networks +doing a volumetric segmentation by three steps | 0.80 | 0.64 | 0.59 |
8 | Kamnitsas et al. [ |
MICCAI-BRATS 2015 dataset | Using 3D CNN, two-scale extracted feature, 3D dense CRF as postprocessing | 0.85 | 0.67 | 0.63 |
However, the above-mentioned CNN methods were all based on the patch-wise method in which (medical) images were often divided into patches during the training and testing. The advantage of this method was that it could take advantage of the existing classification model of the natural image and solve the problem of the class label imbalance in MRI images. Despite its popularity, operating on image patches was computationally time-consuming. Recalling, given a typical image size (e.g., 256 × 256), a large number of patches (65535) were required as inputs for prediction. Furthermore, this method was not end-to-end and performed the segmentation task by independently classifying the central pixel of a patch, which will result in some errors and need postprocessing. Thus, the expensive computation and postprocessing become the bottleneck of its real-time clinic application.
Recently, Shelhamer et al
Our main contribution of this work is to propose a hybrid cascaded neural network for the purpose of segmentation of brain tumors including segmentation of intratumor subregions, from MRI data. This model consists of one FCN and one CNN. This combination enables us to perform pixel semantic predictions by taking advantage of both a pixel-wise method and a patch-wise method. Formally, in this cascaded neural network, an FCN was first used to localize the tumor region from an MRI slice and then a CNN with deeper architecture and smaller kernels was used to classify brain tumor into multiple subregions. This approach can not only obtain the better segmentation accuracy but can also speed the prediction efficiency.
The starting point of the proposed system is
An illustrative overview of the proposed deep cascaded convolutional neural network for a fast and accurate tumor segmentation.
More specifically, the architecture of the proposed system includes an FCN followed by a CNN which accompanies small convolution kernels (see Figure
We modified the FCN-8s architecture [
An illustration of the architecture of the TLN subnet for pixel-wise prediction.
Parameters used in the subnet TLN. In each convolutional layer, the feature maps had been padded by 1 prior to the convolution so that all intermediate feature maps do not change their sizes before and after the convolution.
Number | Layer name | Filter size | Stride | Number of Filters | Output |
---|---|---|---|---|---|
1 | Conv 1_1 + ReLU | 3 |
1 | 64 | 438 |
2 | Conv 1_2 + ReLU | 3 |
1 | 64 | 438 |
3 | Max pooling 1 | 2 |
2 | — | 219 |
4 | Conv 2_1 + ReLU | 3 |
1 | 128 | 219 |
5 | Conv 2_2 + ReLU | 3 |
1 | 128 | 219 |
6 | Max pooling 2 | 2 |
2 | — | 110 |
7 | Conv 3_1 + ReLU | 3 |
1 | 256 | 110 |
8 | Conv 3_2 + ReLU | 3 |
1 | 256 | 110 |
9 | Conv 3_3 + ReLU | 3 |
1 | 256 | 110 |
10 | Max pooling 3 | 2 |
2 | — | 55 |
11 | Conv 4_1 + ReLU | 3 |
1 | 512 | 55 |
12 | Conv 4_2 + ReLU | 3 |
1 | 512 | 55 |
13 | Conv 4_3 + ReLU | 3 |
1 | 512 | 55 |
14 | Max pooling 4 | 2 |
2 | — | 28 |
15 | Conv 5_1 + ReLU | 3 |
1 | 512 | 28 |
16 | Conv 5_2 + ReLU | 3 |
1 | 512 | 28 |
17 | Conv 5_3 + ReLU | 3 |
1 | 512 | 28 |
18 | Max pooling 5 | 2 |
2 | — | 14 |
19 | Conv 6 + ReLU | 7 |
1 | 4096 | 8 |
20 | Conv 7 + ReLU | 1 |
1 | 4096 | 8 |
We observed that a significant amount of low-level feature details such as location and edge could be lost after convolution striding and pooling. However, these lost features were valuable for semantic segmentation. Thus, two skip connections [
The proposed ITCN includes two convolutional layer groups (3 layers each), two max pooling layers, and three fully connected layers. Recall that the TLN yields a binary tumor map for a given MRI image and the ITCN (see Figure
An illustration of the second subnet ITCN for the intratumoral classification. The classification was done in a patch-to-patch fashion.
A list of parameters used in the proposed subnet ITCN. In each convolutional layer, the feature maps had been padded by 1 prior to the convolution so that the convolution do not change the size of the resultant feature map.
Number | Layer name | Filter size | Stride | Number of filters | FC units | Output |
---|---|---|---|---|---|---|
1 | Conv 1_1 + LReLU | 3 |
1 | 64 | — | 33 |
2 | Conv 1_2 + LReLU | 3 |
1 | 64 | — | 33 |
3 | Conv 1_3 + LReLU | 3 |
1 | 64 | — | 33 |
4 | Max pooling 1 | 3 |
2 | — | — | 16 |
5 | Conv 2_1 + LReLU | 3 |
1 | 128 | — | 16 |
6 | Conv 2_2 + LReLU | 3 |
1 | 128 | — | 16 |
7 | Conv 2_3 + LReLU | 3 |
1 | 128 | — | 16 |
8 | Max pooling 2 | 3 |
2 | — | — | 8 |
9 | FC1 + dropout | — | — | — | 8192 | 256 |
10 | FC2 + dropout | — | — | — | 256 | 128 |
11 | FC3 + softmax | — | — | — | 128 | 4 |
In the ITCN, as inspired by the work of Simonyan and Zisserman [
All numerical experiments were conducted using a Dell workstation equipped with dual Intel E5-2603 CPUs and a middle-end GPU graphic card (GeForce GTX 1080, NVIDIA, CA, USA). The operation system of the workstation is Ubuntu (version 14.04). The proposed cascaded neural network has been implemented using Python (version 2.7) under the framework of Caffe, an open-source deep learning platform (
As recommended by the literature [
Given an image Removed the top 1% and bottom 1% from each slice of the MRI data. For each slice of MRI data
The above-mentioned preprocessing method was used to process each modality MRI data including FLAIR, T1, T1c, and T2. Particularly, the FLAIR images were generated using fluid-attenuated inversion recovery protocol and useful in terms of differentiating the brain tumor from its normal background. Figure
Randomly selected examples of FLAIR slices before (a) and after (b) the above-mentioned intensity normalization.
Each feature map
In our study, the TLN used rectified linear unit (ReLU) function [
In the ITCN, the leaky rectifier linear unit (LReLU) [
To address the multiclassification problem, a well-known softmax function was used to transform the neural network outputs to probability distributions. Softmax is defined as follows:
Given a set of weights of the proposed neural network
In the TLN, predictions were made for each pixel of the input image so that the loss function can be written as follows:
Now referring to the ITCN, the loss function was calculated in conjunction with the concept of mini-batch. Thus, the loss function has the following form,
To achieve better generation ability and avoid overfitting, L2 regularization terms were also added to (
Equations (
In (
To suppress the SGD noise and guarantee convergence, the learning rate
The initial and final learning rates of the TLN model were set to 1
During the training of the TLN subnet, we used the transfer learning technique [
In order to train and evaluate the proposed system, numerical experiments were carried out using
The tenfold crossvalidation method [
The quantitative evaluations were conducted for 3 different tumor regions: complete tumor region (including all four tumor subregions), core tumor region (including all tumor structures except edema), and enhancing tumor region (only including the enhanced tumor structure). For each type of regions, we compute DSC [
DSC measures the overlap between the ground truth and the automatic segmentation. It is defined as
PPV is the proportion of the true positive in all segmentation tumor points. It is defined as
Sensitivity is the proportion of the detected tumor points in all ground truth tumor points. It is defined as
The proposed system was compared with some other published methods. Those methods all have been validated on the BRATS 2015 dataset. A one-step segmentation method based on the FCN-8s was also implemented for the purpose of comparison. The FCN-8s can segment the input MRI images into 5 classes in a single step.
Overall, we found that the proposed system can accurately delineate gliomas. Visual inspections were conducted for testing data to validate the segmentation results of our proposed method. Figure
Representative examples of computer segmentation results of four brain tumors. (a–d) The original FLAIR, T1, T1c, and T2 slices, respectively. (e) The ground truth overlaid with the FLAIR image. (f) Segmentation results overlaid with the FLAIR image. (e, f) Red, green, yellow, and blue colors denote necrosis, edema, nonenhancing tumor, and enhancing tumor, respectively.
Also, the proposed system led to good details around boundaries. Figure
Two slices of computer segmentation result in a testing case: (a–c) the ground truth, results of tumor localization using the TLN subnet, and the intratumor segmentation results using the ITCN subnet, respectively. (a, c) Red, green, yellow, and blue colors denote necrosis, edema, nonenhancing tumor, and enhancing tumor, respectively.
We also found that, as compared to the FCN-8s with one-step segmentation, the proposed system could segment heterogeneous gliomas with a better boundary detail. The results of the proposed method and FCN-8s are compared in Figure
Examples of segmentation results from five typical slices comparing the FCN-8s (b) and the proposed method (c). (a) The ground truth. In this figure, red, green, yellow, and blue colors denote necrosis, edema, nonenhancing tumor, and enhancing tumor, respectively.
The quantitative comparisons with other methods in terms of DSC are summarized in Tables
A summary of DSC quantitative comparison on BRATS 2015 combined dataset (HGG and LGG).
Method | Dataset | Grade | DSC | ||
---|---|---|---|---|---|
Complete | Core | Enh | |||
Pereira et al. [ |
BRATS 2015 Challenge | Combined | 0.78 | 0.65 | 0.75 |
BRATS 2015 Training | Combined | 0.87 | 0.73 | 0.68 | |
Havaei et al. [ |
BRATS 2015 Challenge | Combined | 0.79 | 0.58 | 0.69 |
Kamnitsas et al. [ |
BRATS 2015 Challenge | Combined | 0.85 | 0.67 | 0.63 |
BRATS 2015 Training | Combined | 0.76 | 0.73 | ||
Dong et al. [ |
BRATS 2015 Training | Combined | 0.86 | 0.65 | |
Yi et al. [ |
BRATS 2015 Training | Combined | 0.89 | 0.76 | |
FCN-8s | BRATS 2015 Training | Combined | 0.84 | 0.71 | 0.63 |
Proposed | BRATS 2015 Training | Combined | 0.89 | 0.77 |
A summary of DSC quantitative comparison on BRATS 2015 HGG dataset.
Method | Dataset | Grade | DSC | ||
---|---|---|---|---|---|
Complete | Core | Enh | |||
Pereira et al. [ |
BRATS 2015 Training | HGG | 0.87 | 0.75 | 0.75 |
Havaei et al. [ |
BRATS 2015 Challenge | HGG | — | — | — |
Kamnitsas et al. [ |
BRATS 2015 Training | HGG | — | — | — |
Dong et al. [ |
BRATS 2015 Training | HGG | 0.88 | ||
Yi et al. [ |
BRATS 2015 Training | HGG | 0.89 | 0.79 | 0.80 |
FCN-8s | BRATS 2015 Training | HGG | 0.88 | 0.76 | 0.71 |
Proposed | BRATS 2015 Training | HGG | 0.81 |
Obviously, the proposed cascaded neural network obtains the comparable and better DSC value on all tumor regions. Based on the combined testing dataset (see Table
As can be seen in Table
Recently, we found that Pereira et al. [
A comparison of our proposed method with hierarchical brain tumor segmentation [
Method | DSC | PPV | Sensitivity | ||||||
---|---|---|---|---|---|---|---|---|---|
Complete | Core | Enh | Complete | Core | Enh | Complete | Core | Enh | |
Pereira et al. [ |
0.85 | 0.76 | 0.74 | 0.80 | 0.74 | 0.79 | |||
Proposed | 0.77 | 0.87 | 0.76 |
Additionally, the segmentation speed for testing data was also documented (see Table
Comparisons of segmentation time among six different methods. The estimation of time for the proposed method was based on the acceleration of GPU.
Method | Time |
---|---|
Pereira et al. [ |
8 s–24 min |
Havaei et al. [ |
8 min |
Kamnitsas et al. [ |
30 s |
Dong et al. [ |
2-3 s |
FCN-8s | 0.98 s |
Proposed | 1.54 s |
In this work, a cascaded neural network was designed, implemented, and tested. The proposed system consists of two steps. In the first step, the TLN subnet was used to localize the brain tumor. Then, the ITCN subnet was applied to the identified tumor regions to further classify the tumor into four subregions. We also adopted the advanced technologies to train and optimize the proposed cascaded neural network. Numerical experiments were conducted on 274 patient
Based on quantitative and qualitative evaluations, we found that the proposed approach was able to accurately localize and segment complex brain tumors. We stipulate that there are two reasons. First, the ITCN subnet only represents and subsequently classifies the intratumoral region whereas other methods need to represent and classify all heterogeneous brain tissues. Second, intratumor subregions are usually very small proportions of the entire image. Other neural networks (e.g., FCN-8s) may suffer from the imbalance of different pixel labels. In the TLN subnet, our proposed method merged different tumor subregions into a whole tumor. Thus, the imbalance can be somewhat mitigated. In the ITCN subnet, we adopted the same quantity image patches of each class to train and optimize the model. In the future, deep learning neural networks could be expanded to include histological data and other data to further improve clinical management of brain cancers [
Furthermore, the proposed cascaded neural network can, on average, complete a segmentation task within 1.54 seconds. The proposed TLN subset only requires a forward computation for localizing the whole tumor region in the first step. Then, the ITCN subnet only needs to classify tumor candidate pixels into different class subregions within a much-reduced region located by the TLN, thereby improving the computing efficiency.
The authors declare that they have no conflicts of interest.
This research is funded by Chongqing Science and Technology Commission (Grant no. cstc2016jcyjA0383) and Humanity and Social Science Key Project of Chongqing Municipal Education Commission (Grant no. 16SKGH133). This research is also in part supported by Scientific and Technological Research Program of Chongqing Municipal Education Commission (Grant no. KJ1709210) and Graduate Innovation Fund of Chongqing University of Technology (Grant no. YCX2016230).