Local and Deep Features Based Convolutional Neural Network Frameworks for Brain MRI Anomaly Detection

,


Introduction
Computer diagnostic systems (CADs) [1] have grown significantly in recent years to accurately diagnose diseases [2].Furthermore, modern artificial intelligence techniques, such as deep learning algorithms using computer vision-based diagnostic tools, are key components of these advances [3].One of the fundamental challenges of CAD systems is accurate brain disease detection.Accurate and early detection of brain diseases can improve the healing process and control the condition of patients [4].Currently, the most successful technique of diagnosis is medical imaging [5].Magnetic resonance imaging (MRI) is more effective in diagnosing brain diseases than other imaging techniques such as CT scans and X-rays because of its better resolution in soft tissue [6].Different physicians may present conflicting diagnostic results due to environmental factors and manual interpretation, which lead to the loss of large amounts of information in MRI data.erefore, due to the fast development of CAD systems in the field of computer vision and deep learning, it can be an effective tool to help physicians to increase diagnostic accuracy [7].Nowadays, scientists are presenting different types of brain MRI classification methods.
e problem of diagnosing brain abnormalities is essentially a challenge of MRI image classification.In general, these types of research are categorized into two main groups, multiple and single label classification algorithms.
e multilabel classification studies are focused on detection of types of brain diseases.For example, in the field of multilabel classification, we can refer to the diagnosis of Alzheimer's disease (AD) by MRI images.e research dataset for this field of study consists of MRI images with Mild Demented, Moderate Demented, Non-Demented, and Very Mild Demented classes.For instance, in, [8], the authors present Alzheimer's disease detection builds on the deep learning algorithms.e research utilized a weakly supervised learning (WSL) technique with name of ADGNET model.Results of these studies show that these proposed systems achieved significant results with 99.61% accuracy rate in detection of types of Alzheimer's disease.Furthermore, in the binary classification scenario, morphometrics and deformation-based approaches are studied to draw a pattern of structural changes in the brain.In addition, they focused on binary classification, which distinguishes an abnormal brain pattern from a healthy one based on MRI or CT images.Due to a variety of brain diseases that do not have visual data sets and have not been studied in deep learning algorithms, detection of abnormal brain pattern is more applicable than diagnosing the disease.For instance, U-Transformer-based anomaly detection framework (UTRAD) algorithm is proposed [9] for abnormality detection of medical images such as head-CT, brain MRI, and retinal-OCT.In addition, the UTRAD algorithm consists of reconstruction-based methods and pretrained feature-based methods.Another similar study, the attention-based deep ensemble model is proposed in [10] for brain age estimation and anomaly detection.In this study, the ensemble of the attention-based residual network with uncertainty estimation is employed for fetal brain anomaly detection from MRI images.In the same manner, our proposed approach relies on brain anomaly detection from MRI images.
According to our studies, the main challenge of deep learning methods in case of brain abnormally detection for classification can be categorized into number of groups.Due to lack of training dataset and overfitting problems, literature studies try to utilize transfer learning, extraction robots feature sets, and image data generation tools to improve accuracy of classification algorithms.
Based on our category, the first group of studies is focused on the transfer learning methods as a primary solution for these problems.For instance, CNN-based approach [11] based on ResNet-50 model by employing transfer learning with crop normalization preprocessing steps is proposed.In the same paper, the inception V3 and VGG 16 pre-trained models are compared with ResNet-50-based approach in case of accuracy of detection of brain abnormalities.Another similar study for brain abnormality detection proposed [12] based on a mobile net pre-trained model as a deep feature extraction tools and for classification employed feedforward networks with the chaotic bat optimization algorithm.In case of deep feature extraction with pre-trained well-known models, another study [13] proposed a deep learning framework includes attention and hypercolumn techniques with residual block.e paper presented the BrainMRNet model which includes attention modules and hyper column technique.e attention modules employed for an image augmentation method for select important areas of each image.In addition, the convolutional layers are utilized as a feature extractor technique as hypercolumn techniques for brain tumor [14] detection.Furthermore, in another study [15], pre-trained well known models such as Inception-v3 and DensNet201 are utilized as feature extraction for classification task.In addition, transfer learning methods including Alex Net and Google Net are utilized in this study to enhance approach accuracy performance.Disease classification methods, like research to diagnose brain abnormalities, suffer from a lack of training data sets.For example, in case of Alzheimer's disease due to lack of training data sets, different research are focused on the transfer learning and deep feature extraction.For instance, in [16], the Alex Net framework is proposed to extract significant features effectively from MRI images.In another similar study [17], A temporal convolutional network is designed with multiple deep sequence-based architecture.In case of decreasing processing cost, the depth wise separable convolution (DSC) is proposed in [18].For transfer learning, two well-known models are employed, and significant classification accuracies are obtained, demonstrating the efficacy of the proposed depth wise separable convolutional neural network.
In addition, to achieve high accuracy of classification results with less training data set, another solution using efficient and robust feature set has been proposed.For instance, in [19], radial basis function neural network is proposed with utilized 2D discrete wavelet transform (DWT), and entropy-based feature sets.In another research [20], the naive Bayes method is employed for feature extraction and classification.In the case extraction robust and efficient features some of research employed segmentation and classification algorithms in a pipeline manner.For instance, the study in [21] a proposed deep learning framework which contains segmentation deep learning based model and classification of these segmented features.Another similar study [22] presented based on Gabor-like multiscale texture for segmentation and modification of AdaBoost for classification.
According to our category for articles on the diagnosis of brain abnormalities, another solution is unsupervised brain outliers' detection.For instance, the authors of [23] proposed the MADGAN model includes a different two-step method for brain MRI scans for distinguishing AD. is unsupervised medical anomaly detection utilized generative adversarial network (GAN) model with multiple adjacent brain MRI slice reconstruction technique.In addition, in similar study [24], unsupervised anomaly detection (Ano-GAN) was examined in the administration of value of 1H-MRS a person's brain spectra.
Based on aforementioned studies, it can be concluded that the main problem of deep learning methods in case of brain abnormally detection is the lack of training dataset, extraction efficient and robots feature sets to improve accuracy of classification algorithms.Literature studies are focused on the transfer learning methods as a first solution for these problems.However, transfer learning is an efficient method when fine-tune dataset is similar to main train dataset (ImageNet dataset).In case of brain abnormality detection, the MRI images are gray color image datasets, but ImageNet dataset is RGB color space datasets.e other solution proposed by related studies is utilizing extension dataset functions.
ese kinds of functions increase the 2 Complexity processing cost.us, our proposed approach is based on deep and robust extraction capabilities and improvement of local feature sets. is research proposed three end-to-end deep learning frameworks, namely, directional bite-planes (DBP) [25] with a deep autoencoder model (DAE) [26,27], dilated separable residual convolution network (DSRCN), and multibranch approach for brain MRI anomaly detection.In the proposed DBP-DAE, we analyze the directional and robust features set affection in accuracy of classification.By decomposition of local binary pattern (LBP) into eight bite-planes, the local and direction feature sets are extracted.
For achieving more robust and compact datasets, we utilized DAE.In this approach not only DAE decreases the dimension of feature sets but also it helps to extract more robust feature sets for classification purposes.In the second approach, we proposed DSRCN.e separable residual convolution network is inspired by idea [28] for face recognition.Furthermore, because of extraction more enhance feature set in this type of shallow deep learning approach, we utilized convolutional kernel with different dilated rates instead of standard one.is proposed approach achieved significant results in terms of accuracy because of extraction low-and high-level deep features during the training phase from a MRI image.To explore affection of concatenation of low-and high-level deep features with local direction feature sets, we fusion these features as an end to end model with a name of the multi-branch model.
e main contribution of this study is as follows: (1) We designed three CNN models to detect anomalies in brain MRI images to discuss their advantages and disadvantages and to offer possible solutions to problems of excessive computational complexity and lack of training samples.(2) Numerous experiments are performed on diverse datasets of different types of brain abnormalities, such as tumors and Alzheimer's.(3) We compared our proposed three architectures with existing methods and showed that the proposed methods are competitive with state-of-art methods.
e structure of this paper is as follows: Section 2 introduces proposed system; Section 3 describes public dataset and experimental results and analysis; and Section 4 and 5 give conclusion and discussion.

Proposed Method
In this part, the proposed three approaches describe in detail as follows: directional bite-planes (DBP) with a deep autoencoder model (DAE), dilated separable residual convolution network (DSRCN), and multibranch approach.

DBP-DAE.
e proposed approach in the local feature descriptor part contains directional bite-planes (DBP) [25] with a deep autoencoder model which we named as DBP-DAE.DAE has been used to prevent duplication and redundancy of the DBP feature to achieve robust classification accuracy.e DBP-DAE approach is summarized as below: 2.1.1.Directional Feature Extraction.For each input MRI image f (x, y), the LBP feature descriptor [29] processed is achieved by equation (1).
where i c represents the intensity values and i n represent 8 neighboring pixels (n � 0, . .., 7) at center of windows (x c , y c ). erefore, f LBP (x, y) can be computed as follows: where b n (x, y) implies the nth bit-plane (n � 0, . .., 7) and called DBP. e DBP feature descriptor model contains directional information of each MRI input image as presented in Figure 1 [25,29].e location information of each center pixel from the surrounding pixels is presented as follows: x n , y n  � x c + cos θ n , y c + sin θ n . (3)

Deep Autoencoder.
To reduce feature redundancy and decrease processing cost, we have used four among eight bitplans in the DBP-DAE approach.To reduce the dimensions of DBP, we utilized deep autoencoder (DAE).As presented in Figure 2, the autoencoder is a type of the neural network with a symmetric structure with the equal number of units in the input and output layers.e main advantage of this structure extracts and learns abstract features from input images.e DAE extracts and learns abstract features from input images than by feeding these features to logistic regression on the top of this deep model.To use DAE as a dimensional reduction and part of the deep learning method for detecting brain abnormalities, the DAE training is employed.In this phase, the DAE model by training based on the input DBP learns the deep feature in a hierarchical mode.To present a nonlinear mapping of input images the activation function f(.) is applied on the basis equation (4).
For estimating the error rate to update the weights, a cross-entropy cost function is applied for reconstruction.A cross-entropy cost function is applied for reconstruction and utilized based on mini-batch size input images, as presented in equation (5).
where D declare as the input feature vector and M defined as mini-batch size regarding input features and reconstruction images (x ik , z ik ) [30].Consequently, partial differentials equations regard to W, b z , b y factors can be determined as follows: where net y ir signifies the input of the i for the Rth nods to the hidden layer and (net z ik ) implies the kth features of the hidden layer for reconstructing.Furthermore, f � is declared as the sigmoid activation function.To implement classification task with DAE, the output layer of the deep model (reconstruction layer) is replaced with the logistic registration classifier and fine-tuning based upon backpropagation method with SoftMax activation function.e probability estimation of input image with I classification is calculated from the following: i e w j R+b j , (    (13) For each pretraining epoch (14) For each mini batch (15) Compute reconstruction (16) z � f(wf(wx  Complexity . . . In this equation, θ is the model parameter and 1/  k j�1 e θt/Jx i is normalization in terms of probability distribution.In case of improving accuracy, the dilated convolution [31] is utilized.is factor describes the stride of dilated convolution kernel during training phase.Assume f: z 2 ⟶ R as a discrete function, also φ r � [− r, r] 2 ∩ z 2 and k: φ r ⟶ R as a discrete filter with size of (2n + 1) 2 .e discrete convolution operative of * can be described as (f * t k)(p) �  s+t�p f(s)k(t).Let l represents the dilation factor.In this manner, the discrete convolution operator * has the following definition: In this case, l refers as a dilated convolution (l -dilated convolution).In this approach, each block of this model contains three SeparableConv2D, one Conv2D, and Max pooling layer.After each layer batch normalization, ReLU activation function is applied.For the blocks of DSRCN, as presented in in Figure 3 Depending on the scenario, the number of blocks and filters changes.Builds on the experimental results in this study, we utilized three blocks of DSRCN for abnormality detection of brain images.e architecture of the proposed approach is shown in Table 1.

Multibranch Approach.
In the last phase of our proposed approach, a multi-input module is designed with two inputs to simultaneously estimate global features and local texture features.ese models have numerous layers that are used to extract features.ese layers are consisting of convolutional, pooling, batch normalization, rectified linear unit (ReLU), SoftMax, and fully connected layers.To extract local and deep features from the input x l , the convolutional layer relies on several kernels with weights w l for each layer l is as represented in the following equation: C l denotes the output feature map obtained by processing the dot products of the kernels and the input with bias b l .
Two key types of the pooling layer are maximum and average pooling.e output of the pooling method P l is the downsampled version of the complete feature map of C l which is depending upon (m, n) as the window filter size: 6 Complexity e last and most important layer is the fully connected layer.Assume that layer l is fully connected, this layer expects m l− 1 1 feature set produce a size of m l− 1 1 × m l− 1 1 as input.e output feature sets of a fully connected layer: where w (l) i,j denotes the weights connect in layer l and the jth feature sets of layers l − 1.As shown in Figure 5, after finetuning the DBP-DAE and networks with the MRI brain image, the classification layer (SoftMax) is removed.For the DSRCN, global average pooling 2D is added on top of the model for dimensional reduction purposes and in the concatenate with a compressed fully connected layer of the DBP-DAE model which contains 200 nods.After a concatenate layer, a fully connected layer with 4096 is applied.In this case, the extracted features of the two deep models are fused together.In the end, the last fully connected layer attaches to SoftMax classification layer for brain anomaly detection describes as [32] extraction features of Local and deep models

Experimental Results
Several experiments with different databases and types of anomalies were conducted to evaluate and present the performance of the proposed method.e experiments were carried out using an 8400 core i5 CPU, 16 gigabytes of RAM, and a NVidia GTX-1050 TI with 4 GB of memory.In this study, all images resized to (256,256) to achieve standard comparation among different datasets.

Dataset.
In this study, two types of data sets are used to detect Alzheimer's and tumor-based abnormalities.To recognize tumor-based abnormalities [33], we used public access databases by specialists, such as physicians and radiologists, obtained from volunteer patients.e database contains 253 images, separated into 98 normal images and 155 tumor images.e quality and resolution of the images is low, and it has been converted to the JPEG format.For anomaly detection in case of Alzheimer's disease, we utilized public access dataset, namely, Alzheimer's classification dataset (KACD) [34].e KACD dataset includes 6400 MRI 2D images separate into four different groups: nondemented, very mild demented, mildly demented, and moderately demented which, respectively, contains 3200, 2240, 896, and 64 data. is dataset is separated into train, validation, and test folders.Some samples of these databases are presented in Figure 6.

Configuration of Proposed Approach.
In this test, with the help of grid search, we analyze different architectures in terms of the number of layers and the size of DAE nodes according to accuracy of classification.e final parameters set for DAE are given in Table 2.
In case of the DSRCN, we utilized different number of blocks with different size of kernels to find optimal architecture for brain anomaly detection.e parameter configuration of model is presented in Table 3 and the structure of the model is as described in Table 3.To provide clear results for both public access datasets, we employed the ROC curve for the proposed approaches in Figure 8.Following the results, it is appeared that the best AUC results in tumor anomaly is for DSRCN with 0.998 and the lowest one is for DBP-DAE with 0.92.However, in Alzheimer's anomaly detection the lowest result is for Multibranch approach which can be the cause of overfitting.Depending on the result, it can be concluded that the proposed DSRCN methods because of extraction deep and local features during training phase can achieve stable and significant results.In addition, the proposed DBP-DAE due to the inability to diagnose Alzheimer's disease indicates that this approach is not able to extract low-level features; in this case, it also affects the accuracy of anomaly detection of the multibranch approach.

Comparison with State-of-the-Arts.
To evaluate the performance of the proposed approaches with existing systems, the accuracy of our proposed and the state-of-theart methods on the KACD and brain tumor databases is listed in Table 5. e stat-of-art methods listed in the table are implemented in MATLAB and KERAS platforms with the same configurations mentioned in the research.Based upon the experimental results, it seems that the diagnosis of Alzheimer's anomaly is more difficult than the diagnosis of a brain tumor.e highest accuracy for this type of anomaly is for BrainMRNet, ResNet-50 (augmentation), and Mobile Net-ELM-CBA, which are 0.94.0.93 and 0.88 accuracy rates, respectively.Based on these findings, the DSRCN algorithm stayed on the best results with 0.95 accuracy rates.In case of brain tumor detection, the highest accuracy rate is for BrainMRNet which is equal to our proposed approach (DSRCN) with 0.96 accuracy rates.In this data set, the least accuracy is achieved with the multibranch approach, which can be due to the overfitting problem.Considering the accuracy results of KACD and brain tumor MRI images, it can be concluded that the proposed DSRCN approach, considering the extraction of image features by the end-toend method, obtains stable results in different conditions.In addition, the proposed DBP-DAE approach, despite better classification accuracy than other methods such as the Naive Bayes with ELM, Mobile Net-SNN-CBA and CNN, failed to achieve consistent results in different scenarios using local extraction features.In the case of the multibranch deep learning approach achieved the lowest accuracy in small size of database (Brain tumor) after the naive Bayes method with ELM.One of the reasons for the inefficiency of this approach is the overfitting problem.

Discussion and Conclusion
Most importantly, our CNN approaches achieved outstanding performance without the use of additional data or training functions, comprehensive data enhancement, or segmentation algorithms.It is predictable that in the future, these approaches will succeed in processing large databases (big data).Experimental results showed that the DSRCN method significantly improves the performance of detecting brain abnormalities in different database sizes.In addition, by extracting deep and large features from the input image, this model solves two main problems such as lack of training data and overfitting.In the DBP-DAE approach, due to the use of robust local features, the performance of the method in the brain tumor image dataset is considerable.is is due to the inability to extract deep features which includes more details features from images.In case of local and deep features the proposed approach achieve remarkable results in KACD dataset.It is appeared that this model by extracting deep and local features achieved outstanding accuracy results in high size of datasets.However, in the small data set, insufficient results were obtained due to overfitting.Table 5: Comparison of the proposed approach with state-of-theart algorithms.

Figure 1 :
Figure 1: Directional bite-planes of a brain MRI image.
, two layers of separable Conv 2D and Conv2D with dilation rate (1,1) are connected to the input layers.Separable Conv 2D contains 64 (1 × 1) filters with batch normalization and ReLU function.In the same manner continuously, two separable Conv 2D with filter size 64 (3 × 3) with dilation rate (2, 2) and 64 (1 × 1) with batch normalization and ReLU function are employed.e last separable Conv 2D layer connected to Max pooling 2d (3 × 3) layer.In the end of each block Max pooling 2D concatenated to normalized output of Conv 2D. e second and third block architecture is same as the first block architecture with 128 and 256 filter size, respectively.e main advantage of this architecture is the extraction of low-level features along with deep features as presented in Figure 4.

Figure 4 :
Figure 4: Deep and low-level features extracted from the DSRCN.

Figure 7 :
Figure 7: Confusion matrix of proposed three CNN approaches in two public dataset.(a) KACD dataset.(b) Tumor dataset.

Table 4
concern that DBP-DAE may not be able to perform with significant accuracy in diagnosing Alzheimer's disease.On the contrary, in the field of diagnosis of brain tumor anomalies, acceptable results have been obtained by this algorithm.eDSRCN model achieved the best

Table 1 :
Architecture of the DSRCN.

Table 2 :
Parameters of DAE for brain anomalies.

Table 3 :
Parameter configuration of the DSRCN.

Table 4 :
Accuracy results of proposed approaches based on KACD and brain tumor.