BrainNet: Optimal Deep Learning Feature Fusion for Brain Tumor Classification

Early detection of brain tumors can save precious human life. This work presents a fully automated design to classify brain tumors. The proposed scheme employs optimal deep learning features for the classification of FLAIR, T1, T2, and T1CE tumors. Initially, we normalized the dataset to pass them to the ResNet101 pretrained model to perform transfer learning for our dataset. This approach results in fine-tuning the ResNet101 model for brain tumor classification. The problem with this approach is the generation of redundant features. These redundant features degrade accuracy and cause computational overhead. To tackle this problem, we find optimal features by utilizing differential evaluation and particle swarm optimization algorithms. The obtained optimal feature vectors are then serially fused to get a single-fused feature vector. PCA is applied to this fused vector to get the final optimized feature vector. This optimized feature vector is fed as input to various classifiers to classify tumors. Performance is analyzed at various stages. Performance results show that the proposed technique achieved a speedup of 25.5x in prediction time on the medium neural network with an accuracy of 94.4%. These results show significant improvement over the state-of-the-art techniques in terms of computational overhead by maintaining approximately the same accuracy.


Introduction
With the arrival of deep learning (DL), a remarkable change has occurred in the fields related to medical imaging, i.e., magnetic resonance imaging (MRI), computer vision (CV), and many more [1]. A brain tumor is one of the leading causes of death in both males and females. e survival rate of people with a brain tumor is very low, but it can be significantly increased if tumors are detected at an early stage [2]. According to the WHO standard, around 700,000 humans are affected by brain tumors, and since 2019, around 86,000 patients are being diagnosed with this. ere have been 16,830 fatalities due to brain tumors since 2019, and the average survival rate is only 35% [3]. In the USA, during 2021, the estimated cases of brain tumors are 83,570 which include 24,530 malignant and 59,040 nonmalignant tumors. e numbers of deaths occurred during 2020 are 18,600. In 2022, the estimated case of brain tumors will be 700,000 in the United States [4].
Tumors are basically malignant cells that are made by the uncontrolled development of the cancerous cells in any part of the body, whereas if this occurs in the brain it results in a brain tumor [5]. ere are a lot of medical imaging techniques (MITs) used for the detection of diseases, i.e., computed tomography (CT) [6], magnetic resonance imaging (MRI) [7], ultrasonography [8], and many more [9]. Among all these techniques, magnetic resonance imaging (MRI) is one of the best techniques for the detection of brain tumors [10]. is is because it gives detailed information about the size, type, and position of the cell and is also very sensitive to the local changes in the tissue density [11,12]. e images gathered from magnetic resonance imagining (MRI) must be analyzed by the experienced neuroradiologist to check for abnormalities within the brain, which requires a lot of time with manual effort.
is ends up having a drawback of high cost, due to the need of a highly skilled neuroradiologist, and also because of the time-consuming process [13], so automated procedures are proposed by the researchers.
Automatic detection of brain tumors based on CV has been proposed by most of the researchers [14][15][16]. ese techniques sometimes start with the preprocessing step which is generally used to enhance the image to achieve higher accuracy [17]. However, this is not the obvious case as it depends on the situation whether you need to do the preprocessing or not. As many of the researchers skip this part [18], the images are then used for feature extraction. As we explained earlier in the introduction, DL has shown remarkable results in so many fields, i.e., medical and computer vision (CV). e main problem with the deep learning approach is that it required a large amount of data and very high computational power to train the machine. However, this problem has been solved by the arrival of transfer learning (TL) [19]. In the case of transfer learning, the layers of the pretrained model are generally modified in a way that they can be used for specific problems. Typically, it can be performed by the modification of input and output layers to tune them according to your problem. Most of the researchers used several pretrained deep convolutional neural network (DCNN) models for computer vision and medical imaging, i.e., ResNet [20], Inception-V3 [21], VGG [22], and GoogLeNet [23].
With all this work, there is still a need for a lot more work in the phase of detection and classification. To address these problems, in this article, we proposed an optimal deep learning feature fusion for the classification of brain tumors. Our work is carried out in many steps. However, the main focus of this article is the optimization of deep learning features and after that the fusion of them into one matrix. To summarize, the contribution of our work is as follows: (1) Preprocessing of the dataset (normalization and conversion) (2) Selection of the optimal deep features using two algorithms differential evaluation (DE) and particle swarm intelligence (PSO) (3) Fusion of optimal features by serial fusion to obtain the fused optimal feature vector, which is passed to classifiers for the classification is article is organized in the following sequence: Section 2 discusses the previous relevant techniques. e proposed work, which includes database creation, selection of an optimal solution, and fusion, is presented in Section 3. Section 4 presents detailed classification results, and finally, the conclusion of this paper is discussed in Section 5.

Related Work
Brain tumor classification is an important and hot research topic nowadays. Several techniques have been introduced such as deep learning based, best features selection based, and many more [24][25][26]. In the literature, in [14], the authors presented an automated system for MRI-based brain tumor images with the help of machine learning techniques. e dataset they used for this is BraTS2017. ey carried out this whole process in four steps that are preprocessing, segmentation, feature extraction, and then classification. Initially, in the preprocessing phase, they manually removed the skull and reduced the noise with the help of the median filter. In the segmentation, they used Chan-Vese (CV), and then, the features were extracted with the help of a gray-level co-occurrence matrix (GLCM). ey used two classifiers SVM and KNN that are used for the performance evaluation, which outperformed the existing methods. ey achieved an accuracy of about 98.13% for the SVM and 92.30% for the KNN.
A computer-aided diagnosis (CAD) system has been proposed in [26]. To achieve promising experimental evaluation for two different types of datasets, a recognition scheme named multi-level attention network (MANet) that is both cross-channel and spatial attention was proposed.  [27]. is technique extracts the features with the help of discrete wavelet transform (DFT). Later, these extracted features are then passed to the CNN to classify the input MRI images. In the experimental process, they achieved an accuracy of 99.3%.
A deep learning approach is presented in [18] to classify brain tumor disease. e datasets they used for this were BraTS2018 and BraTS2019. To extract the features, they utilized transfer learning to fine-tune the Densenet201 model. Later, they applied the entropy-kurtosis-based high feature value (EKbHFV) and the modified genetic algorithm (MGA) to select the optimal features. Fusion is performed with the help of a nonredundant serial-based approach and then classified by the cubic SVM. ey achieved an accuracy of more than 95%. e authors of [17] examine the performance of multiple deep learning models, i.e., VGG16, AlexNet, GoogleNet, and ResNet50, in terms of their ability to examine the brain tumor. For evaluation, they used the criteria of accuracy and processing time. e result shows that ResNet50 gave the highest accuracy of about 95.8%, and AlexNet has the fast processing of about 1.2 sec which then decreases to 8.3 msec using GPU.
Brain tumor classification by the combination of both machine learning and deep learning approaches is presented in [28]. ey used three different brain tumor classes named glioma, meningioma, and pituitary for classification. To extract the deep features, they utilized transfer learning to fine-tune the GoogLeNet model. e extracted features are then classified with the help of the support vector machine (SVM), K-nearest neighbor (KNN), and softmax.
A computer-aided approach is presented in [16] to classify the brain MRI images. e author considered two classes that are the normal class and tumor class. e proposed technique was named 2D convolution neural network. e evaluation results are compared based on the recall value, F1-score value, and precision value. eir proposed method gave an accuracy of 97%.
In [29], H. A. Khan et al. presented a convolutional neural network (CNN) model for the classification of the brain tumor along with augmentation and image processing. Initially, in the image processing phase, they used canny edge detection to crop the black portion from an image. After this, they performed the data augmentation to increase the number of images in the dataset. ey performed the augmentation by making minor changes in images like rotation, brightness, and flipping. en, they compared their proposed convolution neural network (CNN) model with the pretrained VGG-16, ResNet-50, and Inception-V3 models.
In conclusion, the strategies discussed above primarily aimed to strengthen the extracted features in order to improve the outcome of the presented techniques. ey also demonstrated the significance of classifiers in improving classification accuracy. In the classification phase, the failure to extract the best features and the problem of overfitting are the fundamental shortcomings of these strategies. We focused on the fusion of features for brain tumor categorization in this paper. We concentrated on extracting useful deep learning characteristics. Furthermore, we focused on the problem related to overfitting and then feature selection, with the goal of reducing prediction time without sacrificing too much on the accuracy. We proposed an end-to-end automated technique to address these gaps.

Proposed Methodology
We present a fully automated technique for brain tumor classification in this paper. is study looks at four different types of brain tumors. e following are the steps involved in this implementation: (1) Preprocessing is applied to normalize the dataset and convert images from single to multichannel (requirement of the deep learning model). (5) We apply PCA to the fused vector in order to select the top high variance features. (6) Different classifiers are used to classify these features. Figure 1 shows the block diagram of the proposed approach.

Database Preparation.
In this paper, BraTS2018 is used for evaluation purpose. is dataset is clinically acquired preoperative multimodal MRI scans of glioblastoma (GBM/ HGG) [30]. is dataset is divided into four categories: (a) native (T1), (b) postcontrast T1-weighted (T1CE), (c) T2weighted (T2), and (d) T2 fluid-attenuated inversion recovery (FLAIR). T1 is composed of 28,446 images, T1CE is composed of 28,969 images, T2 is composed of 28,759 images, and FLAIR is composed of 28,413 images. ese images are all in grayscale format with a resolution of 240 × 240 pixels. Figure 2 shows a few examples of images. Table 1 provides a summary of the overall images.

Resnet101 Deep Model.
Deep neural networks have made significant progress in the field of image classification in recent years. A deep model, by definition, is the combination of low-level, mid-level, and high-level features, as well as a classifier. We used ResNet101 [31] to extract deep features in this paper. e VGG19 pretrained network, which is one of the deepest convolutional neural networks (CNNs), inspired this architecture. A CNN model, as previously stated, is made up of many different layers that are connected to each other. ese layers are used for a variety of tasks, such as natural language processing and medical image classification. e convolutional filter size in ResNet101 is 33, and the stride value is 2. Downsampling is performed in the convolutional layers based on the stride value. is network has 347 layers and 379 connections. e input to the network has a dimension of 224 × 224 × 3. e filter size is [7,7], the number of channels is 3, and the number of filters is 64 in the first convolution layer. e filter size in the max pooling layers is 3 × 3, and the stride value is 2. e number of channels and filters in the second convolutional layer is 64. e number of filters in the final convolution layer is 2048, with 512 channels [32]. We get an output vector of dimension N × 2048, where N denotes the number of features, by extracting features from the pool5 layer. Figure 3 depicts the ResNet101 architecture in its entirety.

Transfer Learning-Based Network
Training. In deep learning, data reliance is a severe issue. In comparison to typical machine learning techniques, a large amount of data is needed to train a deep model. e fundamental reason for this enormous amount of training data is that it is necessary to learn hidden patterns. However, a large amount of data is not often accessible for training a deep learning model in a few study domains, particularly in medical imaging. e concept of transfer learning (TL) [33] is to train a model with less data. It is not necessary to train the target model from Computational Intelligence and Neuroscience 3 scratch in TL. Deep transfer learning is defined mathematically as follows. Given a transfer learning task with the following parameters: 〈D s , T s , D t , T t , F t (·)〉, D s represents the source domain, T s and T t represent the learning task from the source and the target, D t represents the target domain, and F t (.) represents the nonlinear function that represents a deep neural network. Figure 4 depicts the model learning process using transfer learning graphically. In this figure, it is shown that the original ResNet101 model is trained on the ImageNet dataset [34], and then, knowledge is transferred using deep transfer learning for retraining this model on the target database. As a target database, the brain tumor database is used. We extract features from the pool5 layer and output a vector of dimension N × 2048. We started with a learning rate of 0.0001 and a minimum batch size of 32 in the learning. Figure 5 depicts the deep learning model's complete training procedure. e following is a description of the figure.

Training Process.
(1) We divide each class such as T1, T1CE, T2, and FLAIR into 50% training and testing images. A      Computational Intelligence and Neuroscience randomized procedure is used to separate these images. (2) e ResNet101 model was trained using deep transfer learning for brain tumor classification. We train and save our model for brain tumor categorization based on the aforementioned methods. e following sections go over the optimization, fusion, PCA, and classification stages in detail.

Feature Optimization.
e classification accuracy is improved by selecting the most optimal set of features from the initial set of features. ese features were chosen for learning from original features with the least amount of loss.
e main advantages are that they improve accuracy, take less time, and eliminate the issue of overfitting. In feature selection, the optimization process entails determining the optimum feasible values depending on the established objective function. Many evolutionary strategies for identifying the closest optimal solution are provided for this aim. We    Computational Intelligence and Neuroscience 5 implemented two algorithms in this article: differential evolution (DE) and particle swarm intelligence (PSO).

Differential Evolution.
e DE is a global search optimization problem-solving evolutionary approach [35]. Because it uses fewer control factors than the genetic approach (GA), this technique is easier to use. It is significantly more effective in the realm of medical imaging because it has fewer control factors. It starts with a set of randomly generated starting values in the search space. e input data are then subjected to mutation and crossover, followed by a selection procedure to establish a new population. e steps involved in this project are listed as follows: Input: original N × 2048 dimensional deep feature vector.
Output: optimal feature vector of dimension N × 1119. (1) Step 1: We initialize the following parameters: (1) Population � 50 (2) Minimum bound and maximum bound (3) Use the following expression to find these bounds: (2) Step 2: We calculate the fitness function, where fine KNN is opted as the fitness function, and the mean square error rate (MSER) is used for the performance evaluation.
Step 3: We perform mutation as shown in the following equation: (4) e following equation is used to define mutation: Step 4: We perform crossover as shown in the following equation: Step 5: We find fitness evaluation and selection. We repeat steps 2, 3, and 4 until the required optimal feature vector is obtained. An optimal feature vector of dimension N × 1119 is obtained as a result.

Particle Swarm Optimization.
Particle Swarm Optimization (PSO) is inspired by swarm behavior such as bird flocks and schooling fish. PSO is basically a populationbased metaheuristic technique [36]. It is an efficient evolutionary algorithm, that is why it is extensively used to solve single or multiple-objective problems [37]. Furthermore, PSO is also a powerful computing tool in terms of speed and memory usage [38].
Particle Swarm Optimization (PSO) works on the basis of 5 steps that are mentioned as follows: Input: original N × 2048 dimensional deep feature vector.
Output: optimal feature vector of dimension N × 1125. (1) Step 1: We perform generation of population as shown in the following equation: Population � p gen gen � 0, 1, . . . , Max gen where b L 11 b L 12 b L 13 . . . . . . b L 1M denotes the particle/ candidate solution. Single individual, i.e., b L 11 called as an agent. "P" basically denotes the population.
(2) Step 2: We calculate the fitness function, where fine KNN is opted as the fitness function, and the mean square error rate (MSER) is used for the performance evaluation. (3) Step 3 (a): We find the local best. We find the local best from the first candidate solution where C is the first candidate solution, else (7) continue (8) end (4) Step 3 (b): We find the global best with the help of the following steps: (1) To find the global best where V 3 is the old speed, C 1 and C 2 are the controlling parameters, ω is the inertia, (lb − b i ) is the local updation, (gb − b i ) is the global updation, C 1 rand(lb − b i ) is the local intelligence, and C 2 rand (gb − b i ) is the global intelligence. All these parameters are combined to generate the updated speed which is V n+1 . (6) e equation to update the position is where X n is the old position and V n+1 is the updated speed. By the combination of these parameters, we achieved the updated position. (7) Step 5: We find fitness evaluation and selection and repeat steps 2, 3, and 4 until the required optimal feature vector is obtained. In the output, an optimal feature vector of dimension N × 1125 is obtained.

Feature Fusion.
Feature fusion is a process in which two feature vectors are combined to get one feature vector, which is more appealing and discriminating than the two input feature vectors. One of the biggest advantages of feature fusion is that it improves the image information in terms of features. In this paper, we implement the serial-based extended (SBE) approach for feature fusion.
Following are the two optimal feature vectors denoted by FV At last, the fused vector is fed into the different classifiers, which produces two outputs: labeled prediction results and numerical results. Figure 6 depicts the labeled results, whereas Section 4 has the numerical results. [39]. is technique is normally carried out to reduce the dimensionality of huge data/feature vectors.

Principal Component Analysis. PCA (principal component analysis) is a dimensionality reduction technique
is is performed because smaller data are easier to understand and analyze, and machine learning and deep learning algorithms can interpret them much more efficiently and rapidly [40]. PCA keeps only those features that carry a massive amount of information. is is accomplished by preserving just those components that have high variance [41].
Following is the optimal feature vector obtained by the optimal feature fusion denoted by FV (FUS) with the dimension of N × 2244. We passed this feature vector to the PCA. e high variance features which we have selected from this are of dimension N × 1000. e detailed discussion about the accuracy achieved and the prediction time speedup is discussed below in the result section.

Results and Discussion
is section covers the detailed discussion about the numerical results we obtained for this work. e dataset that we used for our experiments is 'BraTS2018'. is dataset is clinically acquired preoperative multimodal MRI scans of glioblastoma (GBM/HGG) [30]. In the preprocessing phase, we cleaned the dataset by removing the blank images. After the preprocessing, we ended up having 114,587 images, which we used for the training purpose. We computed the results in multiple steps: (a) with the help of a pretrained deep learning model, we extracted the deep features, and then, we obtained the accuracy by passing them to the different classifiers; (b) to select the optimal feature selection, we first used the differential evolution (DE); (c) then, we used the particle swarm optimization (PSO) and evaluated the results; (d) we fused both of these optimal feature vectors; (e) we compared the results. In the training testing phase, 10-fold cross-validation is used.
Multiple classifiers are utilized in order to compare the accuracies.
ese classifiers are the fine tree, linear discriminant, cubic SVM, boosted tree, bagged tree, subspace discriminant, narrow neural network, medium neural network, and wide neural network. Various performance metrics are used to report the results, such as the accuracy (%), prediction time (sec), sensitivity, precision, FPR, FNR, and area under curve. e hyper parameters that we used for our work were as follows: (1) Epochs � 100 (2) Learning rate � 0.05 (3) Optimizer � stochastic gradient descent (4) Loss function � cross entropy (5) Momentum � 0.7 In order to conduct our experiments, we used Intel Core-i7 6th generation with 16 GB RAM and Nvidia GeForce GTX1070 GPU with 8 GB RAM. We used Matlab 2021a for our simulations.
is section contains the numerical results that we obtained from our experiments. Prediction accuracy of brain tumor disease with the help of original ResNet deep features is presented in  Table 3. Based on the prediction time, the fine tree is executed in the minimum time of 286.34 seconds. e linear discriminant is the second and the bagged tree is the third best classifier with a minimum prediction time of 336.99 and 774.26 seconds, respectively. e accuracies of these classifiers are 7.1%, 0.7%, and 1.2% with a less difference of 4175.46, 4124.81, and 3687.54 seconds, respectively, in the prediction time as compared to the cubic SVM. e confusion matrix of the cubic SVM on original features is also shown in Figure 7. Results gathered after the optimal feature selection by PSO are shown in Table 2. Of all the classifiers, the cubic SVM gives the highest accuracy of about 96.7%. e negative rate of the cubic SVM is 3.3% with a prediction time of 954.87 seconds. e negative rate of the cubic SVM by PSO is the same as that of the original one with a decrease of 3506.93 seconds in the prediction time. e wide  e negative rate of this classifier is 4.2% with a prediction time of 1457.7 seconds. In the case of the medium neural network by PSO, the negative rate is increased by only 0.2% than that of the original one with a decrease of 3448.8 seconds in the prediction time. e prediction time of brain tumor disease with the help of PSO is presented in Table 3. Comparison based on time shows that the linear discriminant will be executed in the minimum time of 62.48 seconds. e fine tree is the second and bagged tree is the third best classifier with a minimum prediction time of 139.28 and 326.44 seconds, respectively. e accuracies of these classifiers are 1%, 7.9%, and 1.3% with a less difference of 892.39, 815.59, and 628.43 seconds, respectively in the prediction time as compared to the cubic SVM. e confusion matrix of the cubic SVM after PSO is also shown in Figure 8.
Results gathered after the optimal feature selection by DE are shown in Table 2. Of all the classifiers, the cubic SVM gives the highest accuracy of about 96.6%. e negative rate of the cubic SVM is 3.4% with a prediction time of 2088.8 seconds. e negative rate of the cubic SVM by DE is increased by only 0.1% than that of the original one with a decrease of 2373 seconds in the prediction time. e wide neural network gives the second highest accuracy of about 96.1%. e negative rate of this classifier is 3.9% with a prediction time of 701.48 seconds. e negative rate of the wide neural network by DE is the same as that of the original one with a decrease of 608.92 seconds in the prediction time. e medium neural network gives the third highest accuracy of about 95.8%. e negative rate of this classifier is 4.2% with a prediction time of 3636.3 seconds. In the case of the medium neural network by DE, the negative rate is increased by only 0.2% than that of the original one with a decrease of 1270.2 seconds in the prediction time.
e prediction time of brain tumor disease with the help of DE is presented in Table 3. Now, if we compare this in terms of time, then the linear discriminant will be executed in the minimum time of 117.11 seconds. e fine tree is the second and the bagged tree is the third best classifier with a minimum prediction time of 159.25 and 460.75 seconds, respectively. e accuracies of these classifiers are 1%, 8.6%, and 1.3% with a less difference of 1971.69, 1929.55, and 1628.05 seconds, respectively, in the prediction time as compared to the cubic SVM. e confusion matrix of the cubic SVM after DE is also shown in Figure 9.
Results gathered after the PSO and DE feature fusion are shown in Table 2. Of all the classifiers, the cubic SVM gives the highest accuracy of about 96.7%. e negative rate of the cubic SVM is 3.3% with a prediction time of 3901.4 seconds.
e negative rate of the cubic SVM by fusion is the same as that of the original one with a decrease of 560.4 seconds in the prediction time. e linear discriminant gives the second highest accuracy of about 95.7%. e negative rate of this classifier is 4.3% with a prediction time of 52.093 seconds. In the case of the linear discriminant by fusion, the negative rate is only increased by only 0.3% than that of the original  Computational Intelligence and Neuroscience 9 one with a decrease of 284.897 seconds in the prediction time. e wide neural network and the subspace discriminant give the third highest accuracy of about 95.4%. e negative rates of these classifiers are 4.6% with a prediction time of 100.38 and 552.78 seconds, respectively. In the case of the wide neural network and the subspace discriminant by fusion, the negative rate is increased by only 0.7% and 0.4%, respectively, than that of the original one with a decrease of 1210.02 and 2510.42 seconds, respectively, in the prediction time. e prediction time of brain tumor disease with the help of feature fusion is presented in Table 3. Now, if we compare this in terms of time, then the linear discriminant will be executed in the minimum time of 52.093 seconds. e wide neural network is the second and the fine tree is the third best classifier with a minimum prediction time of 100.38 and 156.73 seconds, respectively. e accuracies of these classifiers are 1%, 1.3%, and 4.1% with a less difference of 3849.307, 3801.02, and 3744.67 seconds, respectively, in the prediction time as compared to the cubic SVM. e confusion matrix of the cubic SVM after feature fusion is also shown in Figure 10. Table 2 also shows the percentage difference between the original and fused features. is is also shown visually in Figure 11. e cubic SVM shows the same accuracy after fusion as that of the original one. e accuracy of the fine tree is increased by around 3.3%.
In case of the linear discriminant and the boosted tree, the decrease in the accuracy is only about 0.3%. e accuracy of the subspace discriminant, wide neural network, bagged tree, medium neural network, and narrow neural network is decreased by around 0.4%, 0.7%, 1%, 1.7%, and 1.9%, respectively. From the results, it is clear that the accuracy remains the same. However, we come up with a decrease in the prediction time as compared to that of the original one, which we will explain later in Table 3 explanation. Table 3 also shows the speedup between the original and fused features. is is also shown visually in Figure 12. e highest speedup which we achieved is about 25.5 times in the case of the medium neural network. In the case of the wide neural network, the speedup is 13.1 times. e speedup of 6.5, 5.5, and 3.6 times is achieved in the case of the linear discriminant, subspace discriminant, and narrow neural network. In the case of the boosted tree, bagged tree, fine tree, and cubic SVM, we achieved a speedup of 2.1x, 1.9x, 1.8x, and 1.1x, respectively.
Apart from the accuracy and prediction time results presented in Tables 2 and 3, respectively, we also report the detailed results after future fusion. ese results are provided in Table 4 where sensitivity, FNR, precision, FPR, and AUC results are provided for various classifiers. All the classifiers have a good performance on all these metrics.
Finally, we also compare our results with the state-ofthe-art techniques performing brain tumor classification on the BraTS2018 dataset.
is comparison is provided in Table 5. Column 2 shows that the proposed technique has the highest accuracy as compared to the other techniques.
ough this improvement in accuracy is not very high, if we bring execution time into the picture (Column 3), we can see that most of the work did not focus on this aspect and did not report it. Our work achieved an improvement of about 25.5x, which is a significant reduction in the prediction time.

25.5x
Bold represents the best values.

Conclusion
e manual procedure of brain tumor detection and classification is not a good choice as it is tedious, time consuming, and expensive. A fully automated optimal deep feature fusion-based architecture is proposed in this work for brain tumor classification. A database of MRI images is prepared which consists of four different categories of tumors to perform evaluation. e proposed method achieved an accuracy of about 96.7%, which is the highest compared to the existing techniques. Based on the results presented in this work, it is observed that a few redundant and irrelevant features were still perceived. erefore, it is essential to select the optimal features. It is also shown that the fusion of optimal features improved the accuracy, but reduction in the prediction time is quite significant, obtaining the main goal of this work. e major dark side of this work is the fusion process that increases the computational time during the testing process. In the future, lightweight deep learning frameworks will be opted, and we will utilize an optimized feature fusion approach for classification and detection of tumors [45][46][47][48][49].
Data Availability e BraTs2018 dataset has been used in this work for the experimental process (http://www.med.upenn.edu/sbia/ brats2018/data.html).

Conflicts of Interest
All authors declare that they have no conflicts of interest.