An Effective and Novel Approach for Brain Tumor Classification Using AlexNet CNN Feature Extractor and Multiple Eminent Machine Learning Classifiers in MRIs

A brain tumor is an uncontrolled malignant cell growth in the brain, which is denoted as one of the deadliest types of cancer in people of all ages. Early detection of brain tumors is needed to get proper and accurate treatment. Recently, deep learning technology has attained much attraction to the physicians for the diagnosis and treatment of brain tumors. This research presents a novel and e ﬀ ective brain tumor classi ﬁ cation approach from MRIs utilizing AlexNet CNN for separating the dataset into training and test data along with extracting the features. The extracted features are then fed to BayesNet, sequential minimal optimization (SMO), Naïve Bayes (NB), and random forest (RF) classi ﬁ ers for classifying brain tumors as no-tumor, glioma, meningioma, and pituitary tumors. To evaluate our model ’ s performance, we have utilized a publicly available Kaggle dataset. This paper demonstrates ROC, PRC, and cost curves for realizing classi ﬁ cation performance of the models; also, performance evaluating parameters, such as accuracy, sensitivity, speci ﬁ city, false positive rate, false negative rate, precision, f-measure, kappa statistics, MCC, ROC area, and PRC area, have been calculated for four testing options: the test data itself, cross-validation fold (CVF) 4, CVF 10, and percentage split (PS) 34% of the test data. We have achieved 88.75%, 98.15%, 86.25% and 100% of accuracy using the AlexNet CNN+BayesNet, AlexNet CNN+SMO, AlexNet CNN+NB, and AlexNet CNN +RF models, respectively, for the test data itself. The results imply that our approach is outstanding and very e ﬀ ective.


Introduction
The existence of any kind of abnormalities in the brain may lead the human nervous system as well as health in great danger [1]. Today, brain tumor is denoted as the most severe ones, which results from the development of uncontrolled destructive cells in the human brain. According to the World Health Organization and the American Brain Tumor Association reports [2], it can be categorized as glioma, meningioma and pituitary tumors based on cell structure and locations.
It is necessary to completely identify the types of brain tumors for proper diagnosis and treatment. The radiologist utilizes various clinical imaging techniques to visualize the structure of the brain to diagnose tumors. The most popular imaging techniques are ultrasound imaging, X-ray imaging, positron emission tomography, computed tomography, and magnetic resonance imaging (MRI). Among these, the MRI provides precise information on a given medical image, gives definite information on the shape of the cerebrum structure, and the location of inconsistencies in brain tissues. However, it is often chaotic, time-consuming, and errorprone if medical specialists manually detect brain tumors by examining MRIs [3]. To diminish these problems, the computer-aided detection and diagnostic approach has been developed for the last few decades [4][5][6][7][8][9][10][11]. In recent years, various efforts have been taken to elevate a robust and precise technique to classify brain tumors automatically by using machine learning in MRIs [12][13][14]. Nevertheless, because of high inter-and intrashape, texture, and variation of contrast, this still remains a challenging task. Conventional machine learning techniques rely on artisanal features, which limit the robustness of the outcomes, while deep learning techniques automatically extract important functionality that provides significantly better performance.
But still, the results obtained using deep learning technologies are not sufficient, and they do not always show good classification accuracy, uses of only AlexNet, for example [15]. The research is challenging, especially for selecting the region of interest (ROI), extracting deep features, using clamorous images, etc.
To minimize the aforementioned problems, in this study, we have proposed a novel and effective technique for brain tumor classification automatically employing the AlexNet CNN feature extractor and a variety of prominent machine learning classifiers, such as BayesNet, sequential minimal optimization (SMO), Naïve Bayes (NB), and random forest (RF) in MRIs.
We know that AlexNet CNN is a good tool to determine ROI for extracting deep features and can easily deal with clamorous and raw images, which are very essential for the current research. Moreover, deploying machine learning classification algorithms with AlexNet CNN certainly enhance the performance of the proposed model.
Our contributions to this work are as follows: (i) We have utilized AlexNet CNN for separating human brain MRI datasets into training and test data and also for extracting the features from those separated datasets (ii) Those extracted features are then used for classifying the brain tumors as no-tumor, glioma, meningioma, and pituitary tumors by using eminent BayesNet, SMO, NB, and RF classifiers This paper is organized as follows. Section 1 discusses briefly the problems, some of the existing solutions, research gaps, objectives, contributions, and organizations of the paper. Section 2 presents related works on brain tumor detection and classification. It also discusses the research gaps at the end of this section. Section 3 describes overall working methodology, proposed architecture, and performance measuring metrics. Section 4 describes the results and performance of our model along with a comparison with other contemporary findings. Finally, Section 5 contains the conclusion of the entire study, limitations, and future works.

Related Works
Many researchers have published their works to represent different methodologies or approaches for detection and classification purposes over the past few years . Various methodologies have been tested using different clinical databases, including magnetic resonance imaging (MRI) of brain tumors. Selvaraj et al. [37] developed a binary classifier for classifying brain MRIs a utilizing firstand second-order statistics and a least square support vector machine (SVM) classifier. Sarkar et al. [38] proposed a computer-aided approach for detecting and classifying brain tumors from MRIs utilizing genetic algorithm as a feature extractor and SVM as classifier, where they obtained 98.3% classification accuracy. John [25] used techniques based on the gray level cooccurrence matrix (GLCM) and discrete wavelet transform (DWT) to detect and classify brain tumors. Ullah et al. [39] applied DWT for feature extraction and a feed-forward artificial neural network for brain MRI classification. Kharrat et al. [40] classified brain tumors as normal and abnormal classes through genetic algorithms and SVM classifiers. Papageorgiou et al. [33] obtained 90.26% and 93.22% accuracy for low-and high-grade glioma tumors, respectively, through fuzzy cognitive maps. Díaz-Pernas et al. [26] proposed an architecture for automatic detection and segmentation of brain tumors and acquired 97.3% accuracy. Shree and Kumar [41] used a GLCM feature extractor and probabilistic neural network classifier for classifying MRI datasets and achieved 95% accuracy. Arunachalam and Savarimuthu [42] offered an architecture that covered enhancement of image, transformation of image, extraction of features, and finally, image classification. They employed shift-invariant shear-let transforms for enhancing MRIs and used numerous feature extractors, such as Gabor, GLCM, and DWT to extract the features. Hossain et al. [30] utilized the traditional CNN approach to classify brain tumor images. They utilized fuzzy c-means clustering for segmenting the tumors and achieved an accuracy of 97.87%.
Several deep learning approaches have been extensively utilized by researchers to detect and classify brain tumors in the last decades. Sajid et al. [43] proposed a fully automated hybrid approach for the detection and segmentation of brain tumors in MRIs using deep learning. They utilized the BRATS2013 dataset and obtained 86%, 86%, and 91% dice scores, sensitivity, and specificity, respectively. Saxena et al. [44] utilized ResNet-50, Inception V3, VGG-16, and transfer learning techniques to classify the images; the ResNet-50 outperformed with 95% accuracy. In [27], Çinar and Yildirim utilized numerous convolutional neural network (CNN) models, such as Inception V3, GoogLeNet, ResNet-50, AlexNet, and DenseNet-201, for classifying the MRI datasets; whether they acquired respectable accuracies in every case. Khwaldeh et al. [45] customized the AlexNet architecture and employed it to classify brain MRIs, and obtained 91% accuracy. Preethi and Aishwarya [46] presented an architecture that combined wavelet-based GLCM to develop matrices of features. They utilized a deep neural network classifier to detect and categorize brain MRIs and obtained 92% accuracy. In [28], Hemanth et al. proposed a modified deep convolutional neural network (DCNN) to classify brain tumors, and they obtained excellent model performance in terms of accuracy. Khan et al. [47] introduced a computerized multimodel categorization technique to detect and classify brain tumor images. They utilized two presupervised CNN models, known as VGG16 and VGG19, to extract features. They deployed the BRATS2018 dataset to validate their model and obtained about 97.80% accuracy.

2
Journal of Sensors In [48], Siar and Teshnehlab utilized a central clustering algorithm to extract features, then fed those features directly to the CNN model and achieved 96% accuracy. Kuang et al. [49] introduced a 3D CNN model to classify the BRATS2018 brain tumor image dataset. Preprocessing was performed using the grey-level normalization technique and contrast adjustment. Afterward, those images were transferred to the 3D CNN model for classification. They achieved a dice score of 92%. In another study [29], Zhou et al. considered 3D images as 2D crosssections to utilize those images as the input dataset. They employed different models, such as DenseNet-RNN, Den-seNet-LSTM, and DenseNet-DenseNet, and obtained accuracies of 87%, 91% and 92%, respectively. In [31], Krishnammal and Raja utilized a presupervised AlexNet model to classify the MRI dataset. Features were extracted through curvelet transform and GLCM matrix, and 100% accuracy was obtained. Updated versions of CNN architectures, namely, AlexNet and VGG16, were introduced by Ali et al. [15] with 96% and 98% of accuracy, respectively. Deepak and Ameer [32] proposed a fully computerized technique for classifying brain tumor images into three categories: glioma, meningioma, and pituitary tumors. Their designed architecture was an updated version of the GoogLeNet model and obtained 98% accuracy. In another study, Sultan et al. [34] presented a DCNN model for accurately classifying MRI data into glioma, meningioma, and pituitary tumors, where they achieved 96% accuracy. Abiwinanda et al. [35] designed numerous CNN architectures for classifying brain tumors into three classes: glioma, meningioma, and pituitary tumors. One of the architectures showed the best result with 98.51% of accuracy. In [36], Seetha and Raja classified the brain tumor images using the CNN. They extracted the features using ImageNet and obtained 97.50% accuracy.
Every method has some advantages and disadvantages. Some research gaps are identified in the above-mentioned papers and summarized briefly as follows.    Thus, an excellent model is required for feature extraction and classification of brain tumors accurately from MRIs.

Materials and Methods
3.1. Research Implementation Block Diagram. Figure 1 shows the overall working architecture of our proposed model to accurately identify and classify brain tumors from MRIs.
Each key component of this architecture, such as the dataset including training and test data, preprocessing, feature extraction, prediction model, classified results, and performance measurement of the diagram, are sequentially nar-rated in the upcoming subsections. We have used AlexNet CNN in the MATLAB platform [50] for categorizing the dataset as training and test as well as extracting the features from it. Then, the extracted features are employed in the WEKA platform [51] to classify the brain tumors of MRIs as no-tumor, glioma, meningioma, and pituitary tumors utilizing prominent BayesNet, SMO, NB, and RF classifiers and establishing a prediction model. The performance of the model is assessed by the classified results.

Data Description.
The selection of an appropriate dataset is the first and foremost concern for identifying and classifying medical images. Therefore, we collected the dataset from a trustworthy website, the Kaggle database [52, 53], a well-known web-based data source for brain MRIs    Journal of Sensors    Journal of Sensors per class). The elaborate allocation of the data collected and deployed in this study is shown in Table 1. We split the dataset (3600 MRIs) into training and test datasets: 2520 (70% of the total dataset) images are utilized as training images, and the rest 1080 (30% of the total dataset) images as test images, using AlexNet CNN. Training data are deployed for model prediction; on the other hand, test data are applied for testifying the model's performance.

Preprocessing.
Preprocessing is a technique that augments the quality of images and makes them able for further steps. In a certain cases, the separation between normal and abnormal tissue is complicated because of the high-clamor level. As a result, specialists may normally commit errors in diagnosis. Then again, minor contrasts between normal and abnormal tissues can likewise be concealed by the clamor. Therefore, it is important to evacuate the conceivable clamors through preprocessing. In addition, enhancement of the visual nature of images offers tremendous favor to the specialists. In this research, MRIs are preprocessed with an anisotropic filter. The size of the images in the dataset of MRIs is not the same. These different sizes of images illustrate the input layer of the CNN network; therefore, these images need to be preprocessed, standardized, and resized to 227 × 227 pixels.
3.4. Feature Extraction. One presupervised CNN model, AlexNet [15,31], is utilized in this investigation to include extraction, because it is a fantastically incredible model fit for accomplishing high-level accuracy on testing datasets. AlexNet is the most important known deep CNN structure which comprises twenty-five layers, eight of which add to  Journal of Sensors learning by modifying loads. Five of these are known as convolution layers while staying three are known as fully connected (FC) layers. In the AlexNet design, max-pooling layers are connected to convolution layers in series with each other. The first convolution layer and then max-pooling layers utilize fluctuating kernel sizes [54]. The layer of max-pooling ensues the layer of convolution [54,55]. FC6 is the first FC layer, and FC7 is the second FC layer in activation are utilized to remove the features of vectors. There are altogether 4096 features of FC6 and FC7 vectors in the AlexNet CNN design [56].

Prediction
Model. Consequently, we have developed predictive models based on features extracted from the trained data. A prediction model is a procedure through which future results or conducts are predicted dependent on the former and present data. It is a measurable examination procedure that empowers the assessment and count of the likelihood of specific outcomes identified with programming or frameworks. The prediction model works by gathering information, making a factual model, and applying probabilistic methods to anticipate the possible results.  Journal of Sensors pooling layers, and FC layers, as shown in Figure 2. The basic function of the convolution layer is to extract the local features of the existing layers. The ReLU layer performs element-by-element activation. The pooling layers are introduced for downsampling. Max-pooling is commonly utilized to reduce the accuracy of the feature maps by incorporating linguistically similar features. To overcome the fitting issues, some neurons have been removed from the CNN architecture, known as dropouts. At the end, the FC layer yields the class score value ranging from 0 to 1 which is deployed to attain the classification decisions. Essentially, SoftMax layers are introduced by default in this motive.
3.6.2. AlexNet. Figure 3 shows the architecture of an AlexNet which is mainly comprised of five convolutional layers (11 × 11, 5 × 5, 3 × 3, 3 × 3, 3 × 3), three max-pooling layers (3 × 3), and three FC layers. The first two max-pooling layers are deployed after the first two convolutional layers sequentially. The third, fourth, and fifth convolutional layers are connected directly. The third max-pooling layer is inserted after the fifth convolutional layer, and its output is fed into a series of three FC layers. The success of AlexNet is usually credited to ReLU, stochastic gradient descent (SGD), dropout, etc. ReLU is introduced for accelerating the speed of training processes. The value of the convolutional kernel is extracted by optimizing the total cost function and applying the SGD algorithm. To eliminate the overfitting problem, a dropout layer is applied in the first two FC layers. The third FC layer, also known as the softmax layer, is used to classify various objects in a CNN [57]. The BayesNet classifier is a kind of directed acyclic graph which enciphers a joint probability distribution over a set of random variables [58,59]. It is a twosome of B = ðG, θÞ, where G represents the graph whose vertices correspond to the random variables X1 … Xn , as well as whose edges denote direct dependencies between the variables. θ denotes the set of parameters which quantifies the network. For the BayesNet classification model, in the present study, a simple estimator is used, and the batch size is chosen as 100 to get better classification accuracy.

Sequential Minimal Optimization
Classifier. Sequential minimal optimization (SMO) is an important algorithm that can solve the quadratic programming problem as well as be materialized by training the SVMs. At this issue, the least plausible optimization problem includes the Lagrange multipliers which have two numbers, because the Lagrange multipliers must comply with a direct equity limitation. First, it discovers two multipliers, and then it attempts to enhance them. It will be repeated until it meets the equilibrium state [60]. Penalty parameter (C) is chosen as 1, and s polynomial kernel is selected based on the trialand-error method for the SMO classification model, because they provide better model performance.

Naïve Bayes
Classifier. The Naïve Bayes (NB) classifier is a probabilistic machine learning model based on Bayes theorem; that is, very useful for classifying a large amount of data. It anticipates based on the probability of an object. Thus, the model is developed, and the common  Journal of Sensors input to the model is batch size (number of experiments carried out) [61], whose value is selected as 100 for better accuracy of the model.

Random Forest
Classifier. The random forest (RF) classifier is a supervised machine learning algorithm that creates a forest with a bunch of trees. Its working proce-dure consists of building simple decision trees and finally making the decision based on the votes. It looks for the best feature in every step (splitting nodes and growing trees), which predominantly results in a superior model. Batch size, number of features, and seed value are the common inputs in the RF model [61]. For better performance, we have considered batch size as 100 and seed value as 1.   In this research, it is expected that the final classified results will be obtained as glioma, meningioma, no-tumor, and pituitary tumors from MRI datasets. Then, we wish that the qualitative and quantitative values of the performance measurement of the proposed model's will be attained to evaluate the model performance.
3.9. Performance Measurement. A confusion matrix has been utilized to calculate the performance of a classifier. The performances are described by four components of the confusion matrix, such as true positive (TP), true negative (TN), false positive (FP), and false negative (FN), which are whole in number and not in rate. The values of the performance parameters, such as accuracy, sensitivity, specificity, kappa statistics, false positive rate, precision, f-measure, and Matthew's correlation coefficient (MCC), have been calculated for four testing options: the test data itself, the cross-validation fold (CVF) 4 [16,34], the CVF 10 [19], and percentage split (PS) 34% [51] of test data, by employing the Equations (1)- (11) to measure the performance of our proposed model.

Journal of Sensors
Different types of curves, such as the receiver operating characteristics (ROC) curve, precision-recall curve (PRC), and cost curve, have been plotted to understand the performance of every classifier. The ROC curve is utilized to quantify how well the zone could be recognized from noise by plotting the true positive rate against the false positive rate [28,32]. The PRC is a well-known model execution measurement to assess the binary classification model [62]. A cost curve can be defined as a visual curve that represents the performance of the classifier based on the misclassification cost [63].    We have also analyzed cost curves which yields model performance based on the minimum misclassification cost. The minimum misclassification cost can also be defined as the minimum area under the bottom of the envelope. Therefore, the minimum region below the lower end of the range lowers the cost of misclassification and thus provides better classification performance. Figures 8-11 represent the cost curves for different testing options, such as the test data itself, CVF 4, CVF 10, and PS 34% of the test data employing BayesNet, SMO, NB, and RF classifiers for glioma, meningioma, no-tumor, and pituitary classes, respectively. As seen in Figure 8, the RF classifier very clearly offers minimal misclassification cost; that is, better classification performance. However, the SMO classifier shows a slightly lower misclassification cost in Figures 9-11. 4.4. Performance Evaluation. Finally, we have evaluated the performance of our models based on their respective confusion matrices. We have calculated the values of accuracy, sensitivity, specificity, FPR, FNR, precision, f-measure, kappa and MCC and summarized them in Table 4, which represents the values for the test data itself: CVF 4, CVF 10, and PS 34%, respectively.

Confusion Matrix.
For the test data itself, Table 4 shows that the values of accuracy, sensitivity, specificity, FPR, FNR, precision, f-measure, kappa, and MCC are higher for the RF classifier compared to those values of the other classifiers, exhibiting that the RF classifier gives excellent performance with 100% accuracy.
For the case of CVF 4 of test data, Table 4 shows that the values of accuracy, sensitivity, specificity, precision, f-measure, kappa, and MCC are higher and the values of FPR and FNR are lower for the SMO classifier than those values for the other classifiers, indicating that the SMO classifier gives better performance with 92.50% accuracy. Similar results are seen in the case of CVF 10 of test data, where the SMO classifier gives 92.45% accuracy.
However, PS 34% of the test data in the table shows that the values of accuracy, sensitivity, specificity, FPR, FNR, kappa, and MCC are higher for the RF classifier than the other classifiers. The values of precision are the same for both SMO and RF classifiers, and the values of f-measure are very slightly lower for the RF classifier compared to the SMO classifier. From these statistical values, we can assume that the RF classifier reveals a better performance with 89.27% accuracy.  Table 5 represents the comparison of our proposed model with several contemporary findings in terms of accuracy. As shown in the table, our model is highly appreciable, effective, and novel.

Conclusion
In this study, a novel and effective technique for feature extraction and classification of brain MRIs is presented for classifying brain tumors into glioma, meningioma, notumor, and pituitary classes. An AlexNet CNN with Bayes-Net/SMO/NB/RF models is approached. Performances of the model are evaluated using the test dataset. We have calculated the values of performance parameters from respective confusion matrix, namely, accuracy, sensitivity, specificity, FPR, FNR, precision, f-measure, kappa statistics, MCC, ROC area, and PRC area for four testing options: the test data itself, CVF 4, CVF 10, and PS 34% of test data. We have also explained ROC, PRC, and cost curves to measure the classification performance. We have acquired 100%, 98.15%, 88.75%, and 86.25% of accuracy using the AlexNet CNN+RF, AlexNet CNN+SMO, AlexNet CNN+BayesNet, and AlexNet CNN + NB models, respectively. This study presents a paramount technique for the classification of brain tumors from MRIs. However, the proposed model was evaluated using a moderately sized dataset, which is one of the limitations of the current work. So, it is essential to evaluate the model with big data size in the future to see how well it performs. Another limitation is that the proposed approach has not been tested employing the MRIs of real-patient in Bangladesh. In the future, we will try to deploy this work on a real-time medical diagnostic system by acquiring the MRIs from different hospitals and diagnostic centers in Bangladesh. We believe that our approach is very much applicable in the health sector, offering innovative solutions with high precision for medical imaging, especially for diagnosing brain tumors more accurately.

Data Availability
Experimental codes and the data of this work are available from the corresponding author upon request at any time.

Conflicts of Interest
The authors declare no conflicts of interest.