An Efficient Method for Diagnosing Brain Tumors Based on MRI Images Using Deep Convolutional Neural Networks

is paper proposes a system to effectively identify brain tumors onMRI images using artificial intelligence algorithms and ADAS optimization function. is system is developed with the aim of assisting doctors in diagnosing one of the most dangerous diseases for humans. e data used in the study is patient image data collected from Bach Mai Hospital, Vietnam. e proposed approach includes two main steps. First, we propose the normalization method for brain MRI images to remove unnecessary components without affecting their information content. In the next step, Deep Convolutional Neural Networks are used and then we propose to apply ADAS optimization function to build predictive models based on that normalized dataset. From there, the results will be compared to choose the most optimal method. ose results of the evaluated algorithms through the coefficient F1-score are greater than 94% and the highest value is 97.65%.


Introduction
e brain is a particularly important organ, the control center of the central nervous system, coordinating the activities of all organs and parts in the human body. e brain has a complex structure and is protected and covered by the skull, a very hard bone box. However, a rigid skull may help protect the brain parenchyma from minor trauma but does not prevent the development of lesions and abnormal structures within the brain. One of the brain diseases of primary concern in medicine is brain tumors. A brain tumor is a condition in which abnormal cells grow in the brain. Brain tumors are divided into two types: benign brain tumors and malignant brain tumors (called cancer) [1]. Whether it is a benign brain tumor or a malignant brain tumor, it affects brain cells, causing brain damage and being even life-threatening. ere are about 120 different types of brain tumors, most of which are tumors in the brain tissue, in addition to tumors in the meninges, pituitary gland, cranial nerves. Any form of brain tumor can be dangerous for the patient. Tumors in brain tissue or benign brain tumors often progress slowly; the symptoms of brain tumors in this case will also appear slower and more insidious. In contrast, if the brain tumor grows rapidly, the patient will feel the symptoms more pronounced in both frequency and extent. With current medical capabilities, early detection of abnormal structures in the patient's brain can improve the likelihood of successful treatment and limit the sequelae of tumors to the brain in general and the patient's health in particular.
e detection of brain tumors today is mostly based on the ability of doctors to distinguish abnormalities on MRI images which is a type of high-quality image in the field of imaging [2]. is is a process that requires a lot of experience and concentration to detect and classify brain diseases and brain tumors. From brain MRI, it is possible to diagnose and recognize many different types of brain tumors and offer appropriate treatment methods [3]. However, the increasing number of patients with the large number of images obtained has become a major challenge in the field of diagnostic imaging, a field that requires rapid and accurate evaluation of results by doctor. Artificial intelligence technology will help classify diseases from MRI images quickly and bring high accuracy in disease diagnosis. e classification of diseases based on MRI images has not been too difficult with high accuracy due to the introduction of GPUs (Graphics Processing Unit) and image processing based on artificial intelligence (AI).
is research focuses on the application of image preprocessing techniques and the development of algorithms using convolutional neural network (CNN) models, which are advanced deep learning models such as DenseNet201 [4], ResNet152V2 [5], MobileNetV3 [6], and VGG19 [7]. At the same time, the research also focuses on developing and applying the ADAS optimization algorithm to improve the accuracy in classifying normal people and brain tumor patients. e dataset in this work includes 1307 brain MRI images in JPEG format that are manually classified by specialists into 2 categories: normal human brain MRI images and brain MRI images of people with brain tumor disease.
e comparison of all experimental results will evaluate the effectiveness of each model. e article is organized as follows. Section 2 presents the previously conducted MRI brain tumor classification studies. Section 3 provides an overview of brain MRI images and the CNN algorithm models used. Section 4 presents the experimental results and gives evaluation for each algorithm. Conclusions and future work are outlined in Section 5.

Related Work
Several technical methods related to brain MRI images classification since 2017 based on different classification models are summarized in Table 1. ey are divided into two basic methods: using CNN network architecture and not using CNN network architecture. In [10], the authors divided brain MRI images into two categories: normal images and images with abnormal signs. ey used GLCM to get the features of the MRI images; then a probabilistic neural network (PNN) was used to classify the MRI brain images of people as normal or abnormal. As a result, they obtained a classification model with an accuracy of 95%. In [14], Ullah et al. proposed a scheme to classify the brain MRI images of normal people and patients using equilibrium histograms, discrete wavelet transforms, and Feedforward Artificial Neural Networks. Recently, deep learning method has been widely used for the classification of brain tumors on MRI images [8,9]. e deep learning method does not need to manually extract the features of the images; it combines the extraction and classification stages in the self-learning process. e deep learning method requires a dataset where normalized processing of the MRI images is sometimes required, and then salient features are identified during machine learning [13].
Convolution Neural Network (CNN), one of the wellknown deep learning techniques for image data, can be used as a feature extraction tool from which to capture related features to perform data classification task. Feature maps in the initial and higher layers of the CNN model extract low-level features and specific features of high-level content, respectively. Feature maps in the earlier layer construct simple structural information, such as shapes, textures, and edges, while the higher layers combine these low-level features into constructing (encoding) expressions performance, integrating local and global information.
Various researchers have proposed to use CNN to classify brain tumors based on brain MRI image datasets [11,21,22]. Deepak and Ameer [12] used pretrained GoogLeNet to extract features from brain MRI images with CNN network architecture to classify three types of brain tumors and obtained up to 98% accuracy. Çinar and Yildirim, [15] modified the ResNet50 network based on the pretrained CNN network architecture by removing the last 5 layers and adding 8 new layers, and that method achieved 97.2% accuracy. Saxena et al. [17] used InceptionV3, ResNet50, and VGG16 network architectures with legacy methods to classify brain tumor data. In this study, ResNet50 model obtained the highest accuracy rate with 95%. Díaz-Pernas et al. [18] presented a CNN network architecture for automatic brain tumor segmentation such as glioma, meningioma, and pituitary tumor. ey evaluated their proposed model using the T1-weighted contrast-enhanced MRI dataset and obtained an accuracy of 97.3%.
Siddiaue et al. [16] proposed a model based on modified vgg-16 network architecture for brain tumor images classification which achieved an accuracy of 96% and an F1score of 97%. Abd El Kader et al. [19] developed a differential deep-CNN-based model to classify MRI images with and without tumors. In fact, this model was still based on the basic CNN architecture but obtained an accuracy of 99.25% and an F1-score of 95.23%. In [20], the authors successfully deployed transfer learning for some variant architectures of CNN to apply to the classification of MRI images with and without brain tumors, in which MobileNetV2 had an accuracy of 92% and F1-score of 92%; InceptionV3 had an accuracy of 91% and an F1-score of 90.98%; VGG19 had an accuracy of 88% and F1-score of 88.18%.
In summary, as observed from the above studies, the accuracy obtained by using deep learning with CNN network architecture to classify brain MRI is significantly higher than that of the old traditional techniques. However, deep learning models require a large amount of data to train in order to perform better than traditional machine learning techniques.

Content Contained in MRI Images.
e commonly used standard for MRI images today is DICOM, an acronym for Digital Imaging and Communications in Medicine Standards [23]. is is an industry standard system developed to meet the needs of manufacturers and users in connecting, storing, exchanging, and printing medical images.
As for the DICOM image format standard, in addition to the image files, it also includes header files as in Figure 1.
Although stored in different files, when displayed, the header information is displayed along with the MRI image information via a "DICOM browser." Data in MRI images include demographic information, patient information, parameters acquired for imaging studies, image size, and image matrix size. e patient's information displayed includes patient's first and last name, gender, age, date of birth, and place where the MRI scan was performed.

3.1.2.
e Role of MRI Images in the Diagnosis of Brain Tumors. Magnetic resonance imaging of the brain [24] can very clearly detect and describe abnormalities in the brain parenchyma in general such as vascular tumors, arterial occlusion, and invasion of the venous sinuses as well as the relationship between tumor and surrounding structures.
ere are three basic image formats of MRI images: T1W, T2W, and T2 Flair. ey are used in specific cases depending on the situation of the disease.
T1W imaging is mainly used to identify necrotic tumors, hemorrhage in tumors, or cysts. For example, with MRI images in meningiomas, on T1-weighted images, most meningioma shows no difference in signal intensity compared with cortical gray matter.
For image in the T2W phase, the received signal has been changed completely; it is a fairly homogeneous gain signal block. Imaging is also helpful in evaluating hemorrhages and cysts. In particular, the role of the T2W phase is very useful in reflecting the homogeneity of benign soft tumors or meningiomas.
For Fluid-Attenuated Inversion Recovery (T2-FLAIR), this type of phase image is very useful to evaluate the consequences and effects of edema. Although this finding is not specific for meningiomas in particular, it is very meaningful in the diagnosis as well as the long-term prognosis for the patient.
Overall, the sensitivity and specificity of MRI are very high in the diagnosis of meningiomas. MRI has been shown to be superior in tumor delineation by its relationship to surrounding structures.

Model Architectures
3.2.1. Supervised Learning. Supervised learning [25] is an algorithm that predicts the output (outcome) of a new data (new input) based on previously known (input, outcome) pairs. is data pair is also known as (data, label). Supervised learning is the most popular group of machine learning algorithms.  Mathematically, supervised learning consists of a set of input variables X � x 1 , x 2 , . . . , x N and a corresponding set of labels Y � y 1 , y 2 , . . . , y N , where x i and y i are vector. e data pairs (x i , y i ) ∈ X × Y are called the training dataset. From this training dataset, we need to create a function that maps each element from the set X to a corresponding (approximate) element of the set Y as (1) e goal is to approximate the function f very well so that when we have a new data x we can compute its corresponding label: (

2)
A problem is called classification if the labels of the input data are divided into a finite number of groups.

Convolutional Neural Network Architectures.
Convolutional Neural Network (CNN) [31] is one of the most popular and most influential deep learning models in the computer vision community. CNN is used in many problems such as image recognition and video analysis or for problems in the field of natural language processing and solves most of these problems well.
CNN includes a set of basic layers such as convolution layer, nonlinear layer, pooling layer, and fully connected layer. ese layers are linked together in a certain order. Basically, an image will be passed through the convolution layer and nonlinear layer first; then the calculated values will be passed through the pooling layer to reduce the number of operations while preserving the characteristics of the data. e convolution layer, nonlinear layer, and pooling layer can appear one or more times in the CNN network. Finally, the data is passed through fully connected network and soft-max to calculate the probability of object classification. Table 2 summarizes some typical CNN network architectures since 2012. To evaluate and compare network structures, two parameters are used, Top 1 Accuracy and Top 5 Accuracy. In the case of Top 1 Accuracy, the correct model's prediction must be the model that predicts the class with the highest probability. In the case of Top 5 Accuracy, the correct model's prediction is the model that correctly predicts one of the 5 classes with the highest probability.
In this study, four different network architectures are used: DenseNet201 [4], ResNet152V2 [5], MobileNetV3 [6], and VGG19 [7]. All the above four network architectures are developments and upgrades based on the basic network architecture CNN, one of the advanced deep learning models for image classification that has been verified with high accuracy on image sets, ImageNet [32]. ese CNN variant network architectures are widely used in image recognition and classification problems. All four network architectures have a structure consisting of two basic layers, the feature extraction layer and the classifier layer. In this research, the input to the network architecture is a 256 × 256 brain MRI image containing information with or without brain tumors. e feature extraction layer has the role of extracting features of brain MRI images such as white matter, gray matter, cerebrospinal fluid, cerebral cortex, and brain tumor. en, the classification layer is responsible for synthesizing the features of brain MRI images, giving specific features of images with tumors and images without tumors to serve the classification process.

Optimal Algorithms.
e optimization algorithm is the basis for building a neural network model with the aim of "learning" the features (or patterns) of the input data, from which it is possible to find a suitable pair of weights and biases to optimize the model. But the question is how to "learn?" Specifically, how the weights and biases are found, not just randomly taking the weights and biases values for a finite number of times and hoping after some steps a solution can be found. erefore, it is necessary to find an algorithm to improve weights and biases step by step, and that is why optimizer algorithms were created.
Among the above algorithms, the Optimal Algorithms belonging to the adaptive family usually have fast convergence speed. Meanwhile, algorithms belonging to the SGD family often have high generalization. However, this study only focuses on the development and application of ADAM and ADAS algorithms.

ADAM Algorithm: A Method for Stochastic
Optimization. ADAM is a combination of Momentum and RMSProp. One of the key components of ADAM is exponential weighted moving averages (also known as leak averages) that estimate both the momentum and the secondorder moment of the gradient. Specifically, it uses state variables as follows: where v is the first moment vector, s is the second moment vector, β 1 and β 2 are the jump parameters at the initial and the second points in ADAM's algorithm, t is the time for the correction steps, and g is the gradient. Here β 1 and β 2 are nonnegative weight parameters. Popular choices for them are β 1 � 0.9 and β 2 � 0.999. is means that the variance estimate moves much slower than the momentum term.

Applied Computational Intelligence and Soft Computing
Note that if initializing the values v 0 � s 0 � 0, the algorithm will have a significant initial bias towards smaller values. is problem can be solved using t i�0 (1 − β t )/(1 − β) to normalize the terms. Similarly, state variables are normalized as follows: From the appropriate estimates, the updated equations can be established. First, the gradient value will be adjusted, similar to that in RMSProp [33] to get where ε is a constant and it is chosen to be ε � 10 −6 to balance arithmetic stability and reliability, and η is the learning rate. From there, the update step is defined as follows: When looking at the design of ADAM, the inspiration of the algorithm is clear. Momentum and range are clearly represented in the state variables. Moreover, based on RMSProp it is easy to see that the combination of both terms is quite simple. Finally, the learning rate η allows us to control the update step length to solve convergence problems.

ADAS Algorithm: Adaptive Scheduling of Stochastic
Gradients. ADAS [35] is an optimization algorithm belonging to the family of Stochastic Gradient Descent (SGD) algorithms.
e updated rules for ADAS are established using SGD with momentum as follows: where η is the learning rate, t is the time for the correction steps, β is ADAS gain factor, ζ is the knowledge gain hyperparameter, k is the current minibatch, t is the current epoch iteration, l is the convolution block index, G(·) is the average knowledge gain obtained from both mode-3 and mode-4 decompositions, v is the velocity term, and θ is the learnable parameter. e learning rate is calculated relative to the rate of change of knowledge acquired after the training epochs. e learning rate η(t, l) is then further updated by an exponential moving average called the gain factor, with the hyperparameter β, to accumulate the history of the knowledge gained over the series epochs. In fact, β controls the trade-off between convergence rate and training accuracy of ADAS. ADAS is an adaptive optimization tool for scheduling the learning rate in the training of a CNN network. ADAS exhibits a much faster convergence speed than other optimization algorithms. ADAS demonstrated generalization characteristics (low test loss) on par with SGD-based optimizers, improving on the poor generalization characteristics of adaptive optimizers. In addition to optimization, ADAS introduces new polling metrics for CNN layer removal (quality metrics).

Accuracy and F1-Score.
e classification problem in this study is a binary classification problem, in which one class is an MRI image with a brain tumor and the other is an MRI image without a brain tumor. is study considers the image class with brain tumor to be positive and the remaining image class without brain tumor to be negative. e parameters True Positive (TP), False Positive (FP), True   Table 3.
In this paper, the parameters used to evaluate the effectiveness of the model are accuracy, precision, recall, and F1-score [39]. When building a classification model, the ratio of correctly predicted cases to the total number of cases is always considered. at ratio is called accuracy. Precision is the answer to the question: how many true positives are there out of the total number of positive diagnoses? Recall measures the rate of correctly predicting positive cases across all samples in the positive group. F1-score is the harmonic mean between precision and recall. erefore, in situations where the precision and recall are too different, the F1-score will balance both values and help us to make an objective assessment. Accuracy, precision, recall, and F1score are defined as the following equations:

Experiments and Results
is study will compare the results of the network architectures DenseNet201, ResNet152V2, MobileNetV3, and VGG19 in the cases before and after data normalization with the ADAM optimization function. en, the study will specifically compare the performance of the above algorithms with the ADAM and ADAS optimization functions on the same normalized dataset.

Collecting Data.
In this study, the dataset is a set of MRI brain tumors of 123 patients with brain tumors at Bach Mai Hospital, Hanoi, Vietnam, of all ages. Initially, the MRI image was in DICOM format; to remove the information in the patient's DICOM image and convert the image format for machine learning, the DICOM format was converted to the JPEG image format. e size of the converted images is 256 × 256 pixels. e image used during training is a T2 pulse sequence image as in Figure 3. Signal intensity with T2 phase correlates very well with not only homogeneity but also tissue profile. Specifically, with low-intensity signals, the tumor has a fibrous and stiffer character than the normal parenchyma. For example, the tumor is fibroblastic in nature, while the more intense sections show a softer characteristic such as a vascular tumor. erefore, the image of the T2 pulse sequence is considered a pulse sequence that best assesses whether the patient has a brain tumor or not.
With the above 123 patients with brain tumor pathology and 100 healthy persons, 1307 images of T2 pulse sequence were selected, of which 647 images showed brain tumors and 660 images did not show brain tumors. e images are all brought to a size of 256 × 256 pixels to serve the training and testing process of the algorithms.

Normalizing Data
(1) Minimizing Image Redundancy. In the raw MRI image data, it is easy to see that there is a rather large black border, but that is the air in the optical field of the machine, so it does not carry information about the skull to be examined. erefore, it is really necessary to remove the black out-ofthe-edge image from the MRI image without affecting the image information content. e skull on an MRI is usually surrounded by a bright white border, the outer layer of fat around the skull. Meanwhile, the MRI image is a grayscale image (one-dimensional); the range of values of each element in the image matrix representing the brightness of the pixel is in the range [0, 255]. In order to maximize the black border on the image, the easy method implemented by this study is to find the first pixel with a nonzero value in the directions from left to right, from right to left, from top to bottom, and from bottom to top as shown in Figure 4. After determining the coordinates of those pixels, remove the outer edges. e normalization of images by cutting out the parts that do not make sense in image classification aims to increase the accuracy of the training process and reduce the training time of the algorithm.
(2) Normalizing Image Size. e normalization of the image size helps to improve the accuracy and efficiency of the algorithm. In this study, the image size is 256 × 256. is is the right image size for AI algorithms and ensures MRI  Applied Computational Intelligence and Soft Computing image quality after resizing. Choosing a smaller size will make it difficult for AI algorithms to detect small differences between pixels, affecting the accuracy of the algorithm. If the image size is larger, it will affect the quality of the MRI image after resizing/reducing image quality, negatively affecting the accuracy and performance of the algorithm. Normalization of data is processed by image data files corresponding to each type of patient's MRI image and by using Python programming. e normalized data removes the nonsignificant parts of the image classification, which increases the accuracy of the model training process and reduces the training time of the algorithm.

Image Classification Process
Step 1. Preparing the training dataset and feature extraction.
is step is considered an important step in machine learning problems because it is the input for learning to find the model of the problem. We must know how to select the good features, remove the bad features of the data or the noisy components, and estimate how many dimensions of the data are good or in other words how many features to select. If the number of dimensions is too large, making it difficult to calculate, it is necessary to reduce the number of dimensions of the data while maintaining the accuracy of the data (reduce dimension).
In this step, the dataset to test on the model is needed to be prepared. Usually, cross-validation will be used to divide the dataset into two parts, one for training (training dataset) and the other for testing purposes on the model (testing dataset). ere are two ways commonly used in cross-validation: splitting and k − folding. For the above algorithms, during the training process, the data is divided according to the ratio 6 : 2 : 2, in which 60% of the data is for training and 20% is for the training validation process (validation). And the remaining 20% is for the process of retesting the model after training.
With the dataset consisting of 1307 images (T2-Images) as mentioned above, the image set has been divided according to the ratio 6 : 2 : 2 to serve the training, validation, and testing processes. Specifically, the number of images used includes 813 images for training, of which 414 images do not show brain tumors and 399 images show brain tumors; 239 images for validation, including 121 images showing brain tumors and 118 images not showing brain tumors; 255 images for the test process, including 130 images showing brain tumors and 125 images not showing brain tumors.
Step 2. Classifier model. e purpose of the training model is to find a function f(x) from which to label the data. is step is often called learning or training.
where x is the feature or input of the data and y is the class label or output. e classification model used here is the above supervised learning algorithms DenseNet201, ResNet152V2, MobileNetV3, and VGG19.
Step 3. Checking data with model to make prediction.
After finding the classification model in Step 2, in this step, new data will be added to test on the classification model.
Step 4. Evaluating the classification model and selecting the best model.
In the final step, the model will be evaluated by assessing the error level of the testing data and the training data through the found model. If the system results are not as expected, the parameters (turning parameters) of the learning algorithms must be changed to find a better model as well as to test and reevaluate the classification model. From there, it is possible to choose the best classification model for the problem. All steps mentioned above can be described as in Figure 5.

Evaluating the Effectiveness of Applying Data Normalization
(1) Results of Training Process. In order to appraise the effectiveness of data normalization, this work evaluates the convergence (accuracy) of network architectures in     99.91%, respectively. However, the results of the validation step of the algorithms showed a marked increase in the accuracy when comparing before and after data normalization. Specifically, the accuracy of the validation process for the DenseNet201 network architecture after normalization is 94.14%, higher than before normalization with an accuracy of 91.63%. e validation result of ResNet152V2 network architecture after normalization has an accuracy of 93.31%, slightly better than before normalization with an accuracy of 92.86%. And this result of ResNet152V2 network architecture after normalization has higher stability than before normalization as presented in Figure 7; Figure 8 indicates that the validation process of MobileNetV3 network architecture after normalization has higher accuracy than before normalization with accuracy of 91.21% and 88.70%, respectively. e validation results of the VGG19 network architecture are similar to those of the three algorithms above with an accuracy of 92.88% after normalization compared to 89.54% before normalization as shown in Figure 9. And it can be seen that all network architectures have convergence with 90% accuracy after only 40 epochs when they use normalized data.   In this paper, in order to be consistent with the collected brain MRI image data, with the ADAM optimal algorithm and the training process with steadily increasing accuracy, where the loss (loss) decreases the most, this study used different learning rates for each network architecture. Specifically, with the network architectures DenseNet201, ResNet152V2, MobileNetV3, and VGG, the initial learning coefficients are η 0 � 3e − 6; 3e − 6; 2e − 5; 3e − 6 { }, respectively. And the results of using these learning coefficients have shown the stability of the training process to avoid overfitting and are shown in Figures 10 and 11.
In practice, it is not always the case that the longer the model training process, the lower the loss function. When it reaches a certain number of epochs, the loss function value will reach saturation; it can no longer decrease and may even increase again. at is overfitting phenomenon. To prevent this phenomenon and free up computational resources, the training process should be stopped right at that saturation point. In this study, as shown in Figure 11, it can be seen that the values of the loss function for all architectures reach saturation when the number of epochs is 100.
When comparing the efficiency of processing speed on the same resource, based on Figure 12, it can be seen that all 4 network architectures give a shorter training time with normalized image data than the training time with denormalized image data. Comparison results between algorithms with datasets before and after normalization are shown in Table 4. Clearly, the results showed that the benefits of normalizing the image data make the network architectures capable of classifying brain tumors with higher accuracy and shorter training time.
(2) Evaluating the Accuracy of Network Architectures Based on F1-Score. After performing the training process, models of the respective network architectures were generated. In this part, they will be tested for testing on the test dataset. is dataset includes 255 images of which 130 images are showing brain tumors and 125 images are not showing brain tumors.
e results illustrated in Figures 13-16 and the summary data in Table 5 show that all algorithms have an accuracy greater than 92% when based on the F1-score, in which ResNet152V2 network architecture has the highest results.
is is expected to be implemented in practice.

Comparing the Accuracy of Models Using ADAM and ADAS Optimal Function
(1) Results of Training Process. In this section, the accuracy of the classification network architectures will be evaluated and compared using the ADAM and ADAS optimization functions. e network architectures will execute the training, validation, and testing processes on the same normalized database with the same computational resources.
Similar to the ADAM optimization algorithm, in order to fit the brain MRI image data, it is suitable for the ADAS optimization algorithm and the training process has a steady increase in accuracy and the most uniform decrease in loss. Each network architecture uses its own learning coefficient. In this study, the learning coefficients of DenseNet201, ResNet152V2, MobileNetV3, and VGG network architectures are η 0 � 7e − 3; 5e − 3; 4e − 3; 1e − 2 { }, respectively. With the above input data, the experimental results of network architectures with ADAM and ADAS optimal functions are shown in Figures 17-20, respectively. ese results show that the training accuracy of the network architectures using the ADAM and ADAS optimization algorithms are almost the same with the obtained values being greater than 99%. However, for the results of the validation process of the network architectures, the accuracy when implementing the ADAS optimization algorithm has improved significantly in comparison with when using the ADAM optimal algorithm. Specifically, the accuracy of the validation process for the DenseNet201 network architecture using the ADAS optimization algorithm is 95.39% compared to 94.14% when using the ADAM optimization algorithm. And to achieve accuracy, with the ADAM optimal algorithm, the DenseNet201 network needs 40 epochs while with the ADAS optimization algorithm it only needs 10 epochs. e training validation process of ResNet152V2 network architecture using ADAS and ADAM optimization algorithms has the accuracy of 94.47% and 93.31%, respectively. To achieve 90% accuracy, ResNet152V2 network needs 30 epochs when using ADAM optimal algorithm while with ADAS optimal algorithm it only needs 20 epochs. For the MobileNetV3 network architecture, the accuracy of the validation process when using the ADAS optimization function is 95.39% compared to 91.21% when using the ADAM optimization algorithm. e convergence speed for using the ADAS optimization function is also much higher than using the ADAM function, specifically to achieve 90%   accuracy, with the ADAM MobileNetV3 function requiring 40 epochs while only 10 epochs are required when using the ADAS function. e VGG19 architecture also has the same results as the above architectures with the accuracy of 94.56% and 92.88%, respectively, with the ADAS and ADAM functions. And also with 90% accuracy, the number of VGG19 network architecture epochs needs to be 25 and 11 when using the ADAM and ADAS functions, respectively. e performance comparison between ADAS and ADAM algorithms is summarized in Table 6. According to this table as well as the above analysis, it is easy to see that the ADAS optimization algorithm has increased the accuracy of the training process; the convergence in the training process also occurs faster. Figure 21 shows the comparison of training time when using 2 optimization functions with the same normalized dataset. Obviously, the model training time when using the ADAS function in most network architectures is faster. Only for ResNet152V2 architecture, the training time with the use of the ADAS function is slightly longer than with the use of the ADAM function. is can also be one of the problems that need to be studied in the future.
(2) Evaluation of F1-Score of Network Architectures Using ADAS Optimization Algorithm. Performing evaluation through F1-score similar to ADAM's algorithm, according to Figures 22-25, the accuracy evaluation through F1-score of network architectures using ADAS optimization function is established as shown in Table 7.
Obviously, when comparing the synthetic results presented in Tables 5 and 7, it is easy to see that the ADAS optimization algorithm has significantly increased the accuracy of the aforementioned models, in which the Mobi-leNetV3 network model gives the highest accuracy of 97.65%. Combined with the results analyzed above, for the problem of brain tumor identification on MRI-T2 images, the ADAS optimization algorithm has significantly improved the accuracy of the training, validation, and testing processes of all the models surveyed in this work as well as shortening the training time of those models compared to the ADAM algorithm.

Comparison of Results.
e performance of the proposed system in our study will be compared with the most recently published studies mentioned above. e results of that comparison are shown in Table 8. Based on this table, it is easy to see that the proposed system gave better results in both accuracy and F1-score than other studies with the same subjects. Obviously, although using the same variants of the DCNNs family, the data normalization and the ADAS optimization function helped to significantly improve the performance of the proposed system compared to those other systems.

Conclusion
is article has focused on deploying the application of artificial intelligence algorithms in classifying brain tumor patients and normal people using human brain MRI images. e dataset used is MRI images of Vietnamese people, including 123 patients and 100 healthy people. e four algorithms that are experimentally compared in the study are DenseNet201, ResNet152V2, MobileNetV3, and VGG19. e experimental results in the study have shown that the normalization of the initial data processing is very important when it has significantly increased the accuracy in classifying and detecting patients as well as reducing the training time of those models. On the other hand, the paper has also shown the efficiency of the ADAS optimization function compared with the very popular ADAM optimization function. In particular, the ADAS algorithm has advantages in comparison with the ADAM function in improving accuracy as well as reducing model training time. Of the four algorithms mentioned above, the MobileNetV3 algorithm is the most efficient. is can be considered as the foundation for implementing the above system in practice. However, the system also has the disadvantage that the dataset is still small. In the future, besides  collecting more data to increase the accuracy of the system, the research will also develop methods to specifically classify those tumor types according to their tumor characteristics (benign or malignant) or by type of disease.

Data Availability
e data used to support the findings of this study are available upon request from the corresponding author.

Ethical Approval
is study was approved by the Ethics Committee of the Radiology Center, Bach Mai Hospital (Vietnam), and Hanoi University of Science and Technology (Vietnam).

Conflicts of Interest
e authors declare no conflicts of interest.