Deep Learning-Based Ensembling Technique to Classify Alzheimer's Disease Stages Using Functional MRI

The major issue faced by elderly people in society is the loss of memory, difficulty learning new things, and poor judgment. This is due to damage to brain tissues, which may lead to cognitive impairment and eventually Alzheimer's. Therefore, the detection of such mild cognitive impairment (MCI) becomes important. Usually, this is detected when it is converted into Alzheimer's disease (AD). AD is irreversible and cannot be cured whereas mild cognitive impairment (MCI) can be cured. The goal of this research is to diagnose Alzheimer's patients for timely treatment. For this purpose, functional MRI images from the publicly available dataset are used. Various deep-learning models have been used by the scientific community for the automatic detection of Alzheimer's subjects. These include the binary classification of scans of patients into MCI and AD stages, and limited work is carried out for multiclass classification of Alzheimer's disease up to six different stages. This study is divided into two steps. In the first step, a binary classification of the subject's scan is performed using Custom CNN. The second step involves the use of different deep learning models along with Custom CNN for multiclass classification of a subject's scan into one of the six stages of Alzheimer's disease. The models are evaluated based on different evaluation metrics, and the overall result of the models is improved using the max-voting ensembling technique. The experimental results show that an overall average accuracy of 98.8% is achieved for Alzheimer's stages classification.


Introduction
Te human brain contains about 86 billion neurons, which are responsible for establishing communication and passing information between diferent parts of the brain [1].A disorder or malfunction of these neurons causes serious brain diseases.Alzheimer's is a progressive neurodegenerative brain disease that results in the death of neurons.It causes a loss of functionality performed by the brain cells.Alzheimer's is characterized by the deposition of protein layers around the nerve cells.Te intertwining of impaired nerve fbers within and outside the brain's nerve cells is the damage associated with the disease.Alzheimer's disease afects the hippocampus area of the brain, and the ventricles of the brain start expanding.Tese changes are used to detect the early stages of disease [2].In the human brain, the cerebral cortex is responsible for logic building, thinking, and dealing with social activities.Te higher stages of Alzheimer's disease cause shrinkage of the cerebral cortex.Because of this shrinkage, a person becomes dependent on his caregivers [3].Te disorder is inevitably fatal.
Te symptoms of Alzheimer's disease can be used to make a diagnosis.However, in some cases, the symptoms of AD remain hidden for about twenty years.Alzheimer's disease has several stages [4], which are as follows: (i) control normal (CN) is the frst stage where no symptoms of the disease are shown.(ii) Signifcant memory concern (SMC) is the next stage, which is characterized by minor memory-related issues that are difcult to detect and are similar to normal age-related problems.(iii) Early mild cognitive impairment (EMCI) stage causes difculty in arranging items and planning new things.(iv) Te distinguishable symptoms of the disease become visible in the fourth stage called mild cognitive impairment (MCI) stage.Here, the patient is having trouble solving simple math-related problems or managing fnancial tasks.MCI ends with the reduction in the brain gray matter volume [5].(v) In the late mild cognitive impairment (LMCI) stage, the person experiences problems remembering details.Tey need help from their guardians to manage their daily tasks.Te patients feel difculty in their surroundings.(vi) In the last stage of Alzheimer's disease (AD), the person becomes unable to interact with his environment.Te last stage often results in a patient's death [4].Te conversion from one stage of AD to another depends on the patient's condition.Te symptoms appearing at any particular stage may not be the same for multiple patients.
Alzheimer's disease primarily afects people over the age of 65.Alzheimer's disease is a leading cause of death worldwide [6].Today, Pakistan ranks as the sixth-most Alzheimer's-afected country in the world.Te number of Alzheimer's cases in Pakistan is 0.15 million to 0.2 million [7].Te US is one of the countries with the highest number of Alzheimer's cases, which is reported to be 5.8 million.Te studies show that Alzheimer's cases in the UK will reach up to 13.8 million by mid-century [8].Currently, the number of afected people worldwide is 47 million [9].Worldwide, Alzheimer's cases are predicted to reach up to 131.5 million by the end of 2050 [7].Since this is a major problem and is afecting a signifcant percentage of society, it is important to devise ways to detect it as early as possible.While Alzheimer's disease cannot be reversed, patients' progression from the MCI to the AD stage can be slowed.Early diagnosis of AD is, therefore, highly desirable to enhance the quality of life of patients.
Te assessment of abnormal brain modifcations related to AD has been made easier through neuroimaging.Different approaches use positron emission tomography (PET), computerized tomography scan (CT), structural magnetic resonance imaging (sMRI), electroencephalography (EEG), single positron emission tomography (SPECT) images, brain scans, blood samples, and cerebrospinal fuid (CSF) biomarkers.Tis research uses functional magnetic resonance imaging (fMRI), which is noninvasive.fMRI is used to measure functional connectivity between diferent brain parts.fMRI uses blood oxygenation levels to detect changes in response to neural activity in the brain.Being able to detect the efect of the slightest body movement on the brain, fMRI is used in this work [10].
Hongfei Wang et al. proposed the use of a 3D DenseNet network together with ensembling techniques for the classifcation of the subject's MRI scan into AD, MCI, and CN stages [11].3D CNN is used where each CNN layer is directly connected with another subsequent layer to increase the information fow.Ten, the result of diferent 3D dense networks is combined using a probability fusion-based ensembling technique.Binary classifcation of AD vs. MCI, AD vs. normal, and MCI vs. normal gives an accuracy of 93.61%, 98.83%, and 98.42%, respectively, while the ternary classifcation of AD vs.MCI vs. normal gives an accuracy of 97.52%.However, the proposed method is not tested for multiclass classifcation of all the stages of AD.Modupe Odusami et al. propose the use of functional MRI for binary classifcation of seven stages of AD classifcation which includes EMCI/LMCI, AD/CN, CN/EMCI, CN/ LMCI, EMCI/AD, LMCI/AD, and MCI/EMCI [12].A modifed version of ResNet-18 is used that uses a dropout of 0.2 to avoid overftting problems.Te proposed model works best for the classifcation of intermediate stages of MCI; however, the classifcation of AD vs. CN is not very high.
Tis research work has used the Alzheimer's disease neuroimaging initiative (ADNI) dataset (https://adni.loni.usc.edu/) for training and testing of the setup.We used data from 142 subjects, having a diferent number of scans in each of the six classes of Alzheimer's disease.Te patients' data were chosen to be between the ages of 55 and 65.Data augmentation techniques (such as fipping, rotation, mirroring, and padding) are applied to get diferent versions of a single image.For fair results, an equal number of samples are taken from each class of the dataset.We employed the VGG-16 as our primary framework and subsequently carried out research using two strategies.Initially, by arbitrarily normalizing the network weights and training the VGG-16 network from scratch, the second method involves employing two transfer learning algorithms while initializing weights from the trained model: (i) by including an additional convolutional layer and adding dropout and (ii) fne-tuning the network's convolutional layers with our dataset.
VGG-16, ResNet-18, AlexNet, Inception v1, and Custom CNN are assembled for multiclass classifcation of Alzheimer's disease.Te results showed that an accuracy of 98.8% was achieved using the max-voting ensembling technique.
Te main contributions of this research work are as follows: (1) We presented a Custom CNN model for the multiclass classifcation of Alzheimer's disease (2) We provide a solution for improving the performance of multiple models for better predicting the performance of multiclass classifcation problems using the ensembling approach (3) We performed binary classifcation of 9

Literature Review
Multiclass classifcation of AD using diferent deep learning models is the main objective of this research.For the purpose stated, the previous work done in this regard is analyzed by grouping it into two categories: (i) work done using deep learning, transfer learning, and artifcial neural network techniques and (ii) work done using machine learning.

Deep Learning and Transfer
Learning.Te method employed in [13] has used the ResNet-18 architecture for multiclass classifcation of Alzheimer's stages.Te method was tested using resting state fMRI on 138 subjects from a publicly available dataset.A combination of two parallel VGG-16 layers called "Siamese" was proposed in [14].To get maximum features from a small dataset of 382 subjects, an extra convolutional layer was added to the VGG-16 architecture.Te resulting SCNN model was designed to distinguish between the early four classes of dementia.In [5], the authors have used the Xception transfer learning architecture and custom-based CNN architectures for the binary classifcation of AD and MCI.Te method applied has used two diferent image modalities for result comparison.A transfer learning-based method that utilizes AlexNet architecture to train models was proposed in [15].It has used both segmented (gray matter, white matter, and cerebrospinal fuid) and unsegmented images from the OASIS dataset for both binary and multiclass classifcation of the early four stages.A comparison of segmented and unsegmented approaches showed that the latter performed best with an accuracy of 92.8%.Another method for AD detection has used a random neural network cluster of fMRI images [16].Te presented technique analyzes fve diferent neural networks (backpropagation (BP) NN, Elman NN, PNN, learning vector quantization (LVQ) NN, and competitive NN) and selects Elman NN as the base classifer for feature selection based on accuracy, which is 92.31%.Finally, 23 abnormal regions of the brain were identifed using signifcant features that were extracted through Elman NN.Tis method is used to differentiate between two classes: healthy control and AD, and it was tested on the ADNI dataset.

Machine Learning Techniques.
A method based on graph theory that used both the sMRI and fMRI datasets to generate input features for support vector machines (SVM) was presented in [17].Te target was to identify patients as having MCI that progresses to AD (MCI-C), MCI-NC that did not transform to AD, healthy control, or AD.Te results were computed using two feature selection algorithms, namely, SFC (sequential feature collection) and DCA (discriminant correlation analysis), and an accuracy of nearly 56% and 49% is achieved, respectively, on the ADNI dataset, with the limitation that it did not cover all the stages of AD.Bi et al. have used multiple SVMs to distinguish mild cognitive impairment (MCI) from healthy controls (HC) [18].Te human brain was partitioned into ninety "regions of interest" (ROIs) that worked for brain functional connectivity, and a specifc template for anatomical automatic labeling (AAL) was used for ROIs.Te technique was tested on an ADNI dataset with data from 93 MCI and 105 HC subjects.Te researchers proposed the use of both rs-fMRI and graph theory for the binary classifcation of MCI and HC.
In [19], a technique based on the combination of features was presented for the multiclass classifcation of AD into three categories, namely, AD, normal individuals, and MCI.
Here, the researchers have used the combination of clinicalbased features such as the Functional Activities Questionnaire (FAQ), together with textual features such as gray matter (GM), white matter (WM), and cerebrospinal fuid (CSF), to generate a hybrid feature vector, which was then extracted using diferent feature extraction techniques.Te proposed method has produced an accuracy of 79.8% for multiclass classifcation but does not cover all the stages of AD.Gupta et al. proposed another feature combination technique in [20].Tis method uses a combination of the shape and texture of the hippocampal, measurements of cortical thickness, and volumetric measurements to classify three diferent stages which are AD, MCI, and healthy control.Te classifcation is done using the linear discriminant analysis (LDA) classifer and tested on T1weighted MRI scans from two diferent datasets (ADNI and AIBL).Recently, another feature fusion technique based on structural MRI was presented in [21].Te given method incorporates three distinct features, namely hippocampal volume (HV), cortical and subcortical segmented areas, and voxel-based morphometry (VBM), into one feature for the classifcation of 326 subjects.It is trained to distinguish healthy individuals from AD patients and MCI subjects.
Recent work done in this feld includes the detection of the early stages of Alzheimer's disease.Te models have been designed to predict the diference between normal individuals (called "healthy controls") and patients with AD (Alzheimer's disease) or to classify multiple stages.Most of the time, the classifcation is done using structural MRI or PET imaging.AD classifcation using fMRI is limited.As fMRI scans provide massive information about brain structure and capture the resting state of the brain as well as provide the cognitive details that are helpful for AD classifcation, this study used functional magnetic resonance imaging (fMRI) to classify Alzheimer's disease.

Dataset Description.
In this research, the Alzheimer's disease neuroimaging initiative (ADNI) dataset (https:// adni.loni.usc.edu/) is used as it provides good-quality images.ADNI is a famous neuroimaging study.ADNI encourages researchers to perform comprehensive analyses with its generic dataset and exchange valid results with other researchers throughout the world.Tis work used data from 142 subjects, with a diferent number of scans in each of the Journal of Healthcare Engineering six classes.Te data of patients aged 55 to 65 are selected.Table 1 contains a detailed description of the subjects.Te dataset contains images from three diferent views, which are axial, coronal, and sagittal.Figure 1 shows a sample image from the ADNI dataset.

Methodology.
Te methodology applied in this study uses a pipeline of medical imaging processing for the classifcation of the input image.Te frst step is to preprocess the dataset to eliminate anomalies and noise, and to make the entire dataset uniform.Te next step is to perform classifcation by extracting features using a Custom CNN.In this study, we performed both binary and multiclass classifcations of AD.Using Custom CNN, a subject's scan is classifed into one of the following classes: AD vs. CN, MCI vs. AD, CN vs. MCI, AD vs. SMC, EMCI vs. AD, LMCI vs. AD, CN vs. SMC, EMCI vs. CN, LMCI vs. CN.Diferent CNN models are used for multiclass classifcation, along with Custom CNN.Te results of diferent models for multiclass classifcation of AD into one of the six possible stages are then ensembled.Figure 2 depicts the detailed methodology process.

Preprocessing Techniques.
A preprocessing pipeline consisting of standardized methods is used for this study.Preprocessing techniques are divided into a number of steps that are sequential in nature.Te dataset used contains fMRI scans in a nifty format.Tis work uses the Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL).
(1) Reorientation.In the frst step, the fles are reoriented.Reorientation is done so that all the images are displayed in the same way when viewed.Tis includes 90 °, 180 °, and 270 °rotation of images about diferent axes.(2) Skull Stripping.Skull stripping is performed to remove cranial or bony parts from scans, including the eyes and neck tissues.Tis is done using FSL-BET, which works on the basis of an intensity value that lies between 0 and 1 as shown in equation ( 1).
Here, I is the input image on which the function f of FSL-BET is applied.Tresholding is used to separate dark pixels (background skulls and cavities) from bright pixels (brain, skin, eyeballs, and facial tissues).
(3) Motion Correction.One of the major problems during fMRI data collection is the participant's motion, which includes shaking the head left or right.Tis badly afects the quality of the data collected.To reduce the efect of the subject's motion, motion correction is performed using FSL-MCFLIRT.It is done by selecting a reference image from the series of all the images and registering each image in turn to this fxed reference.(4) Slice Timing Correction.Te fMRI image consists of diferent slices taken at diferent moments.3D brain volume images can be obtained by stacking 2D slice images anywhere from a fraction of a second to several seconds, depending upon the number of slices and their resolution.Tere are delays during these slice stackings.Slice timing correction is used to correct these delays by temporally aligning all the slices with reference to a time point.To overcome this diference in slice timings, the FEAT module of FSL is used.Here, interpolation is used for the temporal adjustment of voxels and to estimate a single value between the sample points.(5) Spatial Smoothing and Normalization.To decrease the noise level while retaining the underlying signal, spatial smoothing is applied.During this technique, each voxel's intensity is calculated as a weighted average of its intensity and its near points within a set radius.Spatial smoothing is performed using the FWHM Gaussian kernel of size 6 mm.
In order to remove the noise and psychological artifacts introduced due to the subject's motion (such as breathing and heartbeat), temporal high pass fltering is used with a frequency of 0.01.For spatial normalization, the images are registered according to the reference template of 152 MRI scans, which is the MNI-152 template.Tis is done by applying a linear transformation to images using 12 degrees of freedom (DOF).Tis task is performed using the FSL FLIRT module. (

Results
Ensembling Output  Journal of Healthcare Engineering (8) 3D to 2D conversion.Te above-mentioned preprocessing steps result in 64 × 64 × 48 × 140 fMRI scans, with each scan including 64 × 64 3D-48 volumes per scan (a total of 140 scans).On average, one fMRI scan contains about 48 volumes, which results in 48 slices for each fMRI.Te frst and last 10 slices are removed for each scan as they contain no functional information and are just black.Each slice is then converted into 2D along the image height and time axis.Tis is useful as neural networks work well with 2D images.Figure 3 shows the process by which each slice of an image is saved as a separate layer in PNG format.
To get good classifcation results, a balanced dataset is necessary.Data augmentation techniques such as fipping, rotation, mirroring, and padding are applied to get diferent versions of a single image.For fair results, an equal number of samples are taken from each class of the dataset.Te total number of images acquired for each class before and after data augmentation is shown in Table 2.

Classifcation Stage.
Diferent deep-learning models are used for classifcation.Te input image is of size 64 × 64 and is in grayscale for all the models, which are described as follows: (1) VGG-16.Visual geometry group, or VGG, is the famous convolutional neural network that has different types of layers, which include convolutional layers, pooling layers, and fully connected layers.Te inclusion of features in an input image is defned by convolutional layers.To get the exact features of the input image, the downsampling of feature maps is done using pooling layers.Te pooling layer works independently on each function map to construct a new collection of the same number of pooled function maps.We have used VGG-16, which has 16 layers in total.Te network is composed of a stack of 13 convolutional layers with three fully connected layers.It uses a flter of size 3 × 3 in each convolutional layer with stride 1 and an activation function called rectifed linear unit (relu).To reduce the feature maps, max pooling is performed on few convolutional layers with pool size 2 × 2 and stride 2.
Te results are then fattened and passed through two fully connected layers having 4096 channels each, followed by a softmax activation layer having six output neurons for six diferent classes.Te detailed architecture of the VGG-16 model used for Alzheimer's classifcation is shown in Figure 4. (2) ResNet-18.As the network starts increasing in depth, there exists a problem of accuracy degradation or vanishing gradient.To solve this problem, ResNet-18 was introduced.It difers in the sense that it uses skip connectors to connect the output of the previous layer to the next layer.Similarly, the network can anticipate which feature it was studying before with the feedback applied to it if we skip the input to the frst layer of the model to be the output of the last layer of the model.In general, the inputs are skipped after every two convolutions.We have trained the model using ResNet-18, which has 17 convolutional layers and 1 fully connected layer.Te network uses a kernel of size 3 × 3 with stride 1. Te layers work with the same flter size as long as the output feature maps have the same dimensions and are doubled by halving the output feature map.Te output of layers is passed to the average pooling layer with pool size 8, followed by a fattening layer that fattens the results.
Figure 5 shows the architecture diagram of ResNet-18 used for the classifcation of Alzheimer's disease.
(3) Alex Net.Tis network uses 8 layers, including 5 convolutional layers and 3 dense layers.It has the ability to multi-GPU train by allowing half of the neurons to be trained on another GPU.Te addition of dropout and LRN (local response normalization) distinguishes this network.Te architecture of the model used for Alzheimer's classifcation is provided in Figure 6.Te input image is convolved with 96 flters of size 11 × 11 followed by a max pooling layer of pool size 2 × 2. Similarly, in the second convolutional layer, 256 flters of size 5 × 5 are convolved with 32 × 32 input images.Each convolutional layer is followed by a max pooling layer except the last two, which are stacked.In AlexNet, the neurons of one layer are connected to all the neurons of the next layer via three dense layers, as shown in Figure 6.Finally, the output is classifed using the softmax function, which sums the probability of all outcomes to 1. (4) Inception v1/GoogLeNet.Inception v1 or GoogLe-Net difers in that it broadens rather than deepens the network and is characterized as "sparse" architecture.Instead of having the same flter size, it works by having a diferent-sized kernel that operates on the same level.Here, convolution is performed with 1 × 1, 3 × 3, and 5 × 5 flters with max pooling using ×3 and stride 1. Te output of all flters is then concatenated for the next layer.Figure 7(a) shows the naive version of inception.To reduce the computational cost of the network, the images are convolved with 1 × 1 flters before convolving with other flters.Tis helps reduce the dimensions of feature maps.Journal of Healthcare Engineering the network with our dataset.In this approach, we used the weights of the VGG-16 network as the base point for fne-tuning the layers.Figure 8 shows the architecture of the Custom CNN used for Alzheimer's disease classifcation.

Ensemble.
Ensembling is used to reduce the variance involved in deep learning models for predicting the output of Alzheimer's stage classifcation.During the training phase, the model learns a distinct set of weights, which in turn produce diferent outputs.To overcome this variation in output, several models are trained, and the results are combined for the fnal prediction.Diferent methods of ensembling are used for this research work, which are as follows: (    8 Journal of Healthcare Engineering For Alzheimer's disease classifcation, ensembling of VGG-16, ResNet-18, AlexNet, Inception V1, and Custom CNN is performed as shown in Figure 9. Ensembling of diferent models used for AD classifcation is tested for stacking, blending, averaging, and max voting.Te outcomes are presented in the next section and evaluated on the basis of well-known metrics.

Results and Discussion
Tis work aims to classify Alzheimer's patients into one of the six stages by training diferent models.For this purpose, we trained VGG-16, ResNet-18, AlexNet, Inception V1, and Custom CNN on the ADNI dataset.A layer-wise outcome for VGG-16 execution on the ADNI dataset is shown in Figure 10.
Te performance of the trained models is evaluated using diferent evaluation metrics, for which the following terms are used: True Positive (TP).Te number of cases where the model predicts the stage of Alzheimer's disease correctly and it is true in actuality.False Positive (FP).Te number of cases where the stage of Alzheimer's disease that the model predicts is not real in actuality.True Negative (TN).Te number of cases in which the model predicted that a specifc Alzheimer's stage was invalid and that stage was, in fact, invalid.False Negative (FN).Te number of cases in which the model predicted that a specifc Alzheimer's stage was invalid but that stage was actually valid.
In the context of the above-described values, diferent evaluation metrics used for Alzheimer's disease classifcation are as follows: Accuracy.Accuracy is the measure of how accurately the model predicts various stages of Alzheimer's and is defned as follows:

Accuracy(A) �
True positive + true negative Number of samples in ADNI dataset .
(    Figure 11 shows the comparison of diferent models based on loss and accuracy.Te main target is to reduce the loss while obtaining maximum accuracy.Te graphs of various models show that accuracy generally increases with increasing epochs and loss decreases.Small fuctuations are because models continue to learn about the exact features of the training set.Te models produce random guesses about disease stage prediction.
Next, we have used the ensemble technique to make fnal predictions.We present the results for stacking blending, averaging, and max-voting-based predictions as shown in Table 5.For the underlying problem, the results show that an accuracy of 94.3% is achieved with stacking, 92.5% with blending, and 97.9% with averaging and 98.8% accuracy is achieved using the max voting.Hence, the best results are

Comparison with State-of-the-Art Research Work.
In this research work, two diferent approaches are adopted.In the frst step, binary classifcation is performed.We divided the AD stages into nine groups as given in Table 3.For multiclass classifcation of the subject's scan into one of six possible stages, an ensembling approach is used.Table 6 shows the comparison of existing work done for Alzheimer's disease classifcation that has used an ensembling approach.A deep learning-based ensembling technique was presented by Loddo [22].Tis study performed a four-class classifcation of dementia stages using the AlexNet, ResNet-101, and Inception ResNetV2 models on several diferent datasets.Te average ensembling technique was used on each selected dataset, and the maximum overall accuracy achieved was 98.24%.
Another study presented the use of deep learning models for AD classifcation [23].Tey used AdaBoost as an ensembling technique for merging the results of GoogleNet, ResNet, and DenseNet.Te technique was able to perform binary classifcation of the AD vs. HC and MCI vs. HC stages with an overall accuracy of 93%.Similarly, a three-stage classifcation of AD into AD vs.MCI vs. CN stage using several DenseNet models was performed in [11] that used the probability fusion method for ensembling purposes, and the maximum accuracy for three class classifcations using this approach was 97.52%.Karwath et al. have used the majority voting technique for predicting the best outcome of diferent classifers [24] which performed a binary classifcation of AD vs. healthy and mild MCI vs. severe MCI with an accuracy of 91% and 85%, respectively.Te proposed work in this study uses diferent deep learning models, and the results of these models are then ensembled using different techniques, out of which the max-voting technique has performed best.Previous work done in this feld includes ensembling approaches for less than six-stage classifcation.To our knowledge, this is the only study that has used the ensembling technique for the six-stage classifcation of AD.

Conclusion and Future Work
Using fMRI scans, this study presented a technique for categorizing Alzheimer's disease into six stages.Te Alzheimer's disease neuroimaging initiative (ADNI) dataset has been used for training the classifer on fMRI images of the patients to identify the six stages of Alzheimer's disease.Te data of patients aged 55 to 65 are chosen, and data augmentation techniques are applied to obtain diferent versions of a single image.An equal number of samples are drawn from each dataset class to ensure fair results.Tere are nine diferent groups of AD stages, and binary classifcation with Custom CNN is applied to classify scans of subjects into one stage.For multiclass classifcation of AD, the results of VGG-16, ResNet-18, AlexNet, Inception V1, and Custom CNN are combined.Te results show that the max-vote ensembling technique achieves 98.8% accuracy.As deep learning models are expected to bring breakthroughs for medical image diagnosis, the techniques used in this paper can be cross-validated using another neuroimaging dataset.Te work can be extended by increasing the dataset and validating it using recurrent neural networks.Furthermore, report generation functionality can be added, which makes it easy for the common man to read and understand the reports.

Figure 2 :
Figure 2: Architecture diagram of the methodology used for AD classifcation.
1) Stacking.In this technique, the output of multiple models is used to build a new model, which is then used for fnal output classifcation.Tis method operates by enabling a training algorithm to ensemble multiple similar learning algorithms' predictions.It involves the use of the complete training dataset for predicting the model output.Tis research uses diferent deep learning models (discussed above) as the base models and uses the predictions of these models to create further layers of models.Te results of these layers are then combined to produce the fnal result.(2) Blending.Tis technique is similar to stacking, with the diference that the complete training data is not used for training the base model.Instead, a small portion of training data is used for training the base model, and test data are used for making predictions.(3) Averaging.In the averaging method, several predictions are made for each class by the models, and the fnal outcome is calculated by fnding the average of the model outcomes.(4) Max Voting.Tis technique uses the count of the maximum vote made by each model for the predicted class.Each base model predicts and votes for each sample during max voting.Te fnal prediction class only contains the sample class with the most votes.In this study, the output of all the deep learning models used is fnalized using the maxvoting technique.

Figure 8 :
Figure 8: Custom CNN architecture used for AD classifcation.

Figure 11 :
Figure 11: Loss and accuracy graph of diferent models used for AD classifcation.
Histogram Equalization.Histogram equalization is done to enhance the contrast of images.However, this method is suitable when the image has a nearly similar distribution of pixel values.In this study, diferent MRI scans are used, and the scans have varying pixel distribution values.(7) Contrast Limited Adaptive Histogram Equalization.Tis technique is used to obtain high-quality and clear images.Tis technique works by creating separate histograms for diferent regions of an image.In this work, the image is divided into a grid of size

Table 1 :
Description of the dataset used for AD classifcation.Te bold values show the total count of subjects taken and the respective number of fMRI images for performing the experiment in this research work.

Table 2 :
Dataset description before and after preprocessing.
Te bold values show the count of images before and after applying data augmentation techniques.Figure 5: ResNet-18 architecture used for AD classifcation.
) Table3displays the binary classifcation of the subject's scan for nine diferent groups: AD vs. CN, MCI vs. AD, CN vs. MCI, AD vs. SMC, EMCI vs. AD, LMCI vs. AD, CN vs. SMC, EMCI vs. CN, and LMCI vs. CN.Custom CNN was used to achieve these results.Te result of the binary classifcation of diferent stages shows that our Custom CNN was able to classify the AD vs. CN stages with a high accuracy of 99.6%.Tis is due to the fact that there are great visible diferences between the subject's scans for these two classes.Similarly, MCI and AD are two prominent classes that have signifcant structural and functional variations.Te Custom CNN model was able to get maximum features from these two classes of the dataset and was thus able to obtain an accuracy of 99.4%.A comparison of control normal and healthy individuals with MCI stage reveals that the model accurately classifed MCI patients up to 99.8% of the time.For other intermediate classes such as SMC, EMCI, and LMCI, their binary classifcation accuracy is not as high as that of classes such as CN, MCI, and AD.4.2.Multiclass Classifcation Results.Te comparison resultsof diferent models for predicting Alzheimer's disease stage are shown in Table4.Te performance metrics for several diferent image classifcation models are found as follows.VGG-16 has an accuracy of 96.2%, precision of 91.4%, recall of 89.9%, specifcity of 90.4%, and F1 score of 93.7%.ResNet-18 has an accuracy of 87.5%, precision of 85.4%, recall of 88.6%, specifcity of 84.3%, and F1 score of 86.5%.AlexNet has an accuracy of 91.4%, precision of 88.5%, recall of 86.5%, specifcity of 84.8%, and F1 score of 85.8%.Inception v1 has an accuracy of 88.6%, precision of 89.3%, recall of 90.3%, specifcity of 88.2%, and F1 score of 90.1%.Custom CNN has an accuracy of 96.2%, precision of 91.4%, recall of 94.8%, specifcity of 93.6%, and F1 score of 95.3%.In general, higher accuracy, precision, recall, and F1 score indicate better performance of the model.Results show that Custom CNN performs better among all the models for the underlying problem due to better training of weights for this particular dataset.Te results of all the models are ensembled using the max-voting technique, which produces an overall accuracy of 98.8% for Alzheimer's disease stage classifcation.

Table 3 :
Evaluation metrics results of diferent models on the ADNI dataset.

Table 4 :
Evaluation metrics results of diferent models on the ADNI dataset.

Table 5 :
Comparison results of diferent ensembling techniques.

Table 6 :
Comparison of state of the methods for AD classifcation.