Prediction of Alzheimer’s Disease Using DHO-Based Pretrained CNN Model

Detecting Alzheimer’s disease (AD) early on allows patients to take preventative measures before the onset of irreversible brain damage, which is a critical factor in the treatment of Alzheimer’s patients. Most machine detection methods are constrained by congenital observations, although computers have been utilized in several recent research studies to diagnose AD. In AD, the hippocampus is usually the frst part of the brain to be afected. Structural magnetic resonance imaging (SMRI) can be used to assist in diagnosing AD by measuring the hippocampus’s form and volume (MRI). Te information encoded by these attributes is restricted and may be afected by segmentation problems. Tese traits are also extracted independently of the classifcation, which could result in lower-than-desired classifcation accuracy. Researchers in this study used structural MRI data to develop a deep learning framework for combined automatic hippocampus segmentation and AD categorization. Multi-task deep learning (MTDL) is used to learn hippocampus segmentation simultaneously. Te hyperparameter optimization of the CNN model (capsule network) for illness classifcation is then carried out using the deer hunting optimization (DHO) technique. ADNI-standardized MRI datasets have been used to test the suggested method, and it is accurate. Suggested MTDL achieved 97.1% accuracy and 93.5% of Dice coefcient, whereas the proposed MTDL model achieved an accuracy of 96% for binary classifcation and 93% for multi-class classifcation.


Introduction
Alzheimer's disease is a brain ailment that gradually impairs thinking and memory abilities as well as the capacity to do even the most basic tasks. An intracellular protein called cAMP-response element binding protein (CREB) controls the expression of key genes in dopaminergic neuron [1]. Te shared form of dementia, AD, poses a signifcant test to healthcare providers in the twenty-frst century. In the United States, 5.5 million people who are 65 years and older have AD, making it the sixth greatest mortality [2]. In 2018, the total cost of controlling AD in the United States was $277 billion, with a signifcant impact on the broader economy and a strain on the country's healthcare system [3]. In the absence of a treatment that has been proven to alter the course of the disease, a considerable deal of work has been put into developing procedures for early identifcation, particularly in presymptomatic phases [3]. Advances in neuroimaging techniques, including MRI and PET, have been utilized to discover AD-related structural and molecular biomarkers [4]. Brain imaging technology has progressed at an incredible rate, making it difcult to incorporate enormous amounts of high-dimensional multimodal data. Computer-aided machine learning methodologies for integrative analysis have become increasingly popular as a result. AD progression can be predicted using well-known pattern analysis approaches such as LPBM, logistic regression, and support vector machine (SVM) [5].
Preprocessing or architectural design is required to use these machine learning techniques [6]. Dimensionality reduction is a common aspect in machine learning classifcation investigations, as is the extraction and selection of features, as well as the selection of classifcation methods based on features. Tese techniques necessitate a high level of specialized knowledge and may take a long time to optimize through numerous phases. A problem has arisen in the reproducibility of these methods [7,8]. Neuroimaging modalities can be used to pick AD-related features in the feature selection process, brain glucose metabolism, and amyloid buildup in research regions (ROIs), such as the hippocampus, such as mean subcortical volumes, densities of grey matter, and cortical thickness [9,10].
It is becoming more and more common for large-scale medical imaging analysis to use "on-the-fy" deep (or "onthe-fy") learning to generate features from raw neuroimaging data [11]. Deep learning techniques for AD diagnosis are based on short MRI datasets, which makes it difcult for researchers to build deep CNN models with a signifcant number of parameters that must be learned [12,13].

Problem Statement.
Hippocampal analysis methods now in use have several faws. First, precise segmentation of the hippocampus is required for both hippocampal volumetric and shape analyses. Te hippocampus is difcult to correctly segment because of its irregular shape and unclear boundary in MRI. Handcrafted shapes may not be suitable for examination in the future, afecting categorization performance in the diagnosis of illnesses. According to a third study, the hippocampus alone may not be sufcient to distinguish mild cognitive impairment (MCI) patients from healthy controls. In the early stages of AD, both the amygdala and the para-hippocampus are also afected by the condition. As the last point, MRI images taken from the hippocampal region can be very helpful in the diagnosis of AD.

Contribution.
Machine learning/deep learning algorithms have been used to detect biomarkers and interpret illness aetiology in recent years. Detecting AD can be done in a variety of ways, including analyzing MRI images for specifc areas of interest (ROIs). Te hippocampus is an essential anatomical region in the pathogenesis of AD since it is one of the frst brain ROIs to be impacted. A new deep learning framework combining an MTDL model and an MTDL model for simultaneous hippocampus segmentation and illness organization using MRI data is suggested to address the aforementioned issues listed in the problem description.

Related Works
Faisal and Kwon's goal [14] was to design a deep learning system that could extract useful AD biomarkers from physical MRI and classify brain pictures into AD, MCI, and CN groups. In this study, researchers used ADNI datasets available online to train CNNs on MRI brain pictures. It was used to merge features from multiple into compact highlevel features by using our proposed process. Using the proposed method, computation time is lowered because there are fewer variables to deal with. Comparative evaluations of our suggested convolution operation vs. the most extensively used AD classifcation metrics, such as accuracy and area under ROC curve (AUC), are performed.
Early detection of various phases of cognitive impairment and AD utilizing neuroimaging and transfer learning (TL) was the emphasis of Shanmugam et al. [15]. Images from ADNI's database with varied CN, early mild cognitive impairment, moderate cognitive impairment, and late-MCI as LMCI classifcations are classifed using transfer learning. Tere are three pretrained networks utilized in this categorization that have been trained and evaluated on 6000 photos from the ADNI collection. Confusion matrices and their properties are used to evaluate the classifcation presentation of the three networks. GoogLeNet, AlexNet, and ResNet-18 all have an overall accuracy of 96.39%, 94.08%, and 97.51%, respectively, in detecting Alzheimer's disease. Confusion matrix parameters were also used to examine the pretrained networks' performance within classes.
Tere are numerous techniques to utilize deep learning classifcation to categorize Alzheimer's disease, according to Samhan et al. [16]. In large trials, adopting this method will result in better patient care and lower costs. Python was used in the development of the system, which is particularly useful for doctors in the classifcation of AD. 70% of the image was used to train the model, and 30% was used to verify it. On a series of held-out tests, our trained model was 100% accurate.
As a potential tool for identifying people with ADrelated dementia, Tian et al. [17] investigated the retina, specifcally the retinal vascular system. Adding a saliency analysis on top of the high level of classifcation accuracy helps make this pipeline easier to understand. Saliency study shows that retinal images with small vessels provide more information for Alzheimer's disease diagnosis than images with large vessels.
To classify this chronic condition as AD, Divya and ShanthaSelvaKumari [18] employed several feature selection strategies and distinct classifers. When the number of records with large dimensions is few, it is much easier to classify those records. Tey yielded accuracy rates of 968.22%, 89.59%, and 90.40% after several attempts to pick the best features. SVM with radial basis function kernel yielded these higher accuracy rates. In the MCI/AD classifcation, a 2.7% improvement in the MMSE score was seen, but it had no impact on the NC/AD and NC/MCI classifcations.
"Te wisdom of experts" can be harnessed by using An et al.'s deep ensemble learning framework [19] to integrate multi-source data. Training two sparse autoencoders for feature learning at the voting layer helps to minimize the connection between characteristics and diversify the base classifers. Classifers are ranked using a deep belief network that uses a nonlinear feature-weighted algorithm at the stacking layer, which may violate conditional independence. As a sort of meta-classifer, the neural network is employed. To deal with a cost-sensitive issue, oversampling and threshold shifting are employed at the optimization layer. An ensemble of probabilistic predictions is combined with a similarity computation to produce optimized forecasts. Alzheimer's illness is classifed using the new deep ensemble learning framework. Our proposed framework outperforms six well-known ensemble techniques, including the classic stacking algorithm, in classifcation accuracy tests using clinical data.
Densely linked convolutional neural networks with connection-wise attention mechanisms were proposed by Zhang et al. [20] to learn the properties of brain MR images for AD classifcation. Pictures are preprocessed using a dense CNN, which extracts multi-scale features, and a connection-wise attention mechanism is utilized to integrate connections among features from diverse layers to turn the MR images into more compact high-level features. MRI's spatial information can be captured by extending the convolution operation to 3D. All of the previous layers' features were combined with those from the 3D convolution layer in various ways before being used to classify the data. Based on baseline MRI scans of 968 ADNI database participants, Te authors tested the technique to distinguish between AD and healthy patients, MCI converters and healthy subjects, and MCI using MCI scans.

Challenges in Brain MRI Segmentation
(i) Brain structural structures difer greatly among individuals due to genetics, age, gender, and illness. Using a single segmentation algorithm across all phenotypic subgroups is problematic. (ii) For example, it is difcult to deal with cytoarchitectural changes such as the thickness of tissue, the depth of the sulci, and smooth boundaries between tissue types. Tis might lead to a muddled categorization of various tissue types. Even human professionals have difculty with this. (iii) Tese modalities have a low contrast of anatomical structure, which leads to poor segmentation performance. (iv) Manual segmentation is tedious and subjective and requires a deep understanding of brain anatomy to perform. Tus, it is challenging to acquire sufcient data for creating a segmentation model. (v) In an ordinary image for segmentation, the noisy backdrop makes it difcult to apply an appropriate label to each pixel/voxel with learned characteristics. (vi) In addition to its tiny size and volume, the hippocampus is one of the most important biomarkers for AD because of its structural heterogeneity, partial volume efects and low contrast, and low signal-to-noise ratio.

Proposed Model
One of the ways to diagnose AD is represented in Figure 1.
Te MRI slices must be obtained initially. Preprocessing removes irrelevant information from the data and reorients them so that they can be interpreted more easily. Te preprocessed data are segmented using deep learning to retrieve the properties from the brain MRI. For example, a classifer uses parameters like the patient's body surface area, the center of gravity, intensity, and standard deviation to determine whether he or she is developing AD or not.

Dataset Description.
MCI and early-onset Alzheimer's disease can be tracked using MRI, PET scans, and other biomarkers as part of the Adverse Childhood Neuropsychiatric Disorders Initiative (ADNI). Written informed consent for the collection of imaging and genetic samples was signed by the subjects at the time of enrolment and approved by the Institutional Review Boards (IRBs) at each participating location. A total of 449 participants were randomly selected to participate in the study. MMSE stands for Mini-Mental State Examination, and CDR stands for Clinical Dementia Rating. For 1.5 T MR imaging, we used images obtained by the ADNI acquisition method [21]. Image acquisition procedures are explained in greater detail on the ADNI website. Images are resized to the dimension of 11 cubic mm to ft on a single sheet of paper. As a result of this treatment, their skulls were scraped and their cerebellums were removed. Te FMRIB Software Library (FSL) 5.0 from https://fsl. fmrib.ox.ac.uk/ was utilized in this project and used a template image with 12 degrees of freedom and a set of evasion parameters to align all MR pictures.
ADNI participants' demographic and clinical information is shown in Table 1 (mean standard deviation). AD, mild cognitive impairment, and normal control are all referred to as "AD," "MCI," and "NC," respectively.

Preprocessing.
Nonlinear gradients in a picture can distort an image using a method called Gradwarp [22]. Gradient models have a diferent kind of nonlinearity. Te geometrical features of an image can be tweaked to improve its information. B1 nonuniformity is used to rectify image color and intensity information because of mishandled radio frequency transmission. N3 bias feld correction corrects the distortion caused by dielectric efects during acquisition [23]. Although N3 bias feld correction is used for 1.5 T images to improve the nonuniform gradient in the image, these efects are widespread in 3 T machines. Before the N3 correction, Gradwarp and B1 corrections have been applied.
Image segmentation has been used in the literature to improve classifcation accuracy [24,25]. Tese images were preprocessed with the use of the segmentation module of statistical parametric mapping (SPM) available at http:// www.fiion/ion-ucl/spm. To map MRI scans onto tissue probability maps, SPM uses these maps to extract the mapped regions. Te MRI scan is segmented into three parts using bias correction and normalization in this module. Te output of mapping can be linked to the orientation of a picture using a process called registration. Brain registration is all about minimizing the impact of external elements like the scalp on the segmented pictures of the cerebral cortex that are generated.

Segmentation Using MTDL for Joint hippocampus.
In the human brain, there is a small area known as the hippocampus in the medial temporal lobe. Te hippocampus contains a disproportionately small number of voxels compared to the rest of the brain, resulting in a very unbalanced dataset. After preprocessing and registration, the next step is to create 3D image patches with hippocampusspecifc bounding cubes. Te 3D axes of the bounding cubes are used to extract 3D patches from MR images. It is important to consider the size of the bounding cube when determining how the hippocampus is segmented. A large bounding cube may also lead to the class imbalance problem, increasing the computation time. Small bounding cubes can impede the segmentation of the hippocampus. An empirical study found that a voxel bounding cube of 64 × 48 × 64 voxels was the optimal size for the trade-of. Te patches form the basis of our deep learning model for segmenting the hippocampus and classifying illnesses.
Jointly learning hippocampus segmentation and an illness classifcation is a novel approach that difers from standard methods in which these two tasks are performed separately. To classify images and identify objects, researchers frequently utilize CNNs. V-Net, a volumetric and complete CNN for prostate segmentation in MRIs, has been proposed. Tis is a multi-task deep CNN model for joint hippocampus segmentation and illness classifcation inspired by the success of V-Net in prostate segmentation.
Residual functions are learned at convolutional stages using a deep CNN, which aims to achieve fast convergence. "ResNet Block 1" and "ResNet Block 2" are two residual blocks, each consisting of 3D convolution, batch normalization (BN), parametric rectifed linear unit (PReLU) activation, and dropout layers, as illustrated in Figure 2. Te input is added to the output of the second convolutional layer to learn a residual function in ResNet Block 1. For each block, the input is added to the outputs of both convolutional layers for a residual function, which is learned in Block 2 of ResNet. Tere are batches of MRI data that are used to train the kernels. Fast inference is easier to achieve with small kernels since there are fewer parameters to train. More complicated patterns and greater expressiveness can be learned by larger kernels. Layers of tiny kernels can be stacked to generate this appearance. For all convolutions, the kernel size is fxed at 3 × 3 × 3. A nonlinear PReLU activation is used to activate the learned flters, and a feature map is then constructed for each one.

Mathematical Problems in Engineering
Downsampling is used to minimize the size of feature maps and improve the receptive feld of features in the following layers during the compression stage. Using convolution with kernels of size 2 × 2 × 2 and stride 2, it is implemented. A volumetric segmentation mask is generated by expanding the spatial support of the lower resolution feature maps during the decompression step. Te 222 and stride 2 kernels are used for the upsampling via deconvolution. For the probabilistic segmentation of the hippocampus regions, the outputs are transformed to voxel-wise softmax by applying a convolutional layer with a 111 kernel and stride 1. As the last step, the probability output is converted into a binary mask by setting the threshold to 0.
Optimizing the Dice loss function, which measures how well our model can separate hippocampus voxels from the background, is the goal for subject m's hippocampal segmentation: If the numerator is zero, a little number is denoted to avoid the numerator from being zero. Tis is done by using the segmentation prediction (p i ) and the ground truth label (qi). If the number of foregrounds and background voxels is sufciently unbalanced, the Dice loss function can be utilized. Fully linked layers are used as decompression components to increase classifcation accuracy. Comparing the predicted and actual labels for subject m, we utilize the categorical cross-entropy loss.
Te total number of subjects is M; y m and y m are the ground truth label and the anticipated label for subject m, respectively. Losses in hippocampus segmentation and illness classifcation training are taken into account by weighting the parameter a � [0, 1]. Classifcation is more critical than segmentation in the early stages of training for a multi-task deep CNN model. Initial warm-up emphasizes segmentation by setting a value of 1 for a. After that, it goes down to 0.5 for training in multi-tasking. Finally, a is set to 0 so that the classifcation process can take precedence. Te Adam approach is utilized to jointly optimize the multi-task network model, and a backpropagation algorithm is used to calculate the network gradients.
After correcting the hippocampal segmentation fndings, the hippocampal image patches are shown. Before and after manual corrections, the mean, standard deviation, and range of hippocampus volumes are shown for several groups of participants. After adjustment, we can see that the mean and  SD of hippocampus volume have decreased. For AD, MCI, and NC patients, Figure 3 depicts the scatterplots [21,22].

Classifcation.
Sabour et al. [25] were able to overcome the limitations of CNN by employing a higher-dimensional vector known as a "capsule" to represent an entity rather than an individual neuron. Te properties of a specifc entity portrayed in an image are refected in the neuronal activity of the active capsule. Tese features, including the likelihood and a set of parameters such as albedo (color), hue (texture), or deformation (deformation), are taught to a capsule for each visual item. An entity's attributes and the likelihood of existence are represented in CapsNet's input and output as vectors with direction and norm. Te model is used to improve forecasts of AD by predicting a high-level capsule's instantiation parameter over a conversion matrix by employing similar levels of capsules. Te natural logarithm base, e, is used to defne the spiral shape as a constant. To evaluate it, one may use It is used to save the best solutions and boost the position of a separate search agent, for example, using P → s (x). Te DHO presented here begins with a random sample of the population. Te search agent might move closer or farther away from the ideal search agent as it iterates. To ensure that the shift from exploitation to exploration goes well and it is in control, the DHO becomes a global optimizer when it has a strong exploitation and exploration capacity.

DHO-Based Hyperparameter Tuning.
A new metaheuristic DHO approach based on deer hunting was developed by a group of hunters for the tuning of hyperparameters. Hunters employ a variety of strategies to surround and approach the deer as closely as possible when hunting it. Deer position and wind angle have to be taken into account when using this technique. Another crucial element of successful hunting is a sense of camaraderie among the participants. Following their successor and leader, their fnal goal is achieved. Te graphic below depicts the model's goal function: When it comes to weight loss, the DHO method relies on the deer's unique abilities to elude hunters. A haphazard gathering of hunting vectors catalyzes the process. It is described using the following equation: Tere are two ways to express how much population (or "weight") a hunter has when optimizing his strategy. Next, important elements like weight, position, and wind angle are used. Because the entire search area is considered a circle, it is possible to defne the wind angle as the circle's diameter.
where a stands for the arbitrary value within the range = [0, 1] and J stands for the current iteration. Te location propagation for optimization with the leader position (X l ) and succeeding position (X s ) is provided. Te placement of the following weights is determined by the successor location, whereas the primary location of the hunter is determined by the leader location. X l is used to spread the message. Everyone tries to reach the optimal location after establishing an optimal location. To begin updating the location, we simulate the surrounding behavior as shown below: Te current iteration's location is designated as X j , whereas the location for the next iteration is designated as X j+1 . Tis process is aided by the Z and K coefcient vectors. If wind speed is taken into account, an arbitrary value of p can be generated, and this number ranges from 0 to 2. Te Z and K coefcient vectors can be estimated using the expressions below: where j max is the maximum iteration. In addition to the range [0, 1], the value of the b variable ranges from − 1 to 1. (X, Y) is the initial location of the hunter, which gets upgraded based on the location of the prey. X b and Y b are recalculated using the Z and K coefcient vectors. When the value of p is less than 1, a position update procedure takes place that allows the hunter to move in any direction without regard to the angle. Transmission utilizies a slanted inclination. Search space is expected to expand as a result of the angle location updation. Te angle of the hunter's position is critical to the success of the hunting strategy. To put this into action, consider Te ideal position can be shown as B = (j + 1), X (b j), and p, where p signifes the arbitrary values. Te angle location is opposite to the individual location, so the prey does not have any sense of the hunter's presence via the successor location. Te vector K is shown within the encircling behavior in the exploration. K values are frst considered to be less than 1 to perform an arbitrary search. As a fnal point, a successor location is used instead of the best possible location in the location updating method. As a fnal step, a worldwide search is conducted.
Site updates are carried out so that an ideal location can be found (namely, termination condition). By optimizing the weight parameters of the pretrained CNN model, it is efectively used to identify whether the patient is AD or normal, and the multi-class classifcation of AD is also performed.

Results and Discussion
To create the segmentation and classifcation model, a highlevel neural networks API with Tensorfow as the backend was employed. Keras was used because of its ease of use and ability to run on a GPU.

Evaluation Metrics.
Our method's segmentation and classifcation performance is assessed using the challenge evaluation measures such as accuracy (AC), Jaccard index (JSI), and Dice coefcient (DSC) in segmentation analysis. AC, specifcity (SP), and sensitivity (SE) are all part of the classifcation's evaluation process. Te criteria for evaluating performance are laid forth as follows:  Figure 3: Hippocampal image patches: (a) without patches segmentation labels, (b) patches overlaid with segmentation labels before manual correction, and (c) segmentation tags added to the patches following manual adjustment. Subj1, Subj2, and Subj3 are three people from each of the three diferent study groups which were chosen at random [21,22].
where tp, tn, fp, and fn denote the number of true positive, true negative, false positive, and false negative.

Comparative Analysis for Proposed Segmentation (MTDL).
In this section, the proposed model is compared with existing techniques such as fuzzy c-means (FCM), adaptively regularized kernel FCM (ARKFCM), and fast and robust FCM (FRFCM). Table 2 and Figure 4 provide the experimental analysis for MTDL with existing models [26].
In Table 2, the analysis represents the validation results for diferent segmentation techniques. In the frst method, FCM achieved an accuracy of 84.8% and the next FRFCM achieved an accuracy of 92.6%, and this accuracy performance is better than FCM. ARKFCM reached the accuracy percentage of 96.5%. Finally, the proposed MTDL reached a better accuracy of 97.1% and achieved better performance than other methods. In the analysis of DSC, FCM achieved 82.1%, FRFCM achieved 91%, ARKFCM achieved 92% and the proposed model achieved 93.5%. Finally, JSI is high for MTDL (i.e., 87.8%) compared to existing FCM models (69% to 85%).

Comparative Analysis of Proposed Classifcation.
Two types of analysis such as binary classifcation (normal or AD) and multi-class classifcation (AD/MCI/NC) are carried out, where Table 3 and Figure 5 show the experimental analysis of the proposed classifer with existing techniques. For better performance, all techniques are implemented with DHO. Table 3 represents the comparative analysis of binary classifcation of diferent models such as RNN, recurrent neural network, and CapsNet. Te classifer model of the recursive neural network reached a sensitivity of 93.00% and an accuracy of 90.00%. Te recurrent neural network model reaches an accuracy of 91.00%. Finally, the CapsNet model reaches the accuracy of 96.00%. In this comparative analysis, the CapsNet model reached better accuracy and other performance than the other two classifer models. Table 4 and Figure 6 present multi-class classifcation. Table 4 represents the comparative analysis of multiclass classifcation of diferent models such as RNN, recurrent neural network, and the pretrained model of CNN (CapsNet). Te classifer model of the recursive neural network reached a sensitivity of 93.00% and an accuracy of 89.00%. Te recurrent neural network model reaches an accuracy of 84.00%. Finally, the CapsNet model reaches an accuracy of 93.00%. In this comparative analysis, the Cap-sNet model reached better accuracy and other performance than the other two classifer models. Table 5 and Figure 7 present the comparative analysis of various pretrained models of CNN in terms of accuracy for binary classifcation, and the proposed model shows better accuracy in binary classifcation than multi-class classifcation, which is shown in Figure 8. Table 5 represents the comparative analysis of accuracy evaluation for binary classifcation using diferent classifer models such as UNet, ResNet, VGG-16, EfcientNet,

Conclusion
Tis research study successfully developed and analyzed the MRI data using a deep learning framework for combined automatic hippocampus segmentation and AD categorization. Multi-task deep learning (MTDL) is used to learn hippocampus segmentation simultaneously. Te hyperparameter optimization of the CNN model (capsule network) for illness classifcation is then carried out using the deer hunting optimization (DHO) technique. ADNIstandardized MRI datasets have been used to test the suggested method, and it is accurate. Suggested MTDL achieved 97.1% accuracy and 93.5% of Dice coefcient, whereas the proposed MTDL model achieved an accuracy of 96% for binary classifcation and 93% for multi-class classifcation. Also, in accuracy evaluation for binary classifcation, the CapsNet-DHO reached a better accuracy performance than other classifer models. Te proposed MTDL reached a better accuracy of 97.1% and achieved better performance than other methods. In the analysis of DSC, FCM achieved 82.1%, FRFCM achieved 91%, ARKFCM achieved 92%, and the proposed model achieved 93.5%. Finally, JSI is high for MTDL (i.e., 87.8%) compared to existing FCM models (69% to 85%). Te model considered only one dataset for validation, and as a future work, real-time data will be collected and used for verifcation process. In addition, the efciency of the pretrained model of CNN will be validated, where the hybrid DL model will be designed for identifcation of real-time collected AD images [27].

Data Availability
Te data used to support the fndings of the study are included within the article and are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.