Preemptive Diagnosis of Alzheimer's Disease in the Eastern Province of Saudi Arabia Using Computational Intelligence Techniques

Alzheimer's Disease (AD) is a silent disease that causes the brain cells to die progressively, influencing consciousness, behavior, planning ability, and language to name a few. AD increases exponentially with aging, where it doubles every 5-6 years, causing profound implications, such as swallowing difficulties and losing the ability to speak before death. According to the Ministry of Health in Saudi Arabia, AD patients will triple by 2060 to reach 14 million patients worldwide. The rapid rise of patients is caused by the silent progress of the disease, leading to late diagnosis as the symptoms will not be distinguished from normal aging affect. Moreover, with the current medical capabilities, it is impossible to confirm AD with 100% certainty via specific medical examinations. The literature review revealed that most recent publications used images to diagnose AD, which is insufficient for local hospitals with limited imaging capabilities. Other studies that used clinical and demographical data failed to achieve adequate results. Consequently, this study aims to preemptively predict AD in Saudi Arabia by employing machine learning (ML) techniques. The dataset was acquired from King Fahad Specialist Hospital (KFSH) in Dammam, Saudi Arabia, containing standard clinical tests for 152 patients. Four ML algorithms, namely, support vector machine (SVM), k-nearest neighbors (k-NN), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost), were employed to preemptively diagnose the disease. The empirical results demonstrated the robustness of SVM in the pre-emptive diagnosis of AD with accuracy, precision, recall, and area under the receiver operating characteristics (AUROC) of 95.56%, 94.70%, 97.78%, and 0.97, respectively, with 13 features after applying the sequential forward feature selection technique. This model can assist the medical staff in controlling the progression of the disease at low costs.


Introduction
Alzheimer's disease (AD) is a prevalent form of dementia in which brain cells progressively die, influencing consciousness, behavior, planning ability, language, and others [1]. It develops silently over a long period that may range from three to ten years before reaching a high level of severity [2].
is may lead to late diagnosis as the symptoms will not be distinguished from normal aging affect. Additionally, AD patients are significantly expected to suffer from several chronic diseases compared to healthy older adults [3]. e authors of [4] claimed that the AD infection rate increases exponentially with aging and doubles every 5-6 years. e Ministry of Health in Saudi Arabia mentioned that 130 thousand of the population were diagnosed with AD in 2020.
e Ministry of Health also anticipated that the patients' number would triple by 2060 to reach 14 million patients worldwide [5]. e resources demanded to deliver the best nursing services for the patients which is tremendous and cost-intensive [6]. As a result, numerous researches were conducted to investigate potential treatment procedures to cure or maintain the complication of AD. Most of these studies concluded the importance of early diagnosis and its relevance to executing the optimal treatment plan [7]. Furthermore, according to the World Health Organization (WHO), the development of the disease was proven to be connected with environmental factors [6]. Consequently, this study aims to establish an AD early predictive assistant by employing machine learning (ML) techniques on a Saudi dataset obtained from King Fahad Specialist Hospital.
Currently, it is impossible to confirm AD with 100% certainty via specific medical examinations. Screening and other available diagnostic tools used to identify patients with AD are expensive and not suitable for hospitals with limited resources [6]. erefore, involving the latest technologies such as ML can assist doctors in achieving a more accurate and efficient diagnosis. With the increasing amount of medical data and the evolving capabilities of ML, physicians will be able to identify hidden patterns that humans are incapable of detecting [8].
erefore, it can significantly improve the AD patients' lifestyle and boost their prevention chances. Considering Saudi Arabia's recent principle of employing the latest technology, numerous studies were conducted to support the health industry. Researchers in reference [9] obtained a dataset from King Abdulaziz University (KAU) Hospital, Saudi Arabia, to diagnose AD earlier using cerebral catheter angiogram neuroimaging, achieving an accuracy of 99.14%. Furthermore, researchers in references [10][11][12][13] utilized various ML techniques to perform a pre-emptive diagnosis of diabetes mellitus, chronic kidney disease, schizophrenia, and thyroid cancer, attaining the highest accuracies of 98.00%, 98.00%, 90.70%, and 90.91%, respectively. e promising results of these studies have encouraged us to consider expanding the work to build a prediction model for AD using simple clinical and demographical data.
As health care began to adopt ML, several algorithms were established to achieve adequate results. In this study, a total of four ML algorithms were utilized, namely, support vector machine (SVM), k-nearest neighbor (k-NN), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost), each with robust characteristics. Aside from SVM's flexibility in addressing classification problems, it has been used recently to diagnose precision psychiatry, brain disorders, and many others achieving superior performance [14]. Conversely, k-NN is well known for its simplicity, rapidity, and scalability in handling complex data [15], while AdaBoost is also popular due to its high compatibility and low complexity [16]. XGBoost exceeds AdaBoost in terms of scalability and speed.
is method leverages parallelized techniques to accelerate computation by ten times [17].
Empirical results demonstrated that SVM outperformed all the other classifiers achieving an accuracy of 95.52%, followed by k-NN, AdaBoost, and then XGBoost. With only six features and a negligible difference in accuracy, k-NN exceeded the performance of SVM by reducing the required number of clinical tests. However, it is crucial to consider a higher recall value that focuses on reducing false negatives as the misdiagnosis of AD induces severe complications. Hence, it is concluded that the optimal model for preemptively diagnosing AD is SVM attaining the highest recall of 97.78% and an area under the receiver operating characteristics (AUROC) of 0.97 using 13 features after applying the sequential forward feature selection technique. e remaining sections in this paper are arranged as follows. Section 2 involves the reviewed literature, whereas Section 3 includes the technical description of the four utilized machine learning algorithms. e empirical study is demonstrated in Section 4, containing dataset description, statistical analysis, an experimental setup, performance measures, and an optimization strategy. Section 5 illustrates and discusses the experiment outcomes, while the last section presents the conclusion and future work recommendations.

Review of Related Literature
Several publications utilized various ML and deep learning (DL) algorithms to develop early diagnosis models for AD. Below are some of the literature reviews conducted to identify the gaps in the studies and observe the optimal classifiers. Table 1 represents a summary of the literature reviews with their major limitations.
Alzheimer's Disease Neuroimaging Initiative (ADNI) provides several datasets with different data types for researchers to benefit from while employing ML and DL techniques [31]. e authors in reference [18] utilized the ADNI dataset to enhance the AD prediction accuracy by using the VGG-16 architecture convolution neural network (CNN) model for feature extraction. e empirical results showed that SVM, linear discriminate, and k-NN achieved the highest accuracy of 100% when trained using the functional magnetic resonance imaging (fMRI) images, while k-NN attained the highest accuracy of 76.56% when trained using the positron emission tomography (PET) images.
In another study [19], the authors developed a novel architecture for the early prediction of AD. e proposed 3DMgNet architecture was trained with two magnetic resonance images (MRI) datasets. e first was obtained from the ADNI database, and the other was collected from a hospital located in China. e empirical results revealed that the intended model achieved the highest classification accuracy of 92.133%.
Similarly, the study [20] utilized the ADNI dataset to diagnose AD using three ML classifiers. e features in the dataset were standardized and normalized. e authors concluded that logistic regression (LR) achieved the best results with accuracy, specificity, and sensitivity of 98.12%, 95%, and 90%, respectively. e authors in reference [21] also used one of the ADNI datasets to propose a state-of-the-art technique that automates the early detection of AD's three stages. Various methods were employed for feature extraction, followed by using the principal component analysis (PCA) to reduce the number of features.
e empirical results showed that random forest (RF) achieved the highest accuracy of 98.42%.  [22] also proposed prediction and detection tools for the early diagnosis and classification of AD using MRI. e CNN algorithm was employed to classify 1512 mild, 2633 normal, and 2480 AD patients, achieving a 99% accuracy.
In another study [23], the authors proposed a technique for detecting AD early using blood plasma proteins. e dataset was acquired from the ADNI portal, containing 146 blood plasma proteins from three clinical groups. A correlation-based feature selection technique was used on the first dataset based on brute force search to form feature panels. Ten-fold cross-validation was used on the first dataset. en, the resulted models were tested with the second dataset. e empirical results showed that SVM achieved the best results with a sensitivity higher than 80%, specificity above 70%, and at least 0.80 for AUROC for both datasets.
Similarly, using ML, the authors in reference [24] used psychological parameters including education, age, number of visits, and the mini mental state examination to predict AD. e empirical results indicated that SVM achieved the highest accuracy of 85%.
Open Access Series of Imaging Studies (OASIS) datasets were also employed in several studies [32]. e authors in reference [25] detected AD in its early stages using a dataset obtained from OASIS, containing MRI data of 373 imaging sessions for 62 men and 88 women. Feature selection was applied through the Boruta algorithm. e results indicated that RF achieved the best performance among the 12 proposed models with accuracy, sensitivity, specificity, and AUROC of 94.39%, 88.24%, 100.00%, and 94.44%, respectively.
Furthermore, the authors in reference [26] used a data augmentation method to collect AD braining images to implement a robust computer-vision-based prediction model using an 8-layer CNN with leaky ReLU and max pooling. e model attained an accuracy, sensitivity, and specificity of 97.65%, 97.96%, and 97.35%, respectively.
On the other hand, the proposed method in the study [27] used the extracted spectrogram features from speech data to detect AD's early stages. e technique utilized two datasets, where the first dataset consisted of 36 subjects, while the second dataset consisted of 32 subjects. LR-CV was proved to be the best classification model in both datasets. e results after parameter optimization reached a precision, recall, F1 score, and accuracy of 87.5%, 91.3%, 89.4%, and 86.1%, respectively. e authors in reference [28] utilized four ML algorithms to classify AD patients. ree datasets were used, with each dataset involving 28 mild AD or cognitive impairment patients and 50 cognitively healthy adults. Five-fold cross-validation and leave-one-out were used for evaluation. Moreover, SelectKbest was employed for feature selection, where SVM attained the best outcomes with accuracy, specificity, and sensitivity of 91.08%, 94%, and 85.71%, respectively.
More recently, the authors in reference [29] used clinical data for AD early prediction. To increase the prediction accuracy, two-stage categorization techniques were applied. e first stage entailed investigating the impact of diabetes and high blood pressure on the cognitive decline; consequently, they trained SVM and RF. After that, the cognitive function with possible dementia was examined in the second step utilizing a neuropsychological test called the cognitive ability test (CAT). e multinomial LR approach was used to analyze CAT findings to forecast the likelihood of cognitive deterioration later in life. e proposed work achieved an accuracy of 0.86 with SVM for the first stage, while RF attained an accuracy of 0.71 for the same stage. Multinomial LR attained an accuracy of 0.89 for the second stage. Furthermore, 3D-CNN was utilized in reference [30] for the early prediction of AD. e dataset was taken from the MIRIAD dataset, and it contained MRI scans of 46 AD and 23 non-AD controls. e 3D-CNN model achieved a testing accuracy of 100% with a loss of 12.74%, validation accuracy of 98.08% with a loss of 14.59%, and training accuracy of 100% with a loss of 9.5%, and precision, recall, and F1 score of 100%.
Based on the literature reviews related to this study, it was observed that several classifiers achieved good performance, such as conventional neural networks (CNN), support vector machine (SVM), random forest (RF), and logistic regression (LR). It was also observed that most of the reviewed papers utilized imaging datasets, which increased the workload of collecting data and the inconvenience of using high-complexity assembled models by nontechnical individuals. Additionally, it was noted that most of the studies achieved low sensitivity rates, whereas other studies achieved insufficient classification accuracy. To overcome these limitations, this work aimed to construct a model using simple clinical features and obtain high accuracy and recall rates with minimal computation time using a dataset collected from a Saudi hospital. erefore, local hospitals with limited equipment can benefit from the pre-emptive diagnosis models. In addition, it sought to resolve the problem of imbalanced data that may reduce sensitivity by applying the Synthetic Minority Oversampling Technique (SMOTE).

Materials and Methods
Python programming language was utilized in this experiment to develop the pre-emptive diagnosis models for AD using Google Colaboratory. e Sklearn library was utilized to perform data preprocessing and modeling. Four ML algorithms were used, including support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), k-nearest neighbor (k-NN), and Adaptive Boosting (AdaBoost). Other steps, including balancing the dataset, optimizing the algorithms using GridSearchCV, and feature selection using Sequential Forward Feature Selection, were performed using the Sklearn and Mlxtend libraries with stratified 10-folds cross-validation. e random state was set to 0 for all the implemented steps. Figure 1 illustrates the study's framework, and the following sections explain the flow of the study in detail.

Dataset Collection and Cleaning.
e Saudi Alzheimer's disease dataset was obtained from King Fahad Specialist Hospital (KFSH), Dammam, Saudi Arabia. It initially consisted of 11 features and 53522 instances, where it had to be filtered to extract the laboratory test for each patient in a single row. erefore, a filtering code was proposed for Alzheimer's dataset, where parallelism techniques were employed to reduce the computation complexity. e parallel computing paradigm involved executing several instructions simultaneously to reduce computation complexity and time.
e multiprocessing library is one of Python's most commonly used modules to implement parallel computing. It performs process-based parallelism through a fork system call where all processes share the same memory [33]. e module provides various approaches to execute code blocks concurrently, including Process, Pool/ Map, Pool/apply_async, and others.
To conduct this experiment, we utilized the Pool method, a class that assembles a pool of processes, along with the map method, which traverses the input element list in parallel [34]. Assuming a system with N processors or nodes, the speedup in comparison with one processor can be estimated as follows: where T 1 is the code runtime for one processor and T N is the runtime for N processors. T N depends on the number of computing nodes and the proportion of serial code. Hence, the total runtime of the parallelized code can be measured using Amdahl's law: where T S represents the execution time of the serial code, T P represents the runtime of the parallel code, and T Sync represents the time needed for synchronization operations such as I/O operations [34]. In this study, patients' records in the dataset were organized using the proposed pseudocode presented in Algorithm 1. e map function provided by Python was employed with seven cores instead of the complete number of processors to avoid freezing the program.
As demonstrated in Algorithm 1, we began by defining the dataset, creating the pool, and mapping it to the data filtering function with the list of intended columns. e data filtering function presented in Algorithm 2 traverses through every patient record to fill their information in parallel. After filling the columns, the pool was closed, and the results obtained were transferred to the dataset variable. Computational Intelligence and Neuroscience number of instances. Table 2 outlines the attributes used in this study.

Statistical Analysis.
Statistical analysis provides essential tools for visualizing and understanding the data patterns for better preprocessing and modeling. Table 3 displays the statistical analysis for the Saudi AD dataset's numerical features, including the mean, standard deviation (STD), minimum (Min), first quartile (25%), median (50%), and third quartile (75%), and maximum (Max) values for each feature. Moreover, it outlines the count of missing values in each column. Statistical analysis tools can show the presence of outliers in the data. For instance, the significant difference between the minimum values and the first quartile of the BP-systolic, respiratory rate, BP-diastolic, and pulse ox attributes shows the presence of outliers. Additionally, the considerable difference between the maximum value and the third quartile of the platelet attribute indicates the presence of an outlier. e standard deviation shows a wide spread of data in the platelet, BP-systolic, pulse, and BP-diastolic. In contrast, it demonstrates a considerably low distribution of the other attributes around the mean.

Dataset Preprocessing.
e data fed into ML algorithms require several preprocessing steps to enhance the learning process of the algorithms. In our study, the Sklearn library was utilized to preprocess the data starting by transforming all entries into numerical values, either float or integer, and the platelets values mentioned as "normal limit" were set to 250. e following preprocessing steps included treating outliers, filling the missing values, and scaling the data.
(1) Outlier removal: e outliers were treated by replacing the minimum values of the pulse ox and respiratory rate attributes and the maximum value of the platelet attribute with their mean: where x denotes the data points and n is the number of points.
(1) Define the dataset as a dataFrame (2) (11) EndIf (12) EndFor (13) If found! � true then (14) Add "nan" in temp (15) EndIf (16) Extend temp in temp_lst (17) EndFor (18) Return temp_lst (19) EndFunction ALGORITHM 2: Data filter process pseudo code. 6 Computational Intelligence and Neuroscience (2) Missing values imputation: A k-NN imputer replaces missing values with the mean estimation of the nearest neighbor based on a specific similarity measure [35]. In our study, the missing values were treated with the nearest neighbor's value set to 3 and the metric set to Euclidean distance. e k-NN imputer operates as follows: (1) Define the n_neighbors.
(2) Calculate the Euclidean distance: where p and q correspond to different points and n denotes the n-space.
(3) Data Scaling: e MinMaxScaler operates by scaling the values of the attributes to a specific range. In our study, the attributes were scaled in the range of 0 to 1 as where vi represents the data point, max A and min A represents the maximum and minimum values of an attribute, and new max A and new min A denotes the values 0 and 1, respectively.  Computational Intelligence and Neuroscience

Description of the Utilized Machine Learning
Techniques. e sections below represent a theoretical background of the classifiers utilized to pre-emptively predict the possible incidence of Alzheimer's disease.

Support Vector Machine (SVM)
. Support vector machine (SVM) is a trendy statistical-based supervised machine learning algorithm introduced in the late 1990s by Vapnik, Cortes, and Boser to solve classification and regression problems [36]. It has a distinctive ability to model complex relationships between variables and merge generalization control with a procedure to address the curse of dimensionality, which arises when a model examines a dataset with various features and limited samples, allowing the algorithm to perform superbly well with limited instances [14]. For linearly separable data, the SVM algorithm explores the training instances belonging to a particular class and then constructs a boundary that divides the training data into two separated classes. is boundary is known as the hyperplane, which is a subspace with dimension p-1 [37]. Equations (6) and (7) represent the formulas for finding the maximum margin hyperplane, where w → represents a vector of m dimensions, b denotes a scalar, g represents the slack variable, c denotes the penalty parameter, and y i is the class label [38].
subject to: In order to find the optimal hyperplane, the margin, which is the distance between the hyperplane and support vectors, should be maximized. On the other hand, the separation of nonlinear data is enabled by introducing the kernel functions, where the SVM algorithm utilizes a kernel trick that finds an optimal boundary for the probable outputs.

K-Nearest Neighbor (K-NN). K-nearest neighbor (K-NN) is a supervised machine learning algorithm developed by Evelyn Fix and Joseph Hodges in 1951 and then expanded by
omas Cover [39]. It is often referred to as a nonparametric, instance-based, memory-based, and lazy learner technique as each term represents a characteristic of K-NN. e principle of this classifier concentrates on using the nearest data points (neighbors) to predict classes or continuous values for hidden data. e quality of the predictions depends on the distance measurement. Executing k-NN begins with recognizing the number of neighbors (K) that have the minimum distance between the data points and the target point. Subsequently, a majority voting mechanism is implemented to choose an appropriate class or value for the new data. Equation (8) represents the Minkwiski distance measure that can be generalized to calculate other distance measures: where x and y correspond to different points and p turns the formula to Manhattan distance if set to 1 and Euclidean distance if set to 2.

Adaptive Boosting (AdaBoost).
Adaptive Boosting (AdaBoost) is an ensemble technique that implements boosting to construct a robust algorithm from a combination of weak algorithms in series. is method was first introduced by Freund and Schapire in 1997 [16]. It is mainly used to enhance the decision tree's performance for classifying binary problems. In the training process, each model concentrates specifically on the weakness of the previous model by adjusting the weights of the data points. e misclassified samples' weights will be boosted, whereas the weights of the accurately classified samples will be decreased. After applying the same procedure to each model, the results will be aggregated into a robust classifier [16]. e steps for the AdaBoost algorithm are explained briefly along with the mathematical equations below. e dataset is represented in the following equation, where n represents the number of features, x is the set of instances, and y is the binary target attribute: All instances have the same weighted sample w at the beginning, where N is the total number of instances: en, the classifier's influence in classifying the instances is calculated using the equation below, where α represents the influence and TotalError represents the error rate: After calculating α, the sample weights are updated using the following equation, where α is positive if the instance was classified correctly and α is negative if it was misclassified: e sample weight will be reduced if the classification was accurate and increased if it was inaccurate to avoid misclassification repetitions [40]. e final equation for AdaBoost classification is demonstrated below, where h represents the weak classifier and α represents its corresponding weight [41]:  Chen and Guestrin to solve classification and regression problems [42]. It is extended from the gradient boosting trees, which trains trees sequentially to reduce the error of the prior model. e output y is calculated by summing the scores predicted by trees. Equation (14) represents the formula for calculating the outcome [43]: where k represents the number of trees, f k (x i ) denotes the k th tree's score, and F is the function's space compromising the trees. XGBoost refines the gradient boosting trees by introducing several algorithmic enhancements. It is capable of improving the convergence speed of the model's training by utilizing the second derivative and performing the secondorder Taylor expansion, allowing more profitable model exploration and efficient usage of memory resources. Additionally, XGBoost effectively reduces the overfitting problem by employing regularization techniques and introducing column sampling. Equation (15) represents the formula for optimizing the objective function by adding the regularization technique [44]: where l denotes the loss function, n i�1 l(y i , y i ) the differentiable loss whereas K k�1 Ω(f k ) signifies the regularization technique added.

SMOTE.
e Synthetic Minority Oversampling Technique (SMOTE) is an oversampling approach introduced by Chawla, Bowyer, Hall, and Kegelmeyer in 2002. It is considered one of the most effective data sampling techniques in classification tasks due to its simplicity and robustness. SMOTE aims to reduce data imbalance in the training data and enhance models' generalization on the testing data. It operates by generating synthetic samples of the minority class to make them approximately equal to the majority class's samples [45]. An abstract of the method is shown in equation (16), where x i is a minority class's sample, x p i are randomly chosen samples of its nearest neighbors with p � 1, . . . N, u is a random value between [0, 1], and x * p i is the new synthetic sample: e previous steps are applied when the features contain numeric values. In the case of nominal values, one of two values is selected at random.

Performance Measures.
is study employed three performance measures: accuracy, precision, and recall to evaluate the models' performance. Precision computes the number of true positive predictions that belong to the positive class: Recall calculates the number of true positive predictions assembled out of all positive examples: Accuracy is the primary performance measure that assesses the correct number of predictions. Equation (19) presents the mathematical representation for calculating the accuracy: where true positive (TP) indicates the instances classified with AD correctly, false positive (FP) represents the instances classified with AD incorrectly, true negative (TN) denotes the instances classified with nonAD correctly, and false negative (FN) presents the instances classified with nonAD incorrectly. In addition to the aforementioned performance measures, the receiver operating characteristics curve (ROC) was plotted to measure the accuracy in the sense of the capability  Computational Intelligence and Neuroscience 9 to compare the performance of the classifiers across the whole range of class distributions. In other words, the ROC curve measures how well a model can differentiate between two objects [46].

Optimization Strategy.
Performing hyperparameter optimization mechanisms helps in improving the model to obtain better results. erefore, the GridSearchCV with stratified 10-folds cross-validation has been utilized to identify the optimal hyperparameter to obtain the highest possible accuracy for each model. GridSearch Table 4 outlines the optimal hyperparameters produced by the GridSearchCV algorithm applied using the original and oversampled datasets to compare their results.

Results and Discussion
To assess the utility of performing SMOTE, the Grid-searchCV algorithm was applied to the algorithms while training them on the sampled and original data to obtain the optimal hyperparameters for each experiment. Stratified 10folds cross-validation was utilized to assess the models' performance in terms of the previously outlined performance measures. Table 5 compares the results of the proposed models before and after applying the SMOTE technique.
e results demonstrate an inconsequential difference in the accuracy and precision before and after applying the SMOTE algorithm. In contrast, they reveal a significant difference in the recall after applying the SMOTE sampling technique. is is a consequence of the disproportion between the number of positive and negative patients in the dataset, in which the models are biased toward predicting the negative class correctly and mispredicting the positive class. Since AD is a critical disease that must be identified pre-emptively to avoid further complications, it is necessary to consider the increment in the recall rate that focuses on reducing the number of FN. erefore, it is concluded that the SMOTE algorithm positively impacted the performance of the models, where SVM attained the best outcomes with an accuracy of 93.33%, precision of 93.32%, and recall of 95.56%. e subsequent experiments were conducted with the models trained on the sampled data.

Feature Selection.
Feature selection plays a vital role in reducing the computational power of building models as it dismisses irrelevant features that can negatively impact a model's performance. In this study, the Sequential Forward Feature Selection algorithm was utilized to automatically choose the feature subset that results in the best performance for each algorithm [47].
e Sequential Forward Feature Selection algorithm operates by training an algorithm using (1) Create a null set: Y n ⟶ ∅ { }, n←0 (2) Select the optimal remaining features in a set: y + � argmax y + ∈Y n [(Y n + y + )] (3) If model accuracy(Y n +y + ) > model accuracy(Y n ) (a) Update Y n+1 ←Y n + y + (b) n ⟶ n + 1 (c) Continue with step 2 ALGORITHM 3: Sequential forward feature selection algorithm. the best feature decided by a specific criterion function. In each sequential forward step, the feature that improves the accuracy is added to the features subset as represented by Algorithm 3. Table 6 exhibits the best feature subset yielded by the Sequential Forward Feature Selection algorithm. e authors in reference [48] aimed to investigate the significance of two feature selection techniques, namely, principal component analysis (PCA) and linear discriminant analysis (LDA). It was concluded that datasets with low dimensionality produce better results when ML algorithms are used without dimensionality reduction. Despite the low dimensionality of the dataset, our study showed that the Sequential Forward Feature Selection algorithm improved the ML algorithms' performance.
It is concluded from Table 6 that SVM achieved the highest accuracy of 95.56% with 13 features, followed by k-NN with an accuracy of 95.52% using six features. Since the difference in accuracy is insignificant, it is concluded that k-NN outperformed SVM in terms of the number of the few clinical tests needed to achieve high accuracy, which can reduce the lab test costs. Additionally, it is indicated that the critical attributes for pre-emptively predicting AD are "age" and "sex" as they were utilized in all models. Table 7 evaluates the proposed models with the optimal hyperparameters and feature subsets in terms of the previously defined performance measures. It is indicated that SVM achieved the highest accuracy of 95.56%, followed by k-NN almost reaching the same accuracy rate with a difference of only 0.03%. In contrast, XGBoost obtained the lowest accuracy of 94.38%. It is also noted that the highest precision rate of 96% was attained by AdaBoost, while XGBoost achieved the lowest precision rate of 94.18%. Correspondingly, the highest recall rate of 97.78% was achieved by SVM, whereas Adaboost attained the lowest recall rate of 94.44%. Figure 2 illustrates the confusion matrices of the final selected models.

Further Discussion of the Results.
It is observed from Figure 2 that the lowest FN and FP corresponding values of 2 and 4 were achieved by SVM and AdaBoost, respectively. On the other hand, the highest FN value of 5 was obtained by AdaBoost, whereas SVM and XGBoost attained the highest FP value of 6. As a result of the proposed models achieving similar outcomes, it is challenging to deduce the underperforming model from the observed values. Nevertheless, FN result in the misdiagnosis of Alzheimer's disease leading to severe consequences such as losing the opportunity for early treatment and potentially escalating the disease's progression [49]. Hence, it is critical to attain the lowest possible FN value when considering the best-performing model, determining that SVM outperformed the other models pre-emptively diagnosing Alzheimer's disease. Figure 3 demonstrates the Area Under the Receiver Operating Characteristics (AUROC) generated using stratified 10-fold cross-validation to evaluate the models' capability to distinguish between patients with AD and those who do not. It is suggested that SVM, AdaBoost, and

Computational Intelligence and Neuroscience
XGBoost models attained the same excellent AUROC of 0.97, followed by k-NN with a slightly less AUROC value of 0.96. In general, the achieved results indicate the proposed models' outstanding discrimination ability in producing excellent predictions. is study aims to employ machine learning algorithms for the pre-emptive diagnosis of AD by utilizing simple clinical data. e study demonstrated SVM's superior performance to the proposed models in respect of accuracy and recall rates of 95.56% and 97.78%, respectively, followed by k-NN with slightly lower accuracy of 95.53% and a lesser recall of 96.67%. Although k-NN achieved comparable accuracy using only six features, leading to fewer clinical tests, which may contribute to reducing the costs, it falls short regarding recall rate. A high recall rate ensures fewer FN presence as the FN predictions lead to severe consequences due to misdiagnosis. For further analysis of the best two models, AUROC was generated to evaluate the ability of the proposed models to classify Alzheimer's patients. SVM outperformed k-NN with an AUROC of 0.97, whereas k-NN obtained a slightly less AUROC of 0.96. Overall, SVM is concluded to be the best-performing model in the early diagnosis of Alzheimer's with superior accuracy, recall, and AUROC.
A delay in AD diagnosis results in rapid disease progression and potentially risks the patients' lives as the disease gradually develops. e utilization of simple clinical data in the proposed models aids in the early diagnosis of Alzheimer's due to its fewer risks and costs compared to MRI scans. For instance, patients with implanted medical devices, such as a pacemaker, are restricted from taking MRI scans due to the danger of burns, undesirable device movements, and device malfunctions. Moreover, MRI scans can potentially pose harm to the patient's body if used inadequately, along with other safety concerns [50]. erefore, developing an accurate machine learning model using clinical data can benefit hospitals in the pre-emptive prediction of AD at a lower cost.

Conclusion and Recommendations
is project was undertaken to develop an ML model that pre-emptively predicts AD to enhance the prevention procedures and decrease the mortality rate induced by this illness. Four algorithms were trained using a Saudi dataset that was obtained from King Fahad Specialist Hospital, including support vector machine (SVM), k-nearest neighbor (k-NN), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost). Results confirmed the robustness of machine learning in predicting AD using simple clinical features. SVM achieved the best performance among the other classifiers, considering the accuracy of 95.56%, recall of 97.78%, and AUROC of 0.97. Although the number of features required to attain these results was almost double the number needed for k-NN to accomplish an accuracy of 95.53%, the recall value of SVM is prioritized in the medical field. Consequently, future work may include investigating approaches to reduce the number of features while maintaining high accuracy and recall. e experiment can also be expanded to address other chronic disorders. Moreover, it is recommended to use the least computational techniques and datasets available while experimenting with medical problems. is would facilitate adapting the preemptive prediction tool in most healthcare facilities while addressing the economic concerns.

Data Availability
Alzheimer's disease clinical data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.