The Role of Medication Data to Enhance the Prediction of Alzheimer's Progression Using Machine Learning

Early detection of Alzheimer's disease (AD) progression is crucial for proper disease management. Most studies concentrate on neuroimaging data analysis of baseline visits only. They ignore the fact that AD is a chronic disease and patient's data are naturally longitudinal. In addition, there are no studies that examine the effect of dementia medicines on the behavior of the disease. In this paper, we propose a machine learning-based architecture for early progression detection of AD based on multimodal data of AD drugs and cognitive scores data. We compare the performance of five popular machine learning techniques including support vector machine, random forest, logistic regression, decision tree, and K-nearest neighbor to predict AD progression after 2.5 years. Extensive experiments are performed using an ADNI dataset of 1036 subjects. The cross-validation performance of most algorithms has been improved by fusing the drugs and cognitive scores data. The results indicate the important role of patient's taken drugs on the progression of AD disease.


Introduction
Alzheimer's disease (AD) is considered as one of the most severe diseases that destroy the brain (Zheng and Xu [1]). According to the Alzheimer's Association report by Huber-Carol et al. [2], more than sixty million people around the globe would suffer from AD in the next fifty years. Moreover, based on the report estimation, one person is affected by dementia every three seconds. Consequently, by 2050, the potential number is 152 million internationally [3]. As dementia has several stages, there is a stage called mild cognitive impairment (MCI) between healthy aging and AD. Most people with MCI are gradually advance to dementia within five years (Ye et al. [4]). In addition, MCI patients who are ranged between d 10% to 20% convert to AD per year as estimated by Qiu et al. [5]. erefore, the early-stage discovery of AD could provide an opportunity for a treatment that slows down AD symptoms and improve the patient's life (Gray et al. [6]). e early identification of patients in whom AD and progressive MCI (pMCI) is converted from stable MCI (sMCI) is a complex problem because patients always have similar signs (Lee et al. [7]). Machine learning (ML) techniques is playing an essential role in many areas such as engineering, physics, mathematics, marketing, and computer science (Liu et al. [8,9]). ML techniques have great potential to adopt with this medical challenge (Liu et al. [10]). As AD is considered as chronic disease, the collected patients' data are considered to be time-series and multimodal data. Furthermore, the AD patients' data is considered as heterogeneous based on the patients' profiles. Recently, several ML models such as K-nearest neighbor (KNN), support vector machine (SVM), multilayer perceptron (MLP), and logistic regression (LR) have been employed to classify a patient as cognitive normal (CN), MCI, or AD (Moradi et al. [11]; Park et al. [12]). ese studies focus primarily on using single modalities including magnetic resonance imaging (MRI) (Liu et al. [10]), fluorodeoxyglucose positron emission tomography (FDGPET) (Hinrichs et al. [13]), diffusion tensor imaging (DTI), and cerebrospinal fluid (CSF). However, using single modalities negatively affects models' performance because some useful additional information from various biomarker modalities is omitted (Ye et al. [4]). Some studies have investigated the combination of multiple modalities for AD classification, and they achieved better performance compared to methods based on single modalities (Gray et al. [6], Zhang et al. [14]). In this context, Wee et al. [15] used both DTI and MRI to identify ten patients with MCI from 17 matched CN patients. e accuracy is increased by 7.4% better than using the single-modality-based method. Bouwman et al. [16] diagnosed CN patients from MCI using two modalities: MRI and CSF. For predicting cognitive loss in MCI, Fellgiebel et al. [17] used PET and CSF to predict cognitive loss in MCI. Zhang et al. [14] classified AD and MCI from CN using integration between three modalities: MRI, FDGPET, and CSF. Gray et al. [6] applied a random forest (RF) algorithm to four modalities: genetics, MRI, CSF, and FDGPET to classify AD versus MCI versus CN. In the other hand, there are some works that used time-series approaches to detect AD progression. e authors in Moradi et al. [11] used semisupervised learning to predict MCI-to-AD conversion between one to three years using MRI modality. e authors in El-Sappagh et al. [18] used ensemble machine learning classifiers based on RF for the two layers, utilizing multimodal AD datasets. Venugopalan et al. [19] used different models including, SVM, DT, RF, and KNN, to early detect the AD stage. In addition, they demonstrated multimodality data and single-modality models. Moore et al. [20] studied the relationship between pairs of data points at various time separations using RF. In addition, they used three modalities: demographic, physical, and cognitive data.
Model performance is improved using time-series data with multimodel consideration for AD progression detection. e resulting models are expected to be more stable and medically acceptable because they mimic the real procedures followed by medical experts. In addition to MRI, PET, and CSF, there is a crucial data source, which has not been studied in the literature of AD. is data source is dementia medications, which are taken during patient's observation period. e drugs contains of chemical substances which are accumulated in the body in some forms, which increases the probability of disease progression, or the drugs could help to improve the patient conditions, which decreases the probability of disease progression. us, it is necessary to study the impact of these drugs on the disease's progression (Zimmerman [21]). Furthermore, there is no study in the literature that discussed this issue. In this work, we have provided an ML-based model to predict AD progression after 2.5 years. In doing so, we have implemented and tested a set of ML techniques according to the patient multimodal time-series data. e study is based on the cognitive score and Alzheimer's medication (AM) data. For every patient, these modalities are collected for 1.5 years (baseline, month-6, month-12, and month-18) and used to predict the patient's state at month 48. We used the ADNI dataset. ADNI is real clinical data, so our results have potential practical applications. Extensive experiments have been performed, and AM data showed the superiority of improving the CV performance of most algorithms. All models have been optimized using the grid search technique. Furthermore, the effect of the feature selection process on the model's performance has been studied. e rest of this paper is structured as follows: Section 2 presents the architecture of the proposed system of predicting Alzheimer's progression. Section 3 describes the experimental results. Finally, the paper is concluded in Section 4.

The Proposed System of Predicting
Alzheimer's Progression e proposed system of predicting Alzheimer's progression is described in Figure 1. It consists of the following steps: data collection, data preprocessing, data fusion and splitting, data balancing, classifiers optimization and training, and models evaluation. Each step of the proposed system is described in detail in the following subsections.

Data Collection.
Data used in this work was collected from the Alzheimer's disease neuroimaging initiative (ADNI) database disease neuroimaging initiative [22]. Over 57 sites in the United States and Canada have enrolled subjects El-Sappagh et al. [18]. e study was carried out in accordance with GCP principles, the Declaration of Helsinki, and US 21 CFR part 50 -Protection of Human Subjects-and part 56 -Institutional Review Boards. Subjects were willing and able to participate in test procedures such as neuroimaging and follow-up, and they gave written informed consent. All data are open to the public at disease neuroimaging initiative [22]. e collected dataset has 1036 subjects categorized into four groups, as shown in Table 1.
e study is based on two time-series modalities of the cognitive score (CS) and Alzheimer's medication (AM). e CS dataset includes eight features: CDRSB, GDTOTAL, FAQ, ADAS 13, CDG, MMSE, MOCA, and NPISCORE. Based on the ADNI dataset, we designed a drug dataset that includes nine features: antidepressant, Cognex, Aricept, Namenda, Exelon, Razadyne, Other, and None. ese drugs are sorted according to their popularity in our dataset as Aricept, Namenda, antidepression, Exelon, and Cognex (42.18%, 25.77%, 23.84%, 6.18%, and 0.09%, respectively). Mostly the CN (85.94%) patients did not take any drugs. As a result, we removed this class from the dataset. All datasets have 787 patients and three classes (sMCI, pMCI, and AD). Table 1 shows the patients' demographics.

2
Computational Intelligence and Neuroscience

Data Preprocessing.
We prepare both the drugs and cognitive scores datasets that we collected from the ADNI dataset, as shown in Figure 2. e drug dataset has several preprocessing steps: Time filtering: In this phase, we filtered data of four visits, that is, the first four visits (bl, M06, M12, M18) denoted to baseline, month 6, month 12, and month 18, respectively. ese visits data are exploited with and without drug data to explore the effects of drugs on predicting an AD patient's progression after 2.5 years (at month 48). Code separation: e drug dataset includes a column containing multiple values and delimiter ":" that separates values. We separate the row into two multirows using ":" delimiter. e Cognex feature has been removed because only 0.09% of the patients used it. Data encoding: e dataset includes a column with names of patients' drugs; we split each drug's name and create a new dataset that includes nine columns. e names of the drugs are listed in these columns. Each column has a binary value (i.e., 0 or 1) indicating whether or not the patient is taking the drug.
Aggregation: e last dataset includes multiple rows of each patient. We convert multiple rows of each patient into one row by grouping rows using the RID column and get max value for each column. e preprocessing of the CS dataset has been done as follows: e randomness of the data has been checked, and the data are missing at random.
To minimize the negative effect of missing data on our dataset, any case with missing baseline scores or features with missing values of more than 30% was deleted. We used the forward filling technique to handle missing time-series data, where the previous values were used if the diagnosis was not changed for a time step.   e synthetic minority oversampling approach (SMOTE) proposed by Chawla et al. [23] was used to handle the class imbalance to avoid the biased models. e SMOTE is applied to only the training set.

Classifiers Optimization and Training.
e optimal values of hyperparameters of the ML models were selected using the grid search approach with stratified 10-fold CV. e five were applied to each dataset: Support vector machine (SVM) is a supervised learning approach that analyzes data for classification or regression. e SVM is a discriminative algorithm that is formalized by an optimum hyperplane. It generates an optimal hyperplane result, which classifies unknown instances, and datasets that support the hyperplane are referred to as support vectors. However, selecting the optimum hyperplane is tough since it must be noisefree and accurate in its generalization of data sets. SVM is attempting to discover an optimum hyperplane that delivers a significant minimum distance to the trained data set. Decision tree (DT) by Sweety and Jiji [24] is one of the most widely used machine learning classifiers. It is pretty trendy because it can be customized to nearly all kinds of data types. It is a supervised learning technique that partitions training data into smaller chunks to extract patterns for classification. e knowledge is then shown as a tree, which is easy to understand. e decision model is constructed from the top-down of the tree structure, beginning with the (top) root node. e root nodes are significant predictors, while the leaf nodes have a final classification. K-nearest neighbor (KNN) is a type of supervised algorithm. A KNN algorithm attempts to locate the pattern space for the k instances of training that are similar in new instances when analyzing testing data. KNN classifier may be appropriate for the dependent variable, covering two principles: low risk, medium risk, and high risk. Moreover, the KNN classification needs the same number of bad and good sample examples for better performance. e selection of k also fulfills the KNN process performance. Random forest (RF) by Alickovic et al. [25] is a machine learning classifier based on trees that leverages the power of multiple decision trees for making decisions. RF is made up of several decision trees, each of which chooses its separation features from a bootstrap training set. RF offers several advantages: the approach of classification is exact, quick, and noise-resistant. In RF, random selection and bagging features are merged. e values of independently sampled random vectors are influenced by every tree in the forest and have the same distribution as every other tree. Logistic regression (LR) Mirzaei et al. [26] is a supervised machine learning classifier that predicts the likelihood of a target variable. It is a multivariate technique that seeks to create functional relationships between numerous predictor variables and a single output. In most situations, the LR output variable is categorical because it can only be assigned to a limited number of classes. LR is a powerful ML algorithm because it can generate probabilities and categorize new data using discrete and continuous datasets.

Evaluation Metrics.
Models are evaluated using four standard metrics: accuracy, precision, recall, and F1-score, where TP stands for true positive, TN for true negative, FP for false positive, and FN for false negative, as shown in equations (1)-(4).

Computational Intelligence and Neuroscience
Accuracy �

Results
e Python 3.7.3 distributed in Anaconda 4.7.7 (64-bit) were used to run the experiment. e models were implemented using the Scikit-Scikit-learn 0.20.0 library Pedregosa et al. [27] e performance of ML models: SVM, LR, KNN, DT, and RF were registered to three datasets: CS, AM, and AM-CS. ree experiments were conducted to obtain the results. Each conducted experiment has been repeated 6 times, and the average of accuracy, precision, recall, and F1-score was registered (where A: Accuracy, P: Precision, R: Recall, and F1: F1-score). In the first experiment, we initially aimed to evaluate the capability of the ML models to distinguish patients of AD, pMCI, and sMCI classes based on either cognitive scores or Alzheimer's medication. en, we tried to answer the question: to what extent does the features infusion of the CS and AM affect the performance of the ML models? Table 2 presents the first experimental results. In the second experiment, we evaluated the effect of AM on detecting pMCI within MCI patients. e experiment tries to answer the question: to what extent does the AM-CS fusion con-tribute to the overall performance of the ML models within the MCI patients? Table 3 presents the second experimental results. e third experiment is similar to experiment 2; however, this experiment answers the question: to what extent does the AM-CS fusion con-tribute to the overall performance of the ML models between the MCI and AD patients? Table 4 presents the third experimental results. For the last two experiments, we try to evaluate the performance of the ML models for MCI patients, who have similar cognitive scores and Alzheimer's medication, and sMCI vs. AD patients, who have medically different cognitive scores and Alzheimer's medication. Table 2 shows that the ML models achieved the best CV performance for the fused dataset. For example, the RF, DT, LR, SVM, and KNN models achieved an accuracy of 92.74%, 84.96%, 88.4%, 82.89%, and 82.43%, respectively. RF is an ensemble classifier, which could be the main reason for its high performance. For the testing performance, three out of the five ML models achieved the highest performance using the fused dataset with accuracies 88.54%, 85.42%, and 74.22% for RF, LR, and KNN models, respectively. is indicates the importance of the AM data for the AD progression detection task. Table 2 also shows that the AM features alone are insufficient and CS-based models can be improved by AM-CS fusion.

Experiment 2: sMCI vs. pMCI.
e results of this experiment as shown in Table 3 assert the crucial role of AM-CS fusion to enhance the ML model's performance. For the CV results, the RF, DT, LR, and SVM models with the AM-CS dataset outperformed other models with accuracies 87.90%, 89.54%, 87.07%, and 87.10%, respectively. Besides, testing results of these four ML models show an improvement using the AM-CS dataset with accuracies 85.11%, 89.36%, 87.23%, and 86.57% for RF, DT, LR, and SVM models, respectively. ese models achieved testing AUC of 0.878, 0.815, 0.910, and 0.897 for RF, DT, LR, and SVM models, respectively. e results of ML models based on AM dataset alone achieved better performance than recent studies such as Ye et al. [4]. For example, the KNN and SVM models achieved testing accuracies 75.69% and

Conclusion
is paper studies the role of dementia drugs in improving the progression detection for AD patients based on multimodal time-series data. e algorithm is based on the patient's four-time-step time-series data and can predict AD within 2.5 years of M18. e model is based on the early merging of time-series modalities from CS and AM. We have optimized and tested five ML models using the realworld ADNI dataset. e results showed the crucial role of drugs features to enhance the performance of these ML models. In the future, we will extend this work by studying the interpretability features of these models.  Data Availability e data were collected from the ADNI (http://adni.loni.usc. edu/). e ADNI data had previously been gathered from 50 different study sites. Requests for data access should be made to http://adni.loni.usc.edu/data-sampl es/access-data/.

Ethical Approval
All procedures used in the study involving human participants complied with the institutional and/or national research committee's ethical requirements and the 1964 Helsinki statement and its subsequent revisions or comparable ethical standards. e ethics committees/institutional review boards that approved the ADNI study are listed within the Supplementary file.

Consent
To participate in the study, each study subject gave written informed consent at the time of enrollment for imaging and genetic sample collection and completed questionnaires approved by each participating sites' Institutional Review Board (IRB).