A New Artificial Intelligence-Based Model for Amyotrophic Lateral Sclerosis Prediction

,


Introduction
Amyotrophic lateral sclerosis disease occurs due to the gradual defcit of motor neurons either in the brain or the spinal cord [1][2][3][4].Te development of unknown genes or pathophysiological processes is considered the main cause of the disease [3,5,6].ALS is a complex disorder since it afects the whole body and causes paralysis.Tis disease is very rare and unfortunately being diagnosed lately.Physicians rely on various syndromes to identify the disease in its early stages, such as behavioral defcits or cognitive dysfunctions [1,[7][8][9][10].If the disease is diagnosed behind time, then it could afect the treatment plan negatively [2,5].Te efcient ways to predict and diagnose ALS disease are to look for related biomarkers and perform robust clinical evaluations using biological data [1,11,12].Physicians have found that genes play a substantial role since it is believed to be a cause [3,[13][14][15][16].In addition, the disease can be developed or occur from composite interrelations between various factors, such as genes, age, and sex [3,16,17].
ALS disease afects the UMNs and LMNs networks which results in dysfunctions in the bulbar, thoracic, and cervical segmentations [1,3,6].Tese dysfunctions cause an increasing weakness in the skeletal muscles, which are involved in limb movements [3,[18][19][20].Bulbar onset, spinal onset, and cervical onset are multiple phenotypes of ALS [3,5].Patients who are diagnosed with ALS sufer from the loss of speaking memory.Tese patients are not likely to face a neurologist at the beginning of the diagnostic phase since it is hard to predict it early if no proper clinical evaluations are performed [1][2][3][4].Te clinical evaluations should spot signs of dysfunctions in the bulbar, thoracic, and cervical segmentations [2,3].
ALS is a rare disease, which occurs globally and is common among people aged between 40 and 70 [21].It is found that 5%-10% of positive diagnosed cases occurred due to mutations in C9orf72, SOD1, and FUS genes, while the remaining were sporadic [21,22].Tis disease afects people of all ethnicities and races [21,22].Numerous signs and symptoms can be associated with ALS disease, such as muscle weakness, twitching, atrophy, and cramps [22].In addition, difculty in speaking and swallowing, hyperrefexia, emotional, and cognitive changes, and respiratory symptoms are common signs of being afected by this disease [21].Physicians use various clinical assessments to diagnose ALS disease and these assessments are electromyography, nerve conduction analyses, magnetic resonance imaging (MRI), blood and urine tests, lumbar puncture or spinal tap, genetic testing, and muscle biopsy [22][23][24].
Increasing age, genetics, environmental factors such as exposure to pesticides, herbicides, lead, and mercury, smoking tobacco, physical trauma, and medical conditions such as primary lateral sclerosis, autoimmune diseases, and frontotemporal dementia are considered risk factors for ALS [23,24].Healthcare providers and physicians apply diferent methods, such as medications like riluzole, baclofen, and tizanidine to manage symptoms, physical and occupational therapy, speech and swallowing therapy, breathing support, nutrition support, psychological and emotional support, hospice, and palliative care to treat ALS disease [21,22,24].Currently, physicians and researchers face several challenges, which can afect patients directly.Tese challenges are complications in timely diagnosis, speedy progression, dearth of a cure and inadequate treatment alternatives, difculty of care, multifaceted genetics, inadequate research funding, and narrow access to clinical trials and rehabilitation services [24].Table 1 provides a piece of general medical information about ALS.
Currently, various articles have been published using artifcial intelligence (AI) technologies to address ALS prediction and the stratifcations of patients [2][3][4].Tese articles have provided favorable outcomes; however, using these approaches in healthcare facilities is limited due to some unseen challenges, such as generalizing the methods to work with unseen subjects [1][2][3][4][5].It is crucial to have a suitable method that can be applied or deployed on any dataset.Magnetic resonance imaging (MRI) is considered one of the technologies that are used in clinical evaluations to diagnose ALS as stated in Table 1.Te biggest challenge in diagnosing ALS is the limited availability of datasets [2].Tis research ofers a new deep-learning approach using a developed UNET architecture to predict ALS.

Research Motivations and Contributions.
To be consistent with the Saudi Vision 2030 and provide a reliable diagnosis tool to predict ALS are the motivations of this research.Tis study aims to predict ALS disease using the UNET architecture on a utilized dataset.Te following points list the contributions of this research: (1) Develop a new deep-learning approach based on the UNET model to predict ALS disease and its development rate.(2) Te developed approach is integrated with some data preprocessing tools to robust the outcomes.(3) Te implemented model is evaluated on a dataset using various characteristics.
Tis article is organized as follows: the related work is given in Section 2, and the suggested approach is described in Section 3. Section 4 provides a deep evaluation and its discussion.Section 5 concludes the article.

Literature Review
Interested researchers have developed various solutions to either identify ALS disease or estimate its progression rate.In this section, several works will be covered and discussed.
In [6], Pancotti et al. explored the advantages of using deep-learning methods to predict the ALS development rate.Te authors performed the investigation on a dataset using three architectures.Tese architectures were a feed-forward neural network (FFNN), a convolutional neural network (CNN), and a recurrent neural network (RNN).In the frst architecture, the authors used three hidden levels with a dropout regularization layer.Te utilized hidden layers took their inputs from selected static and longitudinal features.In addition, a linear activation function was deployed, and the mean squared error (MSE) was evaluated as the loss function.For the second architecture, the inputs were divided into two parts, the longitudinal and a statical residual.11 × 3 was the size of every input for the convolutional neural network.Te last architecture was used for the longitudinal data only.Te authors evaluated their models on a dataset from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) database using two parameters, which were the root mean squared deviation (RMSD) and Pearson correlation coefcient (PCC).On the other hand, the proposed approach uses the same dataset on the UNET architecture to predict the disease and evaluate its progression rate.Various performance quantities are utilized for evaluation purposes.Te proposed method reached an acceptable level of accuracy, which was found to range from 82% to 87%.
Faghri et al. in [7] applied supervised, semisupervised, and supervised machine-learning models on ALS patients to fnd the number of ALS subtypes to better understand this disease and study its heterogeneity.Te authors obtained data from ALS patients in Italy between 1995 and 2015.In total, 2858 records were studied.Uniform manifold approximation and projection (UMAP) was the unsupervised model and neural network UMAP was the semisupervised International Journal of Intelligent Systems method, while an ensemble learning based on LightGBM was the supervised model.Tis method identifed subtypes and provided useful insight into the ALS substructure, while the proposed approach in this study is able to determine whether a patient has ALS or not.Moreover, the estimation of the ALS development rate exists.
In [9], Huang et al. developed a model to predict ALS using a pattern analysis method.Tis model was implemented based on comorbidities and indicators of electronic medical records (EMRs).Te authors analyzed these EMRs and later performed a comparison with healthy controls to fnd the associated comorbidities and select them.Tese selected associated comorbidities were used to build a machine-learning model and construct a new Weighted Jaccard Index (WJI) to develop a prediction system using two levels of comorbidities, which were single disease codes and clustered codes.Te authors used WJI in four diferentmachine-learning methods to predict ALS disease.Tese four models achieved 83.7% accuracy.In addition, other performance indicators were evaluated as well.Te authors used a dataset from NHIRD in Taiwan.Tese data were collected between 1996 and 2013.Te authors defned two groups as follows: positive and negative to represent ALS patients and healthy people.Te healthy people were used to select healthy control parameters to build the prediction approach.Te negative records were collected according to the matching gender and age attributes based on the selected healthy controls.Various experiments were performed to select the associated comorbidities and applied statistical analysis on these associated comorbidities to fnd the best healthy controls to implement the prediction model.Te developed model categorized 162 ALS patients accurately.Te proposed approach in this article used the UNET architecture to extract features from the utilized dataset and compute weight for every characteristic to predict the ALS disease and measure its progression rate.
Te suggested method achieved a good accuracy between 82% and 87%, which is better than what was achieved by the method in [9].
Imamura et al. in [12] implemented an artifcial intelligence-based approach to diagnose ALS using induced pluripotent stem cells (iPSCs).Te authors used images of spinal motor neurons (SMNs) to develop the model and analyze it using a convolutional neural network (CNN).Tis method reached 97% of the area under the curve, which was the main performance indicator.Te authors trained their model using a VGG-16 neural network.Tis approach nearly achieved an average of 84% accuracy, while the proposed technique in this article utilized an artifcial intelligence-based method to predict ALS using the UNET structure and reached an accuracy between 82% and 87%.Tis range is better than what was reached in [12].

Problem Statement.
Various solutions to identify the ALS disease or predict its development rate based on artifcial intelligence technologies were developed, such as in [6,7,9,12,13].Tese works were either to identify the disease or predict its progression rate.None proposed both.In addition, some works provided no information about accuracy.Due to these reasons, this article proposes a model to identify ALS and predict its progression rate using an artifcial intelligence solution based on the UNET structure.

Dataset.
Te utilized dataset in this study was obtained from the GitHub repository [25].Tis dataset contains over 1,500 records of ALS patients and healthy people.Tese records were split into more than 30 columns.Te columns refer to various information, such as the patients' IDs, gender, time of visits and diagnosis, and laboratory results.Many data were International Journal of Intelligent Systems missing, so the dataset was cleaned and preprocessed before being utilized in the proposed approach.Several tables were constructed in the training, testing, and analysis stages.Table 2 provides details about the used data in this research and the number of data that were assigned for training, validation, and testing.

Te Proposed Methodology
. Tis part provides a full explanation of the proposed approach.Tis approach takes its inputs from the constructed tables and performs some preprocessing operations to prepare data to be completely utilized.Figure 1 presents a block diagram of the proposed model.Te block diagram shows that the developed method consists of three main phases, which are the preprocessing phase, the neural network, i.e., UNET, and the evaluation of the implemented method by fnding the performance quantities.An internal architecture of the developed UNET is shown in Figures 2 and 3, respectively.Initially, input data are segmented as shown in Figure 2 and then the segmented data are processed to extract features and categorize results to produce outputs as illustrated in Figure 3.
Te Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) is the most common technique worldwide to evaluate ALS disease.It measures 12 daily activities based on scores from 0 to 4, where 0 refers to complete loss of being able to perform an activity and 4 represents normal ability.Tis scale is used in this study to predict the development rate of ALS.Since the scale ranges from 0 to 4 for each activity, then the maximum value is 48.Te sum of all activities represents a score on the scale.Various characteristics are used, such as the number of visits after being diagnosed with ALS, onset type, and onset age.Eighteen features, also referred to as characteristics, are utilized in this research to categorize data and determine the progression rate of ALS.Tese features include the twelve measured activities and other factors in the utilized dataset.Several statistical parameters are used in this study, such as maximum, minimum, mean, variance, covariance, and standard deviation.Tese parameters were applied to healthy people to determine the healthy controls (parameters) in the proposed method.Tese parameters are compared with ALS patients in the developed method.
As shown in Figure 1, data in the dataset go through several operations in the preprocessing phase to prepare data to be used without any issues and to avoid overftting, which could occur due to high dimensionality.Te utilized data are divided into 70% for training, 10% for validation, and the rest for testing purposes to evade unfairness.During the training session, the 5-fold cross-validation technique was deployed to speed up the process, confrm the model solidity, and optimize the hyperparameters of the proposed approach.7,500 bootstraps were applied to compute the confdence intervals.Table 3 lists the applied settings of hyperparameters in the proposed method.
After the required tables were constructed, the remaining clean and useful data were divided into two classes.One class was allocated for ALS patients and another class for healthy people.Tese two classes underwent data incorporation to produce complete sets of medical records.
In addition, a statistical analysis was performed on diferent disease codes after counting them one by one to support developing the proposed method.A threshold to represent the minimum number of ALS patients was set.During the segmentation stage, as shown in Figure 2, the proposed model measures a weight for each characteristic and feeds these weights to the categorization stage to predict ALS and compute its progression rate.Figure 4 illustrates a distribution of the ALSFRS-R score within a year of the training set.Tis distribution represents a slope versus the counted score.Tis slope is utilized to evaluate the progression rate of ALS.Features with high weights get higher attention and are inserted into a group called importance characteristics.Tis group is used in the validation and testing sets to predict the disease.Table 4 shows a sample of the obtained weights for 5 records.Te frst column refers to patients' IDs and the second column to the calculated weights.
Various performance indicators were utilized to evaluate the developed approach.Tese performance indicators were accuracy, precision, sensitivity, F-score, cross-entropy loss (CEL), Dice, and Jaccard.In addition, four performance metrics were required to compute the previous performance indicators.True positive (TP), true negative (TN), false positive (FP), and false negative (FN) were the required metrics.Te following equations show how the performance indicators are determined in the proposed model.CEL � −  N i�1 q xlog(P). ( N refers to the number of classes being evaluated in this study, N � 2. q represents a binary indicator, which is computed in the proposed system, and P is the probability.Tis quantity provides a clear sight of how far the proposed model is from the needed results.Hence, the lower the value, the better results are achieved. (1) Precision (PRC) is computed as displayed in the following equation: (2) Sensitivity (SEN) is evaluated as shown in the following equation: (3) Accuracy (ACR) is computed using the following equation: ACY � (TP + TN) (4) F-score is determined via the following equation:   International Journal of Intelligent Systems (5) Dice (DIC) is calculated as shown in the following equation: (6) Jaccard Index (JI) determines an overlap area between the detected fre area and the ground truth label.Tis quantity is computed as illustrated in the following equation: where TL refers to the true label and PL represents the predicted label.Moreover, the nominator refers to the intersecting objects, while the dominator denotes the number of alliances between two groups.

Results and Discussion
Tis section provides an analysis to predict ALS disease and its development rate through several experiments.In addition, an evaluation of any signs that afect the proposed    To confrm the association and correlation between data and their actual classes, the data were distributed evenly into three sets as shown in Table 1.Te developed deep-learningbased approach was examined and evaluated using the MATLAB platform, which was installed on a machine.Tis machine was running with Windows Pro 11 using an Intel Core I7 8 th Gen., 16 GB RAM, 64-bit operating system, 2 GHz.

Predicting Results
. Due to the difculty of fnding the ALS dataset, we trust the utilized data and work on them with confdence.A hundred healthy people from the training and testing sets were selected as the control group in this study.All related data for the control group were identifed and counted as well.Te implemented method was training using 1231 records as listed in Table 2.A comparison of similarities between the two constructed groups was conducted using statistical analysis.Tis procedure shorted inputs of two groups by deleting unwanted values.Te estimated average values of all considered performance indicators are shown in Table 5. Te model accomplished 85.21% accuracy and 86.05% F-score, while precision and sensitivity were 84.86% and 84.43%, respectively.Tese outcomes were obtained using 6500 iterations; however, increasing the number of iterations enhanced the model's accuracy by nearly 6.8%.Moreover, the required processing time increased signifcantly, which is considered a side efect.Te developed approach calculated individual accuracy for the three main onsets, namely, spinal, bulbar, and limb.Tese results are illustrated in Figure 5. Te implemented approach identifed the bulbar type more than the other two types due to its data availability in the utilized dataset.
During the training stage, the running time was nearly 27 minutes, which was signifcantly higher as the proposed model went through three main stages.Tese stages were the preprocessing, segmentation, and identifcation.Te last two stages consumed most of the running time.In order to minimize the execution time, the patch size of each segmented data was decreased partially by 30%-50% and the achieved running time was noticeably good.Te execution time went down from 27 minutes to 18 minutes.Figure 6 reveals the maximum attained results of all the considered performance indicators.
Computing the running time of the developed approach to categorize input data in seconds, the number of applied variables within the method and the number of foatingpoint operations per second (FLOPS) were crucial; thus, they were measured and evaluated.Tese assessments express the calculation complexity of the presented model.Both FLOPS and the number of parameters were in millions.Table 6 shows these results.Te approach created massive FLOPS and the number of variables due to the internal structure of the internal and the number of used characteristics.Nevertheless, the fnal outcomes were favorable and promising.Te running time refers to the achieved time after shortening the patch size by nearly 45%.Tables 7 and 8 reveal the yield grouping outcomes on the testing set and a comparison assessment between several developed models [6,7,9,12,13,15,18] and the proposed approach, respectively.Te identifcation results are ALS and healthy.Te comparative assessment evaluation involves the deployed tool, accuracy, F-score, and Dice.Table 8 shows that the presented algorithm in this study produces   International Journal of Intelligent Systems promising results and surpasses some implemented methods in the literature.Te attained results in Table 7 reveal that the suggested method identifed nearly 84% of the data appropriately.

Estimation of the Progression Rate.
Te developed approach estimates the development rate of the ALS disease if a patient is predicted to be diagnosed with the disease.Tis is performed by drawing the slope of the ALSFRS-R score for only the predicted ALS patients.Figure 8 illustrates the slope diagram.It says that the survival rate probability decreases as time goes on.In addition, by the end of the frst year, the survival rate becomes 30% and the death is ensured by the end of the coming years.
Exploring the efect of decreasing the number of utilized features was conducted in this research.Te number of characteristics was reduced to seven features only.Tese features were selected based on the achieved values of the ALSFRS-R score, which were Q1 speech, Q3 swallowing, Q4 handwriting, Q6 dressing, Q7 turning in bed, Q8 walking, and Q9 climbing.We noticed that the considered performance indicators went down dramatically by more than 40%.Tis shows that the number of utilized characteristics plays a considerable role.

Discussion.
In this research, an artifcial intelligencebased solution to predict ALS disease and its development rate is presented using one dataset from the GitHub repository.It is good to mention that this dataset does not represent a typical distribution.Nevertheless, it supported this study and provided favorable information.Te presented algorithm generated promising outcomes since its accuracy lies in an acceptable range from 82% to 87%.Tis range is better than what was achieved in [9,18].In addition, the utilized features contributed to the prediction system and the estimation of the progression rate.Te proposed   Explaining and interpreting AI structures are difcult.However, these methods can be deployed and used to support and assist physicians in their diagnosis to provide good treatment plans.Identifying ALS disease and its progression rate were the main aims of this study.Various deep-learning technologies were applied.However, their results were undesired.Tus, these results were neglected.We believe that this occurred due to the limited data availability and how the methods were deployed and interacted with the used features.To prove the efcacy of the presented approach and its suitability, several statistical tools and performance indicators were applied and evaluated.Furthermore, the prediction algorithm was analyzed using diferent confgurations.To improve the fndings, the Adam optimizer tool was adopted and it showed a key role in enhancing accuracy by 4.78% and reducing the execution time by less than 7%.Among the implemented works, the authors in [13] achieved the highest accuracy, while the proposed model attained moderate outputs but no specifc solution could provide an absolute ALS diagnosis.Te presented method in this study can be deployed to identify the considered disease early and it is cost-efective.However, the execution time is considered high and can be seen as a disadvantage.

Conclusion
In this article, an artifcial intelligence-based algorithm to predict ALS and its development rate is presented.It is obvious that the system's accuracy is increased if the quality of utilized data is good enough to let the model pulls out features without any issues.Te quality of data can be improved by performing some required operations, such as cleaning and removing all associated entries of missing values.Increasing the number of used features in the prediction algorithm enhances its fndings if these characteristics are trained well.Even though the applied dataset was small, the outputs of the prediction model are higher than 80%, which is acceptable.Tese outputs were compared with other AI solutions and showed promising conclusions.Te presented approach is very cost-efective; however, its running time is a drawback, and this can be minimized by reducing the number of utilized layers and their associated parameters in the segmentation and learning phases.Moreover, the computed value of the false positive rate increases if the utilized dataset contains symptoms that are similar to ALS disease.Te proposed algorithm shows that the detection of the disease in its early stage can be realized.Tis detection can provide a good plan for treatment and quality of life for diagnosed patients.In addition, the implemented approach can be applied by healthcare providers to support and aid physicians in diagnosing ALS properly.
Future work is projected to enhance the identifcation outputs and minimize the running time for the whole process.In addition, decreasing the complexity of the prediction algorithm is another intention of the projected future work.

Figure 4 :
Figure 4: Te slope distribution of the ALSFRS-R score.

Figure 7
Figure 7  demonstrates two sample graphs of outcomes, which are a chart of cross-entropy and the receiver operating characteristic curve (ROC) for all the three sets, namely, training, validation, and testing.Tables7 and 8reveal the yield grouping outcomes on the testing set and a comparison assessment between several developed models[6,7,9,12,13,15,18] and the proposed approach, respectively.Te identifcation results are ALS and healthy.Te comparative assessment evaluation involves the deployed tool, accuracy, F-score, and Dice.Table8shows that the presented algorithm in this study produces

Figure 7 :
Figure 7: (a) Te chart of the cross-entropy result.(b) Te achieved ROC curve.

Table 2 :
Details of the utilized data.

Table 4 :
Sample of the calculated weights.

Table 5 :
Te results of the performance indicators.

Table 6 :
Te assessment results of the computation complexity.

Table 8 :
Te conducted assessment results.