Predicting Parkinson's Disease Progression: Evaluation of Ensemble Methods in Machine Learning

Parkinson's disease (PD) is a complex neurodegenerative disease. Accurate diagnosis of this disease in the early stages is crucial for its initial treatment. This paper aims to present a comparative study on the methods developed by machine learning techniques in PD diagnosis. We rely on clustering and prediction learning approaches to perform the comparative study. Specifically, we use different clustering techniques for PD data clustering and support vector regression ensembles to predict Motor-UPDRS and Total-UPDRS. The results are then compared with the other prediction learning approaches, multiple linear regression, neurofuzzy, and support vector regression techniques. The comparative study is performed on a real-world PD dataset. The prediction results of data analysis on a PD real-world dataset revealed that expectation-maximization with the aid of SVR ensembles can provide better prediction accuracy in relation to decision trees, deep belief network, neurofuzzy, and support vector regression combined with other clustering techniques in the prediction of Motor-UPDRS and Total-UPDRS.


Introduction
Parkinson's disease (PD) is the second most common and complex neurodegenerative disorder worldwide [1][2][3][4]. Both polygenic and environmental factors can cause PD [5]. It is found that, in about 1%-2% of the PD cases (mainly familial), the disease development occurs through a single gene [5]. e main symptoms of PD are bradykinesia (motor features), muscle stiffness, and tremor, along with other symptoms such as sleep disorders (nonmotor features), cardiac arrhythmia, and constipation. Alteration of voice and speech is one of the features of PD. Unified Parkinson's Disease Rating Scale or UPDRS, which shows symptoms' presence and severity, is mainly used in tracking PD symptom progression [6][7][8]. UPDRS is considered as the well-validated test and the most widely used clinical rating scale for patients with PD [6,[9][10][11]. UPDRS includes 4 sections, in which UPDRS I, UPDRS II, UPDRS III, and UPDRS IV are used to evaluate psychiatric symptoms in PD, activities of daily living, reliable motor symptoms measured in PD recognized by physical exam, and complications of treatment [10]. In many studies, this scale is considered based on Total-UPDRS with the range of 0-176 (176 total disability and 0 representing healthy) and Motor-UPDRS which indicates the UPDRS' motor section with the range of 0-108 (108 indicating severe motor impairment and 0 indicating healthy state) [6].
Machine learning (ML) approaches have demonstrated the capability of handling large volumes of medical datasets and presented perceptive directions [12]. e use of MLbased tools could enhance the safety of individuals [13][14][15], enhance the quality of medical care [16][17][18], minimize the costs of medical care [19][20][21], and support physicians' efforts by manipulating big data of patients' records. ML approaches have been broadly utilized for disorders' classification and prediction [22][23][24][25][26][27][28][29][30]. Gadekallu et al. [31] investigated the use of machine learning techniques for the prediction of diabetic retinopathy. e authors used the PCA-based Deep Neural Network (DNN) model using the Grey Wolf Optimization (GWO) algorithm for the classification of the extracted features of the diabetic retinopathy dataset.
e method was evaluated through the accuracy, recall, sensitivity, and specificity evaluation metrics and compared with the support vector machine (SVM), naïve Bayes classifier, decision tree (DT), and XGBoost. Overall, their method achieved higher accuracy compared with the SVM, DT, and XGBoost techniques. Bhattacharya et al. [32] developed a method for the classification of imbalanced multimodal stroke dataset. e authors implemented the Antlion optimization algorithm on the DNN model to select optimal hyperparameters in minimal time consumption. A positive aspect of their method was that it consumed only 38.13% of the training time on the stroke dataset. An artificial neural network is among the most significant approaches for disease classification and prediction [33][34][35][36][37][38]. Referring to Berner [39], clinical decision support systems (CDSSs) are special tools that are developed to aid medical specialists in their decision-making, considering particular disorders or diseases. ML approaches can be utilized for designing effective CDSS [36] to aid medical specialists in reaching accurate and timely predictions. CDSSs designed using machine learning approaches have played a significant part in evaluating the existence or the severity of the disease.
In machine learning methods, unsupervised approaches are used to lower the dimensionality of data, which allows the detection of the disease. Besides, these approaches allow manipulating the data, removing the noise from data, calculating the similarity, and segmenting the data [40]. On the other hand, supervised learning approaches are used to enable the final classification, prediction, and diagnosis of the disease [41]. While ML has proven its benefits, the effective deployment of ML needs a great effort from human specialists, considering that no particular approach can present acceptable results in all possible scenarios [12]. Although clinical data are available to researchers to explore, the lack of experience to handle big sources of data might restrict the optimum utilization of these sources. Besides, even though several approaches have been used in disease prediction using various real-world medical datasets, the choice of the deployed approach should consider enhancing the accuracy of the prediction and minimizing the time of computation [42]. e goal of this paper is to present a comparison of machine learning approaches for remote tracking of Parkinson's disease progression. e comparative study is based on clustering and prediction learning approaches. To further improve the accuracy of UPDRS prediction, this study uses ensemble learning in the final stage of the proposed method. Ensemble learning approaches have proven to be effective in prediction tasks [25]. Few studies have incorporated ensemble learning approaches for the development of the diseases diagnosis systems [43], [44]; [45]. Further investigations are needed for the effectiveness of these approaches in UPDRS prediction. Accordingly, we use ensembles of support vector regression and different clustering techniques for PD data clustering. e results are then compared with other prediction learning approaches, deep belief network (DBN), support vector regression, multiple linear regression, and neurofuzzy techniques. e rest of this paper is organized as follows. We introduce a summary of related works on Parkinson's disease in Section 2. In this section, the results of previous works are discussed. In Section 3, we introduce a new hybrid method for PD diagnosis. In Section 4, we present the method evaluation through a PD dataset. Finally, we present the conclusion and recommendations for future study in Section 5. To simplify, a list of abbreviations we used in this research is presented in Table 1.

Related Work
Previous literature has presented several approaches that allow PD detection, classification, and severity prediction. In Table 2, we present these studies along with the adopted approaches in each study. In the following, we will introduce a summarization of the up-to-date researches in this field.
Prashanth et al. [68] concentrated on utilizing nonmotor signals in the early diagnosis of PD by deploying NB, SVM, Boosted Trees, and RF. e findings indicated that SVM presented the highest accuracy value of 96.40%. Abiyev and Abizade [69] presented a new methodology for PD diagnosis using FNS and NN. e outcomes of the study presented efficient performance of the FNS compared to other approaches. Singh et al. [70] presented a new approach for PD detection using SVM and presented an overall accuracy of 100%. Çimen and Bolat [60] focused on the vocal signals for PD diagnosis using ANN, MLP, and GRNN. e outcomes indicated that the best performance was achieved using GRNN. Shetty and Rao [71] focused on gait signals in PD diagnosis and other neurological disorders using SVM. e presented approach achieved an accuracy of 83.33%. Nilashi, Ibrahim, and Ahani [61] presented a hybrid methodology by using EM, PCA, ANFIS, and SVR techniques. e findings of the study indicated that the presented methodology can detect the severity of the disease accurately. Ozkan [72] concentrated on vocal indicators by using a hybrid methodology based on PCA with K-NN. e result of the study indicated the robustness of the presented approach with an accuracy of 99.1%. Avci and Dogantekin [62] presented a new methodology for PD detection based on GA, wavelet kernel, and ELM. e findings of the study indicated that the hybrid approach presented better prediction accuracy than other state-of-the-art related approaches.
ree ML methods, namely SVM, NB, and RF, were deployed by Rovini et al. [74] and presented an encouraging outcome with a specificity value of 0.967. e vocal signals of PD were assessed by Pahuja and Nagabhushan [73] by using ANN, K-NN, and SVM. e outcomes of the evaluation indicated that ANN presented the most accurate performance with an overall accuracy of 95.89%. In a study by Nilashi et al. [63], SOM, NIPALS, and ISVR approaches were used in PD prediction and presented a robust performance in forecasting UPDRS while minimizing the time of prediction. Parisi et al. [64] presented a hybrid approach based on SVM (MLP-LSVM). e deployed method presented the highest accuracy in comparison with other techniques for PD diagnosis. Prince and De Vos [65] deployed several approaches for PD diagnosis, focusing on severity detection. e deployed approaches entail LR, RF, DNN, and CNN. Among the used approaches, CNN presented a better prediction accuracy of 62.1%. Another study that concentrated on PD's severity prediction was presented by Zou and Huang [66]. In the deployed method, rTL, LASSO, and ebTL approaches were used, among which ebTL presented the best prediction accuracy.
Grover et al. [67] adopted the DNN for severity prediction among PD patients. e deployed method presented encouraging results with an overall prediction accuracy of 81.66% based on the Motor-UPDRS score. Several supervised approaches were presented by Khoury et al. [59] for PD diagnosis, focusing on gait signals, such as K-NN, NB, SVM, RF, and CART. ese approaches were combined with other unsupervised approaches to meet the goal of the study. Among the deployed methods, K-NN, RF, and SVM presented the highest accuracy result. Nilashi et al. [46] presented a new approach based on DL and clustering methods. Particularly, they used DBN and SVR for UPDRS prediction. SOM was used as a clustering method to enhance the prediction accuracy. e method was assessed based on a real-world dataset and the proposed approach of clustering, DBN, and SVR presented predictions' accuracy that outperformed other learning approaches. Ghaderyan and Fathi [47] concentrated on analyzing gait signals for PD detection. e basic method is based on separating various parts of the signal, choosing the most related parts that are utilized to measure interlimb divergence in singular value space. e proposed method presented an average accuracy of 95.59% and 97.22%. Nilashi et al. [75] presented a hybrid approach that utilized the clustering method, SVD, and ANFIS. ey indicated that the presented methodology outperformed   other state-of-the-art approaches in terms of detection accuracy and minimizing the time of computation. Ashour, El-Attar, Dey, Abd El-Kader, and Abd El-Naby (2020) utilized the LSTM, focusing on the FOG signals collected from several sensors worn on parts of the body. Besides, SVM and ANN were used in the classification process. Although they achieved an overall accuracy of 83%, one shortcoming related to the size of the data can impact the generalizability of the outcomes. Paragliola and Coronato [48] investigated the performance of hybrid NN, which entails CNN for reducing the dimensionality and LSTM for the diagnosis. e study concentrated on walking patterns by utilizing gait data and presented an accuracy of 95%. Still, the applied methodology focused on binary classification issues, in which the severity of the disease was not explored.
In a study by Mohammed, He, and Lin [53], a CNN model for discriminating PD patients from healthy controls, based on SPECT modalities, was proposed. e model was assessed based on 10-fold cross-validation and presented an accuracy of 99.34%. In a study by Balaji et al. [49], a DL approach based on LSTM was developed for severity classification of the PD and presented encouraging results with an overall accuracy value of 98.6%. De Souza et al. [50] presented a Fuzzy OPF for PD diagnosis. RBM was used to extract the features and outperformed other baseline models. Senturk [51] developed an ML method for PD diagnosis. In this study, SVM was used for the classification task and presented an overall accuracy of 93.84%. De Vos et al. [52] used a novel method to discriminate between PSP and PD by utilizing two approaches of LR and RF. RF presented higher accuracy results compared to LR.
In a study by Goyal et al. [54], a hybrid method for feature extraction that integrates RSSD and T-F algorithms was presented. e deep learning approach, particularly CNN, was utilized in PD diagnosis based on speech impairments. e hybrid method presented accurate outcomes in classifying PD patients (99.37%). Based on handwriting assessment, a study by Xu and Pan [55] adopted an ensemble learning model based on RF in the PD diagnosis. Dimensionality reduction was performed using PCA. e presented approach has an overall accuracy of 89.4%, which outperformed LR and SVM based on the adopted evaluation method. Another study, which concentrated on handwriting indicators, was presented by Ribeiro et al. [56]. e study adopted a PD diagnosis approach based on RNN by utilizing the bag of samplings to measure several compact representations and presented acceptable performance compared to other methods in the literature. Another study that concentrated on handwriting was proposed by Parziale et al. [57] based on CGP.
is approach presented an explicit decision measure that was used in the detection of PD. e authors also compared various AI approaches for PD detection and indicated that CGP outperformed these approaches in terms of accuracy. Tsuda et al. [58] adopted an approach based on NNs to distinguish between PD and MSA. e adopted methodology presented an improved recognition of the patterns compared to other approaches.

Methodology
In the prediction of diseases, ML techniques have proven to be effective [77][78][79]; [25]; [26]; [61]; [27] [42]. is study uses both unsupervised and supervised learning techniques to diagnose PD through UPDRS prediction. Several approaches that entail clustering, reducing dimensionality, and learning of prediction approaches are used to create the PD diagnoses method. Figure 1 depicts the proposed method with its main stages. Data preprocessing, dimensionality reduction using PCA, clustering using ensembles of EM, and prediction using ensembles of SVR are all stages of the method that are utilized to predict UPDRS through a set of real-world PD data.
Step 1 (data preprocessing). As suggested by previous studies, the data is preprocessed [80,81] to have a more accurate prediction of UPDRS. e goal of data preprocessing in this study is to handle the dataset's null values. In general, we included the preprocessing stage in the proposed method because it is typically completed during the first step of data analysis. e data is then deployed in the data analysis stages, such as clustering and prediction. e datasets are created with null values for method evaluation. Before clustering and classification tasks, these values must be imputed. In this study, SVD is used for missing value prediction.
Step 2 (data clustering). We use an unsupervised learning technique in this stage for clustering the PD data. e objective of this step was to increase patient record readability through the grouping of patients into different groups. We used ensembles of EM to have a better cluster analysis of the data.
Step 3 (dimensionality reduction). To remove the noise of data, the PCA method was used in this phase to lower the dimensionality of the data [82]. Multicollinearity has a considerable impact on the accuracy of predictors and is a major issue in the field of disease diagnosis [46]. e accuracy of SVR predictors has been affected by the multicollinearity of the data. We, therefore, use PCA to solve the multicollinearity problem as the most popular technique for noise removal.
Step 4 (UPDRS prediction). is stage was performed to predict UPDRS according to the input features. In contrast to the previous prediction methods for PD diagnosis, we used ensembles of SVR to perform this task. SVR is trained to build prediction models with training datasets. It is a common practice to seek the advice of several doctors who are experts in the field in various clinical settings. e ultimate decision for a specific therapy is thus normally made through consultation and a combination of opinions of a committee of specialists. Ensemble learning systems serve a similar function in the machine learning context [83]. In general, ensemble learning systems can be utilized effectively Journal of Healthcare Engineering 5 in classification and regression problems and provide more reliable predictions than any individual learning model [84]. In fact, several weak hypotheses are combined in ensemble learning systems to form a stronger theory. Note that the success and effectiveness of ensemble learning approaches are heavily dependent on the diversity of the individual predictors that construct the ensemble. e total error can be reduced by combining the output of different prediction models through an algebraic expression (e.g., mean value of the predictions), as the various errors of the prediction models are averaged out. [85] developed SVM as a machine learning technique for forecasting problems with the potential to be extensively used as a benchmark. Support vector classification (SVC) and SVR are the two main branches of SVM. SVR performs the prediction of a new sample by training the data with target values. is is done by finding Φ(x) function to map data to a flat space. e SVR can effectively solve complex prediction problems through linear and nonlinear regression. e kernel functions are used to transform the data into a high-dimensional feature space. Radial basis functions (RBF) and polynomial functions are the most widely utilized kernel functions in SVR.

SVR Ensembles. Cortes and Vapnik
Let us have a training dataset of length N: T � (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x N , y N ), where y ∈ R and x k ∈ R n , k � 1, 2, . . . , N, to model a single output (y) in the original SVR; a linear model formulation is presented as where y indicates the predicted output, b is a bias term, w is a weight vector, and 〈., .〉 indicates vector inner product. To solve nonlinear problems, in SVR, it is possible to use nonlinear kernel functions, Φ(x), and we have One of the most widely employed nonlinear SVR kernels is radial basis functions. In SVR, to have good generalization performance, weight vector w is required to be as flat as possible. us, for every data point in the dataset, we need to minimize the norm (.) of w as To guarantee feasible constraints, slack variables (ξ i , ξ * i ) are introduced as well as ε-insensitive loss function L (ε, y, y).
us, the following cost function needs to be minimized by estimating the parameters w and b.
where C is the regularization parameter. e above optimization problem can be solved in its dual through handling the constraints by employing the Lagrange multipliers η, η * , β , and β * . us, Lagrangian (L) is presented as In order to find the minimum, the following final problem of quadratic programming results in Karush-Kuhn-Tucker conditions as Here, K ij indicates the kernel function. Finally, in SVR, the model form in the dual space can be written as 3.2. PCA. Pearson [86] introduced PCA as a statistical technique to simplify the complexity of high-dimensional data. is can be accomplished by the orthogonal projection of the correlated variables into uncorrelated variables. e uncorrelated variables are then known as principal components (PCs). We tested Pearson Correlation Coefficient (PCC) for the interdependencies in the data. If y is the output and x i is the ith observation in the dataset, the PCC is defined as Here, cov is the covariance and var indicates a variable variance.

EM Clustering. EM is a probabilistic and iterative algorithm that switches between the maximization (M) and expectation (E) phases in a sequential fashion.
In the E-phase, EM calculates the expected value of the likelihood function. In M-phase, however, EM obtains an estimation of the parameters to maximize the likelihood function. e parameters obtained in the M-phase are then used in a subsequent E-phase. is process is repeated until convergence occurs (i.e., it convergences to the final values of the parameters). To perform E-phase for the probabilities calculation of which input in the dataset belongs to which mixture model or cluster, EM performs the following formula: where h indicates the probability density function of input X n in the dataset for the cluster m with standard deviation (SD) σ t m and mean μ t m at iteration t. Here, normal univariate, bivariate, or multivariate probability density function may be employed and should be according to the dimensionality of data. In addition, under the constraint M k�1 w t k � 1, the allocation of data to mixture models is influenced by a weighting factor w t . Accordingly, performing M-phase, for the maximization of the likelihood Journal of Healthcare Engineering to the data, EM needs the following computations for each cluster: e above procedure is repeated by executing E and M phases until convergence occurs.

SOM Clustering.
Kohonen's SOM system is an unsupervised machine learning method. us learning method is widely used in the visualization of complex data, image processing, speech recognition, data mining, process control, and diagnostics. Based on the characteristics of features, SOM's algorithm tries to map m-dimensional input vectors x j to two-dimensional maps. SOM aims to reduce the dimensions of the data, which aids in the understanding of high-dimensional data. SOM by this way can present the data in similar groups. Two layers make up a basic SOM. Input space' nodes are included in the first layer, and the output space' nodes are included in the second layer. SOM's idea is to adjust the nodes to represent the distribution of the data. e nodes represent clusters that reflect the distribution of the data. e SOM algorithm starts by assigning random weights to the variables. SOM algorithm in three main stages is shown in Figure 2

Cluster Ensembles.
A number of various clustering approaches form cluster ensembles to partition the initial dataset and concentrate on the enhancement of the clustering outcomes resulting from a mixture of the results of different clustering. is is performed to overcome the instability of the methods of clustering. e fusion approach begins with the clusters formed during the combination phase and determines the optimal number of clusters in the dataset based on certain predetermined criteria. Next, we describe the cluster ensemble approaches, hypergraph partitioning, and majority voting. ey are also called consensus functions.

Hypergraph Partitioning.
e cluster label vectors for hypergraph partitioning are transformed into a hypergraph image. In particular, there are vertices and hyperedges in a hypergraph. e clusters are represented in a graph as hyperlinks, whose vertices match the clustered objects. A set of objects that belong to the same group are described on every hyperlink. ere are three common functions for transforming the cluster set into the representation of the hypergraph. ey are CSPA and HGPA. e CSPA uses a Metis algorithm to induce an association matrix graph and cluster it [25]. A clustering in CSPA refers to a link between objects in the same segment of data and can be employed for measuring the similarity in pairs. en, the similarity will be (1) Computation of the mean feature vector, μ � (1/n) n t�1 x t for n patterns x t (t � 1 to n). (2) Computation of the covariance matrix C using C � (1/n) n t�1 x x k − μ x t − μ T where T indicates matrix transposition.

Competitive Process
Given a set of patterns X, select x = (x 1 , x 2 , … , x d ) T at random from the dataset X. rough distance measure and by considering the weights w i = (w i1 , w i2 , … , w id ) T for the node i, the closest nodes to the input vectors are determined.

Cooperative Process
A topological neighbourhood is defined, locating the BMU in the center of the neighbourhoood. To do so, a Gaussian function is used as h (t) = exp (d 2 c,j /2σ 2 (t)) where, d 2 c,j is lateral distance, t current iteration and σ is the radius with the neigbourhood function which is defined as: σ (t) = σ 0 exp (−t/λ 1 ) where σ 0 is the is the radius at time t 0 and λ 1 is time constant. For two dimensions, the lateral distance d is defined as: Adaptive Process e BMU and all nodes within the neigbourhood are updated by: where w i (t) and x (t) indicate respectively the weight and input vector at time t. α (t) is defined as: σ (t) = σ 0 exp (−t/λ 2 ) where λ 2 is time constant and σ 0 is the learning rate at time 0. deployed to recluster the items in order to generate an integrated clustering. In the HGPA algorithm, the problem of cluster ensembles is viewed as a partitioning issue of a suitably identified hypergraph, which represents the clusters. Minimal cut algorithms are mainly used to control the partition size and to find good hypergraph partitions when they are combined with the proper objective functions.

Majority
Voting. In the majority voting mechanism, the cluster that is the one with the most votes is selected. Assume a dataset contains four Parkinson's disease patients (PDP1, PDP2, PDP3, and PDP4), and there are three clustering algorithms (Algorithms 1-3). Assume that Algorithms 1 and 2 have both assigned PDP1 to cluster A, while Algorithm 3 has assigned it to cluster B. Cluster A would then be chosen as the best cluster for PDP1 based on the majority vote.

Imputing Missing Value.
In this research, we use SVD for missing values imputation. e procedure for missing value prediction by SVD is provided in Algorithm 2. Five steps are used in the SVD algorithm to predict the missing value in the PD dataset. In the first stage of the algorithm, the data is converted to a dense matrix B m,n . In the next stage, we perform a normalization procedure on the B m,n . In the third stage of the algorithm, we apply the SVD technique to the matrix produced in the second step. en, we use matrix Z to approximate Z d . In the last step, the missing value is predicted.

Methods Evaluations and Results
In the first step of our data analysis, EM and SOM algorithms were performed on the PD dataset. We implemented EM and SOM to generate a different number of clusters to have their ensembles for final clustering results. Specifically, we run EM for k � 8, 10, 12, and 13 and SOM for different SOM maps, SOM2×3, SOM2×4, SOM3×3, and SOM3×4. e results of EM clustering for 13 (k � 13) clusters and SOM clustering for 9 clusters (SOM3×3) are provided in Figure 3. ese clusters are then used for ensemble learning to be used in UPDRS prediction by the SVR technique.
SVR was trained through the RBF kernel. e SVR parameters are penalty factor C and loss function ε, and the parameter for RBF kernel is c. ese parameters can have a significant impact on the prediction quality of SVR. e radial basis function is defined as where a single parameter c in (11) indicates the spread of the function. We used 5-fold cross-validation with a grid search mechanism to select the best parameters of each SVR. In the grid search, c was explored on c ∈ [0.01 to 0.1] at an interval of 0.005, loss function ε was explored on ε ∈ [0.0001, 0.002] at an interval of 0.0001, and the penalty factor C was explored on C ∈ [3, 4.1] at an interval of 0.005. We developed the ensembles on bootstrap samples drawn from the selected data points. e training was repeated 8 times to get 8 SVRs.
To measure the performance of the presented methodology, we use several metrics such as adjusted coefficient of determination (R 2 ), prediction accuracy (PA), Index of Agreement (IA), MAE, and RMSE. eir formulas for n observations are shown as follows:

Index of Agreement
Journal of Healthcare Engineering Step 1: e data in each cluster is converted to the dense matrix B m,n .
Step 2: e normalization procedure is performed on B m,n through where B j is the average value and σ j indicate the SD for B j which are calculated by Step 3. e SVD technique is applied to Z.
Step 4. Z d is computed for the approximation of Z.
Step 5. e missing value is provided using   where Predicted is the mean of the predicted values, Observed is the mean of the observed values, S Predicted is the standard deviation of the predicted values, S Observed is the standard deviation of the observed values, and m is the number of independent variables.
In Table 3, we present the performances of EM and SOM ensembles by majority voting, CSPA, and HGPA. e results are provided for different numbers of clusters and their ensembles. From the results, it is found that, on average, SOM and EM ensembles by the use of the HGPA approach perform best, providing the highest rates of R 2 , PA, IA, MAE, and RMSE. e best accuracies are provided by ensemble size 3 (SOM2 × 4 + SOM3 × 3 + SOM3 × 4) for SOM and ensemble size 4 (k � 8, 10, 12, 13) for EM. In addition, majority voting has provided the lowest accuracy for EM and SOM ensembles in UPDRS prediction.
To evaluate the deployed methodology compared with previous methods in the literature, we performed several experiments on the PD dataset and presented the results in Table 4. e proposed method was compared with the SVR, ANFIS [87], MLR, HSLSSVR [88], neural network (NN), and DBN [89].  Table 4, it is found that clustering and dimensionality reduction techniques have significantly reduced the computation time of the proposed method for UPDRS prediction.
To assess the performance of the deployed method on the PD dataset with null values, we randomly considered 10% of patients' data as null values and predicted them with the SVD algorithm. e dense matrices are then used in EM and SOM for clustering and ensemble learning. We finally predict the UPDRS using the dataset with predicted values. e results are provided for HGPA + SOM + SVR ensemble and HGPA + EM + SVR ensemble in Figure 4. e results

Discussion
Efficient detection of PD is crucial, as timely diagnosis and appropriate medication can delay the development of symptoms and difficulties resulting from this disorder [90]. Despite the significance of fast detection of PD, it is not an easy task because current detection measures are usually based on subjective indicators [91]. Besides, in the initial phases of PD, nonmotor signs, like depression, sleep disorder, and rapid eye movement, are more recognized than motor signs, which impacts the fast detection of the PD [92,93]. ML has been used for medical disease detection lately and particularly PD treatment [51]. is can be explained by the convenient performance and accurate results of ML techniques [94]. Classification of diseases is a significant type of predictive modeling. It is considered an important data mining approach because it clusters the population referring to a predetermined criterion. It is vital to compare the outcomes of various classification methods to decide which approach presents the best performance [95]. Hence, the main goal of this research is to assess several approaches that are utilized for PD prediction and classification. Even though ML methods have been assessed in several studies separately, the evaluation of these methods based on various datasets makes it complex to perform an accurate comparison among the deployed methodologies. Hence, it is vital to evaluate these methods in one comparative study based on a chosen dataset.
In this study, in the first step of data analysis, EM and SOM algorithms were implemented to produce various numbers of segments to have their ensembles for final clustering outcomes. Following that, the resulting clusters were used for ensemble learning UPDRS prediction by the SVR technique. Following that, the results are provided for different numbers of clusters and their ensembles. Referring to the outcomes, we can conclude that the performance of SOM and EM ensembles by the use of HGPA is the best among the deployed approaches based on R 2 , PA, IA, MAE, and RMSE measures. Besides, the evaluation of the proposed methodology in relation to previous methods in the literature was conducted based on various experiments on the PD dataset. e result of the deployed approach was compared with other approaches (SVR, ANFIS, MLR,  HSLSSVR, and NN). Based on the findings, the HGPA + EM + SVR ensemble provided better accuracy compared with the other ensemble learning approaches.

Conclusion
Most of the presented methods for PD prediction depend strongly on human proficiency [96]. e benefits of deploying the ML in the medical sector are that they provide objective, context-independent, and data-driven analysis [97]. ML approaches have been utilized effectively in disease diagnosis and severity prediction [27,42,54,61,[77][78][79][98][99][100]. Particularly, ML has also been utilized in analyzing the data collected from wearable IMU sensors for automated evaluation of motor disorders like PD [101][102][103]. Hence, the practical aim of this study entails providing supplementary, quick, and accurate methods that can aid experts in reaching more objective medical decisions considering the PD diagnosis. By deploying these methods in the appropriate systems, several gains can be acquired that entail reducing the expenses of manual diagnosis and minimizing diagnosis time.
Continuing this line of research and supporting previous literature, this study uses both unsupervised and supervised learning techniques to diagnose PD through UPDRS prediction. Besides, clustering, dimensionality reduction, and prediction learning techniques are used to create the PD diagnosis method. e basic aim of this paper is to conduct comparative research of the ML approaches for PD diagnosis. We concentrated on clustering and prediction learning methods to conduct the comparative study. Particularly, several clustering approaches for PD data segmentation and SVR ensembles to predict Motor-UPDRS and Total-UPDRS were used. e findings are then evaluated based on other prediction learning methods, MLR, neurofuzzy, and SVR techniques based on a real-world PD dataset. e finding of the study indicated the superiority of deploying EM with SVR ensembles in relation to decision trees, neurofuzzy and SVR combined with other clustering approaches in the prediction of Motor-UPDRS and Total-UPDRS.
Many previous works have been conducted focusing on patients' classifications, severity prediction, and remote monitoring. Still, there are future routes in each field to be investigated. Besides, several sensors such as magnetometer, accelerometer, and gyroscope have been utilized and assessed. Additionally, MRI, EEG signals, f-MRI, and DATSCAN images were utilized to present accurate predictions of the disease. Other research directions can be followed by utilizing other brain signal images such as ECG, EMG, and PCG. Other sensing modalities can be explored and combined to present a more accurate classification of the disease.
Even though ML methods in previous literature have presented high classification accuracy for PD detection, still, there are some obstacles related to feature extraction and selection which need to be addressed [104]. e utilization of several features can increase the computation time [105,106]. On the other hand, if fewer features were utilized, this will increase the complexity of extracting the features, which will accordingly impact the computation time. is paper has some shortcomings which should be considered in future research. e study is based on a real-world dataset to assess the proposed approaches, which has one limitation considering the number of features used in the prediction process. Other PD datasets with a larger number of features can be utilized in the evaluation of the deployed approaches. Large datasets can present more generalized outcomes. Emerging technologies can be used to collect data from patients using particular applications, as suggested by Bot et al. [107], in which the authors developed an application to collect the data from PD patients using their iPhones. is approach can ease the data collection from the public because of the availability of smartphones and help to present more generalizable outcomes. Furthermore, this study can be extended by incremental machine learning approaches to improve the computation time of previous PD diagnosis methods in processing large datasets [44,89].
Data Availability e data are freely available at https://archive.ics.uci.edu/ml/ datasets.

Conflicts of Interest
e authors declare that they have no conflicts of interest.