Optimization-Based Ensemble Feature Selection Algorithm and Deep Learning Classifier for Parkinson's Disease

PD (Parkinson's Disease) is a severe malady that is painful and incurable, affecting older human beings. Identifying PD early in a precise manner is critical for the lengthened survival of patients, where DMTs (data mining techniques) and MLTs (machine learning techniques) can be advantageous. Studies have examined DMTs for their accuracy using Parkinson's dataset and analyzing feature relevance. Recent studies have used FMBOAs for feature selections and relevance analyses, where the selection of features aims to find the optimal subset of features for classification tasks and combine the learning of FMBOAs. EFSs (ensemble feature selections) are viable solutions for combining the benefits of multiple algorithms while balancing their drawbacks. This work uses OBEFSs (optimization-based ensemble feature selections) to select appropriate features based on agreements. Ensembles have the ability to combine results from multiple feature selection approaches, including FMBOAs, LFCSAs (Lévy flight cuckoo search algorithms), and AFAs (adaptive firefly algorithms). These approaches select optimized feature subsets, resulting in three feature subsets, which are subsequently matched for correlations by ensembles. The optimum features are generated by OBEFSs the trained on FCBi-LSTMs (fuzzy convolution bi-directional long short-term memories) for classifications. This work's suggested model uses the UCI (University of California-Irvine) learning repository, and the methods are evaluated using LOPO-CVs (Leave-One-Person-Out-Cross Validations) in terms of accuracies, F-measure values, and MCCs (Matthews correlation coefficients).


Introduction
Parkinson's is a neurologic problem that involves tremors, rigidity, and problems moving, balancing, and coordinating. e signs of the disease normally appear slowly and continue to worsen. PD is a neurological malady classified as a motor system dysfunction. e patient's activities deteriorate with PD as it progresses. Patients are affected in their fundamental bodily systems, including breathing, balance, movements, and heart functioning [1], where, at initial stages, their speech flow gets hindered. e early diagnosis of PD leads to a longer life of patients, and the diagnostics require high precision and robust health informatics tools. Such solutions aim at assisting clinicians [2][3][4] who detect PD's severity using a range of sensors. is research work uses different speech signal processing methodologies to obtain PD's clinically relevant characteristics, which are then processed using learning algorithms to provide reliable detections of PDs. e performances of computational algorithms are inextricably linked to the quality of input data. e manual identification of speeches or voices in a complex and intricate task can be executed efficiently by MLTs. Important features from voice signals can be identified by computerbased techniques, which may be one of the three categories, namely supervised, unsupervised, or semisupervised, based on the labeling of data. Filtering, wrapping, and embedding are the examples of supervised feature selection approaches.
Filtering strategies choose features that are unrelated to categorizations, while wrappers use the projected accuracies of previously determined values by algorithms for feature estimations. Embedded approaches like the filter models begin by selecting multiple potential feature subsets with specific cardinalities using statistical criteria, where subgroups with highest classification accuracies are finally selected. Unsupervised feature selections work on unlabeled data, however, evaluating the relevance of features is difficult for them. Using the labeled and unlabeled data and semisupervised feature selections can evaluate feature relevance.
Computational methods based on biological evolutions provide a stronger basis for solving problems or taking decisions [5,6]. EFSs boost the stability of feature selections as they take advantage of single approaches while overcoming their flaws. e analysis of features from datasets can be based on individual assessments or by the evaluation of subsets [7,8]. Individual assessments create a rank of characteristics based on relevance, while alternative approaches employ search strategies to generate a series of feature subsets. ese subsets are assessed iteratively using optimality criteria until they arrive at a final subset of selected characteristics [9].
is work's OBEFS framework guides the construction of EFSs that combine the benefits of several feature selection methods, avoid biases, and cover up their drawbacks. e hierarchical layers of DNNs (deep neural networks), which are DLTs, manage to generate deep abstract representations of input features in applications. DLTs have been exploited in many applications, including speech recognition, image categorization, medication development, and genetic research [10]. Researchers have used DNNs for PD categorizations mainly because of their effectiveness [11,12]. DNNs are very helpful classifiers in the case of PDs as they simulate complex and nonlinear data linkages. Previous research on PD classifications used single features like EEG data [11] and sensor activities [12] as inputs for CNNs (convolution neural networks), where the usage of unique parallel layers for classifications has not been tried. e study in [13] proliferated voices using more voice recordings of individuals in training and testing procedures with CVs (cross-validations), resulting in biased performance evaluations. Since the data had voice recordings of healthy persons and PD patients, LOPO-CVs were used to assess the performances of the proposed framework. LOPO-CVs removed examples from individuals in iterations in test sets while using other instances in training. e suggested OBEFSs framework of this work selects features based on agreements. Instead of employing single feature selection approaches, the ensembles of feature selection methods aim to integrate numerous feature selection methods, such as FMBOAs, LFCSAs, and AFAs, whereas in OBEFSs, optimum features are utilized to train FCBi-LSTM classifiers.
e proposed technique was trained using datasets from the UCI machine learning repositories, while its performance was validated using LOPO-CVs. is work's suggested model uses UCI learning repositories, and the methods are evaluated using LOPO-CVs in terms of accuracies, F-measure values, and MCCs.

Literature Review
In this part, we will outline some current works on PD classification that make use of machine learning techniques and discuss contemporary deep learning methods in PD classification. To evaluate speech recordings for PD classification, Alqahtani et al. [14] proposed classifications based on NNges (non-nested generalized exemplars), which, in spite of their capabilities, were not examined thoroughly. e study's experiments categorized healthy and PD using NNges and the algorithm's optimized parameters. Furthermore, the data was balanced using the synthetic minority oversampling technique (SMOTE) method. Finally, using the balanced data, NNge and ensemble algorithms, notably AdaBoostM1, were developed.
Using the sets of vocal data, Gunduz [15] used the dual frameworks of CNNs for identifying PDs, where different feature sets were generated but merged together. eir first architecture combined several feature sets before feeding them as inputs to 9-layered CNNs, while the second part fed feature set information directly to convolution layers in parallel. Hence, each parallel branch's deep features were obtained before their merger into layers. eir second showed highly promising results in tests as they learned deep features utilizing parallel convolutions. e extracted deep features were efficient in increasing the discriminative powers of classifiers in addition to differentiating patients with PDs from healthy people.
PDs were classified by Li et al. [16] by combining CART and ensemble learning. e study used CART to iteratively identify optimal training speech samples with high levels of differentiation.
e study used ensembles, including RFs (random forests), SVMs (support vector machines), and ELMs (extreme learning machines) for learning optimal training data. e study classified test data using trained ensemble-learning systems. e study found that CART and RF combinations were stable when compared to other strategies and also improved PD predictions with speech data categorizations. Caliskan et al. [17] projected the diagnosis of PDs using speech impairments, the first indication of the disease. ey used DNNs with stacked autoencoders and the softmax function for classifications. eir simulation results across two databases demonstrated the efficiency of DNN classifiers in comparison with other classification techniques.
For quickly detecting PDs, Cai et al. [18] proposed the usage of enhanced FKNNs (fuzzy K-nearest neighbors) combined with CBFOs (chaotic bacterial foraging optimizations) with Gauss mutations on voices data. eir CBFO-FKNN was an evolutionary instance-based learning methodology, where FKNN's parameters were tuned effectively by CBFOs. e study evaluated their suggested approach exhaustively on PD datasets in terms of classification accuracies, sensitivities, specificities, and AUCs (area under the receiver operating characteristic curves). e study aided physicians in making better clinical diagnostic judgments.
Castro et al. [19] classified PDs on UCI machine learning repository datasets with ANNs using MLPs (multilayer perceptrons). eir collections included voice recordings of patients with PDs along with control groups. e study used several networks and trained 10 to 6000 neurons, which were increased ten folds in the hidden layers. eir analyses of speech-related characteristics by ANNs could be used to assess patients' impacts of PDs. MLTs can identify other neurological disorders when biological data is made available. Disorders were classified by Abdurrahman and Sintawati, [20] where well-known speech characteristics were used in PD research, including jitters, shimmers, basic frequency parameters, and harmonicity parameters, and they assessed PDs using RPDEs (recurrence period density entropies), DFAs (detrended fluctuation analyses), and PPEs (pitch period entropies). PDs were classified using the XGBoost algorithm, which used identified baseline features, followed by feature selections executed from feature importance plots to enhance the model's performance. e resulting locShimmer features were eliminated from the model, and the efficacy of features was improved by XGBoost's assessments of feature importance to increase classification accuracies.
Karabayir et al. [21] examined PD data with multiple MLTs, including LGBs (light gradient boosts), EGBs (extreme gradient boosts), RFs, SVMs, KNNs, least absolute shrinkages, selection operator regressions, and LRs (logistic regressions). e study also conducted variable significance analyses to find important factors in people diagnosed with PDs. e study found that LGBs outperformed other MLTs in benchmarks and could be utilized to screen huge patient groups for PDs at low costs. Patra et al. [22] employed MLTs to assess the voices of patient datasets and identify PDs. e study's base classifiers were DTs (decision trees), LRs, and KNNs, which had their performances compared to ensembles like bagging, RFs, and boosts. Furthermore, the most important traits associated with classifications for PDs were discovered and prioritized, depending on feature importance with the aim of differentiating PD-affected patients by detecting dysphonia.
Parisi et al. [23] proposed the use of hybrid AIs (artificial intelligence) for examining the cases of PDs. e study used UCI's databases, where the dysphonic values of 68 patients' clinical ratings were considered for processing. e study's feature selections were based on MLP weights while ranking input features, where physiological and pathological patterns were given different weight values. is strategy reduced examinable features from 27 to 20, thus effectively reducing the dimensions for the learning of LSVMs (Lagrangian support vector machines). e proposed hybrid MLP-LSVMs performed well in benchmarks against the existing and previously proposed schemes and could be used in clinical environments for the detection of PDs.
Datasets with rich features were examined by Hasan and Hasan [24] using ANOVA (Analysis of Variance) F-score values to extract the top 50 features. Several MLTs were applied, and their results were compared to prior studies. eir experiments found that feeding select characteristics to RFs resulted in the greatest accuracy scores. eir use of ANOVA for feature extraction successfully retrieved important characteristics that distinguish PD patients from healthy persons while improving classification accuracy scores. Qasim et al. [25] suggested hybrid feature selection approaches for processing unbalanced PD datasets. SOMTE approach was used in the study to balance the dataset. Subsequently, RFEs (recursive feature eliminations) and PCAs (principal component analyses) were used to remove contradictions found in the dataset's features and reduce the processing times of PCAs. eir classifiers included bagging, KNNs, MLPs, and SVMs that worked on the acoustic recordings of PDs along with the patient's individual characteristics.
eir idea of using SMOLTE with RFEs and PCAs in preprocessing datasets was also compared with other identifiers for PDs and general medical disorders found in people.
e study was an asset to healthcare organizations.
Even though the first system integrates distinct selected features [15] prior to feeding them to a 9-layered CNN, the second model feeds feature sets to concurrent input layers that are directly connected to convolution layers. Before integrating deep features from each parallel connection in the merge layer, deep features from each parallel branch are extracted simultaneously. e suggested models are trained using information from UCI machine learning, and their results are verified using Leave-One-Person-Out Cross Validation (LOPO CV). e F-measure and Matthews correlation coefficient measure, as well as correctness, are employed to examine our data because of the imbalanced class distribution. is second model appears to be quite promising, as it is capable of learning feature representations from each set of features via concurrent convolution layers, according to experimental data.

Proposed Methodology
is research work proposes a new feature selection and classification framework for identifying PDs. is work uses five major steps, namely, the extraction of features based on voices, dimensionality reductions using KPCAs (kernelbased principal component analyses), the usage of proposed OBEFSs, LFCSAs, AFAs, and FCBi-LSTMs. Subsequently, the assessments are evaluated using LOPO-CVs. Figure 1 depicts the general flowchart of the proposed system.

PD Dataset.
e PD dataset encompassed speech samples used by prior studies to diagnose PDs from UCI's machine learning repositories [13].
e data gathered at Istanbul University's Cerrahpasa Faculty of Medicine's Department of Neurology comprised 188 PD patients (107 men and 81 women) in the age range of [33,87] and 64 healthy persons (23 men and 41 women) in [41, 82] age ranges. e voices were collected on 44.1 kHz (microphone's frequency), and three copies of the vowels of individuals were collected after doctor's examinations.

Feature Extractions.
e dataset had baseline and temporal frequency features, MFCCs (Mel frequency cepstral coefficients), WTs (wavelet transforms), TQWTs (tunable Q-factor wavelet transforms), and vocal fold features: (i) Baseline features: since PDs impede the speech of patients even in the early stages, speech characteristics were successfully used to evaluate PDs and track the disease's developments following medicinal therapies. e fundamental frequency parameters (#5), harmonicity parameters (#2), RTDEs (recurrence time density entropies) (#1), DFAs (detrended fluctuation analyses) (#1), and PPEs (#1) have been extensively utilized in characterizing speech-based PD researches [24,26] and form the baseline features [13].   (iii) MFCCs: MFCC-based extractions use triangular overlapped filter banks to combine cepstral analyses with spectral domain partitions. MFCCs can detect rapid deterioration in the movements of articulators in PDs like the tongues and lips, which are directly affected by the disease. e dataset had 84 characteristics related to MFCCs to identify the PD effects in the vocal tract (#84), and they were generated using the mean and standard deviation of initial 13 MFCCs along with the signal's log energies and 1 st /2 nd order derivatives [13], in addition to vocal folds. (iv) WTs: generally, WTs are used to make decisions about signals and specifically on signals with minor fluctuations on regional scales. Several studies have utilized WT features obtained from a speech sample's raw fundamental frequencies (F0) to diagnose PD. is work produced 182 WTs characteristics from both approximations and detailed coefficients, including energies, Shannon's and log energy entropies, and Teager-Kaiser energies. (v) TQWTs: the extraction of features using TQWTs improves signal qualities by adjusting three parameters, namely Q-factors (Q), redundancies (r), and a number of levels (J) based on the signal's behaviors. e oscillations in the time domain signals are proportional to Q-factors, while J stands for decomposed layer counts. On decompositions, J high-pass filters output J + 1 sub-bands and one final low-pass filtered output. Ringing, controlled by r allows wavelet's localizations with respect to time [27]. is study's tests yielded 432 TQWT-related characteristics from the dataset [13]. (vi) Vocal fold features: the effects of noises on vocal folds were also investigated in this work using features based on vocal fold vibrations. e study extracted the following from the data [13]: glottis quotients (GQs) (#3), glottal-to-noise excitations (GNEs) (#6), vocal fold excitation ratios (VFERs) (#7), and empirical mode decompositions (EMDs

Feature Selections Using OBEFSs.
e proposed OBEFSs integrate the normalized results of multiple feature selections to arrive at quantitative feature sets with ensemble significances. In the initial phase, the series of feature selectors are created for different outputs, followed by the aggregations of a single model's results. e aggregations of feature selections are accomplished using correlations or consensus on feature ranks or counting most selected features for determining consensus-based feature subsets. e proposed OBEFSs generate final consensus ranks by combining feature ranks supplied by single feature selectors: FMBOAs, LFCSAs, and AFAs.

FMBOAs.
is work uses FMBOAs for the selection of feature subsets, where the characteristics for samples are considered based on the effects of feature existences in PDs. Classifiers then use these selected attributes from samples (m denotes the number of voice samples). Classifiers forecast their own class labels, and evaluations are made for ultimate selections.
e original characteristics are given feature weights that indicate their significance to classifications, and features with the highest weights are chosen. MBOs are migration-based that are built on migration trends, where fitness and importance of selections are rated. When used without modifications, FMBOAs show good classification accuracy results, indicating that they balance their global and local searches. e global search components of MBOAs were tweaked in this study to provide more precise results and boost effectiveness in locating the right characteristics before resorting to local searches. Individual butterflies analyze attributes that interact with one another on local levels, disseminating information across swarms and resulting in the system's growing capabilities [29][30][31]. ey are carried out with the help of two operations, namely migration operators and adjustments to butterfly operators.

3.4.2.
LFCSAs. CSAs (cuckoo search algorithms) are motivated by the unusual habit of cuckoo species, known as obligatory interspecific brood parasitism [32]. ese behavioral patterns are based on the fact that certain animals use suitable hosts to optimize the selections of characteristics from datasets to grow their progenies. CSAs avoid parental commitments in rearing their offspring while limiting the dangers of egg loss (irrelevant traits) to other species. e final characteristics are chosen by placing eggs (features) in a variety of nests. e method's purpose is to replace the present solutions with eggs (irrelevant features) previously placed in the nest with these new solutions connected with cuckoo eggs (features). is iterative replacement may undoubtedly increase the quality of the solution over iterations, finally leading to a very good solution of the feature. In particular, CSA is based on three idealized rules [33,34], which are as follows: (1) Cuckoos lay the eggs (features) in nests randomly (accuracies).
where up k j is the k th feature's upper bound, low k j is the k th feature's lower bound, and µ is the uniform random variable in the range (0, 1). ese parameters are adjusted for ensuring the feature values that exist with their feature spaces. e feature (egg), say i, randomly selected in the iteration, results in the solution f t+1 i . e algorithm uses Lévy flights in place of random walks for efficient random searches. ese flights, similar to random walks, are characterized by step sizes, following probability distributions with isotropic and random orientations. Lévy flights are depicted by e superscript t denotes the current generation, the symbol ⊕ denotes entry-wise multiplication, and α > 0 denotes the step size. is step size specifies how far a particle (feature) may move in a certain number of iterations using a random walk. e Lévy distribution modulates the transition probability of the Lévy flights in e production of random numbers with Lévy flights has two basic phases from a computational standpoint, which are as follows: To begin, a random direction based on a uniform distribution is selected. en, based on the chosen Lévy distribution, a series of steps is constructed.
For symmetric distributions, Mantegna's approach is employed [34]. is method uses an equation to calculate the factor, ϕ � Γ(1 + β).Sin(π.β/2) where the Gamma function is denoted by Γ, and since β � 3/2 was utilized in a recent study [34], this work used the same ranges here. By (5), this factor is utilized in Mantegna's procedure to compute the step lengths: where u and v are the zero mean and deviation normal distributions σ 2 u and σ 2 v , respectively. σ v � 1 and σ u follow the Lévy distribution given by (4). e step size ζ is then computed using e obtained ς changes the value of dimension x to: f←f + ζ.Ψ, where Ψ stands for the solution's random vector, and the x value lies in the normal distribution in the range (0, 1). LFCSA approaches identify new solutions (feature selections) that are fit (accurate) with existing solutions, where new solutions replace older ones on improvements. Nests with the worst values are discarded for further iterations and replaced with randomized new solutions, where replacement rates are based on probabilities prb a , which are tuned for optimality. us, in iterations, existing solutions (feature selections) are rated based on their fitness values (accuracies), and the best solutions (features) are attained and stored as feature vectors f best . Iterations are continued until the defined stopping criteria are met. LFCSA's pseudocode is depicted as Algorithm 1.

AFAs.
e firefly algorithm is based on the idealized behavior of firefly flashing [35]. For the core formulation of FA, the three rules idealized are as follows: (i) Because all fireflies are unisex, they will attract each other regardless of their gender for the best feature selection from the dataset (ii) e brightness (accuracy) of a firefly is related to its attractiveness, which decreases as the distance between two fireflies grows (iii) e brightness of a firefly is controlled by the objective function (accuracy) e light intensity (In) varies exponentially and monotonically with distance. Equation (7) is used to explain it.
In � In 0 e − cr , where In 0 is the initial light intensity and c is the light absorption coefficient. As a firefly's attractiveness is proportional to the light intensity seen by neighbor fireflies (features), define the attractiveness β of a firefly by where β 0 � 1 is the attractiveness at r � 0. e movement of a firefly (feature) "i" is attracted to another more attractive firefly(feature) "j", which is determined by e third term is the randomization with the step α, being drawn from a Gaussian distribution.
FAs generically use (9) for iterative randomizations, resulting in uniform distributions in the interval [0, 1] range. eir step determinations are static/linear and are defined for unchangeable maximum generations. FAs begin with the same steps, and their values keep decreasing in iterations. As a result, it is possible that it will get stuck at the local optimum, causing premature convergence. Secondly, taking such a large stride may lead the firefly to miss the best option while it is still in the area of the firefly during the early phases of the search. As a result, search performance might be harmed. us, (9) implies the benefits of explorations in FAs, where larger steps result in global optimum convergences. 6 Journal of Healthcare Engineering For steps with low values, considerable influence occurs on explorations and convergences of algorithms. e values keep declining slowly on more iterations, however, they are faster in reduced iterations. ese issues have been overcome in this study by the usage of self-adaptive steps, where the firefly's unique experiences help in selecting the best features from the data.
Step settings should be used to remedy the difficulties listed above. e firefly step should be set to be far away from the ideal solution. Fireflies between the two are utilized to balance the global and local searches for the best feature selection from the dataset. As a result, the firefly's stride must be concerned with both its previous data and current circumstances.
is work introduces the firefly's history data, which contains the optimal value of the previous two iterations. Based on the comments mentioned above and many experiments, the step α of each firefly is calculated by (10) and (11), respectively. It is discussed as follows: where h i (t) is the past two iterations' history data of the i th firefly. f pi is the fitness value of the best solution of the i th firefly. f best is the fitness value of the best solution of population heretofore found, and f i is the fitness value of the i th firefly, which reflects the current data. e firefly's next iterations are self-adaptive and are decided by the gap between the current fitness values and the population's best fitness values. As a result, the firefly steps might change with repetitions, and each firefly's step is, likewise, changed at the same time.
(1) Begin  Journal of Healthcare Engineering 7 where x and y are the attribute values under consideration, and N is the total number of instances. e feature set selected by the correlation-based ensemble feature selector is given as an input to the classification.

Classification of PDs Using FCBi-LSTMs.
is work used FCBi-LSTMs for the classification of PDs. e suggested approach computes fuzzy weight with membership values that are adjusted for extracting the most relevant features with respect to PDs. FCBi-LSTMs and CNNs analyze the selected characteristics from PD datasets [36]. CNNs made of convolution and pooling layers convolute and pool where outputs are fed to subsequent convolution layers. CNNs offer significant advantages in terms of feature extractions as they use partial filters for convolutions based on their understanding of biological vision cells' local perception. e convolution layer is separated into many output matrices using filters to offer a better representation of the selected features from the PD dataset, with each output matrix having a size of (Nm + 1). e pooling layer of CNN is a technique for reducing the dimension of a matrix while keeping the fundamental links between the features. Pooling layers are average pooling layers with inputs from convolution layers. In the Bi-LSTM data analysis technique, the output of the last convolution layer is used as an intermediate variable [37]. As a result, LSTM does more than just add a nonlinear element to the input and loop cell transformation. Fuzzy weights are computed using Gaussian membership functions, where Bi-LSTMs outperform unidirectional LSTMs as they capture more structural information.
e final outputs of Bi-LSTMs are processed by CNN's convolution layers for diagnosing PDs. To combine features processed by CNN and features processed by Bi-LSTM, multimodal factorized bilinear pooling (MFB) is utilized.

Experimental Results
is section describes the experimental findings achieved by the proposed FCBi-LSTM classifier and compares them to approaches, such as FCLSTM-CNN (fuzzy convolution long short-term memory-based convolution neural networks), CNN, and SVM. Since the samples in the test sets were fewer, LOPO-CVs' performance was evaluated using the training set's remaining individual instances, as each individual had three recordings, and the class labels assigned to these recordings were used to establish the individual's class label.
e MIT-BIH arrhythmia database was used to conduct the investigations on arrhythmia recognition and classification systems and MATrix LABoratory R2016a (MATLAB R2016a). e implementation has been done using the following system specifications: Intel (R) Core ™ i3-4160T CPU@3.10 GHz 3.09 GHz processor, 4.00 GB RAM, Windows 8.1 Pro, 64-bit operating system, and 1 TB hard disk.

Evaluation Metrics.
To test the predictability of the classifiers, evaluation metrics are required. Although accuracy is a widely used statistic, it might produce deceptive findings when data has an imbalanced class distribution. Even when there is a class imbalance, evaluation measures like F-measure and MCCs may be used to assess how effectively a classifier can discriminate between distinct classes. Allow the confusion matrix in Table 1 to represent the numbers of properly and erroneously categorized occurrences per class for binary classification. e letters tp, fp, fn, and tn in the confusion matrix mean true positive (tp), false positive (fp), false negative (fn), and true negative (tn), respectively. Precision, recall, F-measure, accuracy, and error were calculated using the formulae based on these counts.
MCCs, which take into consideration the tp, fp, fn, and tn counts and are frequently recognized as a balanced measure that may be employed even if the class distribution is uneven, are another statistic for evaluating the validity of binary classifications. MCCs are simply correlation coefficients ranging from −1 to +1 between the actual and predicted occurrences. A score of +1 indicates a perfect prediction, whereas a value of −1 indicates a discrepancy between the forecast and the actual labeling.  Figure 2 compares the F-measure outcomes of four distinct feature level combinations using various classifiers. e proposed FCBi-LSTM with the first feature level combination achieved a higher F-measure value of 98.3100%, which was better than SVM, CNN, and FCLSTM-CNN, which achieved F-measures of 82.9150 percent, 85.8697 percent, and 94.2258 percent, respectively, at the first feature level combination.

Conclusion and Future Work
PD is the second most prevalent neurological ailment, causing considerable impairment, lowering the quality of life, and having no treatment. It is critical to diagnose PD early to use neuroprotective and early treatment techniques. In this research, a feature selection is used to present a multiclass classification challenge for PD analysis. For PD analysis, OBEFS and FCBi-LSTM are presented. e proposed OBEFS method is based on a number of algorithms, including FMBOA, LFCSA, and AFA. To execute OBEFS, the correlation function is utilized to choose optimum features from the three feature subsets.
e FCBi-LSTM classifier is then used for PD diagnosis. It is an effective and accurate model for properly diagnosing the condition at an early stage, which might help doctors aid in the cure and recovery of PD patients. Classification algorithms were tested with UCI's machine learning libraries, and their performance is measured using precision, recall, F-measure, accuracy, and MCC. e results were compared to other existing techniques, and the findings show that the suggested model's accuracy is higher than the other current approaches. Deep learning has a bright future in engineering and medicine. In terms of future work, the goal is to extend existing research in novel ways. Different data types can be sent into the network as inputs at the same time using the proposed CNN's parallel convolution layers. It gives us the chance to utilize the multimodal data in PD classification. Also, the authors plan to use different deep learning models in the classification process.
Data Availability e datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare no conflicts of interest.