Assessment of Acoustic Features and Machine Learning for Parkinson's Detection

This article presents a machine learning approach for Parkinson's disease detection. Potential multiple acoustic signal features of Parkinson's and control subjects are ascertained. A collaborated feature bank is created through correlated feature selection, Fisher score feature selection, and mutual information-based feature selection schemes. A detection model on top of the feature bank has been developed using the traditional Naïve Bayes, which proved state of the art. The Naïve Bayes detector on collaborative acoustic features can detect the presence of Parkinson's magnificently with a detection accuracy of 78.97% and precision of 0.926, under the hold-out cross validation. The collaborative feature bank on Naïve Bayes revealed distinguishable results as compared to many other recently proposed approaches. The simplicity of Naïve Bayes makes the system robust and effective throughout the detection process.


Introduction
Parkinson's disease (PD) is an inherent disease among elderly individuals. e disease appears when the dopamine neurons significantly fall in the human brain [1,2]. e PD symptoms start with voice impairments at its early stage, tremor, and loss of memory, and the subject shows an inability to walk, run, and even perform regular day-to-day duties. e situation worsens at a late age, where the subject suffers huge memory loss and cannot move and lean to perform minor activities. e worst part is that the disease is not curable and not reversible [3], so all efforts have been made to its early detection and preventive measures to suppress its adverse effects. Medical science reveals that Parkinson's disease mainly causes gradual reduction of dopamine hormone in the human brain as this hormone acts as the transmitter of signals among various neurons [4]. Insufficient amount of dopamine hormone leads to nontransmission of signals and various neurorelated disorders and symptoms being started in human beings, and Parkinson's disease is one of them. Symptoms of PD can be nonmotor and motor-related. Nonmotor symptoms include sleep disorder, speech variation, problem in swallowing, and loss of smell, whereas motor symptoms were connected to slow movement, e.g., bradykinesia, tremor, rigidity, and postural instability [5]. ese symptoms also vary from patient to patient over different time periods, and the appearance of symptoms is often lately observed by the patient due to the casual ignorance of early symptoms. e effect of Parkinson's disease varies from person to person, and all the symptoms may not be evident by every PD patient and even may not appear in the same order and same combination. However, subjects suffering from idiopathic rapid eye movement sleep behavior disorder (iRBD) are more prone to PD. Speech changes are the first motor symptom that appears even ten years before the actual diagnosis starts [6]. erefore, assessing speech signals provides a better scope for detecting chances of Parkinson's in the early stage. For instance, the time domain amplitude of both controls and Parkinson's has been visualized in Figure 1. Each block of Figure 1 represents a subject, where the green color plots represent controls and the red color plots represent the subjects suffering Parkinson's. e subjects' specific plots are generated on the pronunciation of sustaining vowel/a/in Italian language [7].
From Figure 1, the amplitude of Parkinson's subjects appears to be abnormal, where the disorder can be identified. On the other hand, the amplitude of the Non-Parkinson's Disease subject is uniformly in a decreasing trend. e disorder signal of Parkinson's subjects is the dysphonia and hypokinetic dysarthria that a subject suffers at various stages of PD [8]. Dysphonia refers to the inability to produce normal phonation due to impaired functioning of the phonatory system. Reduction of pitch variation often denotes monotonicity and reduces loudness, breathless voice, and tiny speech formation [9]. Approximately 90% of the PD patients are affected by this combined sign of hypokinetic dysarthria [9]. In the context of acoustic voice analysis, it is difficult to identify the slight variation of a sound wave through the naked ears. In such a situation, the power of machine learning techniques can be employed to discriminate Parkinson's from the other signal [10,11].
As PD is a nonreversible disease, the only option left with the clinical practitioners is to reduce the speed of the effect. In this way, the subject feels confident and cured if the diagnosis process starts early. PD shows only a few symptoms at the early stage on the flip side of the coin, like voice disorder and mild tremors. However, these symptoms also resemble other symptoms of an average person. is is why diagnostic technicians and clinical practitioners are nowadays exploring machine learning and artificial intelligence approaches [12][13][14] to predict the presence and severity of disease among their subjects. e main contribution of this article is as follows: (a) A collaborative feature bank consisting of seven vocal features has been created from Baseline Features (BF), Vocal Fold Features (VFF), and Time Frequency (TFF) with the help of Correlated Feature Selection (CFS) [15], Fisher Score Feature Selection (FSFS) [16], and Mutual Information-Based Feature Selection (MIFS) [17]. (b) e traditional Naïve Bayes has been trained and tested on the seven features of the collaborative feature bank, which shows the robustness and effectiveness of our system as compared to other recent approaches of Parkinson's disease detection.
e rest of the article is as follows. Section 2 deals with literature reviews, Section 3 outlines the materials and methods, and Section 4 briefly discusses the results, followed by a conclusion at Section 5.

Literature Review
Many recent machine learning techniques, including Naïve Bayes, proved useful in segregating subjects suffering PD from the controls. For instance, Avuçlu and Elen [18] proposed Parkinson's detection through multiple classifiers.
eir experiment was conducted on various training and testing instances spanned over 22  e authors used the dataset proposed by Sakar et al. [24], and the dataset contains replicated speech information of 252 subjects resulting in 756 instances. Machine learning methods cannot be directly applied to these instances as each subject has three readings of the speech signal. ese instances need to be consolidated before the actual classification starts. Moreover, creating a Parkinson's detection system on 754 features is not convincing. e Performance of DNN, as claimed by the authors, may vary on consolidated instances. Further, their system may not be practically effective on synthetic samples generated by SMOTE. Similarly, Polat and Nour [25] use multiple classifiers ensemble to detect Parkinson. e One Against All (OAA) sampling technique plays a pivotal role in the detection process. e Logistic Regression (LR) on OAA samples proved to be a brilliant Parkinson's detector. Multiple supervised classifiers are also used on vocal features selected through Adaptive Grey Wolf Optimization Algorithm (AGWOA) and Sparse Auto Encoder (SAE) [26]. e Naïve Bayes classifier on AGWOA and SAE features reveals a detection accuracy of 72%. In the recent past, decision trees are gaining popularity in biomedical data classification [27]. Classification and Regression Tree (CART) have been used to detect the presence of Parkinson's [28], where the CART detector detects Parkinson's with 75.19% through 8 optimum features of vowel /a/.

Dataset.
e idea behind the proposed approach is the feature collaboration to detect Parkinson's disease. Journal of Healthcare Engineering about the acoustic signal of both Parkinson's and control patients have been considered. All the BF, VFF, and TFF are extracted from a recent Parkinson's detection database publicly available at the UCI machine learning repository [24], prepared at the Department of Neurology in Cerrahpaşa, Faculty of Medicine, Istanbul. e database contains 752 acoustic features of 252 subjects, including control and Parkinson's. Data is prepared with a 44.1 kHz microphone setting followed by a physician's examination. e sustained phonation of the vowel /a/ was collected from each subject with three repetitions. e vast 752 features also include 22 VFF, 11 TFF, and 21 BF. ese features are extracted using Praat acoustic analysis software [24]. e number of features available under VFF, TFF, and BF of the dataset has been presented in Table 1. Gender-specific control and sick subjects are outlined in Table 2. e detailed characteristics of these features segments and corresponding features can be found at [24,27].
e Istanbul acoustic database [24] used here comprises 252 subjects, where 64 are controls, and 188 subjects are suffering from Parkinson's. Similarly, the dataset contains vocal information of 122 female (41 controls and 81 Parkinson's) and 130 male subjects (41 controls and 81 Parkinson's).

Features Selection. For effective collaboration, a Features
Bank (FB) is created using the best features of BF, VFF, and TFF. e identification of best features has been established through three prominent feature selection techniques [29,30]-Correlated Feature Selection (CFS) [15], Fisher Score Feature Selection (FSFS) [16], and Mutual Information-based Feature Selection (MIFS) [17]. ese feature selection schemes initially ranked the features (based on their contribution towards the classification). ey selected the most suitable features from the ranked features (features having the highest contribution towards the classification process). All three CFS, FSFS, and MIFS techniques use distinct proven mechanisms for feature ranking. e CFS calculates correlation among attributes to understand the variable similarity. For two attributes A � a 1 , a 2 , a 3 , . . . , a n and B � b 1 , b 2 , b 3 , . . . , b n , CFS calculates correlation r as follows: where a � mean of attribute A and b � mean of attribute B. e higher the value of r, the more the underlying attributes correlated and the lower the value of r the underlying attributes have far deviated from each other. After calculating the correlation score for each attribute, the attributes are arranged in the ascending order of the correlation score. Arranging attributes based on correlation score provides a scope to move the highly uncorrelated attributes to the front and perfectly correlated attributes at the rear, thus supporting the classifiers for enhanced detection. Similarly, FSFS calculates the fisher score of individual features of the underlying Parkinson's dataset. e feature weights are calculated based on the sample size and number of class labels. FSFS are tested for binary and multiclass datasets, but it is widely used for binary datasets [31]; hence, a suitable feature ranker is proposed for the current work. For a given set of features f � f 1 , f 2 , f 3 , . . . , f p having a set of classes K � k 1 , k 2 , k 3 , . . . , k c , the fisher score S of the feature f i can be estimated as follows: where n j is the number of instances in the j th class, μ i is the mean of the i th feature, and μ ij and ρ ij are the mean and variance of the i th feature and j th class, respectively. In this way, the fisher score of each feature of the Parkinson's dataset has been calculated, allowing us to rank the features based on the score accumulated. It should be noted that the fisher score evaluates the score individually; i.e., no two features are taken simultaneously to calculate the feature's score [32]. e individual fisher score proved to be a limitation to identify the feature redundancy. However, since prominent features have been selected iteratively through Naïve Bayes classification, the limitation of identifying feature redundancy will not affect the evaluation process.
With a similar guideline of CFS, the MIFS ranking algorithm estimated the relationship among features through mutual information and ranked the features based on the mutual information score of attributes. For any two given attributes a and b having values 1, . . . , p and 1, . . . , q , respectively, a joint probability π ab ensures the samples of attribute  (a, b) ∈ 1, . . . , p × 1, . . . , q , then the dependency between a and b can be estimated [17] through mutual information as follows: (3) Like correlation score, mutual information places a crucial role in features ranking. All the three feature ranking algorithms CFS, FSFS, and MIFS can also be extended to select a subset of features. After ranking all ranked feature segments, the ranked features are passed to Naïve Bayes incrementally one feature at a time in an iterative fashion. e incremental feature classification allows selecting the suitable number of features from each segment where the Naïve Bayes shows the highest detection accuracy.
In a nutshell, all the three feature selection techniques CFS, FSFS, and MIFS work jointly to identify goodness scores for each attribute of the underlying Parkinson's dataset. e idea behind this incremental feature selection is to select only those attributes which are mainly close to class attributes and not close to each other. However, instead of depending on the practical way of identifying attributes, selecting attributes through incremental classification is emphasized. In a landscape, the incremental feature selection helps to identify potential attributes in the most realistic way. e selected features of BF, VFF, and TFF through CFS, FSFS, and MIFS provide the most relevant collaborative Parkinson's disease detection features. e entire process of Parkinson's detection process has been depicted in Figure 2.
e process of detecting subjects affected with Parkinson's follows three steps; viz., Feature Selection, Feature Collaboration, and Parkinson's Detection. As pointed earlier, in the feature selection stage, the BF, TFF, and VFF are ranked separately using CFS, FSFS, and MIFS techniques. As a result, nine feature blocks are realized. e feature collaboration stage's ranked feature blocks are passed, where Naïve Bayes play a crucial role in suitable feature identification. Features from each ranked feature block are fetched incrementally and sent to Naïve Bayes for classification. is process continues till all features are fetched from each ranked feature block. e incremental features for classification help identify the minimum number of features required to achieve maximum detection accuracy. e number of ranked features for which the maximum amount of detection accuracy has been received are identified. For each feature block, i.e., VFF, TFF, and BF, the best features are identified by comparing all three feature ranking schemes (i.e., CFS, FSFS, and MIFS).

Classification.
e ranked features are collaborated and sent to Naïve Bayes for detection of Parkinson's. In this way, the entire detection process relies on a small number of collaborative features; thus, it appears to be a practical method of Parkinson's detection. e detection approach has been developed using the Weka machine learning repository [33,34]. e implementation settings of the proposed model are outlined in Table 3. e predictive model of Naïve Bayes uses estimator classes for prediction [35]. e numeric estimator precision values are chosen based on the analysis of the training data. e batch size indicates the desired number of instances to process for batch prediction of testing samples. e supervised discretization option ensures the conversion of numerical attributes to nominal ones. All the attributes remain numerical, so this option has been disabled during the training and testing process.

Results and Discussion
e results of the proposed work have been analyzed in three broad ways. At the first stage, the efficiency of feature ranking schemes, i.e., CFS, FSFS, and MIFS, has been analyzed. e individual ranking of features per feature selector helps identify the most potential VFF, TFF, and BF segments for effective collaboration. At the second stage, the performance of Naïve Bayes has been evaluated along with many other traditional supervised classifiers in the context of Parkinson's detection. Finally, the proposed collaborative feature-based Parkinson's detection system has been compared against other recent vibrant Parkinson's detection mechanisms.

Collaborative Features Identification.
As the first stage of the collaborative Parkinson's detection scheme, a bank of collaborative features is prepared. e detection accuracy of Naïve Bayes on change in the vocal fold, time frequency, and baseline feature through CFS, FSFS, and MIFS ranking has been presented in Figures 3-5, respectively. e classification accuracy of Naïve Bayes was also recorded on original features to understand the power of feature ranking techniques.
It is to note that both the original and the ranked acoustic features are incrementally processed through Naïve Bayes to observe the performance enhancement with a change in the number of features. e performance of Naïve Bayes due to CFS, FSFS, and MIFS shows a satisfactory result as compared to original features. It can be seen from Figure 3 that the CFS shows the highest detection accuracy with just ten features in hand. In contrast, the same Naïve Bayes took 12 original features to produce similar detection accuracy. On the other hand, the three features of the FSFS ranked scheme help the Naïve Bayes attain the same CFS detection accuracy. On a similar note, the Naïve Bayes shows the same detection accuracy with 6 MIFS features. erefore, all the three CFS, FSFS, and MIFS boost the performance of Naïve Bayes to the peak with the help of 10, 3, and 6 features, respectively. erefore, the 3 FSFS features have been sent to the feature bank for collaboration.
With a similar guideline, when both the original TFF features and ranked CFS, FSFS, and MIFS features are processed incrementally, only the 3 features of CFS boost the performance of Naïve Bayes exceptionally well up to 75.79%. However, FSFS also boosts the Naïve Bayes' performance but not as that of CFS and MIFS. Both FSFS and MIFS reveal a satisfactory performance improvement with a detection Journal of Healthcare Engineering accuracy of 73.4% and 73.81%, respectively. ough the Naïve Bayes took only 1 MIFS feature, the first 3 features of CFS have been sent to the feature bank for collaboration due to the highest detection accuracy.
When the performance of Naïve Bayes is studied, the performance of the classifier due to rankers CFS, FSFS, and MIFS was found to be degraded. Nevertheless, the rankers show a similar result as that of original arrangements with minimal features. In this regard, the Naïve Bayes yields the highest accuracy of 76.59% with 3FSFS features. But instead of FSFS, we prefer to choose 1 CFS ranked baseline feature. e CFS enhances the performance of Naïve Bayes with the same detection accuracy parallel to the original order of features with a lesser number of features. erefore, the first feature of baseline ranked through CFS ranker has been shortlisted and sent to feature bank for collaboration. e performance of Naïve Bayes on CFS, FSFS, and MIFS and the original order of VFF, TFF, and BF features have been presented in Table 4. e feature threshold column indicates the minimum number of features identified to produce maximum detection accuracy under the concern settings. So, a total of 3 FRFS ranked vocal fold features. 3 CFS ranked time frequency features and 1 CFS ranked baseline features are identified for feature collaboration.

Performance Analysis of Collaborative Parkinson's Detection.
As the first stage of collaborative Parkinson's detection scheme, a bank of 7 collaborative features comprising VFF, TFF, BF has been prepared. ose 7 features have been undergone 10-fold cross validation on Naïve Bayes classifier. e result obtained both for Parkinson's and control subjects has been presented in Table 5.     Journal of Healthcare Engineering According to Table 5, the sensitivity of Parkinson's subjects and specificity of control subjects are satisfactory. e specificity of 0.926 for control subjects indicates that the collaborative Parkinson's detection model correctly detects negative results for 92.6% of control subjects who have undergone the test. Similarly, the sensitivity of 0.926 for Parkinson's subjects pointed out that the model will correctly return a positive result for 92.6% of the disease subjects. Similarly, a precision of 0.817 indicates a total of 174 subjects are suffering from Parkinson's out of all the subjects that are predicted as Parkinson's, which is impressive in the context of medical diagnosis. On the other hand, the Receiver Operating Curve (ROC) represents an excellent AUC (>71%). e Precision-Recall Curve (PRC) represents 0.905, which is again in an acceptable range. e ROC and the PRC of subjects predicted as control or Parkinson's have been presented in Figure 6.
According to Figure 6(a), the ROC of both the Control and Parkinson's subjects is entirely satisfactory. e curves are tending nicely towards the true positive rate. e curves claim 76.2% area of the plot both for Controls and Parkinson's subjects. On the other hand, the PRC is convincing for Parkinson's subjects, whereas for the control subjects, the PRC is not convincing (Figure 6(b)).   Journal of Healthcare Engineering

Performance Comparison with Other State-of-the-Art
Models. is section highlights the comparison of the proposed work with other similar classifiers for Parkinson's disease detection. e seven collaborative features used are also passed to the C4.5 decision tree, k-Nearest Neighbor, Logistic Regression, Neural Network, and Random Forest classifiers.
e hold-out validation method has been employed to validate the proposed model with other stateof-the-art approaches. In the view of hold-out validation, the training instances are prepared with 30% of the subjects, and the testing instances are 70% of subjects randomly. It is observed that Naïve Bayes on collaborative features excels with 78.97% of detection accuracy with the lowest ever training time. e k-Nearest Neighbor suffers on the collaborative features with the lowest detection accuracy of 67.46%. However, the training time of k-Nearest Neighbor is at par with that of Naïve Bayes. On the other hand, Logistic Regression shows a close performance outcome of Naïve Bayes with a bit of training time of 0.03 s. e detailed performance outcomes of the proposed approach, along with others, are presented in Table 6.
In a subsequent attempt, errors generated by the proposed collaborative Parkinson's detection system have been observed along with peer supervised classifiers. e errors generated by the various classifiers along with collaborative features based on Naïve Bayes represent an inconclusive result. It is because the collaborative PDS shows better results for Mean Absolute Error (MAE). In contrast, it shows at par results with other classifiers in Root Mean Squared Error (RMSE), Relative Absolute Error (RAE), and Root Relative Squared Error (RRSE). e outcome of error matrices such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Relative Absolute Error (RAE), and Root Relative Squared Error (RRSE) have been presented in Table 7.
Similarly, the Naïve Bayes based on collaborative features is also compared with other classifiers through ROC and PRC. e results about the various classifiers have been outlined in Table 8.
In Table 8, Naïve Bayes represents exceptional ROC and PRC Values of 76% and 81%. e results appear to be far better than that of the k-Nearest Neighbor and C4.5 decision tree. e Logistic Regression is the only classifier that closely competes with Naïve Bayes. e ROC and PRC are visually represented for all classifiers, including Naïve Bayes in Figure 7 for control and Parkinson's subjects.
ROC of all the classifiers, including Naïve Bayes, can be seen more towards True Positive Rates. However, C4.5 and k-Nearest Neighbor suffers for controls but shows marginal results for Parkinson's subjects. In addition, with the progression of false positives, k-Nearest Neighbor reveals low true positive rates, and thus, results in low AUC. On the other hand, while evaluating PRC, it is found that Naïve Bayes outperforms with superior precision. erefore, the proposed collaborative features on Naïve Bayes is a practical approach to Parkinson's detection. At the final stage of analysis, the proposed collaborative features-based Parkinson's detection system has been compared with the current state-of-the-art function-based methods, viz., Avuçlu and Elen [18], Bourouhou et al. [19], Zhang et al. [20], Meghraoui et al. [21], Kadiri et al. [22], Polat and Nour [25], Xiong and Lu [26] and Mekyska et al. [28]. Since our approach is based on a function-based approach, most of the methods taken for comparison belong to function-based approaches such as Naïve Bayes and Support Vector Machine (SVM). e comparison has been conducted in two different sets of performance matrices. At first, the standard detection accuracy has been used for the comparison (Table 9). Finally, the Naïve Bayes based Parkinson's detection mechanisms     Journal of Healthcare Engineering are compared and analyzed using many other additional performance matrices and are presented in Table 10.
e detection result of five recent Parkinson's disease detection (PDD) schemes has been tabulated in Table 9 along with the proposed collaborative PDD scheme. All these methods used function-based approaches. It has been observed that the proposed collaborative approach claims the highest detection accuracy with the relatively lowest number of vocal features.
ough the SVM approach of Kadiri et al. [22] shows 73.32% detection accuracy, which is close to our approach, but at the same time, the number of vocal features used is not clearly highlighted.
A detailed comparison through additional performance measures helps to visualize the capability of the proposed approach over other Naïve Bayes approaches. For this comparison, the Avuçlu and Elen [18] and Bourouhou et al. [19] methods are taken into consideration. According to Table 10, the Avuçlu and Elen [18] method has the highest sensitivity score of 0.949. erefore, the concerned method indicates that 94.9% of Parkinson's subjects are detected among all the Parkinson's subjects. On the other hand, our proposed PD detection model is more precise with a 0.926 precision rate. In addition, it shows the lowest false positive rate in detecting control subjects as Parkinson's.

Discussion, Limitations, and Future Works
Like any other detection model, the proposed method also suffers few limitations. e proposed model is based on a voice signal dataset provided by the Department of Neurology in Cerrahpaşa, Faculty of Medicine, Istanbul. e pronunciation ascent of the sustained vowel /a/ is different for different geographical regions. As a result, the model may generate significant false positives or false negatives on the voice signals of subjects of other continents. erefore, it is essential for further evaluation of other voice signal datasets. As future work, the proposed model can be extended to a graphical user interface mode which must have scope to be trained on varying Parkinson's signal datasets. Gender and age of subjects are other aspects that need a detailed investigation, which the proposed approach lacks. It should be noted that gender and age play a significant role in vocal performance both for control and Parkinson's subjects [36,37]. An unbalanced dataset age and gender concerning disease pose considerable issues towards the detection process [36][37][38][39]. erefore, the number of participants in the dataset should be balanced based on genders and age for both Parkinson's and control classes. e assessment of gender and age parameters is missing in this research work and will remain a limitation. e disease severity is another factor that allows a detector to determine the stage of the PD. In the future, the proposed work can be modeled to predict the severity of the disease.
A good Parkinson's detection dataset containing acoustic features of the subjects needs to address various factors such as the balance of gender concerning age, microphone quality, noise, the robustness of analysis procedure, number of subjects, disease severity, and influence of medication. Recently, Rusz et al. [40] presented a guideline for speech recording, which can prepare acoustic datasets for Parkinson's detection. e dataset considered here addresses and meets almost all the parameters stated above. However, it still fails to reveal the disease severity, which is a critical issue for any Parkinson's detection system that relies on the dataset used here. erefore, the proposed work needs to be validated for disease severity prediction, which will make the application practical for clinical use. Similarly, incorporating event-driven methods may improve the performance of suggested solutions in terms of computational effectiveness, compression, and power consumption [41][42][43][44]. Future work considering these aspects may be investigated.

Conclusion
In this article, a collaborative PDD model has been proposed. e model relies on the vocal fold, time frequency, and baseline features of both control and Parkinson's subjects. ese vocal features are first ranked through correlation, fisher score, and mutual information-based feature selection schemes. e ranked features have been passed sequentially to many classifiers where Naïve Bayes evolved as the best classifier for the proposed model. e feature points are also identified based on the highest detection accuracy reported by Naïve Bayes. Relevant features are selected based on these feature points. A total of 7 ranked features has been selected from the vocal fold, time frequency, and baseline feature segments. e detection model based on the 7 ranked features shows promising detection accuracy of 78.97% and precision of 0.926, under the hold-out cross validation. e proposed model has also been compared with other function-based detection models, where our PD detection model proved to be accurate and precise. Finally, an extensive discussion has been carried out regarding the shortcoming and future direction of the proposed Parkinson's detection model.