Parkinson’s Disease Diagnosis in Cepstral Domain Using MFCC and Dimensionality Reduction with SVM Classifier

Department of Computer Science, University of Science and Technology Bannu, Bannu, Pakistan Raptor Interactive (Pty) Ltd., Eco Boulevard, Witch Hazel Ave, Centurion 0157, South Africa Department of Software Engineering, Foundation University Islamabad, Islamabad 44000, Pakistan Department of Electronics, University of Buner, Buner, Pakistan Department of Software, Ajou University, Suwon, Republic of Korea


Introduction
After Alzheimer's disease (AD), Parkinson's disease (PD) is the world's second most prevalent neurodegenerative disorder [1][2][3]. It has been reported that PD prevails at a rate of 0.3% in of the entire population in industrialized countries, while in elder population (60 or above age), the PD prevalence rate is 1% [1]. Impairments in voice have been reported to be the early biomarkers of the disease. Additionally, the proposed intelligent system has the capability to be used as an instrument for prodromal diagnosis. Notably, patients with REM sleep behaviour disorder (RBD) represent a good model as they develop PD with a high probability. It has been shown that slight speech and voice impairment may be a sensitive marker of preclinical PD [4][5][6][7].
People with PD face numerous symptoms including movement impairments (gait and tremors), poor balance, bradykinesia which is slowness of movement, and rigidity [8][9][10][11][12]. As discussed above, the lack of reliable tests for diagnosis of PD has made the diagnosis of PD a challenging task [13][14][15]. However, recent research reported that PD patients manifest impairments in voice and speech. However, these voice defects cannot be detected in clinics by medical practitioners. Hence, automated signal processing tools are required to capture these impairments in voice and to detect PD in its early stages. Recent research shows that machine learning and signal processing algorithms are successful in automated disease detection through automated risk factors extraction and classification [16][17][18][19]. Motivated by these studies, in this paper, we also attempt to develop a method based on machine learning and signal processing algorithms for PD detection. e automated disease detection methods discussed above motivated us to develop automated model for PD detection using signal processing algorithms for feature extraction from voice signals and machine learning algorithms for classification. Hence, we collected a voice dataset, namely, Pak-Voice-PD that contains multiple types of vowel phonations for two types of subjects, i.e., healthy and PD patients. Numerical features are extracted using mel-frequency cepstral coefficients (MFCCs). In order to obtain better PD detection performance, we project the MFCC features to lower dimensional space using linear discriminant analysis (LDA) approach. Finally, numerous machine learning models are developed with the goal of obtaining an optimal learning model. rough performance analysis, we pointed out that support vector machine with linear and radial basis function (RBF) kernels provide optimal performance. Hence, in this study, we propose automated PD detection based on MFCC-LDA-SVM hybrid approach. e working of the proposed MFCC-LDA-SVM model is depicted in Figure 1.
e main contributions of this study are as follows: (1) Collection of a relatively larger dataset: the collected database has relatively larger number of multiple types of voice phonations or samples. (2) Construction of unbiased machine learning models for the automated detection of PD.
(3) In this paper, we developed MFCC-LDA-SVM model for PD detection problem. To the best of our knowledge, no previous studies have explored development of MFCC-LDA-SVM model for PD detection based on voice data. (4) e proposed method, namely, MFCC-LDA-SVM has better performance than ten other machine learning models and many recently published studies.
e remaining of the manuscript presents related work in Section 2 and material and methods in Section 3. e evaluation and validation methods are briefly discussed in Section 4. Section 5 presents results of the proposed model and its discussion. Section 6 is about conclusion.

Related Work
During the last decade, various machine learning systems are proposed for the automated diagnosis of Parkinson's disease (PD) [20]. Resul [13] conducted a comparative study of different classification methods for effective diagnosis of the PD. Decision Tree, Regression, DMneural, and Neural Networks were evaluated for PD detection on the basis of performance scores. Neural network obtained the highest classification score of 92.9% as compared to rest of classifiers. Tsanas et al. [21] presented speech signal processing algorithms for the prediction of PD symptom severity using random forests and support vector machines. e proposed algorithms were reported to have achieved classification accuracy of 99% using 10 dysphonia features. Kaya et al. [22] developed an entropy-based discretization method where support vector machines, C4.5, k-nearest neighbors, and naive boys were used as classifiers for the detection of PD.
e proposed method was developed without using any preprocessing method. e discretization method improved the classification for diagnosis of PD by 4.1% to 12.8%.
Manda and Sairam [23] proposed a method for the early diagnosis of the PD based on the detection of dysphonia. A novel inference system measures the severity of disease through feature selection method based on support vector machines and ranker search method. Hariharan et al. [24] presented a hybrid intelligent system that consists of preprocessing through model-based clustering, feature selection using sequential forward selection, and linear discriminant analysis. For the classification purpose, least-square support vector machine (LS-SVM), probabilistic neural network (PNN), and general regression neural network (GRNN) are deployed. e maximum classification accuracy of 100% was achieved by the proposed method for Parkinson's dataset. Bhalchandra et al. [25] designed a system for early detection of Parkinson's disease (PD) using image processing to compute cheap-based features. Parkinson's progression markers initiative (PPMI) dataset was used along with a striatal binding ratio (SBR) to differentiate between the two types of subjects using discriminant analysis (DA) and support vector machine (SVM). e newly developed system observed the classification accuracy of 99.42%.
Saloni and Gupta [26] developed an algorithm for the detection of PD using clinical voice data. Voice features were used for the classification through support vector machines. e proposed algorithm achieved the accuracy of 100% for subset of features derived from the algorithm. Huang et al. [27] presented a framework for the prediction of Alzheimer's disease (AD) using nonlinear supervised sparse regressionbased random forest (RF). e probabilistic paths are assigned using proposed soft-split technique to test sample in RF for more accurate prediction. e proposed soft-split sparse regression-based RF helped to estimate the missing scores. e proposed method demonstrated superior performance as compared to the traditional RF and regression models. Al-Fatlawi et al. [28] adopted deep belief network (DBN) for automated diagnosis of Parkinson's disease (PD). Voice data of Parkinson's disease patients are used for the experiments.
e DBN classifier was composed of two stacked restricted Boltzmann machines (RBMs). e first stage is an unsupervised learning that used RBMs to eliminate the problems of the random value of initial weight. e second stage is a supervised learning based on the backpropagation algorithm for fine tuning. e accuracy reported by the proposed method was 94%.
Benba et al. [29] studied the discrimination between the two groups of people (patients with PD and healthy subjects) based on multiple types of voice samples. Human factor cepstral coefficients (HFCC) were used in the study. Voice print of the each voice recording was calculated for average value through the extracted HFCC. SVM with various kernels (RBF, Polynomial, Linear, and MLP) is deployed for the classification. e best accuracy of 87.5% was achieved through the linear kernel of SVM. Vaiciukynas et al. [30] adopted phonation corresponding to multiple types of vowel and speech tasks to pronounce short sentences in Lithuanian language. Random forest (RF) algorithm is utilized for the individual feature sets and decision-level fusion. It was pointed out that decision-level fusion provides better performance. Naranjo et al. [31] proposed a method for tracking Parkinson's disease (PD) through Bayesian linear regression approach. e proposed method was suitable for the handling of replicated measurements. Li et al. [32] designed a hybrid feature learning algorithm for classification of PD. Hybrid features were developed through combining features and segments. Different methods were deployed for the selection of efficient hybrid features. e classification is made on the basis of selected hybrid features.
Zhang et al. [33] proposed a telediagnosis method through smart phone and machine learning-based Parkinson's disease detection. Time frequency features, stack autoencoders (SAE), and k-nearest neighbor were used for the automated classification of the PD. e classification accuracy reported through proposed method was in the range from 94.00%-98.00%. In another study, Upadhya et al. [34] adopted Single Taper Smooth (STS) window and omson Multitaper (TMT) windowing techniques for MFCC and PLP voice feature extraction. For classification, neural network classifier was deployed for the classification of the subjects at the early stage of PD. Wu et al. [35] designed a feature learning technique for automatically learning about the extracted voice features. Spherical k-means model was deployed to train the two class sample space (PD patients and healthy subjects). e proposed method obtains the mean pooling accuracy of 95.35%. Ali et al. [20] studied the hand tremor abnormality detection associated with the risk of development of Parkinson's disease using a Chi2-based feature selection and Adaboostbased classification. Khan et al. [36] proposed a method for the prediction of cancer and Parkinson's disease. e proposed method utilized the wavelet-based neural networks for the prediction of cancer. e proposed evolutionary wavelet neural network was deployed on various biomedical benchmark datasets for breast cancer and Parkinson's disease, while 10-fold cross-validation scheme was used for performance evaluation metric. e accuracy achieved by the proposed method was 90%.
Braga et al. [37] presented a methodology for early detection of Parkinson's disease by using free-speech recording in uncontrolled background conditions. Machine learning (ML) algorithms along with signal and speech processing techniques were used for the early detection of the disease. For classification, support vector machine (SVM) and random forest (RF) were deployed. e accuracy reported by SVM (RBF) was 92.38% and 99.94% for RF. Recently, Ali [3] developed a hybrid intelligent system that carries out acoustic analysis of voice signals for automatically detecting Parkinson's disease (PD). Linear discriminant Mobile Information Systems analysis (LDA) was adopted for the dimension reduction and genetic algorithm (GA) for fine tuning the parameter of neural network. Leave one subject out (LOSO) validation scheme was used to avoid the subject overlap. e proposed intelligent system achieved the classification accuracy of 80%. Mostafa et al. [38] presented a Multiple Feature Evaluation Approach (MFEA) and classification machine learning methods (Neural networks, Decision tree, SVM, and Random forest) based on the voice disorders analysis. e performance of the proposed method was evaluated through 10-fold cross-validation metric.
e proposed system reported accuracy for SVM was 95.43%. Eskidere et al. [39] proposed a novel random subspace classifier ensemble and obtained 74.17% accuracy under 10-fold CV. Vadovský and Parali [40] utilized decision tree based methods, namely, C4.5, C5.0, Random Forest, CART, and obtained PD detection accuracy of 66.5% under 4-fold crossvalidation. Kraipeerapun and Amornsamankul [41] proposed stacking of complementary neural networks (CMTNN) and obtained classification accuracy of 75% under 10-fold cross-validation. e main problems in these studies were the inappropriate validation scheme that causes artificial subject overlap and baisedness in the developed models [2,42]. Hence, the obtained results are biased due to the subject overlap between training and testing datasets. In order to develop unbiased machine learning models, Sarkar et al. proposed to use a more practical validation scheme, namely, Leave One Subject Out (LOSO) cross-validation [42]. Under their proposed LOSO approach, they trained and tested KNN and SVM classifiers on multiple types of speech data collected from two classes, i.e., healthy and PD patients and achieved 55% of PD detection accuracy, which are unbiased and more practical results. e same LOSO approach was adopted by Canturk and Karabiber in [43]. In order to improve the PD detection while developing unbiased machine learning methods, they explored integration of four different feature selection methods with six different machine learning models. ey obtained best performance of 57.5 using LOSO approach. Recently, Ali et al. [44] proposed a multimodal approach under the LOSO approach and obtained unbiased performance of 70% classification accuracy using time frequency features.

Data Acquisition.
In this study, we collected voice and handwritten-based database from two types of populations, i.e., PD patients and healthy subjects. e phone was kept at a distance of 10 cm from each subject during recording of the voice phonations. Each subject was asked to pronounce sustained phonations "a," "o," and "u." Consequently, the database contains 160 × 3 � 480 voice samples. Out of these 480 samples, 300 samples belong to healthy subjects and the remaining 180 samples belong to the patient group.
e statistical information about the collected data have been reported in Table 1. Moreover, apart from using our own collected data, we also performed experiments on a bench mark dataset, namely, "multiple Types of Speech Dataset" [2].

Proposed Method.
In this paper, we propose a three stage automated approach for PD detection. e first stage uses MFCC approach for feature extraction. e second stage is about dimensionality reduction through LDA, while the third stage is classification. In order to obtain better results, we explore the feasibility of various machine learning models at the third stage of the system. Hence, we developed ten different machine learning models. Based on the performance analysis, we pointed out that our proposed method, namely, MFCC-LDA-SVM approach, provides optimal PD detection. e proposed approach is depicted in Figure 1. e working of each stage of the proposed learning system is briefly discussed as follows.

Feature Extraction through MFCC.
For extracting numerical features from the voice samples, we utilized the MFCC method. e MFFC algorithm establishes the relationship between perceived frequency and pitch of a pure tone as a function of its acoustic frequency. A subjective pitch is measured in the mel scale in units called mel. e mel for a given frequency f in Hz can be calculated using the following approximate formula [45]: f mel � 2595 × log 10 1 + f Hz 700 . (1) Framing: according to [46], it takes a long period of time to examine the voice signals. is is because the voice signals are not stationary. Hence, it is necessary to move on with a short time analysis (generally, from 10 ms to 30 ms). e rate of movement of the voice articulators is limited by physiological limitations and can be considered stable within an interval from 10 to 30 ms. erefore, the analysis of voice signal is carried out within uniform frames of this interval. In frame blocking, the voice signal is divided into frames of N samples. Neighboring frames should be separated by M (M < N).
Pre-emphasis: in this step, we emphasize the higher frequencies by applying the first-order difference equation to the voice samples. is is to increase the energy in the voice signal.
e difference equation to voice signal (S n , n � 1, . . . , N) is given in equation (2) [47] as follows: where k is the pre-emphasis coefficient, and it should be within the range of 0⩽k < 1. Following the approach of [29], in this work, we used a pre-emphasis coefficient of k � 0.97. Windowing: in order to minimize disrupts at the ends and make them continuous enough to correlate with the beginnings, windowing must be applied. Ideally, there exist several window functions (flat top window, hamming window, and rectangular window); however, the hamming window is used in our study for carrying out windowing. It is used to abate (decrease) signal to zero at the beginning and end of each frame and be represented as follows: where s n is the voice samples and n � 1, N. Fast Fourier transform: the main purpose of FFT is to have a look at frequency domain information when the given signal information are in time domain. For this purpose, we will have to convert into frequency domain each frame having N samples. Compared to DFT, i.e., discrete Fourier transform, FFT is a faster algorithm on the given set of N samples [46,47]: where n � 0, 1, 2, . . . , N − 1.
Mel scale/filter bank analysis: here, the approximation about the existing energy at each spot is determined. us, the spectrums calculated above are mapped on a mel scale using a triangular overlapping window, i.e., triangular filter bank (FB). e FB consists of a number of band pass filters with spacing along with bandwidth which is decided by steady mel frequency time. e mel frequency scale takes a linear spacing for frequency values below 1000 Hz and logarithmic spacing for values above 1000 Hz. To convert a given frequency (f) to a mel frequency (m f ), we used the approximate equation (1) [29].
Logarithm/DCT: with the intension of back conversion to spatial domain from the log mel spectrum, discrete cosine transform is brought into account for evaluating coefficients from the spectrum. us, we calculate the MFCC from the amplitudes of the log filter banks [15]: Liftering: lack of correlation among the cepstral coefficients is the key advantage. However, the fact that the cepstral coefficients of higher order are fairly small is the main problem. Hence, rescaling of the coefficients is necessary in order to have quite similar magnitudes [29,45].
ere is, therefore, the need to apply liftering to the cepstral coefficients using the following equation: where L is the cepstral sine lifter parameter.

Linear Discriminant analysis (LDA).
LDA is a supervised ML technique that is mostly used for classification and dimensionality reduction. e working of LDA is based on linear transformation of data (features) into small dimensional space, for maximum discrimination between classes [48]. LDA, in machine learning, is search for the vectors based on linear combination of features in vector space that separates two or more classes. Furthermore, original data values are plotted on the vectors for evaluation of the classes division. When classes are overlapped on the particular data values, then transformation mechanism is adopted by the LDA for better separation of the classes. To achieve the better separation between the classes, LDA deploys a rule known as the Fisher ratio. e maximum value of the Fisher ratio means maximum distance between the two classes. Equation (7) is the formulation of the Fisher ratio: where ρ 1 and ρ 2 denote the variance of 1 st and 2 n d class, while (υ 1 − υ 2 ) is the difference between the means of the two classes. ρ 2 1 + ρ 2 2 is the sum of classes scatter. For example, δ m tries to compact two classes by reducing (υ 1 − υ 2 ) and δ s tries to minimize the class scatter. For detailed formulation and discussion about LDA, readers can refer to [3].
LDA has the following two benefits. Firstly, the performance of the predictive model is enhanced by LDA through transforming the original feature dimension into reduced dimensional space, where the class division is maximized. Secondly, time complexity of the predictive model reduced tremendously by LDA. Reduced dimensionality data by the LDA are supplied to the SVM for classification.

Support Vector
Machine. Support vector machines (SVMs) are considered powerful learning methods and have been widely used in different biomedical-and health informatics-related problems [49]. During the training process, the output of an SVM model is an optimal hyperplane that could augment the distance of any class from the nearest training data points. e major reasons that motivate machine learning researchers to use SVM for their problems are as follows. (1) e first reason is that SVMs have powerful generalization capabilities to unseen data. (2) e second reason is the dependence of SVMs on a very small number of hyperparameters [50,51].
, where x i stands for i th instance, Q represents the dimension of the original features space of PD data, and y i denotes the class labels, i.e., presence or absence of PD disease. e Q value is 20 for the PD dataset considered in this paper. e SVM model determines a hyperplane calculated by f(x) � θ T * x + δ, where δ represents the bias and θ denotes the weight vector. Based on training data, the hyperplane f(x) of the SVM model augments the margin whereas curtails (reduces) the classification error. Sum of the distances to one of the closest negative and one of the closest positive instances is regarded as margin. e margin is defined as the sum of the distances between the closest negative and closest positive instances.
at is, the hyperplane augments the margin distance 2/‖θ‖ 2 2 . SVM uses a set of lax variables denoted by ξ i , i � 1, . . . , S, and a penalty parameter, i.e., C, and attempts to parity the minimization of ‖θ‖ 2 2 and minimization of the misclassification errors. is fact is formulated as follows: In equation (8), ξ is lax variable that calibrates the degree of misclassification and Euclidean norm or L 2 -norm is the penalty term.

Validation and Evaluation of the Proposed Approach
In order to validate the effectiveness of the proposed approach, we utilized leave-one-subject-out (LOSO) validation scheme in which the data of the one subject (all samples) are left out for testing and the proposed framework is trained on the remaining data. e process is repeated till the point where all the subjects have been tested. At the end, the final accuracy of the model is evaluated by calculating the mean accuracy for all the subjects.
To evaluate the performance of the proposed framework, we utilize some well-known statistical metrics, namely, Mathews Correlation Coefficient (MCC), sensitivity, specificity, and classification accuracy. Classification accuracy gives the precision with which the proposed method can classify all subjects (including patients and healthy). On the contrary, specificity tells us about how precise the model can classify healthy subjects and sensitivity tells us about how precise the developed model can classify patients. If A denotes the number of true positives, B denotes the number of true negatives, C denotes the number of false positives, and D denotes the number of false negatives, then the formulation of these evaluation metrics is given in equations (9)-(12): where MCC is a value in the range −1 to 1, where −1 denotes the worse case and 1 denotes the best case.

Experiment Results
In this section, we discuss the implementation details and the obtained performance of different developed machine learning models for the problem of PD detection based on the voice data. All the experiments were performed using Intel (R) Core (TM) m3-7Y30 CPU @ 1.00 GHz 1.61Ghz with memory of 8 GB and operating system of 64 bit Windows. All the experiments were performed using Python programming package and scikit-learn library. e first experiment was performed by extracting the MFCC features from the voice phonations. e extracted MFCC was in the form of a matrix for each voice phonation. e matrix contained 20 columns which act as MFCC features. Following the approach of previous studies, we evaluated mean for each column or MFCC feature along the rows of the matrix. In this way, we obtained a feature vector of size equal to 20 for each voice phonation. Next, we used iterative feature selection before application of LDA for dimensionality reduction. After dimensionality reduction through the LDA model, we applied the resultant feature vectors at the input of machine learning models. e results for each of the developed machine learning models are given in Table 2.
After observing the results given in Table 2, it can be seen that the worst performance was produced by the GNB model and SVM with sigmoid kernel which are 48.12% accuracy and 46.87%, respectively, while the optimal performance is produced by the SVM model with RBF kernel which is 77.5% accuracy, 84% specificity, and 74.33% sensitivity. It means the proposed MFCC-LDA-SVM model can correctly classify 124 subjects out of the total 160 subjects. Similarly, the specificity value of 80% reveals that out of 100 healthy subjects, 80 are correctly classified, while the sensitivity rate of 73.33 reveals the fact that out of 60 PD patients, the proposed model can successfully detect 44 PD patients correctly. ese statistical results are more clearly depicted in the confusion matrix given in Figure 2.
e performance of the MFCC-LDA-SVM model is further evaluated in terms of area under the curve (AUC) matrix which was calculated from the receiver operating characteristic curve (ROC curve). e ROC curve for the two models with worse performance and the ROC curve for the two models with optimal performance is given in Figures 3  and 4, respectively. It is important to note that a model with higher AUC is decided as a much better model than those models which are having lower values of AUC. Based on these evaluation criteria, we can see in the Figures 3 and 4 that the proposed MFCC-LDA-SVM is an optimal model when compared with other developed models. Additionally, for further validation of the proposed approach, it is compared with recently published studies shown in Table 3. e data were collected by different individuals who had different smart phones for recording the voice data. It is a well-known fact that spectral characteristics of the microphone can highly influence the results, especially considering that MFCCs have been used in the study. ese factors can degrade the performance of the proposed intelligent  system. Furthermore, the shorter length of phonations in PD could be another factor influencing cepstral analysis. To check the strength of our model, we simulated the same model on a publicly available dataset, namely, "Multiple Types of Speech Dataset" [42]. e proposed intelligent system, i.e., MFCC-LDA-SVM obtained outstanding results on the publicly available dataset. Using LOSO CV on the training dataset of the "Multiple Types of Speech Dataset," we obtained 97.5% of accuracy, 100% sensitivity, and 95% specificity. Similarly, the proposed intelligent method produced accuracy of 89.28% on the testing dataset of the "Multiple Types of Speech Dataset."

Conclusion
In this study, we considered the challenge of PD detection based on multiple types of voice signals. From each subject, we recorded three different voice phonations. Signal processing algorithm (MFCC) was utilized to extract numerical features from the voice phonations. e extracted MFFC features were dimensionality reduced through the application of the linear discriminant analysis (LDA) model. At the final stage, numerous machine learning models were developed. It was pointed out that the MFCC-LDA-SVM method produces optimal performance in terms of PD detection. e performance comparison was carried out using different evaluation criteria including classification accuracy, area under the curve (AUC), and receiver operating characteristics curve. e proposed method produced AUC of 87%, PD detection accuracy of 78.5%, sensitivity of 73.33%, and specificity of 80%. Moreover, the proposed intelligent system was also simulated on the publicly available dataset. e obtained results were promising compared to the previous work.  Data Availability e data used to support the findings of the study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.