A novel hybrid method named SCFW-KELM, which integrates effective subtractive clustering features weighting and a fast classifier kernel-based extreme learning machine (KELM), has been introduced for the diagnosis of PD. In the proposed method, SCFW is used as a data preprocessing tool, which aims at decreasing the variance in features of the PD dataset, in order to further improve the diagnostic accuracy of the KELM classifier. The impact of the type of kernel functions on the performance of KELM has been investigated in detail. The efficiency and effectiveness of the proposed method have been rigorously evaluated against the PD dataset in terms of classification accuracy, sensitivity, specificity, area under the receiver operating characteristic (ROC) curve (AUC),
Parkinson’s disease (PD) is one degenerative disease of the nervous system, which is characterized by a large group of neurological conditions called motor system disorders because of the loss of dopamine-producing brain cells. The main symptoms of PD are given as follows:
Previous studies on the PD problem based on machine learning methods have been undertaken by various researchers. Little et al. [
From these works, it can be seen that most of the common classifiers from machine learning community have been used for PD diagnosis. For the nonlinear classification problems, the data preprocessing methods such as feature weighting, normalization, and feature transformation could increase the performance of alone classifier algorithm. So it is obvious that the choice of an efficient feature preprocessing method and an excellent classifier is of significant importance for the PD diagnosis problem. Aiming at improving the efficiency and effectiveness of the classification performance for the diagnosis of PD, in this paper, an efficient features weighting method called subtractive clustering features weighting (SCFW) and a fast classification algorithm named kernel-based extreme learning machine (KELM) are examined. The SCFW method is used to map the features according to data distributions in dataset and transform linearly nonseparable dataset to linearly separable dataset. In this way, the similar data within each feature are prone to getting together so that the distinction between classes is increased to classify the PD datasets correctly. It is reported that SCFW method can help improve the discrimination abilities of classifiers in many applications, such as traffic accident analysis [
The main contributions of this paper are summarized as follows. It is the first time that we have proposed to integrate SCFW approach with KELM classifier to detect PD in an efficient and effective way. In the proposed system, SCFW method is employed as data preprocessing tool to strengthen the discrimination between classes for further improving the distinguishing performance of KELM classifier. Compared with the existing methods in previous studies, the proposed diagnostic system has achieved excellent classification results.
The rest of the paper is organized as follows. Section
Subtractive clustering is the improved version of mountain clustering algorithm. The problem of mountain clustering is that its calculation grows exponentially with the dimension of the problem. Subtractive clustering has solved this problem using data points as the candidates for cluster centers, instead of grid points as in mountain clustering, so the calculation cost is proportional to the problem size instead of the problem dimension [
Consider a collection of
After the density measure of each data point has been calculated, the data point with the highest density measure is selected as the first cluster center. Let
After the density calculation for each data point is revised, the next cluster center
For SCFW method, firstly the cluster centers of each feature are calculated by using subtractive clustering. After calculating the centers of features, the ratios of means of features to their cluster centers are calculated and these ratios are multiplied with the data of each feature [
Load the PD dataset, represent the data as a matrix Initialize the corresponding values; Calculate the cluster centers using subtractive clustering method; Calculate the mean values of each feature in
The flowchart of SCFW algorithm.
ELM is an algorithm originally developed for training single hidden layer feed-forward neural networks (SLFNs) [
The structure of ELM.
For given
The determination of the output weights is calculated by the least square method:
To improve the generalization capabilities of ELM in comparison with the least square solution-based ELM, Huang et al. [
Therefore, the output function is expressed as follows:
When the hidden feature mapping function
This work proposes a novel hybrid method for PD diagnosis. The proposed model is comprised of two stages as shown in Figure
Weight features using subtractive clustering algorithm; Training set = Test set = remaining subset; Train KELM classifier in the weighted training data feature space, store the best parameter combination; Test the trained KELM model on the test set using the achieved best parameter combination; Return the average classification results of KELM over
The overall procedure of the proposed hybrid diagnosis system.
In this section, we have performed the experiments in the PD dataset taken from University of California Irvine (UCI) machine learning repository (
The details of the whole 22 features of the PD dataset.
Label | Feature | Description |
---|---|---|
F1 | MDVP: Fo (Hz) | Average vocal fundamental frequency |
F2 | MDVP: Fhi (Hz) | Maximum vocal fundamental frequency |
F3 | MDVP: Flo (Hz) | Minimum vocal fundamental frequency |
F4 | MDVP: Jitter (%) | Several measures of variation in fundamental frequency |
F5 | MDVP: Jitter (Abs) | |
F6 | MDVP: RAP | |
F7 | MDVP: PPQ | |
F8 | Jitter: PPQ | |
F9 | MDVP: Shimmer | Several measures of variation in amplitude |
F10 | MDVP: Shimmer (dB) | |
F11 | Shimmer: APQ3 | |
F12 | Shimmer: APQ5 | |
F13 | MDVP: APQ | |
F14 | Shimmer: DDA | |
F15 | NHR | Two measures of ratio of noise to tonal components in the voice |
F16 | HNR | |
F17 | RPDE | Two nonlinear dynamical complexity measures |
F18 | D2 | |
F19 | DFA | Signal fractal scaling exponent |
F20 | Spread1 | Three nonlinear measures of fundamental frequency variation |
F21 | Spread2 | |
F22 | PPE |
The proposed SCFW-KELM classification model was carried out on the platform of MATLAB 7.0. The SCFW algorithm was implemented from scratch. For KELM and ELM, the implementation available from
For SVM, LIBSVM implementation was used, which was originally developed by Chang and Lin [
In order to guarantee the valid results,
In order to evaluate the prediction performance of SCFW-KELM model, we used six performance metrics, ACC, sensitivity, specificity, AUC,
The confusion matrix.
Predicted patients with PD | Predicted healthy persons | |
---|---|---|
Actual patients with PD | True positive (TP) | False negative (FN) |
Actual healthy persons | False positive (FP) | True negative (TN) |
In the confusion matrix, TP is the number of true positives, which represents that some cases with PD class are correctly classified as PD. FN is the number of false negatives, which represents that some cases with the PD class are classified as healthy. TN is the number of true negatives, which represents that some cases with the healthy class are correctly classified as healthy and FP is the number of false positives, which represents that some cases with the healthy class are classified as PD. ACC is a widely used metric to determine class discrimination ability of classifiers. The receiver operating characteristic (ROC) curve is usually plotted using true positives rate versus false positives rate, as the discrimination threshold of classification algorithm is varied. The area under ROC curve (AUC) is widely used in classification studies with relevant acceptance and it is a good summary of the performance of the classifier [
In this experiment, we firstly evaluated KELM in the original feature space without SCFW. It is known that different types of kernel activation functions have great influence on the performance of KELM. Therefore, we presented the results from our investigation on the influence of different types of kernel functions and assigned initial values for them. We tried to perform four types of kernel functions, including radial basis function (RBF_kernel), wavelet kernel function (Wav_kernel), linear kernel function (Lin_kernel), and polynomial kernel function (Poly_kernel). Table
To investigate whether SCFW method can improve the performance of KELM, we further conducted the model in the PD dataset in the weighted feature space by SCFW. The proposed system consisted of two stages. Firstly, SCFW approach was used to weight the features of PD dataset. By using SCFW method, the weighted feature space was constructed. Table
Results of KELM with different types of kernel functions in the original PD dataset without SCFW.
Kernel type | Performance metrics | Mean | SD | Max | Min |
---|---|---|---|---|---|
RBF_kernel | ACC (%) |
|
4.66 | 100 | 89.74 |
Sensitivity (%) |
|
5.19 | 100 | 88.89 | |
Specificity (%) |
|
5.93 | 100 | 88.00 | |
AUC (%) |
|
4.06 | 100 | 90.43 | |
|
|
||||
Kappa |
|
||||
|
|||||
Wav_kernel | ACC (%) | 94.36 | 4.59 | 100 | 87.18 |
Sensitivity (%) | 91.24 | 6.02 | 100 | 83.33 | |
Specificity (%) | 95.15 | 5.23 | 100 | 86.21 | |
AUC (%) | 93.19 | 4.56 | 100 | 88.10 | |
|
0.9622 | ||||
Kappa | 0.8425 | ||||
|
|||||
Lin_kernel | ACC (%) | 89.23 | 7.99 | 97.44 | 79.49 |
Sensitivity (%) | 66.07 | 22.33 | 90.91 | 41.67 | |
Specificity (%) | 97.32 | 2.80 | 100 | 93.33 | |
AUC (%) | 81.70 | 12.22 | 95.45 | 68.89 | |
|
0.9316 | ||||
kappa | 0.6333 | ||||
|
|||||
Poly_kernel | ACC (%) | 90.77 | 4.29 | 97.44 | 87.18 |
Sensitivity (%) | 87.73 | 11.54 | 100 | 75.00 | |
Specificity (%) | 91.83 | 5.73 | 96.77 | 82.76 | |
AUC (%) | 89.78 | 5.78 | 98.39 | 82.66 | |
|
0.9375 | ||||
kappa | 0.7547 |
The cluster centers of the features of PD dataset using SCFW method.
Number of feature | Centers of the features using SCFW (normal case) | Centers of the features using SCFW (PD case) |
---|---|---|
F1 | 154.229 | 181.938 |
F2 | 197.105 | 223.637 |
F3 | 116.325 | 145.207 |
F4 | 0.006 | 0.006 |
F5 | 0 | 0 |
F6 | 0.003 | 0.003 |
F7 | 0.003 | 0.003 |
F8 | 0.01 | 0.01 |
F9 | 0.03 | 0.03 |
F10 | 0.282 | 0.276 |
F11 | 0.016 | 0.015 |
F12 | 0.018 | 0.018 |
F13 | 0.024 | 0.013 |
F14 | 0.047 | 0.045 |
F15 | 0.025 | 0.028 |
F16 | 21.886 | 24.678 |
F17 | 0.499 | 0.443 |
F18 | 0.718 | 0.696 |
F19 | −5.684 | −6.759 |
F20 | 0.227 | 0.161 |
F21 | 2.382 | 2.155 |
F22 | 0.207 | 0.123 |
The box graph representation of the original and weighted PD dataset.
Three-dimensional distribution of two classes in the original and weighted feature space by the best three principle components obtained with PCA method.
The detailed results obtained by SCFW-KELM with four types of different kernel functions were presented in Table
Results of SCFW-KELM with different types of kernel functions in the PD dataset.
Kernel type | Performance metrics | Mean | SD | Max | Min |
---|---|---|---|---|---|
RBF_kernel | ACC (%) |
|
1.15 | 100 | 97.44 |
Sensitivity (%) |
|
0 | 100 | 100 | |
Specificity (%) |
|
1.36 | 100 | 96.97 | |
AUC (%) |
|
0.68 | 100 | 98.48 | |
|
|
||||
Kappa |
|
||||
|
|||||
Wav_kernel | ACC (%) | 96.92 | 2.15 | 100 | 94.87 |
Sensitivity (%) | 98.46 | 3.44 | 100 | 92.31 | |
Specificity (%) | 96.54 | 2.39 | 100 | 93.33 | |
AUC (%) | 97.50 | 2.18 | 100 | 94.23 | |
|
0.9793 | ||||
Kappa | 0.9194 | ||||
|
|||||
Lin_kernel | ACC (%) | 96.92 | 2.15 | 100 | 94.87 |
Sensitivity (%) | 90.43 | 8.85 | 100 | 81.82 | |
Specificity (%) | 99.29 | 1.60 | 100 | 96.43 | |
AUC (%) | 94.86 | 3.99 | 100 | 90.91 | |
|
0.9798 | ||||
Kappa | 0.9147 | ||||
|
|||||
Poly_kernel | ACC (%) | 97.43 | 2.56 | 100 | 94.87 |
Sensitivity (%) | 96.67 | 7.45 | 100 | 83.33 | |
Specificity (%) | 97.37 | 3.61 | 100 | 93.10 | |
AUC (%) | 97.02 | 3.42 | 100 | 91.67 | |
|
0.9828 | ||||
Kappa | 0.9323 |
Table
Confusion matrix of KELM with RBF kernel function in the original and weighted PD dataset.
Method | Expected output | Prediction output | |
---|---|---|---|
KELM | Patients with PD | 141 | 6 |
Healthy persons | 2 | 46 | |
|
|||
SCFW-KELM | Patients with PD | 146 | 1 |
Healthy persons | 0 | 48 |
For SVM classifier, we have performed SVM classifier with RBF kernel. It is known that the performance of SVM is sensitive to the combination of the penalty parameter
For original ELM, we know that the classification performance of ELM with sigmoid additive function is sensitive to the number of hidden neurons
The effects of hidden neurons in original ELM in the classification of the original and weighted PD dataset.
Effects of
Effects of
For KNN classifier, the influence of neighborhood size
The effects of
Effects of
Effects of
For KELM classifier, there were two parameters, the penalty parameter
Test accuracy surface with parameters in KELM in the original and weighted PD dataset.
Effects of
Effects of
Table
The results obtained from four algorithms in the original and weighted PD dataset.
Methods | Performance metrics | Original feature space without SCFW method | Weighted feature space with SCFW method |
---|---|---|---|
KELM-RBF | ACC (%) | 95.89 ± 4.66 | 99.49 ± 1.15 |
Sensitivity (%) | 96.35 ± 5.19 | 100 ± 0 | |
Specificity (%) | 95.72 ± 5.93 | 99.39 ± 1.36 | |
AUC (%) | 96.04 ± 4.06 | 99.69 ± 0.68 | |
|
0.9724 | 0.9966 | |
Kappa | 0.8925 | 0.9863 | |
Time (s) | 0.00435 | 0.0126 | |
|
|||
SVM | ACC (%) | 95.38 ± 1.15 | 97.95 ± 2.15 |
Sensitivity (%) | 85.09 ± 10.45 | 96.67 ± 7.45 | |
Specificity (%) | 98.67 ± 2.98 | 98.71 ± 1.77 | |
AUC (%) | 91.88 ± 4.14 | 97.69 ± 3.46 | |
|
0.9699 | 0.9863 | |
Kappa | 0.8711 | 0.9447 | |
Time (s) | 1.24486 | 1.29817 | |
|
|||
KNN | ACC (%) | 95.38 ± 5.25 | 97.43 ± 3.14 |
Sensitivity (%) | 92.73 ± 11.85 | 97.78 ± 4.97 | |
Specificity (%) | 96.50 ± 4.38 | 97.38 ± 4.10 | |
AUC (%) | 94.61 ± 6.95 | 97.58 ± 2.60 | |
|
0.9692 | 0.9828 | |
Kappa | 0.8765 | 0.9431 | |
Time (s) | 1.2847 | 1.3226 | |
|
|||
ELM | ACC (%) | 89.23 ± 6.88 | 96.92 ± 4.21 |
Sensitivity (%) | 73.94 ± 13.18 | 95.78 ± 5.79 | |
Specificity (%) | 93.35 ± 6.27 | 97.19 ± 4.51 | |
AUC (%) | 83.64 ± 9.06 | 96.48 ± 4.36 | |
|
83.64 ± 9.06 | 0.9863 | |
Kappa | 0.7078 | 0.9447 | |
Time (s) | 1.1437 | 1.2207 |
In comparison with SVM, SCFW-SVM has achieved the results of 97.95%, 96.67%, 98.71%, and 97.6% in terms of ACC, sensitivity, specificity, and AUC and improved the performance by 2.57%, 11.58%, 0.04%, and 5.72%, respectively. KNN also has significantly improved by SCFW method. For ELM classifier, it has achieved best results by ELM with 36 hidden neurons on the original feature space, while the best performance was achieved by SCFW-ELM with small hidden neurons (only 26). It meant that the combination of SCFW and ELM not only significantly improved the performance but also compacted the network structure of ELM. Moreover, the sensitive results of SVM and ELM were significantly improved by 11.58% and 21.84%, respectively. Whatever in the original or weighted feature space, KELM with RBF kernel was much superior to the other three models by a large percentage in terms of ACC, sensitivity, specificity, AUC,
Additionally, it is interesting to find that the standard deviation of SCFW-KELM was much lower than that of KELM, and it had the smallest SD in all of the models, which meant SCFW-KELM became more robust and reliable by means of SCFW method. In addition, the reason why SCFW method outperforms FCM is that SCFW may be more suitable for nonlinear separable datasets. It considers the density measure of data points to reduce the influence of outliers; however, FCM tends to select outliers as initial centers.
For comparison purpose, the classification accuracies achieved by previous methods which researched the PD diagnosis problem were presented in Table
Classification accuracies achieved with our method and other methods.
Study | Method | Accuracy (%) |
---|---|---|
Little et al. [ |
Preselection filter + exhaustive search + SVM | 91.40 (bootstrap with 50 replicates) |
Shahbaba and Neal [ |
Dirichlet process mixtures | 87.70 (5-fold CV) |
Das [ |
ANN | 92.90 (hold out) |
Sakar and Kursun [ |
Mutual information + SVM | 92.75 (bootstrap with 50 replicates) |
Psorakis et al. [ |
Improved mRVMs | 89.47 (10-fold CV) |
Guo et al. [ |
GP-EM | 93.10 (10-fold CV) |
Luukka [ |
Fuzzy entropy measures + similarity | 85.03 (hold out) |
Ozcift and Gulten [ |
CFS-RF | 87.10 (10-fold CV) |
Li et al. [ |
Fuzzy-based nonlinear transformation + SVM | 93.47 (hold out) |
Åström and Koker [ |
Parallel NN | 91.20 (hold out) |
Spadoto et al. [ |
PSO + OPF |
73.53 (hold out) |
Daliri [ |
SVM with chi-square distance kernel | 91.20 (50-50% training-testing) |
Polat [ |
FCMFW + KNN | 97.93 (50-50% training-testing) |
Chen et al. [ |
PCA-FKNN | 96.07 (average 10-fold CV) |
Zuo et al. [ |
PSO-FKNN | 97.47 (10-fold CV) |
This study | SCFW-KELM |
|
Besides the PD dataset, two benchmark datasets, that is, Cleveland Heart and Wisconsin Diagnostic Breast Cancer (WDBC) datasets, from the UCI machine learning repository, have been used to further evaluate the efficiency and effectiveness of the proposed method. We used the same flow as in the PD dataset for the experiments of two datasets. The weighted features space of datasets was constructed using SCFW and then the weighted features were evaluated with the four mentioned algorithms. It will only give the classification results of four algorithms for the sake of convenience. Table
Results of SCFW-KELM with different types of kernel functions in Cleveland heart dataset.
Kernel type | Performance metrics | Mean | SD | Max | Min |
---|---|---|---|---|---|
RBF_kernel | ACC (%) |
|
0.91 | 100 | 98.33 |
Sensitivity (%) |
|
0 | 100 | 100 | |
Specificity (%) |
|
1.72 | 100 | 96.67 | |
AUC (%) |
|
0.86 | 100 | 98.33 | |
|
|
||||
Kappa |
|
||||
|
|||||
Wav_kernel | ACC (%) | 99.01 | 0.90 | 100 | 98.36 |
Sensitivity (%) | 100 | 0 | 100 | 100 | |
Specificity (%) | 97.84 | 2.02 | 100 | 95.83 | |
AUC (%) | 98.92 | 1.01 | 100 | 97.92 | |
|
0.9891 | ||||
Kappa | 0.98 | ||||
|
|||||
Lin_kernel | ACC (%) | 93.07 | 93.07 | 93.07 | 93.07 |
Sensitivity (%) | 98.77 | 98.77 | 98.77 | 98.77 | |
Specificity (%) | 87.05 | 87.05 | 87.05 | 87.05 | |
AUC (%) | 92.91 | 92.91 | 92.91 | 92.91 | |
|
0.9195 | ||||
Kappa | 0.8591 | ||||
|
|||||
Poly_kernel | ACC (%) | 98.35 | 2.33 | 100 | 95.08 |
Sensitivity (%) | 100 | 0 | 100 | 100 | |
Specificity (%) | 96.60 | 5.01 | 100 | 88.89 | |
AUC (%) | 98.30 | 2.50 | 100 | 94.44 | |
|
0.9817 | ||||
Kappa | 0.9667 |
Results of SCFW-KELM with different types of kernel functions in WDBC dataset.
Kernel type | Performance metrics | Mean | SD | Max | Min |
---|---|---|---|---|---|
RBF_kernel | ACC (%) |
|
0.79 | 100 | 98.23 |
Sensitivity (%) |
|
2.13 | 100 | 95.24 | |
Specificity (%) |
|
0 | 100 | 100 | |
AUC (%) |
|
1.06 | 100 | 97.62 | |
|
|
||||
Kappa |
|
||||
|
|||||
Wav_kernel | ACC (%) | 99.65 | 0.48 | 100 | 99.12 |
Sensitivity (%) | 99.10 | 1.24 | 100 | 97.62 | |
Specificity (%) | 100 | 0 | 100 | 100 | |
AUC (%) | 99.54 | 0.66 | 100 | 98.65 | |
|
0.9958 | ||||
Kappa | 0.9925 | ||||
|
|||||
Lin_kernel | ACC (%) | 98.07 | 1.69 | 100 | 95.61 |
Sensitivity (%) | 94.70 | 5.27 | 100 | 86.11 | |
Specificity (%) | 100 | 0 | 100 | 100 | |
AUC (%) | 97.35 | 2.63 | 100 | 93.06 | |
|
0.9848 | ||||
Kappa | 0.9582 | ||||
|
|||||
Poly_kernel | ACC (%) | 99.40 | 0.88 | 99.12 | 97.37 |
Sensitivity (%) | 95.33 | 2.07 | 97.73 | 93.48 | |
Specificity (%) | 100 | 0 | 100 | 100 | |
AUC (%) | 97.67 | 1.04 | 98.86 | 96.74 | |
|
0.9944 | ||||
Kappa | 0.962 |
In this work, we have developed a new hybrid diagnosis method for addressing the PD problem. The main novelty of this paper lies in the proposed approach; the combination of SCFW method and KELM with different types of kernel functions allows the detection of PD in an efficient and fast manner. Experiments results have demonstrated that the proposed system performed significantly well in discriminating the patients with PD and healthy ones. Meanwhile, the comparative results are conducted among KELM, SVM, KNN, and ELM. The experiment results have shown that the SCFW-KELM method performs advantageously over the other three methods in terms of ACC, sensitivity, specificity, AUC,
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research is supported by the Natural Science Foundation of China (NSFC) under Grant nos. 61170092, 61133011, 61272208, 61103091, 61202308, and 61303113. This research is also supported by the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, under Grant no. 93K172013K01.