Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network Method and Telediagnosis System Implementation

Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed.


Introduction
Parkinson's disease (PD) is a disorder of brain nervous system which can cause partial or full loss in movement, behavior, and mental processing, especially speech function [1]. Generally, PD can be observed in elderly people and causes disorders in speech [2]. At present, about 1% of the worldwide population over the age of fifty is suffering from PD [3]. Until now, many effective methods and medicines [4][5][6] are invented for relieving the symptoms of PD. Therefore, an early-diagnosis in time and available treatment can improve the prognosis of PD [7]. However, many patients diagnosed with PD are later found which resulted in delays in patient condition [8]. Moreover, many patients with Parkinson's disease in the community still remain undiagnosed and more patients get worse because of poor medical conditions in low income areas [9].
Even though many clinical examinations and diagnostic to PD have been proposed [10][11][12][13][14], it is important that we should exert more effort in automated diagnosis and telediagnosis [15] in real world. In the research of Esteva et al. [16], they deduced that billions of smartphones have the potential to provide medical care for skin cancer diagnosis. Inspired by this novel idea, a smartphone also has the potential to diagnose PD. Many clinical reports reveal that the dysphonic indicator is an important reference index in diagnosis [17]. Therefore, our method is based on the study of vocal impairment symptoms (dysphonia) (90% of people with PD have such symptom) [18].
The purpose of this research is to design a machine learning based telediagnosis PD system for patients by using a smartphone. We found that deep neural network [19] with nearest neighbor (KNN) [20] method can achieve better performance on available speech datasets than other comparative methods. Not only is the proposed telediagnosis system compared with other researchers' methods, but also the running time cost is tested in browser/server system. To conclude, the contribution of this paper includes (1) achieving high performance and (2) proposing a feasible implementation with empirical test. This paper is organized as follows: in Section 2, background and related works are presented. In Section 3, the proposed method and telediagnosis system are described. Section 4 shows the experimental results. Conclusions are drawn in Section 5.

Parkinson's Disease and Speech Disorders.
Many researchers have exerted much effort on PD in their researches. In 2006, Rao et al. [23] discussed diagnosis and treatment for PD. The author insisted that psychosis is usually drug induced; further it can be managed initially by reducing antiparkinsonian medications. Jankovic and Aguilar [24] reviewed approaches to the treatment of PD and the authors think that the new treatments are not necessarily better than the established conventional therapy and that the treatment options must be individualized and tailored to the needs of each individual patient. In 2010, Varanese et al. [25] showed treatment of advanced PD, and the research paper concluded that supportive care, including physical and rehabilitative interventions, speech therapy, occupational therapy, and nursing care, has a key role in the late stage of disease. Yitayeh and Teshome [26] reviewed the effectiveness of physiotherapy treatment on balance dysfunction and postural instability in persons with Parkinson's disease and the author also presented meta-analysis results, in 2016. So far, it has been reported that of the 89% of PD patients with voice and speech disorders [27,28], the reduced speech ability to communicate is considered to be one of the most important aspects of PD by many patients [29]. The common perceptual features of reduced loudness (hypophonia), reduced pitch variation (monotone), breathy and hoarse voice quality, and imprecise articulation [30], together with lessened facial expression (masked faces), contribute to limitations in communication in the majority of people with PD [31]. Figure 1 shows the comparison of waveforms between people with PD and healthy people. Intuitively, it can be observed that the waveform of healthy people is smooth and continuous, but the waveform of people with PD contains unexpected vibrations. The reason for this phenomenon is that the people with PD lose the ability of precise muscle control [32][33][34]. The vibrations can be detected by time frequency analysis; moreover machine learning based audio analysis methods are appropriate for diagnosing PD.

Machine Learning for Parkinson's Disease Diagnostic Care.
Many researchers had proposed effective methods based on machine learning in automated diagnosis research. In 2014, Shahbakhi et al. [43] proposed a method using genetic algorithm and support vector machine for analysis of speech for diagnosis of PD. Little et al. [44] used support vector machine (SVM) with Gaussian radical basis kernel to diagnose PD. Shahbaba and Neal [35] showed a nonlinear model for the PD classification which is based on Dirichlet mixtures. Sakar and Kursun [45] applied mutual information measure to combine with SVM. Those methods achieve high classification accuracy but telediagnosis PD needs a better method with higher classification performance.
Recently, deep neural networks have shown potentials in speech recognition tasks; the classification and recognition accuracy is superior to conventional machine learning method. We proposed a method using deep neural network (stacked autoencoders, SAE) to reduce dimensions and nearest neighbor classifier to diagnose PD. We also implemented a telediagnosis system based on the proposed method. Results of empirical test on smartphone are presented in Section 4.

Structure of PD Telediagnosis Method and System
Structure. The proposed structure of the PD telediagnosis method is as shown in Figure 2. A patient provides personal information and speech records by following instructions of smartphone. The personal information includes gender, age, and a brief health history. Patient is also asked to read a given text; then the speech records of the patient are parsed to be time frequency based features which are extracted from the voice samples. After the processing of SAE and KNN, patient can receive the diagnosis result. Figure 3 shows the workflow of the proposed method in the view of machine learning, an appropriate set of time frequency features, SAE, and classifier dictated diagnostic accuracy. Therefore, the most important work is how to build a high accuracy diagnostic method. Figure 4 briefly illustrates the architecture of the proposed method on B/S (browser/server) structure and details are shown in Section 4.4. The server should be installed on an operation system. In next step, an appropriate version of web service software should be deployed on this server. Usually a smartphone embedded Internet browser software (such as Google Chrome App). Therefore the smartphone can send/receive data (text and voice records) to server by Internet browser. The connection between smartphone and server can be 2/3/4G mobile network or WIFI. The server receives and processes audio files as Figures 1 and 2    PD diagnosis will be displayed on patient's smartphone. This B/S structure is not limited to smartphone, but it is suitable for other electronic devices too, such as iPad, notebook, and even a smart watch. Moreover, the automated voice service system (such as banks' telephone services) should fulfill the same task as B/S structure theoretically.

Speech Features.
Dysphonia [17] is a typical speech problem of people with PD. Dysphonia is a human vocal problem which includes five major clinical features: loudness, decrease, breathiness, roughness, and exaggerated vocal tremor in voice. All those indications can be detected by analyzing time frequency in speech records. A set of 26 features is listed with the considering of the previous works held on this field of study [21,46]. In Table 1, 6 types of parameters are listed; they are frequency parameters, pulse parameters, amplitude parameters, voicing parameters, pitch parameters, and harmonicity parameters. 26 features are also presented in Table 1.

Stacked Autoencoders.
Autoencoder [47] has been wildly used in unsupervised feature learning and speech recognition tasks. It can be built as a special three-layer neural network: the input layer, the hidden layer, and the reconstruction layer (as shown in Figure 5).
An autoencoder has two parts: (1) The encoder receives an input 0 ∈ 0 to the hidden layer (latent representation feature) 1 ∈ 1 via a mapping encoder : where encoder is the activation function, whose input is called the activation of the hidden layer, and { 1 , 1 } is the parameter set with a weight matrix 1 ∈ 0 * 1 and a bias value vector 1 ∈ 1 .
(2) The decoder maps the hidden representation 1 back to reconstruction 2 ∈ 0 via mapping function decoder : (2) decoder is the activation function of the decoder with parameters { 2 , 2 }, 2 ∈ 1 * 0 , 2 ∈ 0 . The input of decoder is defined as the activation of the reconstruction layer. Parameters are learned through backpropagation by minimizing the loss ( 0 , 2 ): In (3), ( 0 , 2 ) consists of the reconstruction error and the 2 regularization of 1 and 2 . By minimizing the reconstruction error , the hidden feature should be able to reconstruct the original input 0 as much as possible. The stacked autoencoders (SAE) [48] are multiple layers of autoencoders. They are a deep learning approach for dimensionality reduction and feature learning. As Figure 6 shows, there are n autoencoders which are trained one by one. The input vectors are fed to the first autoencoder. After finishing training the first autoencoder, the output hidden representation is propagated to the second auto layer. A typical activation function is sigmoid function which is used for activation functions encoder and decoder . After this pretraining stage, the whole SAE is fine-tuned [48] based on a predefined objective. The last hidden layer of the SAE encoder can further cooperate with other applications, such as SVM for classification task.

KNN Classifier.
The KNN [49] classifier is quite simple: given a speech record of undiagnosed patient , the system finds the nearest neighbors to give diagnosis result. Formally, the decision rule can be written as Above, KNN( ) indicates the set of nearest neighbors of speech records . ( , ) is the classification for undiagnosed patient with respect to class , and For undiagnosed patient, the patient could be given diagnosis result. of KNN is 1 in this paper.

Speech Datasets, Evaluation Criteria, and Classifiers.
For comparison, we chose two research papers [21,50] and the PD speech datasets they had used. We use Matlab as programming tool and all classification algorithms are determined by grid search method.
In the first paper, Sakar et al. [21] collected a speech dataset of diagnosis of PD and donated this speech dataset to UCI machine learning group for other researchers. This speech dataset contains a training set file, a testing set file, and a ZIP package of WMA files. Betul Erdogdu Sakar et al. designed a novel speech test; PD patients were asked to say only the sustained vowels "a" and "o" three times. Their work of speech dataset collection was finished at Istanbul University and we call the dataset "Istanbul Dataset" in the following experiments.
In the second paper, Ma et al. [50] proposed a kernel extreme learning machine with subtractive clustering features weighting approach. Their method compared with 15 researches' methods. Total 16 methods are compared in their research paper, and the method of Ma et al. gained top position. The dataset of their research was created by Little et al. [44] of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. We call the dataset "Oxford Dataset" in the following experiments.
Accuracy, sensitivity, and other performance indexes are compared among those classifiers and these performance indexes are defined as follows: Sensitivity: Specificity: -score: where TP, FP, TN, and FN are the true and false positive and true and false negative classifications of a classifier and they are defined as shown in Table 2.   Table 1.
The SAE of the proposed method has two hidden layer: layer 1 and layer 2. The size of layer 1 is set as 10, 9, or 8 neurons and the size of layer 2 is set as 8, 7, or 6 neurons. The batch size of SAE is 20 in training and fine tuning step. All comparative classifiers are optimized by grid search method. We use 50-50% training-testing method as Polat [40] and Daliri [39] did. This experiment method tests comparative approaches with much less training data than 10-fold cross validation. Each classifier was tested 10 times and the results are presented in tables.

Classifiers without Deep Neural Network on Oxford
Dataset. Table 3 summarized the detailed results of classification accuracy after 10 runs. From this table, it can be found that the classification performance of KNN is apparently differential. We can see that the KNN outperforms that with other 7 classifiers with a max, mean, and min accuracy of 90.53%, 82.76%, and 76.84%. All comparative classifiers give low classification results because the input samples are 22dimensional data. Tables 4 and 5 presented the comparison result of the classification accuracy and other performance indexes. Table 4 lists details of accuracy. For SAE, the influence of two layers in dimension reduction has been investigated. In this study, two hidden layers are tested as a subgrid search. As seen from Table 4, 1NN classifier gives more correct diagnosis results than other methods.

Classifiers with Deep Neural Network on Oxford Dataset.
Comparing Tables 3 and 4, other 7 classifiers gave more correct classification result with the using of SAE (hidden layer 1 contains 9 neurons; hidden layer 2 contains 7 neutrons). KELM has obtained max classification accuracy as 98.81%; it is 16% higher than 82.23% in Table 3. LSVM gives 96.71% classification accuracy after dimensional reduction by SAE 8-7. In SAE 10-7 row, MSVM increased 11% in terms of classification accuracy. RSVM gives 97.75% classification accuracy in the row of SAE 9-7. Similarly, CART produced average classification accuracy as 96% when applying SAE. LDA and NB also have got better performance than without using SAE; the classification accuracy of those 7 classifiers are improved by 10-15% with the using of SAE.  Table 2 shows. 28 PD patients are asked to say only the sustained vowels "a" and "o" three times, respectively, which makes a total of 168 recordings. The same 26 features are extracted from voice samples of this dataset. The researchers of the Istanbul University declared that [22] the PD dataset can be used as an independent test set to validate the results obtained on training set. The training file contains 1040 recordings and the testing file contains 168 files. Therefore, we used the training file and testing file to compare the proposed method with other methods.

Experiment Results: Istanbul Dataset and Parameters
The SAE of the proposed method has two hidden layer: layer 1 and layer 2. Layer 1 is set as 10, 9, or 8 neurons and layer 2 is set as 8, 7, or 6 neurons. The batch size of SAE is 20 in training and fine tuning step.

Classifiers without Deep Neural Network on Istanbul
Dataset. Table 6 summarized the results of classification accuracy. According to the declaration of Istanbul University, fixed training set and testing set are given. Only one run will obtain final result. From this table, it can be found that the classification performance of RSVM is apparently differential. We can see that the RSVM outperforms that with other 7 classifiers with a classification accuracy of 76.41%. All comparative classifiers give low classification results because the input samples are 26-dimensional data. The Naive Bayesian method gives worst result among all classifiers.

Classifiers with Deep Neural Network on Istanbul
Dataset. Tables 7 and 8 presented the comparison result of the classification accuracy and other 4 performance indexes. Table 7 lists details of classification accuracy. To SAE, two hidden layers are set as Oxford Dataset experiment. As seen from Table 6, KNN classifier gives more correct diagnosis results than other methods. The max, mean, and min classification accuracy of KNN classifier are not affected by network structure of SAE. Comparing Tables 6 and 5, the other 7 classifiers gave more correct classification result with the using of SAE. And LSVM, RSVM, and CART show much more stability in classification accuracy. But KELM still did not obtain  100.00% classification accuracy, Table 6. LSVM gives 92.00% classification accuracy after dimensional reduction by SAE 9-7. In SAE 10-7, SAE10-6, and SAE 9-6 row, LSVM did not give perfect performance. MSVM, LDA, and NB increased classification accuracy but still lack stability. In conclusion, every classifier has got better performance than without using SAE. Figure 7 shows, our smartphone is running Android system and it was installed from Google Chrome web browser. The server installed Windows operation system and Internet Information Services.

Implementation on B/S System. As
We also installed Matlab 2016a and Visual Studio 2013 on this server. Programming techniques contain Matlab, C#, and HTML5. We used Chrome to connect server via 4G mobile network in the street; we send 28 test speech records (WMA files) of Istanbul Dataset; a transmission of 12.1 MB data took less than 5 seconds to our laboratory. The server received speech records and ran the empirical experiments as Sections 4.2 and 4.3 present. No more than 2 minutes, results of all comparative classifiers were displayed on our smartphone. We also recorded our speech (as healthy samples) and send it to server for testing, but it should be noticed that a quite environment is necessary. It is hard to achieve satisfied classification accuracy in a noisy circumstance. Table 9 summarized the comparative results achieved from related researches. It can be seen that the proposed method achieves better results than other methods and our method shows a relatively fewer training samples. The reduced training dataset is meaningful when applying the proposed method in reality. The performance of SAE and KNN is robust to number of hidden neutrons of deep neural network. We focus on classification accuracy, sensitivity, and other performance indexes to evaluate machine learning based PD diagnosis method; then we choose a B/S structure to test the proposed architecture. Advantages contain time saving, being convenient, and low cost. But we did not achieve 100% correct classification accuracy on smartphone. We deduced that there are still potential problems: (1) a good enough microphone should be taken into consideration. (2) Speech denoising may be fulfilled by multiple microphones. (3) An evolutionary system can overcome big testing data in real world. The future investigation will pay much attention on an evolutionary telediagnosis PD system and solve above issues. Moreover, we also noticed that two public datasets cannot guarantee that a machine learning based telediagnosis system can be trained very well and it is hard to satisfy hospitals,  [36] 10-fold CV 89.47 90% Guo et al. [37] 10-fold CV 93.10 90% Ozcift and Gulten [38] 10-fold CV 87.10 90%

Discussions and Future Work.
Daliri [ clinics, and other medical institutions. And the stability and robustness of a telediagnosis system still need to be built urgently.

Conclusion
For building a convenient and feasible telediagnosis PD service via smartphone, we proposed a machine learning based method on browser/server architecture. In this paper, the proposed method contains stacked autoencoders and KNN classifier which is used to process speech records. The proposed method can remap time frequency features in low dimensional space. Results show that the proposed method with KNN classifier can give doctor-level classification results on public PD speech records. An experimental system is also built for testing; it is projected that telediagnosis of PD on a smartphone will be in the future.