Diagnosing Parkinson's Diseases Using Fuzzy Neural System

This study presents the design of the recognition system that will discriminate between healthy people and people with Parkinson's disease. A diagnosing of Parkinson's diseases is performed using fusion of the fuzzy system and neural networks. The structure and learning algorithms of the proposed fuzzy neural system (FNS) are presented. The approach described in this paper allows enhancing the capability of the designed system and efficiently distinguishing healthy individuals. It was proved through simulation of the system that has been performed using data obtained from UCI machine learning repository. A comparative study was carried out and the simulation results demonstrated that the proposed fuzzy neural system improves the recognition rate of the designed system.


Introduction
In the world, many people suffer from Parkinson's disease (PD). The disease more often appeared after the age of 60 [1]. Parkinson's disease is a chronicle disorder of central nervous system which causes the death of the nervous cell in the brain. Parkinson's disease is progressive and the number of people suffering from the disease is expected to rise. The disease usually happens slowly and persists over a long period of time.
The symptoms of the PD continue and worsen over time. The basic symptoms of PD are movement related symptoms. These are tremor, rigidity or stiffness of the limbs and trunk, bradykinesia or slow movement, and problems with balance or walking [2,3]. Tremor is a basic symptom which may affect shaking or trembling of legs, arms, hands, jaw, or face. The patients may have difficulty talking, walking, or completing some other simple tasks as these symptoms become more pronounced. Other symptoms are related to the behavioural problems, depression, thinking, sleep, and emotional problems. A person with Parkinson's may have a trouble in speaking and swallowing and chewing problems. Especially in advanced stages of the disease nonmotor features, such as dementia and dysautonomia, occur frequently.
The diagnosis and timely treatments are important in order to manage its symptoms. The diagnosis is based on neurological examination and medical history of patients. The diagnosis of the disease in the early stages is difficult [3]. Diagnosis of PD depends on the presence of two or more of the above symptoms.
Vocal symptoms that include impairment of vocal sounds (dysphonia) and problems with the normal articulation of speech (dysarthria) are important in diagnosis of PD [4]. The research paper [5] shows that the most important symptom of PD is dysphonia. The dysphonia is the disorder of voice. Dysphonic symptoms typically include reduced loudness and roughness and breathiness and decreased energy in the higher parts of the harmonic spectrum and exaggerated vocal tremor. The treatment of these symptoms is difficult for the people having Parkinson's disease. In [4][5][6] it was shown that approximately 90% of people with Parkinson's disease have dysphonia. Dysphonia includes any pathological or functional problem with voice [6]. The voice will sound hoarse, strained, or effortful. It may be difficult to understand the voice of people having PD. The used method for diagnosis of Parkinson's disease (PD) is basically based on speech measurement for general voice disorders [4,[7][8][9]. Specialists doctors need to make an analysis of many factors for accurate diagnose of PD. Usually, decisions made are based on evaluating the current test results of patients.
The problem becomes too difficult if the number of attributes that the specialist wants to evaluate is high. Recently various computational tools have been developed in order to improve the accuracy of diagnosis of PD. These tools have provided excellent help to the doctors and medical specialists in making decisions about the patients. Different Artificial Intelligence (AI) techniques, expert systems, and decision making systems are designed for diagnosis or classification of diseases. They were potential and good supportive tools for the expert/doctor. The development of efficient recognition systems in medical diagnosis is becoming more important. Nowadays various Artificial Intelligence techniques such as expert systems, fuzzy systems, and neural networks are actively applied for diagnosis of Parkinson's diseases using voice signals. Reference [4] introduces a new measure of dysphonia, pitch period entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and separates healthy people from the people having PD. Nonlinear dynamical systems theory [4,10] and statistical learning theory, such as linear discriminant analysis (LDA) and support vector machines (SVMs) [5,11], are preferred for classification of healthy people or those with PD and discriminate the healthy people on the basis of measures of dysphonia. Different techniques, such as SVM [12], SVM with RBF (radial based function) kernel [13], SVM with Multiple Layer Perceptron (MLP), and a Radial Basis Function Network (RBFN) [14], are used for diagnosis of PD. In [15] integration of Kohonen self-organizing map (KSOM) and least squares support vector machine (LS-SVM), and in [3,16] nonlinear time series analysis tools are applied for diagnosing of PD. Reference [17] uses fuzzy c-means algorithm, [18] uses four independent classification schemas, neural networks, DMneural, regression, and decision tree for classification purpose, and a comparative study was carried out.
The above methods are used in order to increase classification accuracy of PD. Classification systems can help in increasing the accuracy and reliability of diagnoses and minimizing possible errors, as well as making the diagnoses more time efficient. Success in the discovery of knowledge depends on the ability to explore different classes of specific data and to apply appropriate methods in order to extract the main features. This paper deals with the application of fusion of fuzzy systems and neural networks for designing of the recognition system of PD.
Fuzzy systems can handle uncertainties associated with information or data in the knowledge bases [19] and are widely used to solve different real world problems. Fuzzy system uses data and knowledge specific to chaotic dynamics of the process and increases the performance of the system. In the literature, different neural and fuzzy structures are proposed for solving various problems [20][21][22][23][24][25][26]. In [22,23] clustering algorithm and gradient descent algorithm are applied for the design of multi-input and single output FNS. Well known ANFIS (adaptive neurofuzzy inference system) structure is used for solving cervical cancer recognition [27], for optimizing the chiller loading [28], and for distinguishing ESES (electrical status epilepticus) and normal EEG (electroencephalography) signals [29]. The use of multiple ANFIS structures, in [27], leads to the increase of the number of parameters of the network. In these papers the used systems are designed for special purpose and most of them are basically based on Mamdani type rules. Performances of these systems are determined by measuring classification rate. In this paper, in order to improve the performance of classification system, a multi-input and multioutput fuzzy neural system (FNS) based on TSK rules is proposed for identification of the PD.
The paper is organized as follows. Section 2 describes the structure of proposed fuzzy neural system used for recognition of PD. The parameter update rule of the proposed system is presented in Section 3. Section 4 describes the simulation results. The conclusions are given in Section 5.

FNS Based Recognition
The fuzzy neural system combines the learning capabilities of neural networks with the linguistic rule interpretation of fuzzy inference systems. The design of FNS includes the development of the fuzzy rules that have if-then form. This can be achieved by dint of optimal definition of the premise and consequent parts of fuzzy if-then rules for the classification system through the training capability of neural networks. The two basic types of if-then rules used in fuzzy systems are Mamdani and Takagi-Sugeno-Kang (TSK) type fuzzy rules. The first type consists of rules, whose antecedent and consequent parts utilize fuzzy values. The second one uses the fuzzy rules that have fuzzy antecedent and crisp consequent parts. In the paper we use TSK type fuzzy rules for system design. The second type of fuzzy system approximates nonlinear system with linear systems and has the following form: If 1 is 1 and 2 is 2 and . . . and is where and are input and output signals of the system, respectively, = 1, . . . , is the number of input signals, and = 1, . . . , is the number of rules. are input fuzzy sets; and are coefficients.
The structure of fuzzy neural networks used for the classification of PDs is based on TSK type fuzzy rules and is given in Figure 1. The FNS includes six layers. In the first layer, ( = 1, . . . , ) input signals are distributed. The second layer includes membership functions. Here each node corresponds to one linguistic term. Here, for each input signal entering the system, the membership degree to which input value belongs to a fuzzy set is calculated. The Gaussian membership function is used in order to describe linguistic terms:  where and are center and width of the Gaussian membership functions, respectively, and 1 ( ) is membership function of th input variable for th term.
The third layer is a rule layer. Here number of nodes is equal to the number of rules. Here 1 , 2 , . . . , represents the rules. The output signals of this layer are calculated using t-norm min (AND) operation: where Π is the min operation. These ( ) signals are input signals for the fifth layer. Fourth layer is a consequent layer. It includes linear systems.
Here the output values of the rules are determined using linear functions (LF): In the fifth layer, the output signals of the third layer are multiplied by the output signals of the fourth layer. The output of th node is calculated as = ( ) ⋅ 1 .
The output signals of FNS are determined as Here are the output signals of FNS ( = 1, . . . , ) and are weight coefficients of connections used between layers 5 and 6. After calculating the output signal, the training of the network starts.

Fuzzy Classification.
The design of FNS ( Figure 1) includes determination of the unknown parameters of the antecedent and the consequent parts of the fuzzy if-then rules (1). In fuzzy rules the antecedent part represents the input space by dividing the space into a set of fuzzy regions and the consequent part describes the system behaviour in those regions. As mentioned above, recently a number of different approaches have been used for designing fuzzy if-then rules. Some of them are based on clustering [20][21][22][23][24]26], the least squares method (LSM) [20,22,30], gradient algorithms [14,[20][21][22][23]26], genetic algorithms [24,25,28], and particle swarm optimization (PSO) [31].
In this paper, fuzzy clustering and gradient technique are used for the design of FNS. At first the fuzzy clustering is used to design the antecedent (premise) parts, and then gradient algorithm is used to design the consequent parts of the fuzzy rules. Fuzzy clustering is an efficient technique for constructing the antecedent structures. The aim of clustering methods is to identify a certain group of data from a large data set, such that a concise representation of the behaviour of the system is produced. Each cluster center can be translated into a fuzzy rule for identifying the class. Different clustering algorithms are developed [32][33][34]. Recently fuzzy c-means [32] and subtractive clustering [33,34] algorithms have been developed for fuzzy systems. Subtractive is unsupervised clustering [33] which is an extension of the grid based mountain clustering [34]. Here the number of clusters for input data points is determined by the clustering algorithm. Sometimes we need to control the number of clusters in an input space. In these cases, the supervised clustering algorithms are of primary concern. Fuzzy c-means clustering is one of them. It can efficiently be used for fuzzy systems [32] with a simple structure and sufficient accuracy. In this paper, the fuzzy c-means (FCM) clustering technique is used for structuring the premise part of the fuzzy system.
Learning of FNS starts with the update of parameters of antecedent part of if-then rules, that is, the parameters of the second layer of FNS. For this aim FCM classification is applied in order to partition input space and construct antecedent part of fuzzy if-then rules. The following objective function is used in FCM algorithm: where is any real number greater than 1, is the degree of membership of in the cluster , is the th ofdimensional measured data, is the -dimension center of the cluster, and ‖ * ‖ is any norm expressing the similarity between any measured data and the cluster centers.
The fuzzy classification of input data is carried out through an iterative optimization of the objective function (6), with the update of membership and the cluster centers . The algorithm consists of the following steps: (3) Update ( ) and ( +1) : (4) If {| ( +1) − ( ) |} < then stop; otherwise set = +1 and return to Step (2).
In the results of partitioning the cluster centers are determined. These cluster centers will correspond to the centers of the membership functions used in the input layer of FNS. The width of the membership function is determined using the distance between cluster centers.
After the design of the antecedents parts by fuzzy clustering, the parameter update rules are derived for training the parameters of the consequent parts of the fuzzy rules. In the paper, we applied gradient learning with adaptive learning rate. The adaptive learning rate guarantees the convergence and speeds up the learning of the network.

Learning Using Gradient Descent.
At the beginning, the parameters of the FNS are generated randomly. To generate a proper FNS model, the training of the parameters has been carried out. For generality, we have given the learning procedure of all parameters of FNS using gradient descent algorithm. The parameters are the membership function of linguistic values in the second layer of the network and the parameters of the fourth and fifth layers. In the design of FNS cross validation technique is used for separation of the data into training and testing set. Training includes the adjusting of the parameter values. In this paper, a gradient learning with adaptive learning rate is applied for the update of parameters. The adaptive learning rate guarantees the convergence and speeds up the learning of the network. In addition, the momentum is used to speed up the learning processes.
The error on the output of the network is calculated as Here is the number of output signals of the network, Here is the learning rate, is the momentum, is the number of input signals of the network (input neurons) and is the number of fuzzy rules (hidden neurons), and is the number of output neurons.
The derivatives in (10) are computed using the following formulas: The derivatives in (11) are determined by the following formulas: Using equations (12)- (14), the derivatives in (10) and (11) are calculated and the correction of the parameters of FNS is carried out.
Convergence is very important problem in learning of FNS model. The convergence of the learning algorithm using gradient descent depends on the selection of the initial values of the learning rate. Usually, the initial value of learning rate is selected in the interval [0-1]. A large value of the learning rate may lead to unstable learning; a small value of the learning rate results in a slow learning speed. In the paper an adaptive approach is applied for updating these parameters. The learning of the FNS parameters is started with a small value of the learning rate . During learning, is increased if the value of change of error Δ = ( )− ( +1) is positive and decreased if negative. This strategy ensures a stable learning for the FNS. In addition a momentum term is used to speed up learning processes. The optimal value of the learning rate for each time instance can be obtained using a Lyapunov function [22,23]. The derivation of the convergence is given in [22,23].

Simulation Studies
The FNS, described above, is applied for classification of Parkinson's dieses. The people are divided into two classes: normal and PD. For this aim, the database is taken from University of California at Irvine (UCI) machine learning repository. The data set is donated from hospitals and it has been studied by many researchers. The data set includes biomedical voice measurements of 31 people; 23 were diagnosed with PD. Each row contains the value of the 23 voice parameters. Each column contains 195 items of data for each parameter. The main aim of the data is to discriminate healthy people from those with PD. The parameters that are used for recognition of PD are given in Table 1. These are the parameters of the voice signals recorded directly on the computer using Computerized Speech Laboratory. During modelling the preprocessing have been done on the input data and the input data are normalized in the interval of [0, 1]. The scaling operation helps and makes the training process of the system easy. After normalization, these data are entered as an input signal to the FNS.
To design classification model the FNS structure with 23 input and 2 output neurons is generated first. If we use traditional neurofuzzy structure (e.g., [20] or [26]) for 23 inputs and 2 cluster centers, pow(2,23) = 8383608 rules should be generated. The rules are constructed using all possible combinations of inputs and cluster centers. This is very large number. In this paper the number of rules is selected according to the clustering results, equal to cluster centers.
In the design of FNS, the fuzzy classification is applied in order to partition input space and select the parameters of the premise parts, that is, the parameters of Gaussian membership functions used in the second layer of FNS. FCM clustering is used for the input space with 16 clusters for each input. 16 fuzzy rules are constructed using a different combination of these clusters for 22 inputs. After clustering input space gradient decent algorithm is used for learning of consequent parts of the fuzzy rules, that is, parameters of the 4th layer of FNS. Learning is implemented using cross validation. Cross validation generalizes two independent data sets: training and testing. It is applied to find accurate model of classifier. In the paper 10-fold cross validation is used for separation of the data into training and testing set and for evaluation of classification accuracy. There should be set of experiments in order to achieve required accuracy in the FNS output. The simulation is performed using different number of neurons in hidden layer. The design steps of FNS for the diagnosing PD are given below: (1) Read PD data set. Select input and output (target) signals from statistical data. Apply normalization.
(2) Enter the values of learning rate and momentum. Set the number of clusters. Generate network parameters. Set a maximal number of epochs for learning.
(3) Apply classification algorithm to the input signals and determine the cluster centers.
(4) Use cluster centers to determine the centers of membership functions of layer 2.
(5) Use the centers of membership functions to determine the widths of membership functions. Health status of the subject: one, Parkinson's; zero, healthy (6) Using input statistical data define a random partition for 10-fold cross validation.
(7) Initialize current number of learning epochs to 1.
(8) Use PD data set and cross validation and determine training and testing data sets.
(9) Determine the numbers of rows in training and testing data sets.
(10) Initialize the number of iterations to 1.
(11) According to the number of iterations select input data from training data set and send them to the input of FNS.
(13) Determine the values of errors using network output and target output signals. Use these error values to compute the sum of the squared errors (SSE). (14) Using error values update the network parameters (learning of network).
(15) Apply adaptive strategy for updating the learning rate using current and previous values of SSE.
(16) Compute sum of SSE obtained on each iteration and save as the training error. Repeat Steps (11)-(16) for other remaining training data sets. If the current number of iterations will be less than a number of rows in the training set then go to Step (11), otherwise go to Step (17).
(17) Select test data set.   The training of input/output data for the classification system will be a structure whose first component is the twenty-three-dimension input vector and second component is the two-dimension output clusters. Table 2 depicts the fragment from PD data set. The FNS structure is generated with 23 input and two output neurons. After generation fuzzy c-means clustering and gradient descent algorithms are applied for training the parameters of FNS. In the first step, using fuzzy clustering, cluster centers are determined using the input data. These cluster centers are used to organize the membership functions of the inputs of antecedent part of each fuzzy rules. The rule layer is the second layer. The consequent parts of the fuzzy rules are organized using linear functions. Linear functions are determined in fourth layer. After clustering and designing antecedent part the learning of the parameters of consequent part starts. The initial values of the parameters and of linear functions of consequent part are selected in interval [0, 0.2]. The initial values of learning rate and momentum are selected as 0.02 and 0.625, correspondingly. During learning the parameters and of the rule are updated. In the results of learning the fuzzy rules are constructed. The clusters obtained from classification operation will be the centers of Gaussian membership functions used in antecedent parts of fuzzy rules. The consequent parts are constructed on the basis of learning of the parameters of linear functions.
The simulation results of FNS is compared with the simulation results of other models used for classification of PD. For evaluation of the outcomes of the models the Root Mean Square Error (RMSE) is used: Here are desired values of output and are actual values of the system output.
To estimate the performance of the FNS clustering systems, the recognition rates and RMSE values of errors between clusters and current output signal are taken. RMSE is computed using formula given above. Recognition rate is computed by the number of items correctly classified divided by the total number of items:   From Table 3, it was shown that the increase in the number of rules (or the number of hidden neurons) decreases the values of RMSE for training and testing cases and increases recognition rate. The use of clustering and gradient techniques for learning allows quick obtaining of low RMSE value and allows improving performance of FNS for training and testing stages. In the second simulation a comparative analysis of the classification of PD has been performed. The result of the simulation of the FNS classification model is compared with results of simulations of different classification models, such as support vector machine (SVM), neural networks (NN), regression model, decision tree, and FCM based feature weighting. To estimate the performance of the NN, SVM, and FNS clustering systems, the recognition rates and RMSE values of errors between clusters and current output signal are compared. In Table 4, the comparative results of simulations of different models are given. As shown in the table the performance of FNS classification system is better than the performance of the other models.

Conclusion
The paper presents the diagnosis of Parkinson's diseases using fuzzy neural structures. The structure and learning algorithms of FNS are presented. Fuzzy clustering and gradient descent learning algorithms are applied for the Table 4: Comparative results of different models for classification of PD.

Models
Accuracy (testing) Decision tree [18] 84.3 Regression [18] 88.6 DMneural [18] 84.3 Neural network [18] 92.9 FCM based feature weighting [17] 97.93 SVM 93.846154 FNS 100 development of the FNS. Learning is performed using 10fold cross validation data set. The design of the classification system is carried out using different number of fuzzy rules used in FNS. Recognition rate of classification is obtained as 100% with 16 hidden neurons. For comparative analysis, the simulation of PD is performed using different models. The obtained results demonstrate that the performance of FNS is better than the other models used for classification of PD.