Usage of Probabilistic and General Regression Neural Network for Early Detection and Prevention of Oral Cancer

In India, the oral cancers are usually presented in advanced stage of malignancy. It is critical to ascertain the diagnosis in order to initiate most advantageous treatment of the suspicious lesions. The main hurdle in appropriate treatment and control of oral cancer is identification and risk assessment of early disease in the community in a cost-effective fashion. The objective of this research is to design a data mining model using probabilistic neural network and general regression neural network (PNN/GRNN) for early detection and prevention of oral malignancy. The model is built using the oral cancer database which has 35 attributes and 1025 records. All the attributes pertaining to clinical symptoms and history are considered to classify malignant and non-malignant cases. Subsequently, the model attempts to predict particular type of cancer, its stage and extent with the help of attributes pertaining to symptoms, gross examination and investigations. Also, the model envisages anticipating the survivability of a patient on the basis of treatment and follow-up details. Finally, the performance of the PNN/GRNN model is compared with that of other classification models. The classification accuracy of PNN/GRNN model is 80% and hence is better for early detection and prevention of the oral cancer.


Introduction
The oral tumor is one of the ten most incessant diseases worldwide and its rate of occurrence is increasing in every decade. Two decades back, the yearly occurrence of the oral cancer was over 3,00,000 cases [1], which went up with occurrence of 5,75,000 new cases in last decade at global level [2]. More latest study indicates that the oral cancer related mortality has declined worldwide from 3,20,000 deaths to approximately 2,00,000 deaths in less than half decade due to improved infrastructure of the health system [3,4]. The study shows that the developing countries have the highest rate of oral cavity cancer, whereas the developed countries have the lowest rate of oral cavity cancer, for both males and females [5]. The age-adjusted rates of oral tumor differ from over 20 for every 1,00,000 population in India to 10 for every 1,00,000 in the US and less than 2 for every 1,00,000 in the Middle East [4,6]. It is clearly evident that there is a huge contrast in the rate of oral tumor in different regions in the world. In the US, oral cavity malignancy is only about 3% of malignancies, whereas, in India, it accounts for over 30% of all growths [6]. The head and neck cancers are the sixth common malignancy that is the major cause of cancer morbidity and mortality worldwide. In India, the cancers of head and neck comprise approximately 24.1% of total cancers reported at Tata Memorial Centre, Mumbai, and out of them about 3.2% are from the oral cavity [7] that ranks among the top three types of cancer in the country [8]. It is of tremendous public health importance in India as it has been estimated that 83,000 new oral cancer cases [9,10] and 46,000 deaths [4] occur here each year. The difficulty level is high because it is usually diagnosed at later stage, which results in low treatment outcomes and considerable high cost of treatment to the patients who cannot afford this type of treatment [11]. 2 The Scientific World Journal Hence, early diagnosis and treatment is one of the most important means to improve the patient's survival. Therefore, the objective of this paper is to introduce the assistance of data mining to medical fraternity, with a special focus on early detection and prevention of oral cancer in patients.
According to Fayyad et al. [12,13], data mining can formally be defined as a process of extracting nontrivial and potentially useful information from the enormous datasets, providing explicit knowledge that has a readable form and can be used to diagnose, classify, or forecast problems [12][13][14][15][16][17]. In this paper, we intend to use classification technique of data mining approach. The classification model is built by using the probabilistic neural network and general regression neural network (PNN/GRNN). Though it is a very powerful model, yet it has not been used much in the past. Our endeavour is to build a probabilistic neural network and general regression neural network (PNN/GRNN) model for early detection and prevention of oral cancer. These models can be helpful to practitioners for the following decisions: (a) To diagnose the malignant patients and the type of malignancy on the basis of demographic information, clinical symptoms, medical and personal history, and gross examination.
(b) To predict the stage and extent of oral cancer on the basis of symptoms which are confirmed with the help of relevant tests and investigations.
(c) To predict the survivability of patients after appropriate treatments and follow-ups.
The framework presented in Figure 1 is used to build PNN/GRNN model to classify malignant and nonmalignant cases, type of malignancy, and stage of malignancy and then all malignant cases are further classified to predict the survivability of the patients.
The rest of the paper is organized as follows: Section 2 discusses material and method adopted for the research and Section 3 covers brief discussion about the probabilistic neural network and general regression neural network. Sections 4 and 5 present the experimental results and discussions, respectively, to compare the performance of the PNN/GRNN model with that of the classification model developed previously. Section 6 concludes the paper.

Material and Method
Identifying right source and selecting the relevant data are very important because the data mining learns and discovers the hidden patterns from the available data. Correctness and accuracy of data have a great impact on data mining analysis. Hence, a retrospective chart review of data from ENT (Ear-Nose-Throat) and Head-Neck Department of three Tertiary Care Hospitals of Pune, Maharashtra, India, has been carried out for data collection related to oral cancer. The records are fetched from the Cancer Registries of the Tertiary Care Centers, OPD (Out-Patient Department) datasheet which records the information regarding clinical details, personal history, habits, and so forth of the patients and from the archives of Departments of Histopathology, Surgery, and Radiology.
The information was manually collected to complete the datasheet of the patients. The data of 1025 patients were collected in nonrandomized or nonprobabilistic method, as all the data in the registries for the period of five years have been considered. The dataset is based on the records of all the patients who have been reported with a lesion and treated at the centre from June 2004 to June 2009. The dataset thus collected has been transformed, cleaned, and integrated to make it ready for analysis and is presented in our previous paper [18].
Further, the dataset is reduced to perform classification at various levels using feature selection method which is one of the data reduction strategies. There are basically two categories of feature selection method: filter and wrapper [19]. The filter approach applies an independent test on data subset and has low computational cost and the wrapper approach applies a predetermined learning algorithm and requires great computational effort [20]. The wrapper is considered more reliable for data classification whereas the filter can be scaled up to high-dimensional datasets and it is computationally fast and independent of the learning algorithm [21,22]. Our requirement is not only to select the subset of attributes, but also to know the ranking of the attributes so as to design a model for early detection and prevention of oral cancer. Therefore, we have applied filter method for attribute selection as it selects attributes using their characteristics. WEKA3.7.9 has been used for feature selection. Attribute evaluation method is InfoGainAttributeEval which evaluates the worth of an attribute by measuring the information gain with respect to the class. Search method used is Ranker, which ranks the attributes by their individual evaluations. The information gain (IG) of an attribute that belongs to dataset is defined as follows: where ( ) is entropy, the expected information needed to classify a record in the dataset and it is defined as follows: where is the probability that an arbitrary record in dataset belongs to class and is estimated by | , |/| |. Feature selection approach is applied on the dataset of 1025 oral cancer patients, which initially had 35 attributes. Subsequently, subset of the attributes is chosen with the help of attribute reduction strategy. Then the PNN/GRNN model is built using the selected attributes and leave-one-out method is used for validation of the model. The validation method is a simple cross-validation that utilizes a single observation from the original sample as the validation data and the remaining observations as the training data. Further,  The NPV value for a perfect model would be 1.0. The precision is the proportion of cases selected by the model that have the true value; the precision is equal to PPV. The recall is the proportion of the true cases that are identified by the model; recall is equal to sensitivity. The -measure is the harmonic mean of the precision and recall. It combines the precision and recall to give an overall measure of the quality of the prediction. The Receive Operating Characteristic (ROC) curve for a model is sensitivity in terms of one minus specificity. The ROC analysis is used for estimating the prediction ability of a model. The closer the value of the ROC is to 1.0, the better the model is. To build the classification model and analyze the data, a powerful statistical analysis program, DTREG tool, is used, which is a robust application that can easily be installed on any Windows system. DTREG reads comma separated value (CSV) data files that are easily created from almost any data [23].

Probabilistic Neural Network and General Regression Neural Network
The probabilistic neural networks and general regression neural networks (PNN/GRNN) model consists of two networks that are integrated in a single architecture to handle different types of target variable. The probabilistic networks perform classification for categorical target variable and the general regression neural networks perform regression for continuous target variable. The PNN/GRNN model is usually much faster to train, more accurate, and relatively insensitive to outliers and generates accurate predicted target probability scores by approaching Bayes optimal classification. It is however slower in classifying new cases and requires more memory space to store the model [24][25][26]. The PNN/GRNN proposed by Specht [24] have four layers: input layer, hidden layer, pattern/summation layer, and decision layer, as shown in Figure 2. The input and hidden layers are same for the PNN and GRNN, but the pattern layer/summation layer and decision layer are different for the PNN and GRNN. The input layer of the network has one neuron for each predictor variable ( 1 , 2 , . . .), whose value is fed to each neuron in the hidden layer. Each neuron in the hidden layer stores the values of the predictor variables with its target value for each case in the training dataset ( 1 , 2 , . . .). When the input values are presented to the hidden layer, it computes the Euclidean distance from the neuron's central point for the test cases. This distance is then passed through the activation function, which is the RBF kernel function. The output of the hidden layer is fed to the next layer, which is different for PNN and GRNN. For PNN, this layer is known as pattern layer and there is one pattern neuron for each category of the target variable ( 1 , 2 , . . .). The pattern neuron receives the weighted value of the training cases that belong to a particular target category as input from hidden layer.  pattern neurons add the values for the class they represent through the weighted vote for that category. For the GRNN, this layer is known as summation layer and there are only two neurons. One neuron is the denominator summation unit and the other is numerator summation unit (NS and DS). The denominator summation unit adds up the weight values coming from each of the hidden neurons. The numerator summation unit adds up the weight values multiplied by the actual target value for each hidden neuron [26]. The fourth layer of the network is the decision layer, which is again different for PNN and GRNN. For PNN, the decision layer compares the weighted votes for each target category accumulated in the pattern layer and uses the largest vote to predict the target category. For GRNN, the decision layer divides the value accumulated in the numerator summation unit by the value in the denominator summation unit and uses the result as the predicted target value [26].

Experimental Results
The file format of the database used to build the PNN/GRNN data mining model is comma separated values (.csv). There are total 1025 records that are described with the help of 35 attributes. The model is built with the help of DTREG tool.

Classification Model to Diagnose Malignancy and Benign
Cases. 12 predictor attributes have been selected by applying feature selection attribute for attribute reduction as explained in previous section and also by consulting the practitioners. The predictors are sex, socioeconomic status, clinical symptom, history of addiction, history of addiction1, comorbid condition, comorbid condition1, gross examination, site, predisposing factor, neck nodes, and tumor size. Attribute "Diagnosis" is selected as target variable, which may be malignant or benign. 75.5% of patients have been classified as malignant, whereas the 24.4% have been classified as benign. The malignant cases are treated as positive cases and the benign as negative. The classification accuracy of the model is 99.02% and sensitivity-specificity is also very high. The overall performance of the model for classification of  Table 3.
The Scientific World Journal 5    the appropriate treatment and follow-ups are initiated and the survival rate of the patient is predicted by using the PNN/GRNN model as shown in Table 5. The set of 14 attributes considered as predictors are stage, surgery, radiotherapy, chemotherapy, 1st-5th follow-up symptoms, and 1st-5th follow-up examination, which have been selected by applying feature selection attribute for attribute reduction as explained in previous section, and the target variable is "Survival. " The probability of dead cases is 40.2% and that of alive cases is 59.7%. The overall accuracy of the model to predict the survivability using 14 attributes is 69.95% for the training data as well as validation data. However, when 34 predictors were considered to predict the survivability, the classification accuracy was 80% for the training data and 73.76% for the validation data. Thus, the experimental results show that the PNN/GRNN data mining approach is appropriate for developing a model for early detection and prevention of oral cancer.

Discussions
Data mining has been used in healthcare for quite some time. However, its latest advanced techniques like neural networks have not been explored much for developing the decision making methods [27]. Bruins et al. [28] have used decision algorithm and decision tree to propose a model for developing and testing the evidence-based guidelines for pretherapy oral screening and dental management of patients with head and neck cancer. This model, tested by using a probabilistic sensitivity analysis with second-order Monte Carlo simulations ( = 10.000), reports that the decision The Scientific World Journal 7    [33] apply the neural network (multilayer perceptron (MLP)) on genetic data for the oral cancer detection. Thus, we see that the data mining has not been optimally applied on oral cancer data to support the decision making process of practitioner towards the early detection and prevention of oral cancer. The exiting studies have the dataset too small or the numbers of attributes considered are limited. The linear regression and logistic regression have been used in literature, but with mainly two or three inputs [34][35][36][37][38][39]. The classification tree has also not explored much. The advanced data mining techniques, that is, artificial neural networks like multilayer perceptron (MLP) and radial basis function (RBF), have been applied by some researchers for prediction of oral cancer; however, other popular and more effective neural networks like cascade correlation neural network (CCNN), group method of data handling neural network (GMDH), and probabilistic neural network and general regression neural network (PNN/GRNN) have hardly been applied. Therefore, in this research work we have attempted to create PNN/GRNN model and compare it with other classification models build previously [40][41][42][43][44].
The various classification models developed are logistic regression analysis model [43], classification tree models like decision tree model, decision tree forest model, and TreeBoost model [40,44], and artificial neural networks like multilayer perceptron (MLP) model [42], cascade correlation neural network (CCNN) model [41], probabilistic neural network and general regression neural network (PNN/GRNN) model [43]. The performance comparison of all the classification models developed using various data mining techniques is presented in Tables 6 and 7 for training and validation data,  respectively. Having presented the comprehensive comparison of all models, the best model for each estimation parameter is presented. Table 8 presents the best model for each performance parameter for the training and validation data.

Conclusion
In this paper, we have discussed the probabilistic neural network and general regression neural network (PNN/GRNN) model for early diagnosis of disease, predicting the stage of the cancer, and chances of survivability of the oral cancer patients. This model can be of good help to the practitioners for improving the accuracy of the diagnosis and effectiveness of the treatment. Also, we have critically analyzed all data mining models and it has been observed that the probabilistic neural network and general regression neural network model displays competitive results for the training as well as validation data. The experimental results show that the probabilistic neural network and general regression neural network model displays the best classification accuracy, highest specificity and sensitivity, and better results in terms geometric mean of sensitivity and specificity, positive predictive value, negative predictive value, geometric mean of the PPV and NPV, average gain, precision, recall, -measure, and area under ROC curve, among all the models, which makes it a robust model. Thus, the PNN/GRNN model is more suitable for predicting the survival rate of oral cancer patients.