A Deep Belief Network and Dempster-Shafer-Based Multiclassifier for the Pathology Stage of Prostate Cancer

Object Pathologic prediction of prostate cancer can be made by predicting the patient's prostate metastasis prior to surgery based on biopsy information. Because biopsy variables associated with pathology have uncertainty regarding individual patient differences, a method for classification according to these variables is needed. Method We propose a deep belief network and Dempster-Shafer- (DBN-DS-) based multiclassifier for the pathologic prediction of prostate cancer. The DBN-DS learns prostate-specific antigen (PSA), Gleason score, and clinical T stage variable information using three DBNs. Uncertainty regarding the predicted output was removed from the DBN and combined with information from DS to make a correct decision. Result The new method was validated on pathology data from 6342 patients with prostate cancer. The pathology stages consisted of organ-confined disease (OCD; 3892 patients) and non-organ-confined disease (NOCD; 2453 patients). The results showed that the accuracy of the proposed DBN-DS was 81.27%, which is higher than the 64.14% of the Partin table. Conclusion The proposed DBN-DS is more effective than other methods in predicting pathology stage. The performance is high because of the linear combination using the results of pathology-related features. The proposed method may be effective in decision support for prostate cancer treatment.


Introduction
Prostate cancer is the most common cancer in men, with around 1.1 million cases diagnosed and approximately 309,000 deaths in men worldwide in 2012 [1]. It is estimated that 40-50% of men may also have potentially extraprostatic disease [2].
Carcinectomy and radiotherapy are the typical treatments for prostate cancer [3]. The choice of treatment for prostate cancer requires extensive experience and analysis of treatment cases. Pathological staging is the process of predicting the likelihood of prostate cancer disease spreading in a patient prior to treatment. The clinical stage evaluation is based on data gathered from clinical tests that are available prior to treatment or the surgical removal of the tumor. Cancer staging evaluation occurs both before and after the tumor is removed: the clinical and pathological stages, respectively [4]. Pathologic staging is determined after the removal of the tumor tissue and after surgery. This is more likely to be more accurate than clinical staging because it evaluates the direct nature of the disease. Therefore, the prediction of pathological stages using clinical data analysis is an important factor in the treatment of prostate cancer [5].
Pathologic staging prediction is very important because it provides physicians with optimal treatment and management strategies. For example, radical prostatectomy (RP), the surgical removal of the prostate gland, provides the best opportunity for cure when prostate cancer is localized and accurate prediction of the pathology stage can provide the most beneficial treatment approach [6][7][8]. Currently, Partin tables are used to predict the prognostic clinical outcome for prostate cancer, which are based on statistical methods such as logistic regression [9,10]. The Partin tables use clinical test data including prostate-specific antigen (PSA) level, Gleason score, and clinical T stage to predict the pathology stage. While the Partin tables have been verified from 2001 to 2011, there are questions about their applicability to current patients following environmental changes [11]. Thus, a new classification method using machine learning is needed to provide an accurate prediction of the pathology stage [12].
Deep belief networks (DBN) are a deep learning technique and is an effective method for classification prediction [13,14]. As DBN supports both unsupervised and supervised learning, it is possible to effectively learn about uncertain data relationships [15,16]. Because PSA level, Gleason score, and clinical T stage for stage prediction have uncertainties in each patient, a combination of evidence for each variable is needed. The Dempster-Shafer theory (DS) is a technique used to fuse information based on trust values [17,18]. The DS allows the combination of evidence from different sources to arrive at a degree of belief (represented by a mathematical object called a "belief function") that considered all available evidence [19,20]. This technique is a method for fusing information using a stochastic calculation method for belief values [21]. This allows fusion of the classification results of each variable to the pathology stage.
In this paper, we propose a DBN-DS-based multiclassifier for pathologic stage prediction of prostate cancer. The proposed DBN-DS uses patient PSA level, Gleason score, and clinical T stage and three DBNs to predict the pathology stage by combining the predicted information from the classifier. The classifiers are created by learning data according to features. When output values are generated using each learned DBN classifier, the final predicted result is provided by stochastically calculating the predicted output from each DBN classifier using DS. This paper is organized as follows: Section 2 presents the proposed technique and its process. Section 3 explains the experiments and presents their outcomes. Finally, Section 4 presents the conclusions.

Materials and Methods
2.1. Data Set. The study data comprised 6345 male patients extracted from the Korean Prostate Cancer Registry (KPCR) which is extended from Smart Prostate Cancer Data Base (SPCDB) at six tertiary medical centers in Korea [22]. The three input variables consist of initial PSA, Gleason score, TRUS volume, and clinical T stage. Two output variables consisting of pathologic T stage (pT2a, pT2b, pT3a, pT3b, and pT3c) and N stage (pN1) were used. The output variables are transformed using the guidelines of the American Joint Committee on Cancer (AJCC), which were used to identify the pathologic stage as organ-confined disease (OCD; pT2+) or non-organ-confined disease (NOCD; pT3+ or N+) [23]. For the experiments, the data from the KPCR were divided into a training set 70% (4039 patients) and a validation set 30% (2306 patients).

Deep Belief Network.
A deep belief network (DBN) is a generative graphical model or a type of deep neural network composed of multiple layers of latent variables, with connections between the layers but not between the units within each layer. The DBN is composed of restricted Boltzmann machine (RBM) layers. The learning method in the DBN is done by configuring the visible layer and hidden layer 1 into a single RBM. The DBN is composed of multiple layers of RBMs [24]. The RBMs consist of visible and hidden unit layers. Once learning is complete, hidden layers 1 and 2 are trained via the RBM by giving a new input as a value of the hidden layer 1. As such, learning is performed up to the last layer sequentially [25]. One classification technique using the DBN is back propagation, which is configured in the uppermost layer in the DBN [26]. This technique shows better results than an artificial neural network (ANN), which uses a connection intensity that is arbitrarily selected.
In this study, we constructed a classifier for three input and two output variables to construct a multiclassifier, as shown in Figure 1. We created one classifier for each variable. Our idea was to use multiclassifiers for each variable [27]. The purpose of this study was to make a linear combination of the predictions of the classifiers using DS [28]. Therefore, one variable must be converted into several input values. As PSA levels are continuous data, they were converted into binary numbers and configured as an input node. Because Gleason score and the clinical T stage are categorical data, they constitute an input node by constructing data in flag form.

Dempster-Shafer-Based Information Fusion. Dempster-Shafer (DS) is a mathematical theory that deals with the uncertainty and inaccuracy problems presented by Arthur
Dempster and Glenn Shafer [29]. The DS provides an effective method for establishing evidence intervals using belief and likelihood values for the data set. The DS can support the combination of information. As a result, it is possible to use a combination rule to set various information as an evidence value and to calculate the result of all the evidence [30].
The DS expresses the degree of certainty as a section and sets mutually exclusive hypotheses such as probability. The set of objects is called the environment and is denoted by θ. The θ can have several elements such as θ = θ1, θ2, θ3, … , θk , and the number of subsets is 2 k . When θ has only one element, it is called an identification frame. A set of 2 k subsets is called a power set and is denoted by θ. The degree to which θ is supported by any evidence is called the basic probability assignment function m (1). The m is mapped to a probability value of 0 for an empty set, and the sum of m is 1 for all subsets of θ (2).
Belief H , which is the belief value for any hypothesis H (hypnosis; belief in a hypothesis is constituted by the sum of the masses of all sets enclosed by subjective probabilities) by given evidence, as shown in The degree of trust depends on the reliability of the given evidence and on the overall environmental impact; the ratio of the degree is expressed by e.
where r is a value between 0 and 1 and is true if r = 0 and false if r = 1. The DS calculates the value of a new belief through the process of fusion between different evidence. Thus, the convergence between the evidence can be expressed as (5); if X ∩ Y = ∅, then the convergence value of the two evidence is zero.
The DS expresses the confidence measure for H as Bel H , Pls H and the term as the interval. This interval is called the "evidential interval." Plausibility Pls means the extent to which the hypothesis is not negated based on evidence (empty period except for true and false intervals), which means the maximum likelihood of being trusted. Bel has a range from 0 to 1 (true and false), Pls can be defined as in (7) and has a value of [0,1]. Likewise, the likelihood values can express the process of fusion from multiple evidence as well as the fusion of belief values.
Pls U = Pls 1 ⨁Pls 2 ⨁Pls 3 ⨁⋯ ⨁Pls n 8 In this study, three output data predicted from a multiclassifier were fused and calculated. The calculation process using DS shown in the figure as DBN#1 (initial PSA) was set to m 1 , DBN#2 (Gleason score) was set to m 2 , and DBN#3 (clinical T stage) was set to m 3 . For the output data, the empty set of each of m 1 , m 2 , and m 3 is given by As described above, m 1 , m 2 , and m 3 were obtained, and then m 4 is combined. The combination of m 4 is shown in Next, the interval of the pass and fail of the evidential interval are summarized as As described above, the evidential interval section is constructed for OCD and NOCD, and the higher probability value of OCD and NOCD was set as the final output value.
Uncertainty data processing is a critical issue in the data fusion process. The DS and the Bayesian methods were compared to deal with this uncertainty. Unlike Bayesian inference, DS can contribute different levels of information to each source. In addition, a popular approach to data fusion has been established; unlike the Bayesian method, reliability can be assigned to all subsets of a hypothetical group, making it possible to form distributions for all subsets [31].

Result
3.1. Dataset Description. The characteristics of the initial PSA variable in the OCD and NOCD groups are shown in Table 1. Among the 6345 men, the average PSA levels in the OCD and NOCD groups in the training set were 9.535 and 18.606 ng/mL, respectively. In general, the level in the OCD group was higher, and the validation set also shows a difference of 9.377 and 17.899 ng/mL in the OCD and NOCD groups, respectively. The difference in values between the training and validation sets was not large. Although a high number of patients were observed at maximum, this is not a problem for analysis because they were only a fraction of the outlier compared to the mean.
The Gleason scores in the OCD and NOCD groups are shown in Table 2. Patients with OCD had a high Gleason score of 6. The NOCD group had scores of 6 or more. The difference between the OCD and NOCD groups was significant. In the scores below 5, OCD is more distributed than NOCD, and even more than 9 patients showed more NOCD patients.
The clinical T stages in the OCD and NOCD groups are shown in Table 3. Most patients were T2+. T1a occurred only in patients with OCD. In addition, many patients that are distributed in OCD until T1+ and patients with T3+ belong to NOCD. Although all variables are bounded by OCD and NOCD, there are many patients who belong to the same distributions.    Figure 2. The training set was first changed to binary form. The initial PSA values were expressed as nine binary numbers based on the highest value (440 ng/mL). The Gleason score was composed of nine flags ranging from 3 to 10. The clinical T stage consisted of eight flags from T1a to T3b. The binary data of each of these variables was learned by the DBN classifier; that is, the first DBN consisted of nine input nodes because it was the input data of the initial PSA binary data. The output nodes of all classifiers were composed of two so that OCD and NOCD could be calculated with probability. The DBN consisted of three RBM layers, with the number of nodes of each RBM the same as the number of input nodes. Unsupervised learning was performed 100 times in total, while supervised learning using back propagation was performed 1000 times. Finally, we calculated the probability of the output variables as DS and determined the final number of m 4 (OCD) and m 4 (NOCD) as the final outputs.

Experiments.
To evaluate the DBN-DS-based multiclassifier, the entire data set was divided into a 70% training set and a 30% testing set. The control groups included Decision Tree C4.5, naive Bayesian (NB), logistic regression (LR), back propagation (BP), support vector machine (SVM), random forest (RF), deep belief network, and Partin tables. The experiments compared the sensitivity, specificity, accuracy, and area under the curve (AUC) using confusion matrix [31] and receiver operating characteristics (ROC) curve analysis [32]. The experimental results of confusion matrix are shown in Table 4. In general, the results from a training set are better than those of a validation set because of differences in dataset volumes. Sensitivity was defined as the probability of correctly matching NOCD. Because NOCD has less data than OCD, it is difficult to match. The proposed method has a 61.77% improved performance compared to those of the other models. In other words, the probability of matching NOCD is very important because it is a prediction of the risk of the pathology stage. Specificity was defined as the probability of correctly matching OCD. NB had the highest specificity, with 93.78%, but its sensitivity was low. The proposed method showed 93.56% higher performance than those of the other models. The accuracy was defined as the probability of predicting both NOCD and OCD. The proposed model had the highest accuracy, at 81.27%. The AUCs are shown in Figure 3 and Table 5.
The ROC curve has the highest DBN-DS of 0.777. The error of all models was about 0.01, and the p values were all 0.000, so the experimental results of the ROC curves were usable. The DBN-DS predicted each of the three classifiers constructed for each variable separately and combined them into one. In this paper, we propose a new classification method for the classifier. The proposed method is based on the classification of two classifiers. In addition, as the DS computes probability, if one classifier predicts NOCD at a ... ... high number and the two classifiers predict a low number for OCD, then the NOCD is finally predicted based on the belief value of the DS algorithm.
Next, the DBN-DS was evaluated. The result of the confusion matrix for DBN-DS is shown in Table 6. In addition, the results of the ROC curve analysis are shown in Figure 4 and Table 7. DBN#1 learned the initial PSA. DBN#2 learned the Gleason score, while DBN#3 learned the clinical T stage.
Among the three variables, the initial PSA level had the highest prediction rate. The PSA level is closely related to pathologic stage and is the most important parameter in prostate cancer. Variables combined with PSA showed a high prediction rate. In other words, the reason for the high prediction rate was that the Gleason score and clinical T stage also affect the pathology. However, the combination of Gleason score and clinical T stage had a lower accuracy than that predicted by the initial PSA level alone. The two variables are uncertain because they are diagnosed according to the doctor's experience. However, when combined with PSA level, the performance was much higher. In this study, we found that initial PSA was the most important predictor, and that the Gleason score and clinical T stage were also important predictors.

Discussion and Conclusion
Prediction models for pathology staging of prostate cancer are based on clinical tests and can be used to predict the spread of cancer. It is possible to diagnose cancer more precisely at the postoperative, pathological stage and to determine the degree of metastasis of prostate cancer.
We proposed a DBN-DS-based multiclassifier approach to predict the pathologic stage of prostate cancer. The proposed method provides a predictive model to improve accuracy through deep learning and information fusion based on the relationship between data measured using clinical tests. The inputs include initial PSA level, Gleason scores, and clinical T stage variables. The output can be OCD or NOCD in pathological staging (pT). This approach was evaluated using an existing validated patient dataset that included 6345 patient records from the KPCR database, which collected data from six tertiary medical institutions.
The performance of the proposed DBN-DS was compared with that of the NB, LR, BPN, SVM, RF, DBN, and Partin tables. The results showed that the proposed DBN-DS had better sensitivity and accuracy than all other methods.
In a recent pathological staging methodology study, Cosma et al. [4] use a neuro-fuzzy model, with an approach similar to ours. The results also indicated that the neural network-fuzzy-based computational intelligence learning approach is suitable for prostate cancer staging and exceeds the performance of the Partin tables. The neuro-fuzzy model and our proposed method aim to predict whether a patient has OCD (pT2) or NOCD (pT3+). All methods use the initial PSA level, Gleason scores, and clinical T stage to predict the pathologic stage of prostate cancer, but the   [4], although different data sets were used for each study; however, they show a high consistency with the results of the present study. Currently, the proposed DBN-DS method is implemented as a research tool. Once the clinical evaluation is completed, the proposed tool will be developed as an easyto-use clinical decision support system that can be accessed by clinicians.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (NRF-2016R1A2B4015922).