Development and Verification of Prognostic Nomogram for Penile Cancer Based on the SEER Database

Aim We aimed to establish a prognostic nomogram for penile cancer (PC) patients based on the Surveillance, Epidemiology, and End Results Program (SEER) database. Methods Data from 1643 patients between 2010 and 2015 were downloaded and extracted from the SEER database. They were randomly divided into the development group (70%) and the verification group (30%), and then, univariate and multivariate Cox proportional hazards regression, respectively, was used to explore the possible risk factors of PC. The factors significantly related to overall survival (OS) and cancer-specific survival (CSS) were used to establish the nomogram, which was assessed via the concordance index (C-index), receiver operating characteristic (ROC) curve, and calibration curve. An internal validation was conducted to test the accuracy and effectiveness of the nomogram. Kaplan–Meier calculation was used to predict the further OS and CSS status of these patients. Results On multivariate Cox proportional hazards regression, the independent prognostic risk factors associated with OS were age, race, marital status, N/M stage, surgery, surgery of lymph nodes, and histologic type, with a moderate C-index of 0.737 (95% confidence interval (CI): 0.713–0.760) and 0.766 (95% CI: 0.731–0.801) in the development and verification groups, respectively. The areas under the ROC (AUC) of 3- and 5-year OS were 0.749 and 0.770, respectively. While marital status, N/M stage, surgery, surgery of lymph nodes, and histologic type were significantly linked to PC patients' CSS, which have better C-index of 0.802 (95% confidence interval (CI): 0.771–0.833) and 0.82 (95% CI: 0.775–0.865) in the development and verification groups, and the AUC of 3- and 5-year CSS were 0.766 and 0.787. Both of the survival calibration curves of 3- and 5-year OS and CSS brought out a high consistency. Conclusion Our study produced a satisfactory nomogram revealing the survival of PC patients, which could be helpful for clinicians to assess the situation of PC patients and to implement further treatment.

Because of its high mortality, a clinical model for predicting the prognosis of PC patients is necessary [11]. Although the TNM stage and pathological classification systems from the 8th American Joint Committee on Cancer (AJCC) and the Union for International Cancer Control were widely used to predict the survival of PC patients [12,13], a lot of limitations existed.
As we know, the patient's prognosis is individual. And multiple factors other than TNM staging would have independently affected patients' prognosis, including age, race, marital status, information of surgery, count and density of examined lymph node, and histologic type [14,15]. The nomogram can quantify and analyze these factors more comprehensively than TNM staging system and finally get a more specific value of event probabilities [16]. It can help clinicians and patients to obtain reliable prognostic information more individual, reliable, and convenient [17,18].
It should be noted that the application of nomogram is limited by the established population, and it must be carefully cautioned when it is applied to different populations. However, the bias can be reduced by increasing the sample size. The Surveillance, Epidemiology, and End Results Program (SEER) database had collected detailed information on PC patients from multicenter in America, which allowed us to build a reliable prognostic nomogram.

Methods
After registering an account and signing a Data Agreement on the SEER database website, we were authorized to download all the data of PC patients using the SEER * Stat version 8.3.5 software. All available data on the patients' age, race, marital status, TNM stage (AJCC 7th standard 2010+), tumor primary site, surgery, surgery of lymph nodes, radiation, chemotherapy, histologic type, survival time, cancer-cause death, and live status were collected. Cases with unknown, undefined, or missing data were excluded. Patients' prognosis was mainly evaluated by the outcome of overall survival (OS) and cancerspecific survival (CSS). The "caret" package of the R version 3.6.0 software was utilized to randomize the patients into the development group (70%) and the verification group (30%). Notably, in our study, the patients' histologic type was limited to squamous cell neoplasms based on the 3rd Edition (ICD-O-3) (805-808). Patients with a follow-up less than 1 month were removed. Patients whose race is not black or white are described as "others," and patients who are divorced, separated, widowed, and unmarried but have a domestic partner are considered as "single." The R version 3.6.0 software with the "foreign," "survival," "survminer," and "rms" packages was used in all statistical analyses, and a P value of < 0.05 was regarded to be statistically significant. Every parameter was first analyzed using the univariate and multivariate Cox proportional hazards regression model to calculate hazard ratio and 95% confidence interval (CI). Then, possible risk factors linked to OS and CSS were identified. Finally, on the basis of those independent prognostic risk factors, the prognostic nomogram was developed to predict patients' further OS and CSS.
The concordance index (C-index), area under the receiver operating characteristic curve (AUC), and calibration curves of 3-and 5-year OS and CSS were calculated to verify the accuracy of the nomogram. Higher C-index and more AUC meant higher quality. We set up as many as possible bootstraps with 1000 resamples to ensure the precision of 3-and 5-year calibrations in comparing the predicted and observed OS and CSS. Furthermore, the Kaplan-Meier analysis was also used to demonstrate patients' possible OS and CSS.

Results
According to the screening criteria, 1943 males between 2010 and 2015 were involved in our study, from which 300 patients were excluded because of incomplete clinical information. Eventually, 1643 patients were included and then randomly divided into the development group (1151 patients) and the validation group (492 patients). The characteristics of these patients are summarized in Table 1.
In the development group, the median follow-up time was 42 (95% CI: 40-46) months, whereas the median OS was 66 (95% CI: 57-NA) months, and the median CSS was unavailable (Figures 1(a) and 1(b)). The univariate and multivariate COX regression analyses were carried out to predict patients' OS and CSS and identify the independent prognostic factors of PC (Tables 2 and 3). On univariate Cox proportional hazards regression, age, marital status, TNM stage, surgery, radiation, chemotherapy, and histologic type were all significantly related to the OS of PC patients, while marital status, TNM stage, primary site, surgery, surgery of lymph nodes, radiation, chemotherapy, and histologic type were associated to CSS. On multivariate Cox regression, the prognostic factors related to OS and CSS that were strongly independent included marital status, N/M stage, surgery, surgery of lymph nodes, and histologic type. Age and race could only affect patients' OS independently. Additionally, insignificant correlation of OS and CSS was found for T stage, tumor primary site radiation, and chemotherapy.
The prognostic nomogram involving all risk factors might be related to patients' OS or CSS based on the data of the development group is shown in Figures 2(a) and 2(b). Corresponding scores were assigned to each factor, and the sum of scores reflected the 3-and 5-year OS and CSS and mortality of patients. The C-index of the nomogram model for predicting the OS based on the development group was 0.737 (95% CI: 0.713-0.760), whereas that based on the verification group was significantly superior, with a value of 0.766 (95% CI: 0.731-0.801). Nomogram for predicting CSS showed better reliability and stability, with the C-index of 0.802 (95% confidence interval (CI): 0.771-0.833) and 0.82 (95% CI: 0.775-0.865) in the development and verification groups. The AUC of 3-year OS and CSS were 0.749 and 0.766 and of 5-year OS and CSS were 0.770 and 0.787 in the development group (Figures 3(a) and 3(b)), respectively, which indicated the reliability of these two nomograms. Both of the 3-and 5-year calibration curves predicted that the OS and CSS of the development group also showed satisfactory consistency between the observed and predicted outcomes (Figures 3(c) and 3(d)).
More intuitional differences of OS and CSS were shown on Kaplan-Meier analyses. The "coxph" package was used to build the proportional-risk model. After comparing their median risk, patients were divided into high-and low-risk groups, from which the high-risk group had both lower 3 years OS ( 1 The Surveillance, Epidemiology, and End Results Program; 2 includes patients whose race were not black or white; 3 includes patients who are divorced, separated, windowed, and unmarried but have domestic partner; 4 the surgery of the primary tumor; 5 local tumor destruction, includes electrocautery, fulguration, or laser; 6 local tumor excision, includes excisional biopsy, electrocautery, cryosurgery, and laser ablation; 7 simple/partial surgical removal of primary site; 8 total surgical removal of primary site (enucleation); 9 number of lymph nodes removed in surgery; 10 radiation before surgery; 11 radiation after surgery; 12 (Figure 4(a)), while younger patients had not an obvious advantage in CSS (Figure 4(b)). Marital status also seemed to affect the survival, as single patients were found to have a lower OS than others (P < 0:0001), and patients who never married unexpectedly got a worse CSS (P = 0:025); married patients had significant superiority over others in OS and CSS (Figures 4(c) and 4(d)). Although patients with more than four lymph nodes removed had a slight but insignificant advantage in OS, lymph node surgery did not contribute significant benefit (P = 0:059) (Figure 4(e)) and was even associated with poorer CSS (P < 0:0001) (Figure 4(f)). The impact of pathological differences on prognosis was obvious, in which verrucous and papillary carcinoma had significantly better OS and CSS (Figures 4(g) and 4(h)). No significant difference in OS and CSS was found among patients of different races or tumor primary sites; nevertheless, it seemed that patients would get a better survival when the primary tumor was located at the prepuce (Figures 4(i)-4(l)). However, significant differences of OS and CSS in Kaplan-Meier curves were also observed in the TNM stage, surgery, radiation, and chemotherapy (Figures 5(a)-5(l)). The OS and CSS of patients receiving radical surgery, radiotherapy, and chemotherapy did not significantly improve either, although patients who received preoperative radiotherapy seemed to get a better CSS.
C-index, AUC, and calibration curves were used to evaluate the accuracy of the two nomograms. The verification group had a C-index of 0.766 (95% CI: 0.731-0.801) and 0.82 (95% CI: 0.775-0.865) when predicting the OS and CSS, respectively, which were both higher and better than those of the development group. Meanwhile, the AUC of 3-year OS and CSS in the validation group were 0.754 and 0.771 and of 5year OS and CSS were 0.723 and 0.756 (Figures 6(a) and  6(b)). The observed-predicted calibration curve at 3 and 5 years also showed similar results (Figures 6(c) and 6(d)). All of these results demonstrate the accuracy of these nomograms.

Discussion
Although patients have some differences in hygienic, social, and religious practice [19], PC, mostly squamous cell carcinoma [20], has been a rare disease in the past decades [21][22][23]. In most developed areas, the incidence of PC has been gradually decreasing [24,25]. However, because of the uncommon clinical cases and lack of reliable prognostic tools, clinicians seemed to have limited methods for understanding and predicting the prognosis of PC.
As a tool for predicting patients' prognosis, the nomogram is widely used in oncology, such as in bladder, prostatic, and breast cancer [26][27][28]. It can provide a more individualized prognostic assessment for patients by combining various prognostic risk factors which have been widely recognized [29]. Our prognostic nomogram was based on the SEER database, which includes the detailed information of approximately 34.6% of the U.S. population [30].
In our study, elderly patients, especially those older than 80 years, would have a significantly lower 3-year (38.9%, 95% CI: 32.9%-42.1%) and 5-year (22.7%, 95% CI: 16.8%-30.8%) overall survival (P < 0:0001). Simultaneously, multivariate Cox analyses also revealed the risk of advanced age; these patients were weighted with more points than others in the nomogram. Furthermore, the Kaplan-Meier curve of age showed only slight differences in OS among all groups younger than 70 years (597/1151 of the development group). These findings suggest that elder age may be an independent risk factor for the prognosis of PC patients, which is consistent with most studies [23]. However, the difference in age in our study did not affect patients' CSS, as reported in the study of Shao et al. [31].
According to a study that included 5412 patients from the SEER who suffered penile squamous cell carcinoma between 1998 and 2011 by Sharma et al., black males who suffered from PC would have a worse OS. However, they excluded all patients in the M1 stage and included only 183 black patients, most of whom were diagnosed with a higher T stage of disease, lacked private insurance, and had lower median income [32]. Similarly, Slopnick et al. declared that African-American (AA) PC patients probably had a higher risk of death than white patients. Compared to white patients, surgical treatment was significantly delayed in AA patients. Meanwhile, a higher incidence of medical comorbidities such as heart disease, hypertension, and diabetes might also reduce their OS [33]. In our study, the white race was also an independent prognostic factor of OS but not CSS based on the Cox regression, but Kaplan-Meier curve analyses demonstrated that white patients only had a slight advantage in the long-term OS compared to black and other races. Thus, it may be worthwhile to Signif. codes: 0 " * * * " 0.001 " * * " 0.01 " * " 0.05 "." 0.1 " " 1. 1 Hazard ratio; 2 included patients who were not black or white; 3 includes patients who are divorced, separated, windowed, and unmarried but have domestic partner; 4 the surgery of the primary tumor; 5 local tumor destruction, includes electrocautery, fulguration, or laser; 6 local tumor excision, includes excisional biopsy, electrocautery, cryosurgery, and laser ablation; 7 simple/partial surgical removal of primary site; 8 total surgical removal of primary site (enucleation); 9 number of lymph nodes removed in surgery; 10 radiation before surgery; 11 radiation after surgery; 12   We also focused on marital status in our study. Cox regression analyses revealed that married status independently affects OS and CSS; married males with PC had better survival than single or unmarried patients, which might be related to their relatively fixed sexual partners or regular and clean sexual practices. Both intentional and unintentional examinations by the married male or his spouse before and after intercourse could also allow for the detection of penile abnormalities in the early stage. Unexpectedly, although the average age of unmarried men was younger, their survival still had no sufficient advantages, even worse than the single patients in CSS.
Furthermore, the results of Cox regression analyses suggested the importance of the cancer stage in the prognostic evaluation of patients. However, this seemed that the T stage did not show sufficient prognostic value in the multivariate Cox regression, which was consistent with the result from Gao et al. [14]. In the K-M analysis, the T stage was obviously related to patients' survival. Wu et al. included 234 patients from Sun Yat-Sen University Cancer Hospital and declared that the pathological T stage was an independent risk factor of lymph node metastases [34]. Similar to most studies, lymph node involvement and distant metastasis were found to be independent risk factors for prognosis [35]. Notably, our results also showed that the absence of significantly enlarged lymph nodes in the groin was very important for prognosis.
A different tumor primary site would not affect the prognosis of PC patients; clinicians should thus decide on the appropriate surgical method based more on the stage of the tumor to preserve the patients' sexual ability and improve sexual satisfaction. Expectedly, patients who underwent surgery showed generally better CSS; particularly, the therapeutic effects of electrocautery, fulguration, cryosurgery, and laser Signif. codes: 0 " * * * " 0.001 " * * " 0.01 " * " 0.05 "." 0.1 "". 1; Inf: infinity. 1 Hazard ratio; 2 included patients who were not black or white; 3 includes patients who are divorced, separated, windowed, and unmarried but have domestic partner; 4 the surgery of the primary tumor; 5 local tumor destruction, includes electrocautery, fulguration, or laser; 6 local tumor excision, includes excisional biopsy, electrocautery, cryosurgery, and laser ablation; 7 simple/partial surgical removal of primary site; 8 total surgical removal of primary site (enucleation); 9 number of lymph nodes removed in surgery; 10 radiation before surgery; 11 radiation after surgery; 12    BioMed Research International ablation were worthy of recognition. However, the prognosis of patients after radical surgery did not improve significantly. They generally had a worse TNM stage; this might be the cause of poor prognosis. But there may be a statistical bias because only a few patients (0.2%) underwent debulking surgery. Nevertheless, considering the integrity of the data, we still retained this in our calculation. The recent guidelines on PC from the European Association of Urology strongly affirmed the prognostic importance of the number of positive lymph nodes found on physical examination and pathological biopsy. The OS of patients with three or more inguinal lymph nodes would drop sharply to below 60% [36,37]. However, in our study, the K-M curve showed insignificant difference in OS between patients with different numbers of lymph nodes surgically removed, and patients whose lymph nodes were not removed had a better CSS. Hakenberg et al. claimed that 25% of patients might have micrometastases; even if they did not have obvious swollen inguinal lymph nodes, they might have early metastasized [38]. Thus, surgeons should prospectively focus on sentinel lymph node biopsy or dynamic sentinel lymph node biopsy to determine lymph node metastasis as accurately as possible, rather than simply predicting the prognosis and formulating treatment plans based on the number of enlarged lymph nodes found on physical examination or removed in surgery [39,40].
Most studies have found that adjuvant chemotherapy could improve the disease-free survival rate and median survival of PC patients with positive lymph nodes after radical inguinal lymph node dissection [41,42], and this might also reduce their clinical stage [43,44]. However, our results showed that chemotherapy was not an independent prognostic risk factor. In the development group, patients receiving chemotherapy (141 patients) had significantly wider lymph node infiltration on average (stage N2: 37 patients, stage N3: 48 patients), which was also considered to be a high-risk factor for recurrence after chemotherapy [45]. Since the data were unclear about the specific chemotherapy regimens given, we were conservative about this result. Patients who received radiotherapy either before or after surgery did not have significant benefits and even had worse survival. Patients who received preoperative radiotherapy seemed to get a better CSS, but a bias might be caused by limited patient number (2/1151). Radiation therapy would increase the difficulty and the risk of complications in the dissection of inguinal positive lymph nodes and resection of the primary tumor. Some studies also claimed that radiation therapy cannot significantly prolong the OS of PC patients [46,47]. Our study also found that the pathological characteristic of patients was an important factor of prognosis, among which verrucous carcinoma, verrucous papilloma, squamous cell papilloma, and papillary   [38].
Some limitations in our study must be taken into consideration. First, the SEER database was a retrospective resource library including patients from USA over a long period of time.
Most of these patients were white, which might have introduced a bias and limited its application. Second, data about habits, customs (especially for sexual activity), HPV infection, average income, religion, smoking, education, Charlson comorbidity index, complications, and other information were not available in the SEER database, which could also affect the quality of our results. Finally, no additional data about PC patients from other sources or institutions could be used for external verification, which might have caused a selection bias.

Conclusions
Our results demonstrated that our nomogram model is feasible and reliable. This could be helpful for clinicians to evaluate the prognosis of PC patients faster and more accurately. However, because of the limitations in our study, more prospective studies are required to verify the accuracy of this nomogram.

Data Availability
The dataset supporting the conclusions of this study is available in the SEER database.

Disclosure
An earlier version of this article has been submitted as a preprint paper (https://www.researchsquare.com/article/rs-76991/v1).

Conflicts of Interest
The authors declare that they have no conflict of interest.