Combining Phenotypes of Nucleotide Excision Repair Pathway to Predict the Risk of Head and Neck Squamous Cell Carcinomas in a Chinese Population

Background Nucleotide excision repair (NER) is pivotal in the development of smoking-related malignancies. Nine core genes (XPA, XPB, XPC, XPD, XPF, XPG, ERCC1, DDB1, and DDB2) are highly involved in the NER process. We combined two phenotypes of NER pathway (NER protein and NER gene mRNA expression) and evaluated their associations with the risks of the head and neck squamous cell carcinomas (HNSCCs) in a Chinese population. Methods We conducted a case-control study of 337 HNSCC patients and 285 cancer-free controls by measuring the expression levels of nine core NER proteins and NER gene mRNA in cultured peripheral lymphocytes. Results Compared with the controls, cases had statistically significantly lower protein expression levels of XPA (P < 0.001) and lower mRNA expression levels of XPA and XPB (P = 0.005 and 0.001, respectively). After dividing the subjects by controls' medians of expression levels, we found an association between increased risks of HNSCCs and low XPA protein level (Ptrend = 0.031), as well as low mRNA levels of XPA and XPB (Ptrend = 0.024 and 0.001, respectively). Subsequently, we correlated the two phenotypes and found associations between the NER mRNA and protein levels. Finally, the sensitivity of the expanded model with protein and mRNA expression levels, in addition to demographic variables, on HNSCCs risk was significantly improved. Conclusions Combining two phenotypes of NER pathway may be more effective than the model only including one single phenotype for the assessment of risks of HNSCCs.


Introduction
Head and neck squamous cell carcinomas (HNSCCs) are among the most common malignancies worldwide, which originate in the epithelial cells of the mucosal linings of the upper airway and food passages (the oral cavity, oropharynx, hypopharynx, and larynx) [1][2][3][4]. In the United States, the estimated number of new HNSCC cases has been increasing from 59,260 in 2010 to 65,410 in 2019, according to the American Cancer Society [5][6][7]. In China, there were 74,500 new HNSCC cases in 2015, according to the Chinese Cancer Society [8]. It is well established that tobacco smoking and excessive alcohol use are major risk factors for smoking-related HNSCCs and prior high-risk human papillomavirus (HPV) infection for HPV-positive HNSCCs, especially for the oropharyngeal cancer in the western world [9][10][11]. Although these risk factors may contribute to HNSCCs, only a small fraction of those who have the history of smoking, excessive alcohol use, or HPV infection develop one of these cancers in their lifetime, suggesting that there may be heterogeneity in HNSCCs susceptibility [12,13].
Numerous carcinogenic chemicals in cigarette smoke can cause damages to cellular DNA [14]. For example, one of these chemicals is benzo (a) pyrene diol epoxide (BPDE) which is found in cigarette smoke as well as in the environment as a result of fuel combustion. The BPDE can induce irreversible damage to DNA by forming DNA adducts to block transcription of essential genes [15,16]. Several DNA repair processes have evolved to repair these damages, among which nucleotide excision repair (NER) is a major and well-studied one [16][17][18][19]. NER essentially uses nine core proteins (XPA, XPB, XPC, XPD, XPF, XPG, ERCC1, DDB1, and DDB2) to restore the damaged DNA to normal one in living cells [20][21][22]. Functional mutations in any of these proteins may lead to abnormal NER and subsequently increase susceptibility to cancers including cancers of skin, lung, head, and neck [23][24][25][26][27].
So far, several studies have been reported concerning the association between the polymorphisms (genotypes) in the NER pathway genes and risks of different cancers [28,29]. In a Chinese population study, the results indicated that ERCC1 rs2298881 CA variant genotype was associated with an increased gastric cancer risk [30]. And another Chinese population study showed that five SNPs in NER genes are correlated with neuroblastoma susceptibility [31]. Furthermore, the association between the NER gene polymorphisms and risks of HNSCCs has been reported as well [22]. The XPA rs1800975 23A>G has been reported to be associated with HNSCCs, and XPD rs13181 also has been associated with HNSCCs [32]. In another large case-control study, it is reported that XPC Ala499Val SNP was associated with the risk of HNSCCs [33]. However, the associations of these biomarkers with cancer risks would not be fully elucidated if these analyses only contain genotype data. Thus, it is necessary to evaluate the associations between NER phenotypes (mRNA and protein levels) and risks of HNSCCs. In a non-Hispanic white population study, patients with reduced expression levels of NER proteins were reported to have an increased risk of having HNSCCs than those with higher protein levels [34]. Later, the same group validates these results in a study with a much larger sample size and more NER proteins [35]. Until now, there is no study exploring the above associations in the Chinese population, in which the composition of HNSCCs is quite different from that in non-Hispanic white population. Specifically, oropharyngeal cancer accounts for the most of HPV-positive HNSCCs in the United States, and the HPV-positive oropharyngeal cancer cases in previous non-Hispanic white study were about 91.4% of all the oropharyngeal cancer cases, while in the Chinese population, the most cases of oropharyngeal cancer are HPV-negative, meaning they are primarily caused by cigarette smoking [35][36][37][38][39]. In addition, the etiology of smoking-related HNSCCs is different from that of HPVpositive HNSCCs [40][41][42].
Although the NER translational and transcript levels have been studied separately for the risks of HNSCCs in a non-Hispanic white population, the effect of a single expression level of the NER pathway is quite limited for the prediction of HNSCC risk. Therefore, it is necessary to combine both NER translational and transcript levels to predict risks of HNSCCs. Subsequently, in the current study, we first validate the association of NER proteins and HNSCC risks in a Chinese population. Then, we test the effectiveness of combining two NER phenotypes in the HNSCC risk prediction model.

Study Subjects.
We recruited 337 HNSCC patients and 285 cancer-free controls from the First Affiliated Hospital of Xi'an Jiaotong University during the period between 2013 and 2018. The cases were selected based on the following criteria: newly diagnosed, histologically confirmed HNSCCs but with no other cancers. The controls were recruited among visitors accompanying patients to the First Affiliated Hospital of Xi'an Jiaotong University. They were biologically unrelated to the cases, frequency matched with cases by age and sex, and have no history of prior malignancies. The subjects included in currently study were all Chinese Han. A written informed consent was obtained from cases and controls. Participants who smoked more than 100 cigarettes during their lifetime were defined as ever smokers, of which those who had quit smoking at least one year were defined as former smokers and remaining was considered current smokers; others were considered never 2 Disease Markers smokers. Participants who drank alcoholic beverages at least weekly for one year were considered as ever drinkers, of which those who had quit drinking for more than one year were considered as former and the remaining was defined current drinkers; others were defined never drinkers. Each subject donated a 15 ml blood sample. The HPV status of all subjects was tested by RT-PCR assay. In the previous study, the expression levels of NER proteins were not correlated with the HPV status in the non-Hispanic white population [35]. Since the number of HPV-positive HNSCC cases was extremely limited with only five cases identified as HPVpositive, we could not infer whether NER protein levels were correlated with the HPV status. Thus, HPV-positive HNSCC subjects were excluded to avoid further heterogeneity in this study. The study protocol was approved by the First Affiliated Hospital of Xi'an Jiaotong University Institutional Review Board.

2.2.
Reverse-Phase Protein Lysate Microarrays. Details regarding the current methods have been reported previously [35]. In detail, we isolated T-lymphocytes from whole peripheral blood by Ficoll gradient centrifugation. Cellular proteins were extracted from the cells and prepared for the RPPA analysis. Serial diluted lysates applied to nitrocellulose-coated slides (Schleicher & Schuell BioScience, Inc., USA) by Aushon Arrayer (Aushon BioSystems, USA). Each sample containing the antigens (the NER proteins) to be detected was spotted in duplicate with additional positive and negative controls prepared from mixed cell lysates or dilution buffer, respectively. Each slide was probed with a validated primary antibody plus a biotin-conjugated secondary antibody. Mouse anti-goat or anti-rabbit polyclonal or anti-human monoclonal antibodies were used against XPA, XPB, XPC, XPD, and ERCC1 (Santa Cruz, USA); XPF (Abcam, USA); XPG (Protein tech, USA); and DDB1 and DDB2 (Invitrogen, USA). The arrays were incubated with individual antibodies for 1 h at room temperature. The secondary antibodies were added to the slides and incubated at room temperature for 30 min.
Signals were amplified using a Dako system according to the protocol as previously described [35]. We then incubated the slides with a secondary conjugated streptavidin for 30 min and observed the signals by DAB colorimetric reaction. The signals on the microarrays were processed using the Array-Pro Analyzer software (Media Cybernetics, USA) to determine spot intensity, which were then analyzed by a logistic model by the R package. A fitted curve was plotted with the relative log2 concentration of each protein on the X-axis and the signal intensities on the Y-axis using the B-spline model as previously described [33]. Protein concentrations were determined from the fitted curve for each lysate by the curve fitting and normalized by the median value for protein loading as described [43,44]. The RPPA_CF is the correction factor in RPPA. Samples were considered as an outlier, if the correction factor was below 0.25 or above 2.5.

Quantitative Real-Time PCR.
Details of this test have been reported previously. Briefly, the mRNA expression levels were examined using the total RNA with the TRIzol reagent (Invitrogen, Carlsbad, CA) by the ABI 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA). The process of thermal cycling conditions was done as follows: 95°C for 5 min, followed by denaturation at 95°C for 15 s for 40 cycles, and annealing/extension at 60°C for 1 min. Each sample was analyzed in triplicate. The Ct value or threshold cycle was the PCR cycle at which a significant increase in fluorescent signal was first detected. The 18S expression was used as an internal control. The expression levels of NER genes relative to that of 18S were calculated by delta Ct (ΔCt). The ΔCt value was the Ct value of the target gene subtracted its Ct value of 18S. Therefore, the higher the ΔCt values, the lower expression levels of the target mRNA.

Statistical
Analysis. The distribution of demographic variables was evaluated between cases and controls by the chi-square test. The differences in the relative expression 3 Disease Markers levels of NER proteins and mRNA levels were compared by Wilcoxon rank-sum test between cases and controls.
The medians of expression values were used in the controls as the cutoff values for calculating crude odds ratio (OR) and their 95% confidence intervals (CI). The associations between expression levels of protein and mRNA and HNSCC risk were estimated by computing ORs and CIs from multivariate logistic regression models. Further stratification analyses were used to evaluate effect modification of related expression levels of NER protein and demographic variables. A multiplicative interaction was defined as when OR 11 > OR 01 × OR 10 , in which OR 11 was the OR when both factors were present, OR 10 was the OR when only factor 1 was present, and OR 01 was the OR when only factor 2 was present [45].
To assess effects of two NER phenotypes (protein and mRNA) on HNSCC risk prediction, four risk models were constructed to examine the area under the receiver operating characteristic (ROC) curve (AUC): the baseline model including only demographic variables, the protein model including expression levels of proteins in addition to these demographic variables, the gene model including mRNA expression levels of the genes in addition to these demographic variables, and the combining model including the expression levels of mRNA and protein in addition to these demographic variables. All tests were two-sided, and P <  3.2. Differences in NER Protein or mRNA Expression Levels between Cases and Controls. The cases showed lower relative mean expression levels in six of the nine core NER proteins analyzed than did controls, except for XPC, XPG, and ERCC1 (Table 2). In Wilcoxon rank-sum test for differences in NER protein expression levels between cases and controls, only XPA levels were statistically significantly lower in cases than in controls (P = 0:001; Figure 1(a)). Because the expression levels of the nine NER proteins were measured at the same time, they were likely to be correlated with each other. As shown in Supplementary Table 1, expression levels of XPA were statistically significantly correlated with XPB, XPC, XPD, and ERCC1 (P = 0:019, P = 0:050, P < 0:001, and P = 0:012, respectively). Moreover, mRNA expression levels of XPA, XPB, and XPF were statistically significantly lower in cases than in controls (P = 0:005, P = 0:001, and P = 0:035, respectively, Table 2).

Stratification Analyses of Expression Levels of XPA.
Stratification analyses of XPA expression levels revealed that patients in subgroups of the age ≤ 59, age > 59, male, female, former and current smokers, and former and current drinkers exhibited significantly lower mean expression levels of XPA than did controls (all the P < 0:001, respectively, Table 3). In cases, women had lower expression levels of XPA than did men, but in controls, women had higher expression levels of XPA than did men, and the sex differences in the expression levels were insignificant in both case and control groups (P = 0:249 and P = 0:889, respectively, Table 3). Moreover, both ever smokers and drinkers had significant lower expression levels of XPA than did never smokers and drinkers, respectively (all the P < 0:001, respectively, Table 3). There were no significant differences in the expression levels of XPA by tumor sites, suggesting that expression levels of XPA may not be different among tumors of HNSCCs (Supplementary Table 2). We further assessed possible interactions on a multiplicative scale between expression levels of XPA and selected variables listed in Table 1. The multiplicative interaction was tested when we included the interaction term (i.e., relative expression levels of XPA × each of the risk factors) in a multivariate regression model that also included the main effects of NER protein expression levels and other covariates. We found that smoking status as well as drinking status had significantly multiplicative interactions with relative expression levels of XPA (P = 0:005 and P = 0:044, respectively, Table 3), in association with HNSCC risk. To further unravel these multiplicative interactions, we stratified the adjusted ORs by smoking status and drinking status. It was apparent that ORs for the relative expression levels of XPA by median in groups of ever smokers were greater than those of never smokers (Figure 1(b)). And the ORs for the relative expression levels of XPA by medians in groups of ever drinkers were greater than those of never drinkers (Figure 1(c)).

Associations between Expression Levels of Protein and mRNA and Risks of HNSCCs.
We first correlated expression levels of NER proteins with mRNA. As shown in Table 4, expression level of XPA was statistically significantly correlated with mRNA expression levels of XPA or XPB

Disease Markers
(P < 0:001 and <0.001, respectively). Then, to estimate HNSCC risks, the expression levels of proteins and mRNA were grouped into median values of the controls (Tables 5  and 6). The crude ORs for HNSCC risk associated with lower relative expression levels of XPA were 1.43 (95% CI, 1.04-1.97), compared with the high expression levels of XPA. After adjusting for age, sex, smoking status, and alcohol consumption in multivariate logistic regression analysis, the OR of XPA remained essentially unchanged. When continuous expression values were used in the logistic regression model with adjustment for all covariates, there was also a dose-response relationship between   Disease Markers the reduced expression levels of XPA and the increased HNSCC risk (P trend = 0:031). Furthermore, when continuous mRNA expression values were used in the logistic regression model with adjustment for all covariates, there were also dose-response relationships between the reduced mRNA expression levels and the increased HNSCC risks for XPB and XPA (P trend < 0:001 and =0.024, respectively, Table 6).

Prediction of the Risk of HNSCCs by NER Protein and mRNA Expression.
We assessed the performance of expression levels of NER protein on HNSCC risk prediction using the ROC curves. The AUC was significantly improved in the model that included the effect of XPA expression, compared with the model that did not (Figure 2(a), P = 0:004). Furthermore, the AUC was significantly improved in former and current smokers that included the effects of XPA expression, compared with the model that did not (Figures 2(c) and 2(d), P < 0:001 and P < 0:001, respectively), but insignificantly improved in never smokers (Figure 2(b), P = 0:462).
The AUC was significantly improved in former and current drinkers that included the effects of XPA expression, compared with the model that did not (Supplementary Figure 1B and 1C, P = 0:001 and P = 0:001, respectively), but insignificantly improved in never drinkers (Supplementary Figure 1A, P = 0:404).
We further assessed the prediction performance of models combining expression levels of NER mRNA and protein on HNSCC risks. Compared with the model that only included XPA, the AUC was significantly improved in the model including the effects of NER protein and mRNA (XPA and XPA; P = 0:010, Figure 3(a)). Besides, 7 Disease Markers compared with the model that only included NER mRNA (XPA), the AUC was significantly improved including the effects of NER protein and mRNA (XPA and XPA; P = 0:002, Figure 3(b)). Compared with the model that included two mRNA levels (XPA and XPB), the AUC model was also improved including the effects of NER protein and mRNA (XPA and XPA; P = 0:056, Figure 3(c)).

Discussion
In this study, we firstly confirmed the results that reduced expression levels of NER protein were associated with an increased risk of HNSCCs in a Chinese population. Later, we assessed interactions between XPA expression levels and selected variables and found that smoking as well as drinking had significant multiplicative interactions with XPA expression on HNSCC risk. As the effect of a single phenotype in the NER pathway on the HNSCC risk prediction is quite limited, we combined expression levels of NER protein and mRNA in the ROC model. Our result showed that the model combining both NER protein and mRNA levels maybe more effective for the HNSCC risk prediction than the model that only included one phenotype.
In an early study, it was reported that there was an association between an increased risk of HNSCCs and reduced expression levels of XPD, XPF, XPA, and XPC in non-Hispanic white population, when appropriate antibodies for DDB1 and XPB were not available at that time [34]. Later, the same group validated the above results with more available antibodies for essential proteins and found the risks of HNSCCs associated with lower expression levels of XPA and XPB [35]. As the composition of HNSCCs in the Chinese population is quite different from that in non-Hispanic white population, we tested the

Disease Markers
associations between expression levels of nine core NER proteins and risk of HNSCCs in a Chinese population and found that the reduced expression levels of XPA was associated with HNSCC risk, but not for XPB. These results further support the notion that altered translational levels of NER pathway, which have a more direct effect on the NER capacity than that of transcript levels, may contribute to the risks of HNSCCs. Moreover, our previous work of transcript level suggested that mRNA expression levels of XPA and XPB were statistically significantly lower in cases than in controls, and the reduced mRNA expression levels of XPB were associated with an increased risk of HNSCCs in a Chinese population [46]. How-ever, we did not find above association with XPB in translational level. One reason for this discrepancy is that the sample size of current study is still not large enough; future studies with more cases and controls are warranted to validate the current results. Another reason is that the transcript levels and translational levels of NER genes may not be directly correlated.
Although the mRNA of NER gene is ultimately translated into a NER protein, the transcription and translation processes are far from a simple linear correlation [47]. The underling mechanisms are likely to be the cis-acting and trans-acting processes create a serial of systems that promote or inhibit the synthesis of proteins from a certain copy number of mRNA molecules,   and translation levels are more directly involved in the NER repair process [48].
Previous study suggested a modification effect of smoking status on XPB, indicating that an association between the reduced expression levels of XPB and increased risk of HNSCCs may differ by smoking status [35]. In current study, we have observed smoking as well as drinking status had significant multiplicative interactions with XPA expression levels on HNSCC risk, other than XPB. Subsequently, we stratified the ORs of XPA by smoking and drinking status and found that the adjusted ORs for XPA in ever smokers or ever drinkers were greater than that in never smokers or never drinkers, indicating that ever smokers or ever drinkers might have a higher risk of developing HNSCCs with reduced XPA expression levels.
The XPA protein consists of several domains: the Cterminal domain is able to interact with the transcription factor IIH, the N-terminal domain with RPA34 and ERCC1 binding sites, and the central domain responsible for DNA binding [32]. Variation in XPA's functions may lead to an aberrant NER process and subsequently increase the susceptibility to cancer. Our data suggested an increased risk of HNSCCs associated with reduced expression levels of XPA in a Chinese population, and the current results were consistent with previously published non-Hispanic white studies on HNSCC risks, suggesting XPA may serve as a general biomarker for HNSCCs among two race groups.
Although the analysis of the NER mRNA expression is easy to perform, the results from the PCR assay may fluctuate in different cell stages. Thus, the data from transcript level maybe unsteady to predict HNSCC risks, and another assay is warranted to confirm the results from the PCR assay. The RPPA assay is a rapid, cost-effective, and most importantly an efficient method to measure the expression levels of NER proteins, and the current study is the first study to measure the associations between NER proteins and risks of HNSCCs in a Chinese population. Previously, we assessed the performance of NER proteins on HNSCC risk in the AUC model in a non-Hispanic white population and found that the AUC model was significantly improved by including the effects of XPB and XPA expression, compared with the model that did not [35]. In the current study, we found that the AUC model was significantly improved by XPA expression levels, suggesting that suboptimal XPA expression levels may play an important role in the risk of HNSCCs in two different races. Furthermore, we added NER mRNA expression in the protein-gene model and found the model that included NER protein and mRNA levels could significantly improve the AUC, compared with the model only included NER protein or mRNA alone. These results further implied that combining two phenotypes of NER pathway could significantly improve the effectiveness of the NER-HNSCC risk prediction.
Although this epidemiological study is the first large population study combining NER protein and mRNA levels to investigate risks of HNSCCs, there are still several limitations needed to be resolved. Like previous hospital-based studies, the control group may not be representative of the general population, and future studies may need a much larger sample size with more HPV-positive cases included and recruit the controls from the community-based population.

Conclusion
Reduced XPA expression levels were associated with an increased risk of HNSCCs in a Chinese population, and combining NER protein and mRNA may build a novel risk assessment model for HNSCC risk prediction.