A Novel Pseudogene Methylation Signature to Predict Temozolomide Outcome in Non-G-CIMP Glioblastomas

Objective Alterations in the methylation state of pseudogenes may serve as clinically useful biomarkers of glioblastomas (GBMs) that do not have glioma-CpG island methylator phenotype (G-CIMP). Methods Non-G-CIMP GBM datasets were included for evaluation, and a RISK-score signature was determined from the methylation state of pseudogene loci. Both bioinformatic and experimental analyses were performed for biological validation. Results By integrating clinical information with DNA methylation microarray data, we screened a panel of eight CpGs from discovery cohorts of non-G-CIMP GBMs. Each CpG could accurately and independently predict the prognosis of patients under a treatment regime that combined radiotherapy (RT) and temozolomide (TMZ). The 8-CpG signature appeared to show opposite prognostic correlations between patients treated with RT/TMZ and those treated with RT monotherapy. The analyses further indicated that this signature had predictive value for TMZ efficacy because different survival benefits between RT/TMZ and RT therapies were observed in each risk subgroup. The incorporation of other risk factors, such as age and O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status, with our pseudogene methylation signature could provide precise risk classification. In vitro experimental data revealed that two locus-specific pseudogenes (ZNF767P and CLEC4GP1) may modulate TMZ resistance via distinct mechanisms in GBM cells. Conclusion The biologically and clinically relevant RISK-score signature, based on pseudogene methylation loci, may offer information for predicting TMZ responses of non-G-CIMP GBMs, that is independent from, but complementary to, MGMT-based approaches.


Introduction
Glioblastoma multiforme (GBM) with a glioma-CpGs island methylator phenotype (G-CIMP) are the most frequent and devastating glioma subtype [1]. Intra-or intertumoral molecular heterogeneity has been a major obstacle when developing treatment strategies against this deadly disease [2]. Identification of novel biologically and clinically relevant biomarkers may assist in the stratification of GBM subsets with distinct molecular features and allow for the development of precision medicines [2].
Compelling data have linked pseudogene alterations with glioma biology and response to treatment [3][4][5]. DNA methylation is a critical layer of control for the pseudogene transcriptome [6] and has long been regarded as an ideal cancer biomarker [7]; therefore, the identification of clinical relevant alterations in DNA methylation of pseudogenes is of great importance. In the present study, we integrate in silico and experimental approaches to examine the clinical and biological implications of pseudogene methylation in non-G-CIMP GBMs.

Patient Cohort from Rennes and Angers University
Hospitals. A French patient cohort of seventy-seven primary non-G-CIMP GBMs from Rennes and Angers University Hospitals (RAUH) has previously been reported [7]. All patients underwent combination radiotherapy (RT) and concurrent and adjuvant temozolomide (TMZ). Snapfrozen surgical samples were profiled using Infinium Human Methylation450k BeadChip (Illumina Inc.) as described in Ref [7]. The G-CIMP subtype was determined by a Kmeans clustering algorithm [8], and the O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status was calculated using DNA methylation data from two Illumina probes (cg12434587 and cg12981137) [9].

Patient Cohorts from Public Databases.
Genomic DNA methylation and gene expression microarray data from 106 patients with an integrative diagnosis of non-G-CIMP GBM were downloaded from the Cancer Genome Atlas (TCGA) together with clinical annotations (RT/TMZ, n = 73; RT monotherapy, n = 13; and unknown regimens, n = 20) [10]. A further collection of 59 non-G-CIMP GBM samples with Illumina 450k DNA methylation microarray data was obtained from GSE60274 deposited in Gene Expression Omnibus (GEO; RT/TMZ, n = 32, and RT monotherapy, n = 27) [11]. Finally, Infinium450k DNA methylation microarray data from G-CIMP GBMs in TCGA [10] and nontumor brains (NTBs) in GSE63347 [12], together with RNA sequencing data from primary GBMs and NTBs in the Chinese Glioma Genome Atlas (CGGA) [13], were included for comparative analysis.

Probe Selection and RISK-Score Modeling.
A discoveryvalidation approach was employed to develop a multimarker prediction model. Data of patients treated with RT/TMZ were collected from TCGA and GSE60274 datasets for use during the discovery phase. Illumina 450k probes, that did not match regions with single-nucleotide polymorphisms and regions on X and Y chromosomes, were crossmatched with a list of 13603 pseudogenes downloaded from HGNC (HUGO Gene Nomenclature Committee;http://www .genenames.org/). 3210 CpGs located within the genomic regions of approximately 660 pseudogenes were identified and those with a standard deviation of inter value >0.1 from TCGA were further selected to correlate them with survival data using a univariate Cox regression model (Figure 1(a)). Inconsistent results from each discovery cohort were removed, and 15 overlapping candidates (permutation P < 0:2) were inputted into a multivariate Cox regression model that incorporated age, dataset source, and MGMT methyla-tion status (Figure 1(a)). Finally, a panel of eight CpGs targeting seven pseudogenes was identified and combined using a RISK-score formula (Figure 1(a) and Table 1), which was calculated as the sum of β values of each CpG weighted by their multivariate Cox coefficients. The optimal cutoff for stratification of risk subgroups was determined using the maxstat R package [14]. Batch effect across datasets was adjusted using a nonparametric empirical Bayes approach (combat R package) [15].
2.4. Bioinformatic Analysis. Gene set enrichment analysis (GSEA) was performed on the gene sets of the gene ontology biological processes from molecular signature database (MSigDB) to evaluate the functional profiles of each risk subgroup [16]. Tumor mutation burden (TMB) was calculated using mutation annotation format (MAF) flies from TCGA and defined as the total amount of coding variants/ the length of exons (38 million; maftools R package [17]).

Cell
Culture and Drugs. The human GBM cell lines (A172, U251, U373, and DBTRG-05MG) were obtained from the American Type Culture Collection and were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum at 37°C in 5% CO 2 . TMZ (MedChemExpress) was reconstituted in dimethysulfoxide (DMSO, Sigma-Aldrich) at a concentration of 100 mM.

TMZ Cytotoxicity Assay.
The chemosensitivity of TMZ was tested by the Cell Counting Kit-8 (CCK-8) kit. GBM cells were seeded in 96-cell plates at a density of 5 × 10 3 cells per well and exposed at indicated concentrations of TMZ (7.5, 15, 30, 60, 120, 240, and 480 μM) for 48 h. CCK-8 reagent was then added to wells (10 μl/well) and incubated for 1 h at 37°C. The OD 450nm value was measured for calculating half maximal inhibitory concentration (IC50).
2.10. Statistical Analysis. Frequency data were examined using Fisher's exact or Chi-square test. Continuous data were examined using an unpaired t test or Mann-Whitney test. Time-to-event data (for example, overall survival (OS) or progression-free survival (PFS)) were compared using Kaplan-Meier curves and the log-rank test. The prognostic influence and independence of each variable were evaluated using univariate and multivariate Cox regression models. Results from each cohort or subgroup were pooled using meta-analysis, where an inverse-variance approach was applied using either fixedor random effect models based on the heterogeneity test, with P < 0:1 or I 2 > 50% considered to be statistically significant. Differences between subgroups were tested using subgroup analysis. The performance of risk variables was assessed using area under the curve (AUC) from a time-dependent receiver operating characteristic curve (survcomp R package) [18]. All statistical tests were done within SPSS (SPSS software Inc.) and R software. Statistical significance was defined as a twosided P < 0:05.
The optimal cutoff was calculated as -6.6904 for the combined discovery cohort.

The Performance of the Pseudogene Methylation
Signature in Non-G-CIMP GBMs. Using the calculated cutoff above, we divided all patients from the discovery cohorts into low-risk and high-risk groups. Both pooled analysis of patient-level data and risk classification for each discovery cohort showed that OS of patients with high-risk tumors was significantly shorter than patients with low-risk tumors (Figure 1(b)). Similar results were observed for OS and PFS with an independent French cohort (Figure 1(c)). Cox regression analyses confirmed this signature as an independent risk indicator for RT/TMZ-treated patients (Table 2). Conversely, when RT monotherapy-treated cohorts were examined, we found that high-risk patients appeared to be associated with longer OS than low-risk patients (Figure 1(d)). Meta-analysis and Cox regression models both reported inverse prognostic correlations among patients with different treatments (Figure 1(e) and Table 2). These data suggested that our pseudogene methylation signature may not be a potent prognostic indicator for general GBM prognosis, but may be a predictive indicator for the survival benefits of additional TMZ treatment.

The Predictive Value of the Pseudogene Methylation
Signature for TMZ Response. To investigate whether our pseudogene methylation signature could provide predictive information about tumor response to TMZ, subgroup analyses were carried out between the risk and treatment subgroups. Patient baseline characteristics, such as age, gender, and MGMT methylation status, did not appear to differ from between above subgroups (data not shown). Subgroup analyses showed that low-risk patients benefited from RT/ TMZ treatment over RT monotherapy (Figure 2(a)); however, OS differed very little between treatment types in high-risk patients (Figure 2(b)). Meta-analysis and Cox regression model confirmed those findings (Figure 2(c) and Table S1). The data indicated that our pseudogene methylation signature may predict TMZ efficacy and help identify patient subpopulations likely to benefit from TMZ treatment.

Patient Classification in Clinically and Molecularly
Stratified Subcohorts. To further evaluate the performance of the pseudogene methylation signature, we examined our risk classifications in a combined cohort from TCGA, GSE60274, and RAUH collectively, stratified by MGMT promoter methylation status and age. The 8-CpG signature was able to discriminate OS among patients with each MGMT promoter methylation status (Figures 3(a) and 3(b)) or within each age subgroup (Figures 3(c) and 3(d)). Moreover, AUC comparison showed that the signature had predictive value similar to MGMT promoter methylation status for patients of all ages treated with RT/TMZ and was superior to the MGMT-based approach for elderly populations (≥60 years; Figure 3(e)).

Clinical and Molecular Correlations of the Pseudogene
Methylation Signature in TCGA Samples. Correlation with known clinical and molecular features showed that the 8-CpG signature subgroups were not associated with age, gender, MGMT promoter methylation status, gene expression subtypes, and TMBs in TCGA samples (Figures 4(a) and  4(b)). GSEA on transcriptome data, however, revealed differential functional profiles between risk subgroups. In particular, high-risk tumors appeared to be more enriched for gene sets related to cell cycle regulation, DNA repair, and ncRNA processing (Figure 4(c) and Table S2).

A Preliminary Experimental Study of Two
Pseudogenes on TMZ Resistance in GBM Cells. CGGA RNA sequencing data showed that two locus-specific pseudogenes were differentially expressed between GBMs with the wild-type isocitrate dehydrogenase gene (IDHwt), a surrogate marker for the G-CIMP phenotype, and NTBs; specifically, ZNF767P was found to be upregulated and CLEC4GP1 downregulated in IDHwt GBMs ( Figure S2). Functional assays on these two pseudogenes showed that ZNF767P was relatively highly expressed in DBTRG-05MG cells and CLEC4GP1 in both DBTRG-05MG and U251 cells ( Figure S3). TMZ knockdown was associated with reduced activation of PAPR, a key mediator of base excision repair (BER) [19], while CLEC4GP1 knockdown resulted in decreased levels of mismatch repair (MMR) proteins such as MLH1 and MSH2 [19] (Figure 5(d)). Finally both pseudogenes had      Journal of Oncology significant impacts on NF-κB activation in DBTRG-05MG cells ( Figure 5(d)). These data show that disruption of these two pseudogenes led to a disruption of the molecular mechanisms associated with TMZ resistance.

Discussion
Pseudogenes, characteristic by high sequence similarity to functional parent genes, have long been regarded as "junk DNA" due to the presence of a variety of disabling mutations (e.g., insertions, deletions, and stop codons) that result in loss of function [20]. The advent of large-scale, pan-cancer studies has prompted many to reexamine the function of pseudogenes, highlighting their multifaceted roles in cancer biology [3][4][5]. Integrative analysis of multiomics data showed that pseduogenes can be transcribed and translated, and that pseudogenic RNA and protein can regulate the function of key cancer genes, including their parent genes [3][4][5]. An increasing number of transcribed or translated pseudogenes have been discovered with diagnostic, prognostic, and predictive potentials in cancers [3][4][5]. Those pesudo-gene biomarkers have many disadvantages, namely, over standard biomarkers, due to issues inherent with the examination of expression data, such as unreliable RNA sampling and unstable altered patterns [21], and issues specific to the detection of pseudogene expression, such as complexities in designing reliable expression profiling approaches to distinguish the expression of parent gene with high sequence similarity, difficulties in defining which pseudogenes are transcribed, and a high dependence on the quality of reference genome sequences and annotation [3][4][5] . Epigenetic marks and DNA methylation, in particular, represent critical layers of control of pseudogene transcription [22]. DNA methylation can, therefore, provide information about pseudogene transcription while avoiding issues with genetic sequence similarities. Pseudogene methylation information has many advantages over expression information, since DNA sampling is reliable, patterns are stable, and drug-induced changes are reversible [23]. Biomarkers that take advantage of pseduogene methylation patterns can often, therefore, be more practical and informative. In the present study, we constructed a RISK-score signature based  9 Journal of Oncology on the methylation pattern of eight pseudogene-related CpGs by employing a multistep selection pipeline. The pseudogene methylation signature was found to exhibit opposite prognostic correlations for patients treated with RT/TMZ vs. RT monotherapy. Specifically, a high-risk score may be indicative of a poor outcome in RT/TMZ-treated patients, but a better outcome in those treated with RT alone. Subgroup analyses highlighted a predictive potential when making decisions about TMZ usage; the addition of TMZ appeared to be beneficial for low-risk but not high-risk counterparts. Correlation analyses indicated that the pseudogene methylation signature may not be an alternative manifestation of known clinical or molecular characteristics. Together, these data suggest that our pseudogene methylation signature may serve as a novel and promising predictive biomarker for TMZ response in non-G-CIMP GBMs, rather than as a general, treatment-independent prognostic biomarker [24].  Table S2.   The most informative biomarker for predicting TMZ outcome is the promoter methylation status of MGMT [25]. However, it is unlikely that TMZ would be withheld in patients with unmethylated MGMT promoters, since there is a lack of effective alternative therapies, and this agent has some benefits for this subpopulation [26]. Therefore, the use of MGMT promoter methylation as a biomarker for TMZ response may have limited value. The use of TMZ is also controversial for elderly subpopulations which are characterized by general poor health, reduced tolerance to anticancer therapy, and increased expectation for a better quality of life [27]. In fact, TMZ may not be a cost-effective option for GBM in health resource-limited countries, such as China [28]. The overuse of TMZ could result in overconsumption of health resources, raise medical cost to families and caregivers, and increase risk of drug toxicity. Selecting patients that are likely to respond well to TMZ and have favorable prognostic biomarkers may represent an effective approach for optimizing TMZ usage and increasing cost-effectiveness of treatment [26,27]. The data reported here show that the pseudogene methylation signature could provide more refined risk classification in subpopulations determined by age or MGMT methylation status and may be helpful in identifying subsets of elderly patients, or