Prostate cancer is a clinically and biologically heterogeneous disease. Deregulation of splice variants has been shown to contribute significantly to this complexity. High-throughput technologies such as oligonucleotide microarrays allow for the detection of transcripts that play a role in disease progression in a transcriptome-wide level. In this study, we use a publicly available dataset of normal adjacent, primary tumor, and metastatic prostate cancer samples (GSE21034) to detect differentially expressed coding and non-coding transcripts between these disease states. To achieve this, we focus on transcript-specific probe selection regions, that is, those probe sets that correspond unambiguously to a single transcript. Based on this, we are able to pinpoint at the transcript-specific level transcripts that are differentially expressed throughout prostate cancer progression. We confirm previously reported cases and find novel transcripts for which no prior implication in prostate cancer progression has been made. Furthermore, we show that transcript-specific differential expression has unique prognostic potential and provides a clinically significant source of biomarker signatures for prostate cancer risk stratification. The results presented here serve as a catalog of differentially expressed transcript-specific markers throughout prostate cancer progression that can be used as basis for further development and translation into the clinic.
Alternative splicing is a fundamental cellular process by which a multiexon gene generates different transcripts from the same primary sequence, thereby increasing functional diversity of the expressed genome. The central dogma of “one gene, one mRNA, and one protein” is outmoded as our understanding of the ubiquitous nature of gene splice variation; its complexity throughout normal development, cell differentiation, and in disease is better understood [
The biological and clinical significance of differential expression of isoform variants is illustrated, for example, by the bcl-2 apoptotic gene family member bcl-x [
Recent advances in genome annotation and high-throughput technologies have led to the design of splicing-specific microarrays (e.g., exon, exon-junction, and tiling arrays) and RNA-sequencing (RNA-Seq), which allow transcriptome-wide expression profiling of coding and non-coding transcripts. While RNA-Seq is the technology of highest (i.e., single basepair) resolution and is especially powerful for the discovery of specific splice variants and novel transcripts, its utility in routine clinical testing remains to be proven. High-density microarrays, on the other hand, are already established tools in routine clinical testing (e.g., use of paraffin-embedded solid tumor specimens) [
The publically available genomic and clinical data was generated as part of the Memorial Sloan-Kettering Cancer Center (MSKCC) Prostate Oncogenome Project, previously reported by Taylor and colleagues [
The normalization and summarization of the 179 microarrays were done with the frozen Robust Multiarray Average (fRMA) algorithm using custom frozen vectors [
The normalized and summarized data was partitioned into three groups. The first group contains the samples from primary localized prostate cancer tumor and normal adjacent samples (used for the normal versus primary comparison). The second group contains all the samples from metastatic tumors and all the localized prostate cancer specimens (used for the primary versus metastasis comparison). The third group corresponds to all samples from metastatic tumors and all the normal adjacent samples (used for the normal adjacent versus metastasis comparison).
Using the xmapcore R package [
PSRs annotated as “unreliable” by the xmapcore package [
The multiple testing correction was applied using the p.adjust function of the stats package in R.
For any given transcript with two or more transcript-specific PSRs significantly differentially expressed the one with lowest
A
Biochemical recurrence endpoint is used as defined by the “BCR Event” column of the supplementary material provided by Taylor and colleagues [
The list of differentially expressed genes was queried for previously reported association with prostate cancer by two means: (i) using E-utils PubMed Search; a gene is found associated with prostate cancer if it presents one or more hits in PubMed using the official gene symbol or any of the aliases in addition to the phrase “prostate cancer” found within the title or abstract and (ii) using a previously reported set of genes known to be differentially expressed in prostate cancer [
Additionally, evidence for androgen regulation was pursued using the Androgen Responsive Gene Database, ARGDB [
High-density Affymetrix human exon (“HuEx”) microarrays provide a unique platform to test the differential expression of the vast majority of exonic regions in the genome. Based on Ensembl v62 and xmapcore [
In this study, we use the publicly available HuEx data set generated as part of the MSKCC Prostate Oncogenome Project [
Assessment of the defined set of TS-PSRs yielded 881 transcripts differentially expressed between any pairwise comparison of normal adjacent, primary tumor, and metastatic samples (see Section
Venn diagram distribution of differentially expressed transcripts across pairwise comparison. N versus P: normal adjacent versus primary tumor comparison. P versus M: primary tumor versus metastatic sample comparison. N versus M: normal adjacent versus metastatic sample comparison.
Interestingly, 371 (42%) of the differentially expressed transcripts are non-coding. Inspection of their annotation reveals that they fall into several non-coding categories, the most frequent being “retained_intron” (
In addition to the ncRNA genes, various coding genes present one or more non-coding transcripts differentially expressed. Many of these genes have been shown to be involved in prostate cancer and present evidence of androgen regulation (Table
Androgen-regulated genes known to play a role in prostate cancer with non-coding transcripts differentially expressed. All these genes present evidence of being androgen sensitive, based on ARGDB.
Gene | Transcript | Comparison |
---|---|---|
ABCC4 | ABCC4-002 | NvsP |
ABCC4-004 | NvsP PvsM | |
ACADL | ACADL-001* | PvsM NvsM |
ACADL-004 | PvsM | |
ACPP | ACPP-001* | PvsM |
ACPP-005 | PvsM NvsM | |
ADAMTS1 | ADAMTS1-001* | NvsM |
ADAMTS1-002 | NvsM | |
ADAMTS1-003* | NvsM | |
ANO7 | ANO7-006 | PvsM NvsM |
ANO7-007 | PvsM NvsM | |
ANXA1 | ANXA1-001 | PvsM NvsM |
ANXA1-005 | PvsM NvsM | |
AR | AR-001* | NvsM |
AR-005 | PvsM NvsM | |
AR-203* | NvsM | |
BNC2 | BNC2-001 | NvsM |
BTG3 | BTG3-005 | PvsM NvsM |
BTG3-006 | NvsM | |
CACNA1C | CACNA1C-016 | NvsM |
CACNA1C-018* | NvsM | |
CACNA1C-201* | NvsM | |
CACNA1D | CACNA1D-004* | NvsM |
CACNA1D-006 | PvsM NvsM | |
CACNA1D-007* | PvsM NvsM | |
CACNA1D-201* | PvsM NvsM | |
CALD1 | CALD1-005* | PvsM NvsM |
CALD1-008 | PvsM NvsM | |
CALD1-012* | PvsM | |
CD40 | CD40-005 | NvsP NvsM |
CD40-201* | NvsM | |
CD44 | CD44-014 | NvsM |
CEACAM1 | CEACAM1-004* | PvsM NvsM |
CEACAM1-010 | PvsM NvsM | |
COL1A2 | COL1A2-002 | NvsM |
COL1A2-005 | NvsM | |
COL1A2-006 | NvsM | |
COL1A2-012 | NvsM | |
DPP4 | DPP4-001* | PvsM NvsM |
DPP4-006 | PvsM NvsM | |
DST | DST-006 | NvsM |
DST-010* | PvsM NvsM | |
DST-015* | PvsM NvsM | |
DST-032 | NvsM | |
FBLN1 | FBLN1-001* | PvsM NvsM |
FBLN1-016 | NvsM | |
FGFR1 | FGFR1-005 | NvsM |
FGFR2 | FGFR2-008 | NvsP PvsM NvsM |
FGFR2-016* | NvsP PvsM NvsM | |
FGFR2-201* | PvsM NvsM | |
GOLM1 | GOLM1-008 | NvsP |
GSN | GSN-011 | PvsM NvsM |
HSPA8 | HSPA8-008 | PvsM |
HSPA8-013 | PvsM NvsM | |
HSPA8-025 | PvsM | |
IFI16 | IFI16-003* | PvsM NvsM |
IFI16-008 | PvsM NvsM | |
INSIG1 | INSIG1-004 | NvsM |
IRS1 | IRS1-001* | PvsM NvsM |
IRS1-002 | NvsM | |
KHDRBS3 | KHDRBS3-003 | PvsM |
MAT2A | MAT2A-012 | NvsP |
MME | MME-001* | PvsM NvsM |
MME-003* | NvsM | |
MME-010* | NvsM | |
MME-011* | NvsM | |
MME-013 | NvsM | |
NAMPT | NAMPT-006 | NvsP PvsM |
NAMPT-007 | NvsP PvsM | |
NAMPT-008 | PvsM | |
NAMPT-009 | PvsM | |
NCAPD3 | NCAPD3-004 | PvsM |
NCAPD3-006* | PvsM NvsM | |
NCAPD3-011 | PvsM | |
NCAPD3-015 | PvsM NvsM | |
NCAPD3-016 | PvsM | |
PALLD | PALLD-015 | PvsM NvsM |
PART1 | PART1-001 | PvsM |
PBX1 | PBX1-003 | NvsM |
PDE4B | PDE4B-008* | PvsM |
PDE4B-016 | PvsM | |
PDE4D | PDE4D-005 | PvsM |
PDE4D-013 | PvsM NvsM | |
PDE4D-016* | PvsM NvsM | |
PDE4D-020* | PvsM NvsM | |
PDE4D-021* | PvsM | |
PDE4D-022 | NvsM | |
PDE4D-026 | NvsM | |
PDLIM5 | PDLIM5-010* | PvsM |
PDLIM5-017 | PvsM NvsM | |
PIK3R1 | PIK3R1-008 | NvsM |
PPP2CB | PPP2CB-003 | NvsM |
RAN | RAN-006 | PvsM NvsM |
SEMA3C | SEMA3C-001* | PvsM NvsM |
SEMA3C-008 | PvsM NvsM | |
SVIL | SVIL-004 | PvsM NvsM |
TBC1D1 | TBC1D1-005 | NvsM |
TBC1D1-010 | NvsM | |
TBC1D1-013 | NvsM | |
TGM4 | TGM4-001* | NvsP PvsM NvsM |
TGM4-008 | NvsP NvsM | |
THBS1 | THBS1-001* | PvsM |
THBS1-004 | PvsM | |
THBS1-008 | PvsM | |
TNC | TNC-002 | PvsM NvsM |
TNC-010* | PvsM NvsM | |
TPM2 | TPM2-002* | PvsM NvsM |
TPM2-003 | PvsM NvsM | |
TPM2-005* | PvsM NvsM | |
TSC22D1 | TSC22D1-004 | PvsM |
VCL | VCL-005 | PvsM NvsM |
VEGFA | VEGFA-005 | NvsM |
VEGFA-007 | NvsM | |
WSB1 | WSB1-003 | PvsM |
XBP1 | XBP1-005 | PvsM NvsM |
XRCC2 | XRCC2-002 | PvsM NvsM |
*indicates a protein-coding transcript. NvsP: normal adjacent versus primary tumor comparison. PvsM: primary tumor versus metastatic sample comparison. NvsM: normal adjacent versus metastatic sample comparison.
The set of non-coding transcripts in both coding and non-coding genes reported here add to the current stream of evidence showing that non-coding RNA molecules may play a significant role in cancer progression [
Overall, of the 680 genes with one or more transcripts found differentially expressed, 281 have a previously reported association to prostate cancer (see Section
The majority of the 881 differentially expressed transcripts originate from the comparison between normal adjacent and metastatic samples, in agreement with previous analyses of differential expression in the MSKCC dataset [
Transcripts found differentially expressed across all pairwise comparisons (top) and across normal versus primary tumor and primary tumor versus metastatic samples comparisons (bottom).
Transcript | Mean fold difference | ||
---|---|---|---|
P versus N | M versus P | M versus N | |
ACOT11-001 | 0.79 | 0.77 | 0.61 |
AOX1-001 | 0.79 | 0.56 | 0.44 |
C19orf46-002 | 1.24 | 1.23 | 1.53 |
C8orf84-001 | 0.76 | 0.75 | 0.57 |
COCH-202 | 0.76 | 0.83 | 0.63 |
CTA-55I10.1-001 | 0.83 | 0.68 | 0.56 |
DMD-024 | 0.74 | 0.82 | 0.60 |
FGF10-002 | 0.83 | 0.64 | 0.53 |
FGFR2-008 | 0.76 | 0.79 | 0.60 |
FGFR2-016 | 0.74 | 0.67 | 0.49 |
GABRE-006 | 0.79 | 0.83 | 0.66 |
GNAL-001 | 0.82 | 0.69 | 0.57 |
GNAO1-002 | 0.78 | 0.75 | 0.58 |
HEATR8-006 | 0.80 | 0.80 | 0.64 |
ISL1-002 | 0.80 | 0.81 | 0.65 |
NR2F2-202 | 0.82 | 0.82 | 0.68 |
PCP4-004 | 0.81 | 0.72 | 0.58 |
PDE5A-005 | 0.74 | 0.79 | 0.59 |
PDZRN4-202 | 0.80 | 0.71 | 0.57 |
RSRC2-017 | 1.27 | 1.28 | 1.63 |
TGM4-001 | 0.68 | 0.62 | 0.42 |
TSPAN2-001 | 0.80 | 0.77 | 0.61 |
| |||
ABCC4-004 | 1.35 | 0.81 | N.A. |
ALK-001 | 1.24 | 0.83 | N.A. |
ATP1A1-002 | 1.23 | 0.71 | N.A. |
NAMPT-006 | 1.34 | 0.73 | N.A. |
NAMPT-007 | 1.75 | 0.57 | N.A. |
RP11-627G23.1-004 | 1.38 | 0.78 | N.A. |
N: Normal, P: Primary and M: Metastatic samples. N.A.: not applicable.
For FGFR2 and NAMPT, the transcripts happen to be differentially expressed in the same direction as the tumor progresses, suggesting that both transcripts are functioning in a cooperative manner. In order to determine if this is a general pattern of the transcripts analyzed here, all the genes for which at least two transcripts presented differential expression were inspected (Figure
Heat map of genes with two or more transcripts differentially expressed across any pairwise comparison. Transcript names are provided as annotated in Ensembl. Heat map is colored according to median expression values for normal, primary and metastatic samples. “*” indicates that the transcript is protein-coding. Background indicates the expression value considered as background level based on control probe sets on the HuEx array.
A particularly interesting group of genes for detection of differential expression is the one for which all annotated transcripts for a given gene can be tested individually (Supplementary Figure 2). Of the 7,867 genes for which one or more transcripts were assessed in this analysis, 1,041 genes are such that all of their transcripts have at least one TS-PSR (Supplementary Table 2). Of these, 92 genes have at least one of their transcripts differentially expressed in any pairwise comparison between normal adjacent, primary tumor, and metastatic samples. As depicted in Figure
Heat map of genes for which all transcripts were assessed with one or more transcripts differentially expressed across any pairwise comparison. Transcript names are provided as annotated in Ensembl. Gene names are annotated based on their gene symbol. Heat map is colored according to median expression values for normal, primary and metastatic samples. “*” indicates that the transcript is protein-coding. “+” indicates significant differential expression of a given transcript or gene. Background indicates the expression value considered as background level based on control probe sets on the HuEx array.
In addition to the expression profile of each transcript for these 92 genes, Figure
In order to assess the prognostic significance of the differentially expressed transcripts, the corresponding TS-PSRs were used to train a K-nearest neighbor (KNN) classifier on normal adjacent and metastatic samples. This KNN classifier was subsequently validated on the primary tumor subset, such that each primary tumor sample was classified as “normal-like” or “metastatic-like” based on its distance to the normal and metastatic groups. As shown in Figure
Multivariable logistic regression analysis of transcripts and genes for prediction of BCR progression adjusted for Kattan nomogram.
Classifier | Transcripts | Genes | ||||
---|---|---|---|---|---|---|
OR | OR CI (95%) |
|
OR | OR CI (95%) |
| |
KNN positive** | 13 | [2.5–99] | <0.005 | 3.8 | [1.0–14.3] | 0.05 |
Nomogram* | 6.6 | [2.3–20] | <0.001 | 7.9 | [2.9–22.6] | <0.0001 |
**: metastatic-like. *: greater than 50% probability of BCR used as cut-off. OR: odds ratio. CI: confidence interval.
Kaplan Meier plots of primary tumor samples classified by KNN (“normal-like” versus “metastatic-like”) using the BCR endpoint. (a) Transcripts, (b) Kattan nomogram, and (c) genes. The blue curve indicates “metastasis-like” patients; the green curve indicates “normal-like” patients. For the nomogram a probability of greater than 50% for BCR was chosen to classify patients as “metastasis-like” or “normal-like.”
Transcriptome-wide detection of molecular markers for the development of better diagnostics and personalized medicine approaches have been facilitated by high-throughput technologies such as microarrays and more recently next-generation sequencing. Additionally, appreciation of the fact that most of the transcriptome is non-coding in both normal and cancer tissues [
Gene expression profiling efforts in prostate cancer have not yet become mainstream. The argument against the use of gene-based biomarker signatures is that, despite numerous efforts, none have been shown to perform significantly better than established clinical variables and predictive models such as nomograms. Here, we demonstrate that improved predictive models can be obtained in prostate cancer by leveraging the complexity of transcript-specific isoforms.
In this study, we show that HuEx arrays are populated with thousands of probe selection regions (PSRs) that hybridize to a specific transcript (TS-PSRs). Given their unambiguous nature, these TS-PSRs become a useful and reliable tool to test the differential expression of individual transcripts across benign and cancerous tissues. Still, some of these TS-PSRs could be hybridizing to more than one transcript if additional transcripts of a given gene exist but have not been discovered yet and, hence, are missing from the genomic annotation. Even though we focus our analysis on a subset of TS-PSRs that correspond to 22,517 transcripts (from 7,867 genes) and that shed light on the behaviour of two or more transcripts within the same gene, the same approach can be generalized to 49,302 transcripts corresponding to 34,599 genes. The 881 transcripts found differentially expressed across normal adjacent, primary tumor, and metastatic prostate samples from the MSKCC Oncogenome Project [
In addition to genes presenting multiple transcripts differentially expressed as well as genes for which each individual transcript was probed, this study demonstrates that transcript-specific differential expression provides a clinically significant and unique source of biomarker signatures for prostate cancer risk stratification. We demonstrate that these biomarker signatures segregate patients into groups with significant differences in BCR-free survival and are significant prognostic factors for BCR prediction in multivariable analysis after adjusting for established prognostic factors such as the Kattan nomogram [
More datasets with associated clinical outcome are needed to further validate these findings. However, the results presented here serve as a catalog of differentially expressed transcript-specific markers throughout prostate cancer progression continuum that can be used as a basis for further exploration of disease biology and translation into clinical practice as novel diagnostics and therapeutics.
N. Erho, C. Buerki, T. J. Triche, E. Davicioni, and I. A. Vergara are employees of GenomeDx Biosciences Inc.
The authors would like to thank all employees of GenomeDx Biosciences Inc. for valuable input regarding this project. This paper was supported in part by the National Research Council of Canada Industrial Research Assistance Program.