RNA-seq Reveals the Overexpression of IGSF9 in Endometrial Cancer

We performed RNA-seq on an Illumina platform for 7 patients with endometrioid endometrial carcinoma for which both tumor tissue and adjacent noncancer tissue were available. A total of 66 genes were differentially expressed with significance level at adjusted p value < 0.01. Using the gene functional classification tool in the NIH DAVID bioinformatics resource, 5 genes were found to be the only enriched group out of that list of genes. The gene IGSF9 was chosen for further characterization with immunohistochemical staining of a larger cohort of human endometrioid carcinoma tissues. The expression level of IGSF9 in cancer cells was significantly higher than that in control glandular cells in paired tissue samples from the same patients (p = 0.008) or in overall comparison between cancer and the control (p = 0.003). IGSF9 expression is higher in patients with myometrium invasion relative to those without invasion (p = 0.015). Reanalysis of RNA-seq dataset from The Cancer Genome Atlas shows higher expression of IGSF9 in endometrial cancer versus normal control and expression was associated with poor prognosis. These results suggest IGSF9 as a new biomarker in endometrial cancer and warrant further studies on its function, mechanism of action, and potential clinical utility.


Introduction
Precision medicine calls for better characterization of diseases at molecular level. As the cost of next-generation sequencing (NGS) technology becomes increasingly affordable, more and more efforts have been invested on genome wide observation of genetic changes and gene expression profiles in cancer. NGS-based RNA-sequencing (RNA-seq) is an established and effective screening method in identifying new biomarkers and better understanding cancer biology [1]. Nationwide and international consortia like The Cancer Genome Atlas (TCGA) and the International Cancer Genomics Consortium (ICGC) have generated tremendous amounts of new data and new insights regarding the molecular landscape of many major types of cancer using various high throughput profiling technologies including RNA-seq [2][3][4].
Endometrial cancer of the uterus is responsible for about 74,000 deaths of women each year worldwide [5,6]. It is the most common gynecological malignancy in America and other western countries. Among the histologic subtypes, endometrioid carcinoma is the most prevalent, accounting for 75%-85% of all patients. Other types, like serous, mucinous, clear cell and squamous cell carcinomas, are much less common. According to the data from SEER (Surveillance, Epidemiology, and End Results) program at the National Cancer Institute (https://seer.cancer.gov/), although the current survival rate of women with endometrial cancer 2 Journal of Oncology is as high as 81%, annual deaths from endometrial cancer still numbered over 10,000 patients in the United States alone. Furthermore, for over three decades, the age-adjusted death rate of endometrial cancer in the general population has not decreased. Thus, it is imperative to better characterize the molecular details of endometrial cancer using the latest high throughput profiling technology to identify new biomarkers that may enhance our understanding of disease progression and may impact clinical management of this cancer.
The largest effort in interrogating the transcriptome of endometrial cancer is from TCGA [7]; however both the initial comprehensive report from TCGA on endometrial cancer and the follow-up report by others who focused on the RNA-seq data of endometrial cancer in TCGA did not provide transcriptome-wide information on the differentially expressed genes in the scenario of endometrial cancer versus normal endometrial tissue [7,8]. Another effort by Xiong et al. examined the transcriptome profiles of endometrial cancer versus adjacent noncancer tissue, but with 3 patients only [9]. In the current study, we screened paired human endometrioid carcinoma tissues and the adjacent nontumor tissues from 7 patients using RNAseq technology and compared their gene expression profiles. Further validation was performed by quantitative PCR and immunohistochemistry. Focusing on the overexpressed gene IGSF9, we explored its associations with clinicopathologic characteristics in endometrioid endometrial carcinoma.

Human Tissue Specimens.
Two different approaches were used in human tissue specimen procurement. For RNA-seq, freshly harvested, RNAlater-preserved cancer and adjacent noncancer tissues were obtained from 7 patients through frozen section rooms at The Medical Foundation (South Bend, IN). For the follow-up immunohistochemistry study, formalin-fixed and paraffin-embedded (FFPE) blocks from a total of 56 patients were obtained from The Medical Foundation surgical archive. All specimens were collected during the period from 2013 to 2015 via the Harper Cancer Research Institute Tissue Biorepository project. All collection of human tissues from surgically removed uterine endometrial cancer specimens were conducted by The Medical Foundation pathologists. Pathologic diagnoses and classification were initially made by the pathologist on-duty and further reviewed by board-certified pathologists. Informed consent was obtained from all patients. Protocols were approved by the University of Notre Dame Institutional Review Board (IRB) and local hospital IRBs.

RNA-Sequencing and Data
Processing. For RNA preparation, ∼30 mg tissue from each specimen was homogenized using an Omni-TH tissue homogenizer (OMNI International, Kennesaw, GA) in prechilled lysis buffer from the AllPrep RNA/DNA/protein mini kit (Qiagen, Germantown, MD) according to the manufacturer's instructions. RNA elution was then quantified with a Nanodrop spectrophotometer (Thermo Fisher) and further analyzed by BioAnalyzer (Agilent) for quality control. Samples with RIN > 7 were used in RNA-seq library preparation by the Notre Dame Genomics Core Facility. RNA-seq libraries were prepared with nonstranded TruSeq RNA kit from Illumina (San Diego, CA) and sequenced on a HiSeq 4000 sequencer (BGI Hong Kong). The reads generated were paired-end and of length 100-nt. Raw reads were cleaned up by BGI with SOAPnuke to remove adapters and low-quality reads. All other command line software tools were installed and used on the high performance computer cluster at University of Notre Dame Center for Research Computing. FastQC (version 0.11.2) was used to examine the read quality. No further read trimming or filtering was performed. Splicing aware mapping tool, STAR (version 2.4.2a) was used for alignment in 2-pass mode. The resulting BAM files were examined by Qualimap (version 2.2) and then used to count reads per gene with the htseq-count function from HTSeq software (version 0.6.1).
R/Bioconductor packages including DESeq2 were used for gene expression analysis [10][11][12]. Dealing with paired samples, we used a multifactor design which includes the sample information by putting the condition of interest (cancer or noncancer) at the end of the design. When the data was treated as from two independent groups, only single factor design was used. For comparison, we also downloaded and reanalyzed the raw RNA-seq data published by Xiong et al. [9] from SRA (NIH Sequence Read Archive) with our own data processing protocol as mentioned above. To identify the most overrepresented biological terms and genes, we made use of the gene classification tool from DAVID Bioinformatics Resources 6.8 [13] (https://david.ncifcrf.gov). High classification stringency was used.
To access tumor subgroup gene expression and survival data in The Cancer Genome Atlas (TCGA) project, we used the web interface of UALCAN [14] and part of the processed data from UALCAN. UALCAN uses TCGA level 3 RNA-seq and clinical data from 33 cancer types. The expression level was normalized as transcripts per million reads (TMP) for comparison across groups and individuals.

Quantitative PCR.
Taqman5 gene expression assays were used in quantitative PCR. The primers and probe set for the IGSF9 gene and reference control gene ACTB were bought from Life Technologies (Carlsbad, CA). RNA sample preparation and cDNA synthesis used TRIzol reagent and SuperScript II reagent, respectively, from Life Technologies. The q-PCR reaction was run on an ABI StepOnePlus real time thermal cycler (Life Technologies). Quantification was calculated with the ΔΔCt method using control samples and the expression of the reference gene ACTB for normalization [15].

Image Analysis and Quantification.
All immunohistochemistry slides were scanned and analyzed with the digital pathology system, including Aperio ScanScope CS, eSlide manager, and image analysis tools, made by Leica Biosystem (Chicago, Illinois). Quantification of IGSF9 used a macro built from the pixel counting algorithm (version 1), which reports the percentage of total number of positively stained pixels over total number of pixels analyzed. Cellular quantification nuclear stain algorithm (version 9) was used to make a macro to count positive cell percentage of total number of tumor cells analyzed in Ki67 immunostains.
2.6. Statistical Analysis. Rstudio and related R packages were used to perform all the statistical analyses involving RNAseq data downstream processing, clinical pathological data, and IHC results. Significance level was set at value < 0.05, while in screening the long gene lists from RNA-seq, adjusted values were described in the results. Principle component analysis was used in analyzing the variance in overall RNAseq counts. Wald test was used when comparing differential gene expression as implemented in DESeq2 package. When comparing means of two groups in clinical and IHC, nonparametric method Wilcoxon rank sum test (also known as Mann-Whitney test) was used. Correlation tests used Pearson's method as implemented in base R and the psych package. Survival analysis used Kaplan-Meier method and log-rank test.

Gene Expression Profiles of Endometrioid Endometrial
Carcinoma and Benign Adjacent Tissues via RNA-seq. RNAseq was performed for a total of 7 patients with both cancer tissues and adjacent benign endometrial tissues available. Patient ages ranged from 48 to 78. All specimens were from the same major histological type, endometrioid endometrial carcinoma; two patients were at histological grade I and five at grade II, all at clinical stage I (FIGO, 2009). On average, 17.2 million reads were produced from each of the 14 libraries. Sequencing read quality was examined by FastQC, all with a Phred score above 20. Alignment or mapping quality metrics as produced by Qualimap over the output BAM files from alignment with STAR are listed in Suppl. Table 1.
While exploring the gene count data, plotting the top two components from principle component analysis did not reveal any clustering trend in the 7 cancer samples versus the 7 controls (Figure 1(a)). Using DESeq2 as the main tool, we sought to find the differential expression of genes in cancer versus the control with two different approaches: (1) using the pairing information in the samples, (2) without considering the pairing information.
With the first approach, when comparing cancer tissues with noncancer controls, a total of 665 genes (357 up and 308 down) were identified as differentially expressed at the statistical significance level of adjusted (for multiple comparison) value < 0.05. The entire list of differentially expressed genes is given in Suppl. Table 2. On the top of gene  list ranked by ascending adjusted value there are EGR1,  FOSB, PTPRT, SELE, FOS, ENPP2, TFPI, ZFP36, and others. Those with adjusted value < 0.01 and log2 fold change (L2FC) >2 were labeled in the volcano plot in Figure 1(b). TUBB3 is the most increased (L2FC = 2.177). With a cutoff at adjusted value < 0.01, we obtained a list of 66 genes ( Figure 2), which was used for analysis with NIH DAVID as described below.
Using the second method, the 14 samples were treated as two independent groups, cancer versus control, with pairing information ignored, then only 4 genes were found to be differentially expressed at adjusted value < 0.05, including IGSF9, c10orf35, and ZNF710 genes overexpressed and the HHATL gene downregulated in cancer versus control. The normalized counts of these 4 genes were plotted in Figures 1(c)-1(f) and the full list of genes is given in Suppl. Table 3. Reanalysis of the RNA-seq study by Xiong et al. on paired cancer and normal tissues from 3 endometrial cancer patients produced another list of differentially expressed genes (Suppl. Table 4) [9].
With the aforementioned list of top 66 genes that were taken from the differentially expressed genes cutoff at adjusted value < 0.01, we performed the analysis with the gene function classification tool from NIH DAVID [17], which has the capability of addressing the enriched and redundant relationships among many-genes-to-many-terms and can reduce the complexity of gene list. Out of the 66 genes, 5 genes, that is, VSIG2, PTPRT, PRTG, IGSF9, and PTPRF (Figure 3), were found to share functional similarity and form the only enriched cluster, with an enrichment sore of 0.96. Among them, IGSF9 is the one that increased most in terms of log2 folds. This result from the gene function classification assay with DAVID plus the short list of differentially expressed genes we obtained when treating the samples as 2 independent groups prompted us to further evaluate the expression of IGSF9 with immunohistochemistry on human endometrial cancer tissues.

IGSF9 Expression in Endometrial
Tissue and Endometrioid Carcinoma. The first study on IGSF9 expression was on the developing nervous system in mouse and human [18]; however to date there are no published data detailing IGSF9 expression in endometrial tissue. We first performed quantitative PCR to confirm the differential expression of IGSF9 in cancer versus noncancer endometrial tissues. With 3 paired RNA samples from the RNA-seq project, the relative expression in cancer tissue was found to be over 6 folds compared to adjacent tissue (Figure 4(a)). IGSF9 is also one of differentially expressed genes identified by our reanalysis of the RNA-seq data from Xiong et al. For side-by-side comparison, we plotted their normalized counts (Figure 4(b)).
As further characterization of IGSF9 expression in relation to cancer microscopic features requires immunohistochemistry, we evaluated IGSF9 expression at protein level

Correlation of IGSF9 Expression with Clinical Pathologic
Features. The association of IGSF9 expression with clinicopathologic characteristics of these patients was systematically examined in our collection of 56 patients with endometrial cancer, which were aged from 39 to 94 years, with a median of 64. All patients were diagnosed as endometrioid carcinoma, 28 at grade I, and 28 at grade II in histologic grading (    (Figure 6(a)). Comparison of the all the cancer tissues with the 8 control tissues showed a similar trend (Figure 6(b)) with statistical significance ( value = 0.002 by the Wilcoxon test). A trend was observed in which IGSF9 positivity is higher in histological grade II (G2) relative to grade I (G1) (Figure 6(c)), but the results are not statistically significant. No association was found in relation to clinical staging and lymphovascular invasion (Figures 6(d) and 6(e)). Interestingly, however, statistically significant difference was obtained when comparing the IGSF9 positivity between patients without any degree of myometrial invasion and those with myometrial invasion (0.151 versus 0.395; value = 0.013 by the Wilcoxon test) (Figure 6(f)).
The correlation of IGSF9 positivity with other continuous variables, such as age, tumor size in cm, depth of myometrium invasion as percentage of total myometrium thickness, and Ki67 labeling index were plotted in Figure 7. IGSF9 expression has somewhat of a correlation with Ki67 labeling index, but not statistically significant. However the positive correlation of tumor size with depth of myometrium invasion (coefficient = 0.412, value = 0.002) and that of Ki67 with depth of myometrium invasion (coefficient = 0.360, value = 0.006) were statistically significant.

IGSF9 Expression Status in RNA-seq Data from TCGA.
As a way of in silico validation, we surveyed the expression of IGSF9 in TCGA RNA-seq dataset via the web portal of UALCAN [14]. As of November 2017, normalized RNA expression data from 33 types of cancer are available, but not all have proper amount of normal controls for comparison. IGSF9 had higher expression in uterine corpus endometrial cancer (UCEC) tissues and 7 other types of cancers compared to their corresponding normal controls ( Table 2). In contrast, Journal of Oncology IGSF9 expression is lower in cancer versus normal control in rectum adenocarcinoma and colon adenocarcinoma ( Table 2).
UALCAN contains reprocessed RNA-seq data from 546 UCEC tissues and 35 normal endometrial tissues. Out of that, 23 patients provided both cancer and normal tissues. IGSF9 mRNA expression was found to be higher in both the comparison of all cancer versus normal control ( < 0.001 by Wilcoxon test) and comparison of paired cancer versus normal from the same patients ( < 0.001 by Wilcoxon test) (Figures 8(a) and 8(b)).
A total of 543 endometrial cancer patients with RNAseq and overall outcome data were used for Kaplan-Meier survival analysis. Overexpression of IGSF9 (top 25%) was associated with poor survival (Figure 8(c), log-rank test, = 0.017). A statistically significant association of IGSF9 overexpression and poor overall survival was also observed in thymoma, skin cutaneous melanoma, and brain lower grade glioma ( Table 2).

Discussion
To find differentially expressed genes between groups of specimens, RNA-seq is a very powerful tool as it is high throughput and reliable in quantification [10,20]. For gene expression profiling with RNA-seq, TCGA has data from the largest number of patients [2]. The reports and datasets from TCGA are a rich resource for evaluation of genetic variations or molecular markers of endometrial cancer. Based on their comprehensive findings on somatic copy number alterations and exome sequencing, a new molecular classification system was proposed for endometrial cancer [7]. However, limited evaluation of the mRNA expression data was provided [7,8]. By design, the raw RNA-seq data and clinicopathologic characteristics in all the TCGA projects are open to researchers who are interested in further analyses. Noticeably, many secondary online resources based on TCGA datasets (e.g., the UALCAN website) have recently appeared and greatly eased the use of these data [14,21]. The report by Dellinger and colleagues utilized TCGA data to show that L1CAM is an independent predictor of poor survival in endometrial cancer [22]. The lack of information from normal tissues or benign lesions in RNA-seq data from the initial TCGA endometrial cancer publication piqued our interest and others as well. Xiong and colleagues used RNA-seq to evaluate both mRNA and microRNA expression profiles in three patients with both cancer and noncancer tissues [9]. In addition to mRNA and microRNA expression profiles, their study also performed genetic variation calling from RNA-seq data, which is a feasible but less established approach [23].
In the current study, our RNA-seq with paired cancer and control tissues from 7 endometrioid carcinoma patients discovered a unique set of differentially expressed genes. For example, our reanalysis of Xiong's data identified WISP2 (CCN5) (log2 fold change −7.12; Suppl. Table 4) as a highly differentially expressed gene. It is reported to be a tumor suppressor; the deficiency of WISP2 promotes breast cancer growth [24] and WISP2 RNA expression was decreased in 79% of human colon cancers [25], but no studies of WISP2 in endometrial cancer have been reported. The TUBB3 gene is the most increased gene on our list of genes (Figure 1(b)) from our own patient materials. TUBB3 encodes class IIItubulin and its overexpression has been linked to resistance to paclitaxel and correlated with poor survival in ovarian, breast, gastric, and non-small-cell lung cancers and unknown primary tumors [26,27]. This finding was not confirmed in endometrial cancer and class III -tubulin expression was also not correlated with clinicopathologic characteristics [27,28].
Facing the long list of differentially expressed genes, tools for functional profiling are often used to extract biological meaning from such data. RNA-seq specific analysis packages like GOseq and SeqGSEA are available [29]. Our use of the gene functional classification tool in the NIH DAVID Journal of Oncology package and our top 66 genes provided only one cluster of genes, and treating our paired samples as two independent groups further narrowed the gene list. As IGSF9 is the only common gene from both analyses, we focused on IGSF9 and explored the association of IGSF9 protein expression with clinicopathological characteristics in endometrioid endometrial carcinoma.
IGSF9 was first cloned and characterized as a member of the immunoglobulin superfamily expressed in a wide variety of tissues in human and mouse [18]. Structurally, it contains five Ig domains, two fibronectin III domains, a single transmembrane domain, and an intracellular C-terminal and it was found to be related to dendrite arborization in the brain [30]. To date, there are no published reports on IGSF9 in any type of cancer tissue. Existing studies of IGSF9 are very limited (10 papers in PubMed as of January 2018). Our study is the first to investigate the association of its expression with cancer, but its expression and mutation profiles could be assessed via searching online databases, such as the Human Protein Atlas project (http://www.proteinatlas.org), GTEx (https://GTExPortal.org), and the Catalog of Somatic Mutations in Cancer (COSMIC, http://cancer.sanger.ac.uk/cosmic).
The expression pattern as revealed by our immunohistochemical analyses in endometrioid carcinoma confirms the localization of IGSF9 on the cellular membrane and in the cytoplasm. We confirmed the differential expression of IGSF9 in cancer versus noncancer tissues with materials from a cohort of 56 patients. The expression of IGSF9 is higher in cancer than that in normal or benign endometrial tissue. The positivity of IGSF9 is also higher in cancer with myometrium invasion than those without, likely an indicator of cancer aggressiveness. However, as these observations were based on the small and retrospective cohort of patients available, further studies on larger and prospective cohorts were needed. Analysis of the expression data from TCGA as we surveyed with UALCAN provided the necessary support. These data confirmed that IGSF9 is overexpressed in cancerous versus normal endometrial tissue as well as in 7 additional major cancer types as well. Interestingly we observed the opposite relationship in colorectal cancer, wherein IGSF9 was expressed at lower levels in cancer versus normal, suggesting a topic to be further studied.
A limitation of the current study is our lack of access to survival data for our local cohort of 56 patients. However, analysis of publicly available TCGA RNA-seq data via UALCAN provided further support for our initial finding and enabled survival analyses. Overexpression of IGSF9 is an indicator of poor prognosis in endometrial cancer. Moreover, Notes. (1) The number of normal versus tumor samples, except for LGG. (2) "Up" or "Down" indicates significant expression level changes, tumor versus normal; dash indicates no significant change. (3) Kaplan-Meier analysis; "Yes" indicates that higher expression of IGSF9 is related to poor overall survival.
IGSF9 overexpression is also associated with poor prognosis in several other cancer types, including brain lower grade glioma, skin cutaneous melanoma, and thymoma, suggesting that the significance of IGSF9 in cancer pathology is likely not limited to endometrial cancer.

Conclusions
We identified via RNA-seq a list of differentially expressed genes in endometrioid endometrial carcinomas versus noncancer controls. Focusing on IGSF9, an adhesion molecule that has not been characterized in connection with cancer pathology, we confirmed the overexpression of IGSF9 in endometrial cancer and found an association with myometrial invasion and poor outcome. These studies provide preliminary support for consideration of IGSF9 as a new biomarker in endometrial cancer with potential utility in diagnostics, prognostics, and therapeutics. Further studies on the role of IGSF9 in cancer pathology are warranted.

IGSF9:
Immunoglobulin Super Family 9 NGS: Next-generation sequencing RNA-seq: RNA-sequencing SEER: Surveillance, Epidemiology, and End Results ICGC: International Cancer Genomics Consortium TCGA: The Cancer Genome Atlas FFPE: Formalin-fixed and paraffin-embedded IHC: Immunohistochemistry Q-PCR: Quantitative Polymerase Chain Reaction COSMIC: Catalog of Somatic Mutations in Cancer.

Ethical Approval
Protocols were approved by the University of Notre Dame Institutional Review Board (IRB).

Consent
Informed consent was obtained from all patients.

Conflicts of Interest
The authors declare that they have no conflicts of interest.