Next-Generation Sequencing of MicroRNAs for Breast Cancer Detection

It is reported that different microRNA (miRNA) profiles can be detected in the blood of cancer patients. We investigated that whether the key serum miRNAs could discriminate patients with and without breast cancer. This study was divided into three parts: (1) miRNA marker discovery using SOLiD sequencing-based miRNA profiling on cancerous and adjacent noncancerous breast tissue of one breast cancer patient; (2) marker selection and validation by real-time PCR on a small set of serum; (3) gene ontology analysis of the key miRNA target genes. Of genome-wide tissue miRNA expression analysis, five miRNAs were found to be altered more than fivefold by SOLiD sequencing (i.e., miR-29a, miR-23a, miR-23b, miR-192, and miR-21). All the five miRNAs were validated on the 20 breast cancer patients and 20 controls. miR-29a and miR-21 were significantly increased in the serum of breast cancer patients (P < .05). Gene ontology analysis of the target genes revealed enrichment for special biological process categories, that is, signal transduction, development, apoptosis, cell proliferation, and cell adhesion. SOLiD sequencing provides a promising method for cancer-related miRNA profiling. Serum miRNAs may be useful biomarkers for breast cancer detection.


Introduction
Recently, microRNAs (miRNAs), small noncoding RNAs of ∼22 nucleotides (nt) in length, have been implicated in several carcinogenic processes by acting either as tumor suppressors or oncogenes. Studies have shown that miRNAs expression profiles can classify human cancers [1][2][3][4][5][6][7]. Furthermore, some reports also suggest that cell-free circulating miRNAs existed in serum and plasma [8][9][10][11][12][13][14]. While cancerspecific miRNAs are important for the molecular basis of cancer, blood-based miRNAs as biomarkers could be more vital for early detection, diagnosis, and followup of cancer patients. Earlier work by comparing circulating miRNAs expression in diffuse large B-cell lymphoma (DLBCL) patients with healthy controls found that cancer can affect serum miRNA levels [8]. Later, Mitchell et al. screened multiple miRNA biomarker candidates (miR-100, miR-125b, miR-141, miR-143, miR-205, and miR-296) in serum collected from 25 individuals with metastatic prostate cancer and 25 healthy age-matched male control individuals and found that in the sera of human metastatic prostate cancer patients, miR-141 was very highly overexpressed and moderately correlated with prostate-specific antigen (PSA) level. The data showed that miR-141 levels could identify prostate cancer patients with high sensitivity and perfect accuracy [10]. Chen et al. employed the Solexa approach to sequence serum miRNA profiling of patients with nonsmall cell lung carcinoma (NSCLC). Compared with health subjects, 28 miRNAs were missing and 63 new miRNAs were detected in the lung cancer patients. More specifically, this study determined that miR-25 and miR-223 were highly expressed in lung cancer sera [12]. Ng et al. investigated whether plasma miRNAs could discriminate patients with and without colorectal cancer (CRC). Of the panel of 95 miRNAs analyzed by real-time PCR-based miRNA profiling, miR-92 differentiated CRC from gastric cancer, inflammatory bowel disease (IBD), and normal subjects, suggesting that miR-92 can be a potential noninvasive molecular marker for CRC screening [14]. Accordingly, it raises the possibility of using circulating miRNAs as novel blood-based molecular markers for cancer detection. In this study, we used a strategy of genome-wide screening by Applied Biosystem's next-generation sequencing system which is sequencing by oligonucleotide ligation and detection (SOLiD) to analyze the different genome-wide miRNA expression profiling in breast cancer. Our hypothesis is that some key miRNAs are detectable in both serum and breast cancer biopsies, which may be useful for breast cancer detection.

Patient Samples.
In part one (miRNA screening), miRNA profiles were generated from cancerous and adjacent noncancerous breast tissue from one patient by SOLiD sequencing. Two differential miRNA expression patterns were established. By comparing the genome-wide miRNA expression from cancerous and adjacent noncancerous breast tissue, upregulated miRNAs in breast cancer tissue were identified for further analysis in part two.
In part two (selection and validation of key miRNAs), serum was collected from a group of 20 breast cancer patients before undergoing operation and 20 age-matched controls, respectively. The selected miRNAs must be significantly elevated in 20 breast cancer patients.
Informed consent was obtained from participants for the use of blood and tissue samples in this study. This project was approved by Southeast University and Nanjing Medical University Clinical Research Ethics Committee, Nanjing, China. No patients received chemotherapy or radiotherapy before tissue and blood collection.

Samples
Processing and RNA Extraction. Tumor tissues were crashed into powders with liquid nitrogen, and immediately RNA ( 200 nt) was extracted using mirVana miRNA Isolation Kit (Ambion) following the manufacturer's instruction. The whole blood was centrifuged at 1600 rpm for 5 min, and serum was transferred into new tubes followed by further centrifugation at 12000 rpm for 10 min to completely remove cell debris. Total RNA containing small RNA was extracted from 500 μl of serum using Trizol LS reagent (Invitrogen) and miRNeasy Mini Kit (Qiagen) according to Ng EK's modified method [14].

MicroRNA Profiling by SOLiD Sequencing.
Libraries for SOLiD sequencing were prepared according to the manufacturer's protocol (Small RNA Expression Kit, Applied Biosystems). Briefly, miRNA samples (100 ng) were hybridized and ligated overnight with adapter mix, reverse transcribed, RNase H-treated, and PCR amplified. And PCR products were cleanup and selected on agarose gels by size 105-150 bp. Template bead preparation, emulsion PCR, and deposition were performed using the SOLiD V2 sequencing system (Applied Biosystems) at the State Key Lab of Bioelectronics Laboratory, Southeast University of China.

Sequence Analysis.
Mapping of SOLiD reads was analyzed by SOLiD system small RNA analysis pipeline tool (RNA2MAP). Firstly, alignment of three mismatches in the maximum length (18 nt) was identified, indicating initial seeds locations. And less than 6 mismatches were allowed in full-length mapping. To increase signal quality, those alignments beads reaching a minimum of 5 times in any of the libraries were conservatively selected. After calculating length distribution and filtrating by rRNA, tRNA, snRNA, and snoRNA, reads were mapped with miRBase (release 15.0 at http://microRNA.sanger.ac.uk/), and the total copy number of each sample was normalized to 100 000. Mann-Whitney test was used to determine the statistical significance of differences expression levels between breast cancer tissue and normal adjacent breast tissue. Fold change was calculated based on the normalized counts. A candidate that passes the three criteria is taken as a putative miRNA: (1) fold change >5, (2) at least 5 copies by SOLiD sequencing.

MicroRNAs Quantification by Real-Time PCR.
Micro-RNA quantification was performed by SYBR green qRT-PCR assay. In brief, serum RNA containing miRNA was polyadenylated by poly (A) polymerase and reverse transcribed to cDNA using miScript reverse transcription kit (Qiagen) according to the manufacturer's instructions. Realtime qPCR was performed using miScript SYBR Green PCR kit (Qiagen) with the manufacturer-provided miScript universal primer and the miRNA-specific forward primers in ABI PRISM 7300 real-time PCR system (Applied Biosystems). The cycle threshold (Ct) is defined as the number of cycles required for the fluorescent signal to cross the threshold in qPCR. ΔCt = Ct (miRNA) − Ct (RNU6B) . ΔΔCt = ΔCt (case) − ΔCt (control) . Fold change of gene was calculated by the equation 2 −ΔΔCt [15]. The miRNA-specific primer sequences were 5 -ACGCAAATTCGTGAAGCGTT-3 for RNU6B; other primer sequences were designed based on the miRNA sequences obtained from the miRBase database (http://microrna.sanger.ac.uk/). Each reaction was performed in a final volume of 20 μl containing 2 μl of the cDNA, 10 μl of 2× SYBR Green PCR Master mix, of 2 μl and 10× miScript universal primer and 10× miScript primer assay. The amplification profile was denaturation at 95 • C, 15 min, followed by 40 cycles of 94 • C, 15 s; 55 • C, 30 s, and 70 • C, 34 s. Each sample was run in duplicates for analysis.

Target Prediction and GO Analysis.
Predicted targets of miRNAs differentially expressed in this study were determined using mRBase targets (http://www.mirbase.org/ and http://pictar.mdc-berlin.de/). In addition, we used Capital-Bio Molecule Annotation System V3.0 to perform gene ontology (GO) analysis on the target genes and specific biological process categories were enriched.

Statistical
Analysis. The significance of serum miRNA levels was determined by Mann-Whitney, Kruskal-Wallis, or χ 2 test which was suitable. All P values were two sided and less than .05 was considered statistically significant. All statistical calculations were performed by the SPSS software (version 16.0) and GraphPad Prism 5 Demo software (GraphPad software, San Diego, Calif, USA).

3.1.
Patients. Twenty-one breast cancer patients (BC) and 20 controls (C) were included in the study. There were no significant differences of age among BC patients (53 ± 4.5 years) and controls (51 ± 6.9 years) (P = .87, ANOVA). TNM stage of the patient in part one was II. The number of every TNM stage of subjects in part two was 5 for stage I, 5 for II, 5 for III, and 5 for IV, respectively. Tumors were staged according to the tumor node-metastasis for breast cancer, AJCC (American Joint Committee on Cancer), 7th edition.

SOLiD Sequencing of miRNAs from Cancerous and Adjacent Noncancerous Breast Tissue.
In this key miRNA screening step, SOLiD sequencing-based miRNA expression profiling was performed to identify differential expression patterns of miRNAs from cancerous and adjacent noncancerous breast tissue. 560,054 effective reads were obtained in cancerous tissue and 3,300,900 effective reads adjacent noncancerous tissue. After filtrated by rRNA, tRNA, snRNA, and snoRNA, the remaining effective reads were mapped to miRNA Precursor Library-Human. 41,225 and 203,889 reads were obtained from cancerous and adjacent noncancerous tissue, respectively. In these reads, the most abundant length was 22 nt size class in both of cancerous and adjacent noncancerous tissue. The percentage was 29.8% and 23.4%, respectively ( Figure 1). Compared to the miRBase (15.0), 546 precursor miRNA out of 940 known precursor miRNAs were identified in cancerous tissue, and 364 precursor miRNAs in adjacent noncancerous tissue. To study the differential expression profile of genome-wide tissue miRNAs between cancerous and adjacent noncancerous breast tissue, Mann-Whitney test and fold change of sequenced miRNAs were employed. Using 2-fold expression difference as a cutoff level, 19 upregulated miRNAs were identified (Table 1). Among 19 upregulated miRNAs, five miRNAs were found to be altered more than fivefold by SOLiD sequencing (i.e., miR-29a, miR-23a, miR-23b, miR-192, and miR-21), and could be selected as candidates.

Validation of Key miRNAs in Serum of Breast Cancer
Patients. To validate the putative markers identified from part one, real-time PCR assays were developed to quantify miRNAs in serum. Using RNU6B as normalization control, expression levels of the marker miRNAs (miR-29a, miR-23a, miR-23b, miR-192, and miR-21) were assessed by realtime PCR on the 40 serum samples including 20 breast patients and 20 controls. Our data indicated that miR-29a and miR-21 were significantly elevated in serum of breast cancer patients than those in controls (P < .001, Mann-Whitney test, Figures 2(a) and 2(b)). Spearman rank correlation showed that serum levels of miR-29a and miR-21 were not correlated (R 2 = 0.298, P = .202, Figure 3), and we examined whether the serum miR-29a and miR-21 level may be associated with stages of cancer. The patients were  stratified by TNM staging of the 20 breast cancer patients. The results indicated that miR-29a (P = .86) and miR-21 (P = .50) levels were not different among the staging (Kruskal-Wallis test, Figure 4), but statistically significant differences of miR-29a were obtained as individual tumor stage was compared with the controls (stage I, P = .002; stage 4 Journal of Biomedicine and Biotechnology II, P = .004; stage III, P = .004; stage IV, P = .01, Mann-Whitney test).

Gene Ontology Analysis.
Each of miR-29a and miR-21 has a broad range of predicted target genes (Table 2). Target genes predicted by mRBase targets for all assayed miRNAs were used as reference. Furthermore, we performed GO analysis on the target genes of these miRNAs and found that specific biological process categories were enriched, such as signal transduction, development, apoptosis, cell proliferation, cell adhesion, and (Table 3).

Discussion
This is the first report on genome-wide miRNA expression profile of breast cancer patient by SOLiD sequencing. We found five miRNAs to be altered more than fivefold by SOLiD sequencing, and two miRNAs, miR-29a and miR-21, were identified as significantly increased in the serum of breast cancer patients. These key miRNAs, if validated in a large set of serum samples, may serve as noninvasive markers for breast cancer detection. In addition, miRNAs are very stable in plasma and serum, which are protected from RNases and remain stable even in harsh conditions [10,12]. Therefore, their stability makes miRNA expression well suited for being tested in samples. And because of the simplicity and reproducibility of getting blood samples, these noninvasive and easily detectable biomarkers may have a great potential in cancer therapy and prognosis. A full understanding of miRNA expression profiles as potential biomarkers for diagnosis and prognosis is necessary. About the two key miRNAs, miR-21 has been extensively evaluated in the literature. The miR-21 is overexpressed in a wide variety of cancers and has been causally linked to cellular proliferation, apoptosis, and migration. Si et al. [16] transfected breast cancer MCF-7 cells with anti-miR-21 oligonucleotides and found that anti-miR-21 suppressed both cell growth in vitro and tumor growth in the xenograft mouse model, associated with increased apoptosis and decreased cell proliferation, which could be in part owing to the downregulation of the antiapoptotic Bcl-2. Zhu et al. [17] found that suppression of miR-21 in metastatic breast cancer MDA-MB-231 cells significantly reduced invasion and lung metastasis. Several potential target genes directly regulated by miR-21 have been identified, including the tumor suppressor tropomyosin 1 (TPM1) [18] and the protein-programmed cell death 4 (PDCD4) [19,20] in breast cancer cells. As for miR-29a, the overexpression of miR-29a suppressed the expression of tristetraprolin (TTP), a protein involved in the degradation of messenger RNAs with AU-rich 3 -untranslated regions, and led to epithelial-tomesenchymal transition (EMT) and metastasis in cooperation with oncogenic Ras signaling [21]. Interestingly, one of the predicted target genes of miR-29a is PTEN, which is a previously identified target of miR-21 in heptocellular carcinoma, suggesting that cell and tissue type-specific differences might result in different functional miR targets [22]. Further investigation of regulatory mechanism of these miRNAs and their targets may improve our understanding of molecular pathogenesis of breast cancer as well as therapy and prognosis.
Notably, we demonstrated that SOLiD sequencing can provide a more accurate screen method for genome-wide miRNA profile of breast cancer. This is a recent introduction of deep sequencing technology, enabling the simultaneous sequencing of up to millions of DNA or RNA molecules, and providing a promising option for profiling miRNAs. Deep sequencing overcomes many of the disadvantages of microarrays, which suffer from background and crosshybridization problems and measure only the relative abundances of previously discovered miRNAs. Deep sequencing measures absolute abundance (over a wider dynamic range than possible with microarrays) and is not limited by array content, allowing for the discovery of novel miRNAs or other small RNA species. A number of other next-generation sequencing technologies are currently in widespread use, including pyrosequencing (454 sequencing, Roche), which provides up to 400,000 sequences of up to 250 nt in length for a single read, and Illumina/Solexa and AB SOLiD generate shorter reads (35 bp) but generate >1 Gbp of sequence data per run. Chen et al. sequenced all serum miRNAs of healthy Chinese subjects employing Solexa and found over 100 and 91 serum miRNAs in male and female subjects, respectively. And further studies revealed that serum miRNAs contain fingerprints for various diseases, such as lung cancer, colorectal cancer, and diabetes [12]. Hu et al. also used Solexa sequencing to test different serum-miRNA expression profile between patients suffering from non-small-cell lung cancer (NSCLC) with longer survival and short survival. Levels of four miRNAs (i.e., miR-486, miR-30d, miR-1, and miR-499) were significantly associated with overall survival [23]. In addition, Qi et al. [24] analyzed the miRNA expression profiles from adenovirus type 3 (AD3) infected Human laryngeal epithelial (Hep2) cells, and Schulte et al. [25] revealed differential expression of microRNAs in favorable versus unfavorable neuroblastoma using SOLiD sequencing, respectively. This strategy of deep sequencing provides a powerful approach that allows rapid identification of miRNA expression profile, of course, which should be evaluated using independent validation methods, for example, RT-qPCR.
In recent research, Zhu et al. analyzed miR-16, miR-145, and miR-155 in archived serum specimens from 13 subjects with breast cancer and 8 subjects without. The results showed that miR-155 may be differently expressed in the serum of women with progesterone receptor positive tumors compared to those were negative [26]. In another study, Wang et al. examined 6 miRNAs (miRNA-21, 106a, 126, 155, 199a, and 335) in the serum from 68 patients with breast tumors and 40 healthy subjects. They found that the expression of miR-21, miR-126, miR-155, miR-199a, and miR-335 was closely associated with clinicopathologic features of breast cancer (P < .05), such as histological tumor grades and sex hormone receptor expression [27]. Compared with the above research, our studies have some advantages: (1) genome-wide miRNA expression analysis by SOLiD sequencing was employed; (2) the key miRNA was Journal of Biomedicine and Biotechnology     selected and validated by two steps. But there are several limitations in our studies: (1) serum-miRNA profiling should be analyzed and compared with tissue-miRNA profiling to identify coregulated miRNAs; (2) the sample size we used is too small, further validation of key miRNAs in large scale subjects is necessary; (3) despite of the significant increasing of miR-29a and miR-21 in the serum of breast cancer patients, it is desirable to identify the correlation with specific breast cancer biopathologic features, such as estrogen and progesterone receptor expression, vascular invasion, or  11 5 Values expressed as the number of genes targeted by miRNA.
proliferation index. If further validation can be performed, it would add more value to use these markers for breast cancer detection.

Conclusion
Our studies were beneficial from SOLiD sequencing for genome-wide miRNA expression profile followed by qRT-PCR validation. This novel application provides an important advancement in miRNA detection and profiling and has the capacity to dramatically advance our understanding of the role for miRNAs in disease pathogenesis.