Comprehensive Analysis of Transcriptome Sequencing Data in the Lung Tissues of COPD Subjects

Background and Objectives. Chronic obstructive pulmonary disease (COPD) is a complex disease characterized by airflow limitation. Although airway inflammation and oxidative stress are known to be important in the pathogenesis of COPD, the mechanism underlying airflow obstruction is not fully understood. Gene expression profiling of lung tissue was performed to define the molecular pathways that are dysregulated in COPD. Methods. RNA was isolated from lung tissues obtained from 98 subjects with COPD and 91 control subjects with normal spirometry. The RNA samples were processed with RNA-seq using the HiSeq 2000 system. Genes expressed differentially between the two groups were identified using Student's t-test. Results. After filtering for genes with zero counts and noncoding genes, 16,676 genes were evaluated. A total of 2312 genes were differentially expressed between the lung tissues of COPD and control subjects (false discovery rate corrected q < 0.01). The expression of genes related to oxidative phosphorylation and protein catabolism was reduced and genes related to chromatin modification were dysregulated in lung tissues of COPD subjects. Conclusions. Oxidative phosphorylation, protein degradation, and chromatin modification were the most dysregulated pathways in the lung tissues of COPD subjects. These findings may have clinical and mechanistic implications in COPD.


Introduction
Chronic obstructive pulmonary disease (COPD) is characterized by chronic airflow limitation that is not fully reversible and is highly prevalent worldwide [1]. Although cigarette smoking is a major risk factor for COPD, the mechanism by which inhaled smoke contributes to airflow obstruction is not fully understood. Current theories for this include persistent airway inflammation that is modified by oxidative stress, excess proteinases, autoimmunity, and apoptosis of alveolar cells [2].
Gene expression studies of diseased lungs can generate high-throughput results to shed light on the molecular processes underlying COPD pathogenesis. Whole-genome expression of COPD in humans has been studied using airway epithelium and resected lung tissue in several groups [3][4][5][6][7][8][9]. These studies mostly used microarrays to profile gene expression patterns in COPD. Next-generation sequencing technology was recently applied to transcriptomics. RNA-seq technology provides read counts of RNA fragments in each gene [10]. Background and cross-hybridization are not issues in RNA-seq and the technology can quantify both lowly and highly abundant transcripts [11]. In addition, this method can provide information about splice variants, allele-specific expression, and fusion transcripts [12]. RNA-seq data of 2 International Journal of Genomics airways was recently published [13,14]; however, the number of subjects in these studies was relatively small.
We performed gene expression profiling using RNA-seq of lung tissues resected from large numbers of subjects with COPD or with control subjects to better understand the molecular mechanisms responsible for the pathogenesis of COPD.

Study Populations.
Subjects were patients who required resection for lung cancer and who were registered in the Asan Biobank from January 2008 to November 2011. The inclusion criteria were a postbronchodilator FEV 1 /FVC ratio (ratio of forced expiratory volume in the first second to forced vital capacity) of less than 0.7 for the COPD group and normal spirometry for the control group in accordance with American Thoracic Society/European Respiratory Society criteria [15]. This study was approved by the institutional review board of Asan Medical Center (2011-0711) and written informed consent was obtained from all patients.

RNA Preparation and Sequencing.
Total RNA was isolated from apparently normal fresh frozen lung tissue that was remote from the lung cancer. RNA integrity was assessed using an Agilent Bioanalyzer and RNA purity was assessed using a NanoDrop spectrophotometer.
One g of total RNA was used to generate cDNA libraries using the TruSeq RNA library kit. The protocol consisted of poly A-selected RNA extraction, RNA fragmentation, reverse transcription using random hexamer primers, and 100 bp paired-end sequencing using the Illumina HiSeq 2000 system. All data have been deposited in the NCBI Gene Expression Omnibus (GEO) public repository and can be accessed through the accession number GSE57148.

Quality Control and Data
Management. For quality control, read quality was verified using FastQC and read alignment was verified using Picard. Differential gene expression (DEG) analysis was performed using TopHat and Cufflinks software [16]. To estimate expression levels, the RNA-seq reads were mapped to the human genome using TopHat (version 1.4.1) [17] and quantified using Cufflinks software 2.0.0 [18]. Cufflinks software was run with the UCSC hg19 human genome and transcriptome references. The numbers of isoform and gene transcripts were calculated and the relative abundance of transcripts was measured in fragments per kilobase of exon per million fragments mapped (FPKM).
Expression levels were extracted as a FPKM value for each gene of each sample using Cufflinks software. Genes with FPKM values of 0 across all samples were excluded. Filtered data were subject to upper quantile normalization. Statistical significance was determined using Student's -test. The false discovery rate (FDR) was controlled by adjusting values using the Benjamini-Hochberg algorithm. The analysis steps used are summarized in Figure 1.
To investigate whether DEGs are related to clinical phenotypes, we performed a linear regression analysis for 5 clinical phenotypes with respect to its gene expression. We considered each clinical phenotype as the responder for regression and each gene expression as the predictor.

Quantitative Real-Time PCR (qRT-PCR).
Several genes whose expression level was found to be related to COPD status by RNA-seq were validated using TaqMan real-time PCR. The results were normalized to GAPDH Ct values. Primer sequences for the genes of interest are given in Table 2.

Pathway
Analysis. Functional enrichment analysis was performed using gene set enrichment analysis (version 2.0.8), which combines information from previously defined gene sets obtained from the Molecular Signature Database (version 3.1). Biological gene functional annotation analysis was performed using DAVID (version 6.7) with a list of DEGs. BioLattice (version 1.1) was used to annotate coexpressed gene groups to GO biological process terms and visualize their relations [19].
2.6. Differential Alternative Splicing. To detect differential alternative splicing between the two groups, subjects from each group were evaluated using a multivariate Bayesian algorithm called "multivariate analysis of transcript splicing" [20]. Differential alternative splicing, including exon skipping, mutually exclusive exons, alternative 5 or 3 splice site usage, and intron retention, was investigated. Exon usage of VIM between two groups was visualized using DEXSeq [21].

Connectivity Map.
A connectivity map [22] was used to identify potential drugs that might reverse the gene International Journal of Genomics 3 expression pattern associated with the pathogenesis of early COPD. The connectivity map is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules. The basic assumption of the connectivity map is that transcriptional perturbation can occur or be treated by certain drugs that intrigue similar changes.

Demographic Characteristics.
The demographic characteristics of 98 subjects with COPD and 91 control subjects are shown in Table 1. All subjects were male and the mean age and the mean pack years of cigarette smoking history were higher in the COPD group than in the control group. As expected, pulmonary function was significantly lower in the COPD group than in the control subjects. Most of COPD subjects were in early stage. In COPD group, 28 subjects took inhaled corticosteroid, 40 subjects took tiotropium, and 22 subjects took short-acting beta-agonist. None in control group took bronchodilator.

Quality Control and DEGs.
The total number of reads produced from each sample was 38,742,474 ± 7,332,014 reads (mean ± standard deviation). The difference in the number of reads between COPD samples and control samples was not statistically significant. After read alignment with TopHat and read quantification with Cufflinks using UCSC hg19 transcriptome reference, a total of 189 samples and 23,146 genes were analyzed. A total of 248 genes had zero FPKM values in all samples. After filtering for genes with zero counts in whole samples, noncoding genes, and low variance genes, 16,676 genes were analyzed. Out of these genes, 2,312 genes were differentially expressed between the two groups (FDR corrected < 0.01) ( Table 3). There were many overlaps between -test and EdgeR (see supplementary Table 1 in the Supplementary Material available online at http://dx.doi.org/10.1155/2015/206937). Regression analysis of the DEGs with clinical phenotypes revealed that genes were more associated with FEV 1 (forced expiratory volume in the first second) and FEV 1 /FVC ratio than with age and smoking history ( Figure 2).

Validation.
To validate the RNA-seq results, we performed qRT-PCR of the FGG, MCL1, PDE4A, S100A6, SER-PINE1, SFTPC, and TMSB4X genes using the same RNA samples that were used for RNA-seq. We used a subset of 156 samples out of 189 samples for validation. The RNA-seq and qRT-PCR results were in good agreement ( Figure 3).

Clustering and Gene Functions.
A heat map shows the genes that were differentially expressed between the two groups ( Figure 4). DAVID revealed that the expression of genes involved in protein catabolism, oxidative phosphorylation, and chromatin modification differed most significantly between the two groups ( Table 4). Expression of genes encoding proteasome components including PSMA2, PSMB1, PSMC5, and PSMD4 was lower in the COPD group than in the control group. Gene set enrichment analysis revealed that the most significantly downregulated pathways  in the COPD group were oxidative phosphorylation and biosynthetic process (FDR < 0.25).
We used -means clustering to construct 30 groups of coexpressed genes for BioLattice. After matching with hypergeometric distributions to annotate those gene sets to GO terms, concept lattice for coexpressed gene clusters with GO biological process terms was visualized showing that genes related to transcription and oxidative phosphorylation are enriched in the major clusters ( Figure 5).
In the COPD group, the heat map shows hierarchical clustering of two subgroups using Euclidean distance ( Figure 6). Four hundred and four genes that were differentially expressed between these two subgroups in COPD subjects ( < 0.05) are enriched in the mitochondrial and steroid biosynthesis pathway. Group 1 showed tendency for lower FEV 1 and lower DLCO (carbon monoxide diffusing capacity). When comparing each group with the control, DEGs between group 1 and the controls only consisted of 18 genes, and their values were negligible. Meanwhile, DEGs between group 2 and the controls consisted of 4072 genes and their values were even higher than those of DEGs between the COPD group and the controls.
International Journal of Genomics 5  There was no difference in medication history between the two groups.

Isoform and Alternative
Splicing. Isoforms that were differentially expressed between the COPD and control groups were evaluated by Cufflinks. Pathway analysis results of the DEIs were similar to those of the DEGs. However, among the DEIs, 310 were not in the DEG list, which are enriched in genes encoding proteins that function in cell junctions and focal adhesions. The multivariate analysis of transcript splicing program (MATS) revealed that specific alternative splicing events were significantly more common in COPD subjects than in control subjects. Five categories of differentially expressed isoforms were identified by MATS: skipped exon, alternative 5 splice site, alternative 3 splice site, mutually exclusive exons, and retained introns. Significantly different events between the COPD group and the control group are shown in the supplementary Table 2 (FDR < 0.01). Intron retention of HNRNPH1 and VIM occurred significantly more in the COPD group than in the control group. Mutually exclusive exons occurred more frequently in the COPD group than in the control group in 78 genes. Figure 7 shows exon usage of VIM between the two groups visualized using DEXSeq [21].

Connectivity Map.
A connectivity map [22] was used to identify potential drugs that might reverse the gene expression pattern associated with the pathogenesis of early COPD. Gene expression changes arising from treatment with several drugs were negatively correlated with the expression patterns 6 International Journal of Genomics P oxidative phosphorylation P protein targeting P transport P intracellular transport P localization P establishment of localization P cellular localization P establishment of cellular localization P electron transport P generation of precursor metabolites and energy P regulation of transcription DNA-dependent P regulation of metabolism P regulation of cellular metabolism P regulation of biological process P regulation of physiological process P regulation of cellular process P regulation of cellular physiological process that differ in the COPD group compared to the control group. Gene expression changes arising from treatment with MG-262 ( = 0.00004) and puromycin ( < 0.00001) were most significantly negatively correlated.

Discussion
In this study, we used RNA-seq to identify genes whose expression differs between COPD and control subjects. In total, 2312 genes were identified with FDR < 0.01. We validated a subset of RNA-seq data with qRT-PCR, and the results were in good agreement.
Previous studies have investigated the gene expression profiles of COPD patients using microarray, serial analysis of gene expression, and RNA-seq. Microarray data indicates that MICAL2 and NOTCH2 are upregulated in the resected lung tissue of COPD patients [5]; the expression of these genes was higher in the COPD group than in the control group in the present study. The expression of genes encoding ribosomal proteins and S100A6 was lower in the COPD group than in the control group, and RNA-seq previously indicated that the expression of these genes is reduced in the small airway epithelium of smokers [14]. However, there is little overlap between studies in terms of the genes that are differentially expressed between people with reduced lung function and those with normal lung function [23]. This may be because different methods were used or because different phenotypes were examined. Of interest, 582 differentially expressed genes were not in the Affymetrix microarray (U133a).
Previous studies of the COPD transcriptome reported a high degree of overlap in the biological processes affected. In the current study, the most altered pathway in COPD patients was mitochondrial oxidative phosphorylation. The expression of mitochondrial genes was previously shown to be reduced in the lung tissues of COPD subjects using serial analysis of gene expression [3]. A recent report showed that expression of the mitochondrial membrane protein PHB1 is downregulated in lung tissue of COPD patients, suggesting that mitochondrial stability is reduced [24]. Another report revealed that mitochondrial mass is reduced in airway epithelial cells exposed to particulate matter, and this defect is also observed in the daughter cells [25]. One possible explanation may be related to the fact that an increase in the CO 2 level can reduce oxidative phosphorylation [26]. However, considering that our COPD subjects were in relatively early stages of the disease, they are not expected to have a clinically meaningful increase in CO 2 levels.
International Journal of Genomics In the current study, the protein catabolism pathway was dysregulated in the lung tissues of COPD subjects. Many genes related to the 20S proteasome including PSMA2, PSMB1, PSMC5, PSMD4, and PSMD13, as well as ubiquitin ligase complex genes including STUB1, SELS, and DERL2, were downregulated in COPD subjects. The ubiquitinationproteasome pathway is dysregulated in COPD, but the mechanism by which this occurs is not fully understood [27]. The expression of 26S proteasome-associated genes is lower in lung tissues of moderate COPD subjects than in those of the smoker control subjects with normal lung function [3]. The expression and activity of the proteasome are reduced in lung tissues of COPD subjects due to dysregulation of Nrf2 [28]. Impairment of proteasomal activity/expression may be important in the pathogenesis of COPD. Interestingly, the proteasome inhibitor MG-262 was on top of a list of drugs that reverse the gene expression pattern of COPD. In recent experiments in mice, a proteasome inhibitor was suggested to be a therapeutic agent for pulmonary arterial hypertension via inhibition of pulmonary vascular smooth muscle cell proliferation and correction of endothelial dysfunction [29]. Inhibition of proteasome inhibitors can reverse diaphragmatic function in a COPD mouse model [30].
In the current study, chromatin modification genes were upregulated in the COPD group. An epigenetic mechanism is reportedly important in the pathophysiology of COPD [31]. Chromatin modification is an important mechanism in epigenetics. In the current study, the expression of MLL, which plays an important role in H3K4 methylation [32], and CHD, which is important for chromatin modification and opening of chromatin to allow transcription [33], was increased in COPD subjects. These changes may lead to the upregulation of transcription. The expression of HDAC10 was decreased 8 International Journal of Genomics in the lung tissue of COPD subjects, while the expression of HDAC2 was previously reported to be decreased in COPD [34]. Several studies report the mechanism by which the expression of HDAC2 is reduced in COPD, but further studies are required to explain how the expression of several genes related to chromatin modification is increased in COPD.
In a recent study investigating genes associated with the severity of emphysema, the major pathways affected were inflammation and tissue repair [35]. However, the inflammation pathway was not majorly affected in COPD subjects in the current study, probably because the subjects were at relatively early stages of the disease.
One of the advantages of RNA-seq is that DEIs can be identified. Pathway analysis results of the DEIs were similar to those of the DEGs; however, genes encoding proteins that function in cell junctions and cell migration were found specifically in the DEIs and not in the DEGs. This suggests that specific isoforms of these genes may function in the lung. Alternative splicing of genes is tissue-specific and has important roles in development, physiological responses, and the pathogenesis of diseases [36]. Interestingly, the gene encoding vimentin, which is a structural protein, had more retained introns in the COPD group than in the control group, which may alter the sequence of the protein.
There are several limitations of this study. COPD subjects were older and had more pack years of smoking than the control group. However, regression analysis results showed that most of DEGs had stronger correlation with lung function than age or smoking amount. All subjects were smoking or ex-smoking men. The results of this study may not be applicable to nonsmoking or female COPD subjects. Normal looking tissue adjacent to the lung cancer tissue was used for analysis. Lung tissue consists of many cell types including macrophages, epithelium, and endothelium. Microdissection of lung tissue or single cell sequencing would be required to determine whether the differential expression is altered in all lung cells or only in a specific subset of cells. Finally, it is difficult to determine whether the dysregulated pathways identified in this study are a cause or a consequence of the pathogenesis of COPD. Experiments in which the increase/decrease of the DEGs is reversed and shown to slow disease progression are needed to confirm that these pathways are causally involved in the pathogenesis of COPD.
In conclusion, reduced oxidative phosphorylation, modulation of protein catabolism, and dysregulation of transcription are important molecular features of early stages of COPD. Genes and splicing variants were identified that were differentially expressed between COPD subjects and control subjects. RNA-seq was useful tool to increase understanding of the pathophysiology of COPD.

Abbreviations
COPD: Chronic obstructive pulmonary disease FPKM: Fragments per kilobase per million DEG: Differentially expressed gene DEI: Differentially expressed isoform.