The Effects of Age, Cigarette Smoking, Sex, and Race on the Qualitative Characteristics of Lung Transcriptome

The within-sample relative expression orderings (REOs) of genes, which are stable qualitative transcriptional characteristics, can provide abundant information for a disease. Methods based on REO comparisons have been proposed for identifying differentially expressed genes (DEGs) at the individual level and for detecting disease-associated genes based on one-phenotype disease data by reusing data of normal samples from other sources. Here, we evaluated the effects of common potential confounding factors, including age, cigarette smoking, sex, and race, on the REOs of gene pairs within normal lung tissues transcriptome. Our results showed that age has little effect on REOs within lung tissues. We found that about 0.23% of the significantly stable REOs of gene pairs in nonsmokers' lung tissues are reversed in smokers' lung tissues, introduced by 344 DEGs between the two groups of samples (RankCompV2, FDR <0.05), which are enriched in metabolism of xenobiotics by cytochrome P450, glutathione metabolism, and other pathways (hypergeometric test, FDR <0.05). Comparison between the normal lung tissue samples of males and females revealed fewer reversal REOs introduced by 24 DEGs between the sex groups, among which 19 DEGs are located on sex chromosomes and 5 DEGs involving in spermatogenesis and regulation of oocyte are located on autosomes. Between the normal lung tissue samples of white and black people, we identified 22 DEGs (RankCompV2, FDR <0.05) which introduced a few reversal REOs between the two races. In summary, the REO-based study should take into account the confounding factors of cigarette smoking, sex, and race.


Background
Recently, we have revealed an important biological phenomenon that, despite high variations of gene expression levels among different individuals, the within-sample relative expression orderings (REOs) of genes are highly stable in a particular type of normal human tissue, which might be an intrinsic mechanism to keep genes functioning coordinately in the normal tissues. On the other hand, the stable REOs in the normal tissues are widely disturbed in diseased tissues [1,2], providing abundant information for characterizing diseases [1,3]. As the qualitative characteristics of transcriptomes, the within-sample relative expression order-ings (REOs) of genes are highly robust against measurement variations and experimental batch effects [4][5][6]. Actually, taking these unique advantages of the REOs, some REObased methods such as TSP [7], K-TSP [8] and others [9,10] have been developed for discriminating cancer subtypes. Especially, many REO-based prognostic signatures have been proposed for specific medical issues for various cancers such as nonsmall cell lung cancer [3,11], colorectal cancer [4,12], and other cancers [13][14][15].
Based on the REOs analysis, we have proposed an algorithm named RankComp [1] to detect differentially expressed genes (DEGs) for an individual disease sample compared with its previous normal state through analyzing which genes' up-or downregulation may lead to the reversal REOs in the disease sample, taking the stable REOs predetermined in a large collection of the normal tissue samples as the normal background [1,16]. The individual-level analysis of DEGs allows us to identify subtype-specific genes, which can provide us novel perspectives for understanding the mechanisms of carcinogenesis [16]. In contrast, for a DEG detected at the population-level, we cannot know whether it is differentially expressed in a particular cancer sample because of the heterogeneity of cancer. The REOs analysis method could also be applied to the identification of disease-associated genes or pathways based on onephenotype disease data when the normal tissues are unavailable or insufficient for some vital organs such as the brain and heart [1,[17][18][19]. In this situation, it is of great value to reuse the normal control data accumulated in other studies. And we have proposed a REO-based algorithm, named DRFunc [19], to identify disease-associated pathways based on one-phenotype data through comparing the stable REO in the one-phenotype disease samples with the normal stable REOs background predetermined in previously accumulated normal samples from other studies. Based on the REOs analysis, we have also proposed a method named "Rank-CompV2" for identifying DEGs at the population-level through comparing the stable REOs of two phenotypes [20].
The above mentioned differential expression analysis methods based on REO comparisons are all dependent on the normal stable REOs background predetermined from previously accumulated normal samples. However, some confounding factors such as age, cigarette smoking, sex, and race may affect the gene expression levels in normal samples. Studies have shown that sex-biased gene expression is widespread across genomes on both sex chromosomes and autosomes [21,22]. Several studies have also reported that cigarette smoking [23] and race [24] could alert the gene expression levels, and the gene expression levels change with age in many organ tissues, including lung tissues [25]. However, whether those confounding factors could affect the REO of gene pairs is still unknown.
Thus, in this paper, using the normal lung tissue samples from three different laboratories, we evaluated the effects of four confounding factors, including age, cigarette smoking, sex, and race, on the REOs within normal lung tissues.

Data and Preprocessing.
The gene expression profiles analyzed in this study are described in Table 1. All the datasets were measured by the Affymetrix GPL570 platform, and the processed data were directly downloaded from the Gene Expression Omnibus database. For the downloaded data, each probe ID was mapped to Entrez gene ID with the corresponding platform file. If a probe was mapped to multiple or zero genes, the data were discarded. If multiple probes were mapped to the same gene, the expression value of the gene was defined as the arithmetic mean of the values of these probes. Notably, nonsmokers included in this study denote individuals without the history of cigarette smoking.

Evaluation of Confounding Factors on REO of Gene Pairs.
Within a sample, the REO of two genes, A and B, is denoted as A > B (or B < A) if the expression level of gene A is higher (or lower) than that of gene B. For each of the three binary confounding factors, cigarette smoking, sex, and race, we first divided the samples into two groups and then identified the gene pairs with significantly stable REOs in each of the group. The significance of a gene pair with stable REO in a group of samples was determined by the binomial test as follows: where the REO pattern (A > B or A < B) is consistent among k samples out of n samples in total and p 0 ðp 0 = 0:5Þ is the probability of observing one of two possible REO outcomes in a sample by chance. The P-values were then adjusted using the Benjamini and Hochberg method [26]. A gene pair with stable REOs in both groups of samples but the REO directions are opposite is called a reversal gene pair. Otherwise, if the REO directions are consistent in both groups, it is called a concordant gene pair. If the two lists of stable gene pairs identified above have m common pairs, among which k have opposite REO directions, the reversal ratio is calculated as k/m.
Between the two groups of samples classified by a binary confounding factor, the distribution of other confounding factors between the two groups was tested by the Fisher exact test to ensure that there is no significant difference for the other confounding factors. For the age factor, the samples were divided into two groups based on the REO pattern of each gene pair, and then, we test whether there is a significant difference in age between the two groups of samples based on the Mann-Whitney U-test. The REO of the gene pair is significantly correlated with age if the age is significantly different between the two groups.

Identification of Differentially Expressed Genes.
Focusing on the stable gene pairs commonly identified from two groups of samples, we identified the concordant and the reversal REOs between the two groups for a specific factor. RankCompV2 [20] was applied to detect differentially expressed genes (DEGs) between the two groups of samples. The details of the RankCompV2 algorithm has been described in ref. [20]. Briefly, Fisher's exact test was applied to identify whether a gene may disrupt the gene correlation structure in one group compared to the other group based on the concordant and the reversal REOs between the two groups. For a particular gene, to minimize the potential effect of other genes' expression changes on the Fisher's exact test, an iterative filter process [27] was conducted.

The Influence of Age on REOs within Normal Lung
Tissues. From three datasets (GSE31210, GSE19804, and GSE20257, as shown in Table 1), we selected 65 samples of nonsmoking Asian females with age ranging from 37 to 80 years old for the analysis. Based on the REO pattern of each gene pair, the samples were divided into two groups, and then, the Mann-Whitney U-test was used to test whether there is significant difference in age between the two groups of samples. We could not find any gene pair whose REO was significantly correlated with age with FDR <0.05 or even with FDR <0.2 (Methods). Similarly, using 34 samples for Caucasian males with age ranging from 27 to 80 years old, collected from the dataset GSE4115, no significant gene pair was found with either FDR <0.05 or FDR <0.2.
The above results indicated that the influence of age on REO of gene pair could be negligible. Accordingly, the age factor was not taken into account in the subsequent analyses.

The Influence of Cigarette Smoking on REOs within
Normal Lung Tissues. We compared the gene expression profiles of normal lung tissue samples for 49 smokers and 44 nonsmokers from the GSE20257 dataset. The detailed information on the sample composition was shown in Table 2.
There is no significant difference in sex or race distribution between the smoker group and the non-smoker group (Fisher's exact test, P > 0:1).
With FDR <0.05, we identified the gene pairs with significantly stable REOs in the smoker group and nonsmoker group, respectively. We found 187,875,560 gene pairs that have significantly stable REOs (binomial test, FDR <0.05) in both groups, among which 0.227% showed reversal REO patterns. With RankCompV2, we identified 344 DEGs, including 210 up-and 134 downregulated genes in the smoker group compared with the non-smoker group (FDR <0.05). The 210 upregulated genes and 134 downregulated genes were enriched, respectively, in 7 pathways and 1 pathway (hypergeometric test, FDR <0.05), as shown in Figure 1. For the pathway "metabolism of xenobiotics by cytochrome P450", cytochromes P450 are known to be responsible for the metabolism of compounds present in cigarette smoke, including nicotine, benzene, polycyclic aromatic hydrocar-bons (PAHs), and tobacco-specific nitrosamines (TSNAs) [28]. As for the "glutathione metabolism" pathway, it has been found that cigarette smoking could induce the deregulation of glutathione metabolism in bronchial epithelial cells [29]. It has also been reported that "metabolic pathways," [30] "steroid hormone biosynthesis," [31] "pentose phosphate pathway," [32] "arachidonic acid metabolism," [33] and "mineral absorption" [34] are affected by cigarette smoking.
The above results indicated that cigarette smoking can alter the REOs in normal lung tissues and disturb some important biological pathways.

The Influence of Sex on REOs within Normal Lung
Tissues. We compared the gene expression profiles of normal lung tissue samples for 64 males and 29 females from the dataset GSE20257. The detailed information of the sample composition was shown in Table 3. There is no significant difference in smoking rate or race distribution between the male group and the female group (Fisher's exact test, P > 0:2).
We identified the gene pairs with significantly stable REOs in the male and female groups, respectively, and found 187,481,246 gene pairs with significantly stable REOs (binomial test, FDR <0.05) in both groups, among which 0.074% showed the reversal REO patterns. With Rank-CompV2, we identified 35 DEGs in the male group compared with the female group (FDR <0.05). In another dataset GSE71181, including 201 male samples and 80  3 BioMed Research International female samples which are all from smokers, 25 of the above 35 DEGs were also found (T-test, FDR <0.05) and 96% (24 genes) have the same dysregulation directions in the male group compared with the female group. Among the 24 DEGs, 6 out of the 10 upregulated genes in the male group are located on Y chromosome, 12 out of the 14 upregulated genes in the female group are located on X chromosome, and the cytoband of these genes is shown in Table 4. In particular, DDX43, CRISP2, and PRDM7, which are upregulated in the male group, are located on autosome and involved in spermatogenesis and male fertility [35,36]. For the other two genes, NLRP2 and C3orf79, located on autosome but upregulated in the females, it is known that NLRP2 is a critical regulator of oocyte [37].

The Influence of the Race Factor on REOs within Normal
Lung Tissues. Due to the limitation of the sample sizes for other races, we only compared the gene expression profiles of normal lung tissues for the white and black races. From the GSE20257 dataset, we obtained 34 samples for white people and 59 samples for black people. The detailed information of the sample composition was shown in Table 5. There is no significant difference in cigarette smoking rate or sex distribution between the two groups (Fisher's exact test, P > 0:1).     Due to the small number of DEGs, we found no pathway significantly enriched with the up-or downregulated DEGs with FDR <0.05. With P < 0:05, the 10 upregulated and 12 downregulated genes were enriched in, respectively, 4 and 4 pathways, as shown in Figure 2. The result indicates that there are some differences in metabolism and immunity of the normal lung tissues between the white and black races [38,39].

Discussion
Among the four confounding factors investigated in this paper, cigarette smoking alters the REOs within lung tissues most widely, and sex and race can also alter the REOs but only slightly, whereas there is no evidence that age could affect the REO of gene pairs. Therefore, the REO-based study should take into account the confounding factors of cigarette smoking, sex, and race. When building the normal stable REOs background based on previously accumulated normal samples from other studies, the normal samples should include sufficient samples with the same factors presenting in the one-phenotype disease samples analyzed in a study.
Our results showed that cigarette smoking disrupts "metabolism of xenobiotics by cytochrome P450," "glutathione metabolism," and other pathways [28,29], and there are some differences in metabolism and immunity between different races. The sex factor affects some genes located on the sex chromosome and some genes located on the autosomes which are involved in spermatogenesis, male fertility [35] and are critical regulator of oocyte [37]. Because cigarette smoking, sex, and race could affect the REO of gene pairs, the influence of these factors should be taken into account in the REO-based analysis for lung tissue.
This study exists some limitations. Due to the limitation of normal tissue samples and clinic information for many other organs, we only systematically analyze the influence of the four common confounding factors (age, cigarette smoking, sex, and race) on REOs in the normal lung tissues. The effects of the confounding factors on the REOs might be tissue specific. We have primarily analyzed the influence of sex on REOs of gene pairs in normal stomach tissues and esophagus tissues, respectively, and found that all the DEGs are located on sex chromosome, as described in Supplementary files 1. Future studies on the effect of confounding factor on the REOs of gene pairs in tissues of other organs need to be further studied.

Conclusions
Our results show that the confounding factors, including cigarette smoking, sex, and race could alter the REOs within lung tissues. Thus, the REO-based study should consider these confounding factors. Moreover, the effect of age on REO of gene pair could be negligible.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no competing interests.

Authors' Contributions
QZG and JZ conceived the study, analyzed the data, made figures, performed the statistical analysis, and drafted the manuscript. YG, JX, JHZ, JJX, and HC searched the data and participated in the statistical analysis. HDY participated in discussing and revising the manuscript. ZG and XLW   BioMed Research International conceived of the study, participated in its design and coordination, helped to draft the manuscript, and supervised the work. All authors read and approved the final manuscript. Qingzhou Guan and Juan Zhang contributed equally to this work.