Coexpression Analysis of Transcriptome on AIDS and Other Human Disease Pathways by Canonical Correlation Analysis

Acquired immune deficiency syndrome is a severe disease in humans caused by human immunodeficiency virus. Several human genes were characterized as host genetic factors that impact the processes of AIDS disease. Recent studies on AIDS patients revealed a series disease is complicating with AIDS. To resolve gene interaction between AIDS and complicating diseases, a canonical correlation analysis was used to identify the global correlation between AIDS and other disease pathway genes expression. The results showed that HLA-B, HLA-A, MH9, ZNED1, IRF1, TLR8, TSG101, NCOR2, and GML are the key AIDS-restricted genes highly correlated with other disease pathway genes. Furthermore, pathway genes in several diseases such as asthma, autoimmune thyroid disease, and malaria were globally correlated with ARGs. It suggests that these diseases are a high risk in AIDS patients as complicating diseases.


Introduction
Human immunodeficiency virus (HIV) causes a serious disease that affects people's health and lives. Millions of people have died from HIV infections in the 30 years since its identification. Over the past decades, a large number of studies have focused on every aspect of this virus, including virology, immunology, treatment, and genetics. An important problem related to AIDS complications was raised after the discovery of the HIV characteristics that severely damaged the lymphoid system.
Several diseases are associated with HIV infection and antiretroviral therapy. Tuberculosis is highly frequent in a large proportion of HIV infection cases in developing countries [1]. When research revealed that HIV-induced immune deficiency was the most common risk factor for cancer, HIV infection-related cancer became a complication of HIV infection [2]. HIV-associated sensory neuropathy is also a complication of HIV infection [3]. Recently, venous thrombosis has been described as a disease associated with HIV-positive patients [4]. Pulmonary arterial hypertension is a life-threatening complication of HIV infection [5]. A well-described complication of HIV and antiretroviral therapy is pancreatitis, which has exceedingly high rates in the HIV-positive population [6]. During antiretroviral therapy, classical Hodgkin lymphoma (HIV-cHL) and rhabdomyolysis are also important complications of HIV disease [7,8]. One report showed that HIV patients frequently had neutropenia [9]. Generally, AIDS complications are involved in almost all important human diseases to the best of our knowledge.
Our understanding of genetic restriction factors targeting AIDS has been greatly improved by advances in genome research, such as sequencing of the whole human genome through physical and functional analyses. Many methods have been developed to study the underlying mechanisms of diseases on the whole genome level, such as genomewide association studies, which can identify host genetic factors that affect HIV infection and the host restriction response. Nearly 40 AIDS restriction genes (ARGs) were identified from widely biological pathways such as the HIV entry receptor on lymphoid cells to oncogenes in human glioblastomas.
Many web-based databases, such as KEGG, have been established as tools to collect human disease pathway genes using genomic and molecular methods. For example, comparative transcriptome analysis can isolate marker genes that are highly differentially expressed in patients. Molecular biology has discovered several pathways that play major roles in human diseases, and hundreds of genes have been characterized as members of human disease pathways. Hence, human disease genes are available for the analysis of their effects on ARGs.
Expression correlation between genes based on a gene coexpression model can reveal the molecular mechanisms underlying gene regulation. For instance, mitochondrial pathways are coexpressed with muscle system pathway genes and neurodegenerative disease pathway genes [10]. The expression correlation between ARGs and other disease pathways could also explain the relationship between ARGs and AIDS complicating diseases. Canonical correlation analysis (CCA) is a powerful approach to detect coexpression between gene sets because it does not only determine correlations between two genes. For example, CCA can be used to perform a coexpression analysis of glioma pathway genes from glioblastoma transcriptomes.
In this study, we developed a CCA to determine coexpression between ARGs and other human disease pathway genes. We discussed the most significant coexpression patterns that could imply the susceptibility or sensitization to AIDS complicating diseases.  Table S1 available online at https://doi.org/10.1155/2017/9163719). From human genome expression datasets, two expression datasets were generated to include disease pathway genes and ARG expression data, respectively.

Canonical Correlation Analysis.
Canonical correlation analysis is a statistical method which extracts statistically independent pair of canonical variables dependent on the correlation among two sets of original variables. The original variables are results of a linear combination of the canonical variables. In this study, expression of disease pathway genes in diverse conditions was described by vector a = a 1 , a 2 , …, a m and ARG expression by the vector b = b 1 , b 2 , …, b m . The respective canonical variables c = c 1 , c 2 , …, c m and d = d 1 , d 2 , …, d m have canonical coefficients vectors s = s 1 , s 2 , …, s m and s′ = s′ 1 , s′ 2 , …, s′ m . a = c′s, and b = d′s′. The vector of eigenvalues was calculated as the magnitude of the correlation between pair of canonical variables. The variance covariance matrices were used to analyze the covariances between variables.
2.3. The Study Design and Software Tools. The R platform (http://www.rproject.org/) was used for canonical correlation analysis of expression data. After the canonical variables were produced, the top correlated canonical variables (r > 0 95) were identified to analysis the coexpressed individual genes. Two thresholds were set up to isolate correlated integrated disease pathways with r values > 0 5 and standard deviations > 0 2. Web-based DAVID tool (http:// david.abcc.ncifcrf.gov/) was used for functional annotations and enrichment analysis; we used Homo sapiens genome as background. The "KEGG_PATHWAY" was selected for disease pathway enrichment analysis. Other parameters were automatically generated from DAVID.
Functional annotations were generated, and enrichment analyses were performed for the metabolic pathway genes using the web-based DAVID tool (http://david.abcc. ncifcrf.gov/). For the pathway enrichment analyses, the "KEGG_PATHWAY" was selected. The pathways with a P value < 0 01 were considered significant.

Results and Discussion
3.1. ARGs. Nearly 40 AIDS restriction genes (ARGs) have been considered as host genes that impact the progression of HIV infection from virus entry to the development of AIDS (Table S1). For example, PPIA, TSG101, TRIM5α, APOBEC3G, and CUL5 encode HIV-1 postentry cellular viral cofactors that have been described in recent research. PPIA plays a role in cyclosporin A-mediated immunosuppression as a member of the peptidyl-prolyl cis-trans isomerase (PPIase) family [11], which can interact with HIV viral proteins. Cell growth and differentiation are regulated by the interaction of TSG101 with stathmin [12]. TRIM5α is an E3 ubiquitin ligase that is involved in retroviral restriction [13]. Other ARGs are involved in many cellular processes, such as DEFB1, which has been implicated in cystic fibrosis pathogenesis [14], HLA-A, which is expressed in nearly all cells [15], CCL, which has been implicated in immunoregulatory and inflammatory processes [16], CXCR6, which is a chemokine (C-X-C motif) receptor [17], LY6D, which is a member of the lymphocyte antigen 6 complex [18], and APOBEC3B, which is a cytidine deaminase. However, these ARGs have been characterized only in the absence of AIDS-related complications. The relationship between ARGs and other human diseases is unknown.
3.2. The General CCA Results. The application of CCA to a transcriptome can identify coexpression between genes. Coexpression between individual genes or groups of genes can be identified based on the standard deviations of genes on canonical variables. Hence, we used two strategies to determine two types of correlation. First, the top (r > 0 95) canonical correlations with low standard deviations were isolated as coexpressed individual genes. Coexpression has been suggested to have less of an impact on whole disease pathways. The relationship between complicating diseases and AIDS is dependent on the roles of a few genes. Second, canonical correlations with high standard deviations (s > 0 2) and r values (>0.5) were selected as coexpressed between gene groups. This result indicates that coexpression has an effect on the whole disease pathway because more than 20% of the genes contribute to the canonical correlation. Most genes in disease pathways are involved in the cross talk between the complicating disease and AIDS. Table 2, 21 top (r > 0 95) canonical correlations were determined between the ARGs and human disease pathway gene transcriptomes using the CCA approach. The canonical variables originated from disease pathways including HTLV-I infection, herpes simplex infection, Epstein-Barr virus infection, viral carcinogenesis, viral myocarditis, type I diabetes mellitus, graft-versus-host disease, autoimmune thyroid disease, allograft rejection, pathways in cancer, influenza A, proteoglycans in cancer, tuberculosis, transcriptional misregulation in cancer, Huntington's disease, toxoplasmosis, hepatitis B, measles, microRNAs in cancer, hepatitis C, and Alzheimer's disease. However, canonical variables could  not delegate total ARG expression and human disease pathway gene expression because the standard deviations of the ARGs (Sa) and the human disease pathway genes (Sb) were less than 0.05, as shown in Table 2. Sa and Sb indicate that the canonical variables can explain less than 5% of the genes among the ARGs and human disease pathway genes. For example, HTLV-I infection pathway has 0.988 correlation factor with ARGs, and Wilks and Chisq indicate the statistic of correlation. But it has Sa < 0 01 and Sb < 0 05, which suggested that in whole pathway of HTLV-I infection, only a few genes were correlated with ARGs. In Table 3, the genes with the highest correlation with the canonical variables from the ARGs and human disease pathway genes were collected to show the coexpression relationships.  (Table 3) [19]. HLA polymorphisms are significantly correlated with the time to AIDS in HIV-infected individuals [20]. HLA-A and HLA-B are major AIDS restriction genes.

Coexpressed ARGs and Human Disease Genes. As shown in
HLA-B and HLA-B were coexpressed with 3135 (HLA-G major histocompatibility complex, class I, G) and 3134 (HLA-F major histocompatibility complex, class I, F) genes shared by many disease pathways, including HTLV-I infection, herpes simplex infection, Epstein-Barr virus infection, viral carcinogenesis, viral myocarditis, type I diabetes mellitus, graft-versus-host disease, autoimmune thyroid disease, and allograft rejection ( Table 3). As a marker of T cell activation, HLA-DR induction was associated with HTLV-I seropositivity [21]. HTLV-I infection leads to the induction of HLA [22]. Herpes simplex virus type 1 (HSV1) can upregulate HLA-G expression in human neurons after acute neuron infection [23], whereas HLA-G is the MHC class I molecule that is induced in B cells after Epstein-Barr virus transformation [24]. During viral carcinogenesis, HLA is abnormally expressed to enable cancer cells to escape from immune surveillance [25]. HLA-G polymorphisms and expression were suggested as diagnostic markers due to their involvement in breast carcinogenesis [26]. The increased occurrence of HLA antigens was shown to be associated with viral myocarditis [27]. The HLA complex has been reported to contribute to type 1 diabetes because HLA polymorphisms introduce genetic susceptibility to type 1 diabetes [28,29]. There is evidence that HLA gene polymorphisms are potent risk factors for severe acute graft versus host diseases [30]. It was also shown that an HLA variant conferred a high risk for autoimmune thyroid disease [31]. Furthermore, allelicinduced abnormal expression of the HLA-G gene has been suggested to be associated with acute allograft rejection [32]. Generally, coexpression of the HLA genes indicates cross talk between HIV infection and other diseases, and this cross talk provides a potent mechanism for these diseases becoming AIDS complicating diseases.
VHL is a well-known tumor suppressor that is involved in hereditary cancer syndromes [36]. Inhibition of STAT1 can block cancer cell proliferation and invasion [37]. Overexpression of PTPN11 can enhance the progression of many proteoglycan cancers, such as liver cancer [38]. Inhibition of PIK3CD signaling can abrogate transitions in cancer cells [39]. Downregulation of PML, which is involved in the cell cycle, survival, and apoptosis, was identified in cancer cells [40]. ETV7 plays important roles in chromosomal translocations in human cancer [41]. Induction of SOS2 was demonstrated in cancer cells, such as triple negative breast cancer [42]. Toxoplasma gondii can downregulate CIITA to inhibit MHC class II expression [43]. The PCNA promoter can recruit the hepatitis B virus preS protein to become active [44]. Additionally, the Fas ligand gene can be induced by the hepatitis B virus X protein [45]. Thus, the coexpression of MYH9, ZNRD1, and other disease pathway genes suggests probable AIDS complicating diseases.

IRF1 and ZNRD1
. IRF1 is an interferon regulatory factor that can affect productive HIV infection and support natural resistance against HIV infection [46]. Hence, IRF1 is a well-studied human AIDS restriction gene.
As shown in Table 3, IRF1 and ZNRD1 are coexpressed with 356 (FASLG; Fas ligand) and 5609 (MAP2K7; mitogen-activated protein kinase 7) in the influenza A pathways. FASLG is described as a hepatitis B pathway gene and plays an important function in the influenza A virus pathway because influenza virus infection induces Fas ligand expression when the infected cells contact one another [47].
3.3.4. TLR8 and ZNRD1. TLR8 is a human toll-like receptor that plays a role in signaling pathways that modulate the innate immune response to HIV infection and reduce HIV replication [48]. Genetic polymorphisms in TLR8 have been determined to be host cell factors associated with HIV status [49].
As shown in Table 3, TLR8 and ZNRD1 are coexpressed with 6772 (STAT1; signal transducer and activator of transcription 1) and 3587 (IL10RA; interleukin 10 receptor subunit alpha) in the tuberculosis pathways. STAT1 plays a role in a signaling pathway to control intracellular killing of phagocytosed Mycobacterium tuberculosis [50]. IL-10 is a key factor that mediates the immunopathogenesis of tuberculosis in combination with interferon gamma and adiponectin [51].
As shown in Table 3, NCOR2 and ZNRD1 are coexpressed with 356 (FASLG; Fas ligand) and 6772 (STAT1; signal transducer and activator of transcription 1) in the measles pathways. Fas ligand and STAT1 were previously described in other disease pathways. Moreover, the measles virus phosphoprotein can prevent STAT1 phosphorylation [55].
As shown in Table 3, NCOR2 and ZNRD1 are coexpressed with 4035 (LRP1; LDL receptor-related protein 1) and 9377 (COX5A; cytochrome c oxidase subunit 5A) in the Alzheimer's disease pathways. Recent studies showed that the LRP1 levels were reduced in Alzheimer's disease [56]. COX family genes can impair anxiety-like behavior in an Alzheimer's disease model [57].
3.3.7. GML and ZNRD1. GML is the LY6 family member of the glycosylphosphatidylinositol-(GPI-) anchored proteins, which have conserved cysteine-rich domains with specific disulfide bonding patterns. LY6H was upregulated by HIV infection and suggested to play a role in innate immunity to HIV-1 via an interferon-like mechanism. Gene polymorphism analysis results support the hypothesis that GML is a susceptibility locus for HIV-1 infection [58].
GML and ZNRD1 are coexpressed with 6772 (STAT1; signal transducer and activator of transcription 1) and 6655 (SOS2; SOS Ras/Rho guanine nucleotide exchange factor 2) in the hepatitis C pathways. STAT1 plays an important role in hepatitis C infection in addition to cancer, tuberculosis, and measles. For example, hepatic STAT1 undergoes nuclear translocation during hepatitis C virus infection [59]. SOS2 has major functions in hepatitis C disease and cancer. Table 4, canonical correlations with large Sa, Sb (>0.2), and r values (>0.5) were selected to analyze coexpression between integral disease pathways and global ARGs. Unlike the top canonical correlations (r > 0 95) shown in Table 2, these canonical variables showed less correlation between each disease pathway and the ARG ( Table 4). The maximum r was detected between the asthma pathways and ARGs (equal to 0.66). However, these canonical variables had larger standard deviations (at least 0.2) than the top canonical correlations described previously. In the maximum case, the Sa of the ARG correlated with the acute myeloid leukemia pathway was 0.37 and the Sb of the ARG correlated with the bacterial invasion of epithelial cells pathway was 0.47. Large standard deviations indicate that more ARGs or disease pathway genes are involved in the canonical variable. Additionally, these canonical correlations show a certain level of Pearson correlation, which determines coexpression. This result suggests that most of the genes in disease pathways are coexpressed with most of the ARGs at the whole pathway level. For example, renal cell carcinoma pathway showed coexpression with ARGs with r = 0 5, Sa = 0 26, and Sb = 0 21. It indicates that 26% ARG genes were expressed with 21% renal cell carcinoma pathway genes. Rather than find a single gene correlated with ARGs, CCA could identify a group of genes in certain pathways correlated with ARGs. This indicate more significant and higher potential than a single gene. These pathways include renal cell carcinoma, endometrial cancer, bladder cancer, acute myeloid leukemia, non-small-cell lung cancer, asthma, autoimmune thyroid disease, allograft rejection, graft-versus-host disease, primary immunodeficiency, nicotine addiction, type I diabetes mellitus, bacterial invasion of epithelial cells, pathogenic Escherichia coli infection, and malaria. The coexpression identified from CCA is preferred to describe a general relationship since the transcriptome datasets include not only HIV-related experiments but also diverse biological experiments. Due to no limitation of samples in datasets, diseases in other human organs could be characterized as coexpressed pathway with AIDS. This provide a possibility to establish a precise experiment to determine this relationship.

Coexpressed ARGs and Human Disease Pathways. As shown in
Among these diseases, asthma is prevalent in populations with HIV infections because 20% of individuals infected with HIV have asthma in some clinical investigations [60]. There is evidence to support the hypothesis that asthma is one of the major causes of morbidity and mortality in HIV patients during antiretroviral therapy [61]. Recently, some studies have indicated that endometrial cancer has a favorable risk post-HIV infection [62]. One researcher found that some HIV drugs could inhibit in vitro bladder cancer migration and invasion [63]. In some clinical cases, HIV MDS transformed to acute myeloid leukemia [64]. A famous HIVpositive patient was healed of acute myeloid leukemia by allogeneic hematopoietic cell transplantation from a graft that carried the HIV-resistant CCR5 mutation [65]. The association between autoimmune thyroid disease and HIV infection has been suggested by several studies [66]. Although the HIV-related immunodeficiency state does not cause allograft rejection, anti-CD4 antibodies have considerable functions in the treatment of allograft rejection and the blockade of HIV infection [67,68]. Recently, studies found that patients with HIV HAART-associated lipodystrophy syndrome had an increased risk of diabetes [69]. The impaired immune response to pneumococcal antigen pneumolysin due to HIV infection facilitates bacterial invasion [70]. Finally, HIV and AIDS are well-known subject diseases followed by malaria [71].

Conclusion
In this study, the correlation between ARGs and disease pathways genes expression were analyzed by CCA. The results showed that among ARGs, HLA-B, HLA-A, MH9, ZNED1, IRF1, TLR8, TSG101, NCOR2, and GML are the most significant genes correlated with other disease pathway genes. They are potential cross-links between AIDS and other diseases. Furthermore, gene pathways involved in several diseases such as asthma, autoimmune thyroid disease, and malaria were identified as an integrated pathway correlated with integrated ARGs. It suggests the risk of these diseases as AIDS complicating disease.

Conflicts of Interest
Authors declare no financial interest related to this work.

Authors' Contributions
Yahong Chen, Jinjin Yuan, and Xianlin Han contributed equally to this work.