Inequalities and Duality in Gene Coexpression Networks of HIV-1 Infection Revealed by the Combination of the Double-Connectivity Approach and the Gini's Method

The symbiosis (Sym) and pathogenesis (Pat) is a duality problem of microbial infection, including HIV/AIDS. Statistical analysis of inequalities and duality in gene coexpression networks (GCNs) of HIV-1 infection may gain novel insights into AIDS. In this study, we focused on analysis of GCNs of uninfected subjects and HIV-1-infected patients at three different stages of viral infection based on data deposited in the GEO database of NCBI. The inequalities and duality in these GCNs were analyzed by the combination of the double-connectivity (DC) approach and the Gini's method. DC analysis reveals that there are significant differences between positive and negative connectivity in HIV-1 stage-specific GCNs. The inequality measures of negative connectivity and edge weight are changed more significantly than those of positive connectivity and edge weight in GCNs from the HIV-1 uninfected to the AIDS stages. With the permutation test method, we identified a set of genes with significant changes in the inequality and duality measure of edge weight. Functional analysis shows that these genes are highly enriched for the immune system, which plays an essential role in the Sym-Pat duality (SPD) of microbial infections. Understanding of the SPD problems of HIV-1 infection may provide novel intervention strategies for AIDS.


Introduction
Gene coexpression networks (GCNs), which provide a system-level understanding of the functionality of genes, have been used for the pathogenesis research of various diseases, including Alzheimer's disease [1,2], cardiac hypertrophy and failure [3], obesity [4], and schizophrenia [5]. In the GCN, each node represents a gene, and the edge links two coexpressed genes. The edge weight is usually determined with the similarity of gene expression profiles using the Pearson correlation coefficient (PCC) method [6][7][8]. The connectivity of a gene is typically defined as the number of its corresponding edges and can be decomposed into two components: positive and negative connectivity, according to the algebraic sign of PCC value [6]. Statistical analysis of the inequality and duality properties of GCNs is extremely valuable for discovering novel biological insights [8][9][10][11][12].
Microbial infections, including HIV-1/AIDS, always involve symbiosis (Sym) and pathogenesis (Pat), which are the two sides of the same story [13]. The inequalities or imbalance in the Sym-Pat duality may be the key problems in microbial infection. HIV-1/AIDS is a very complex disease affecting millions of individuals throughout the world. Although substantial progress has been made in the fighting against this disease since 1981, the mechanisms of HIV-1 infection are still not fully understood [14][15][16]. In this study, we propose to define the inequalities and duality in gene expression patterns of HIV-1 infection using the Gini's method. Based on a previously published microarray dataset [17], we analyzed GCNs of the uninfected subjects and HIV-1-infected patients at the acute, the asymptomatic, and the AIDS stages. With the Gini's method, we quantified the inequalities of connectivity and edge weight in these HIV-1 stage-specific GCNs. The analysis results show that there 2 Journal of Biomedicine and Biotechnology are significant differences between positive and negative coexpression links in these GCNs. With the proposed permutation test method, we further considered the changes of the Gini coefficient of positive and negative edge weight (denoted as, ΔG + and ΔG − , resp.) in two different GCNs. We finally identify a set of genes with significant ΔG + or ΔG − among GCNs of the uninfected subjects and HIV-1 patients at three different stages (i.e., the acute, the asymptomatic, and the AIDS stages). These genes might be highly involved in the pathogenesis of HIV-1 disease. More importantly, several patterns of duality in inequalities in GCNs are also revealed with the Gini's method. Some duality patterns might be related to the Sym-Pat duality (SPD) in HIV-1 infection [13].

Microarray Dataset.
The microarray dataset used in this study is a published HIV-1 microarray dataset (Gene Expression Omnibus GEO GSE16363), which contains Affymetrix gene expression profiles of human lymphatic tissues from the uninfected (unin) subjects and infected patients at the different stages of HIV-1 infection (the acute (acut), the asymptomatic (asym), and the AIDS stages). In total, this dataset consists of 52 samples measuring 54630 probe sets. Details about this dataset are available in the original paper [17]. The differences in gene expression between different settings were analyzed using the twosample t test and fold change methods. With the criteria of P-value ≤ 0.05 and fold change ≥ 1.7, 962 probes with significantly different expression have already been picked up by Li and colleagues [17]. These probes were further grouped into several functional categories with the annotation information from the NetAffx Analysis Center (http://www.affymetrix.com/analysis/index.affx), Ingenuity Pathways Analysis (Ingenuity Systems, http://www.ingenuity.com/), and literature examination. To avoid underestimate inequalities in GCNs, we further removed the probes which were not annotated with Entrez gene identifiers, or were mapped to multiple Entrez gene identifiers. We finally obtained 908 probes (704 genes) for the construction of GCNs.

GCN Construction.
With the log2-transformed gene expression values of these 704 genes, we constructed four GCNs (denoted as N unin , N acut , N asym , and N AIDS ) for the uninfected subjects and three different stages (the acute, asymptomatic and AIDS) of HIV-1-infected patients, respectively. The PCC method is used to compute the similarity of expression profiles between any pair of genes. Take the gene expression from the uninfected subjects, for example, the PCC value between genes A and B can be computed: where a i represents the log2-transformed gene expression of gene A in the ith subject, whereas b i is the log2transformed gene expression of gene B in the ith subject.
a m represents the mean of log2-transformed gene expression of gene A, and b m represents the mean of log2transformed gene expression of gene B. The significance level of PCC value is estimated with the statistic result of t = PCC (n − 2)/(1 − PCC · PCC) under the Student's tdistribution with df = n − 2 (n is the sample number). The PCC value is assigned as the edge weight of these two genes.
For genes with multiple probes, only the highest absolute value of PCC is chosen for the edge weight.

Inequality
Measurements. The inequalities in GCNs are measured with the Gini's method, which has been commonly used in the economics and social science [18][19][20]. One of the basic measures of the Gini's method is the Gini coefficient (also known as Gini index), which has been well defined for quantifying variable inequalities in a population. For a given variable X, the Gini coefficient can be computed with the formula [21] where n (n ≥ 2) is the number of considered variable in the population, and X (i) is the ith value of considered variable sorted in increasing order, 0 ≤ X (1) ≤ X (2) ≤ · · · ≤ X (n) . The Gini coefficient can be ranged from 0.0 (complete equality) to 1.0 (complete or absolute inequality). We assigned the Gini coefficient to be 0.0 if n is one. In this study, the Gini coefficient was used to measure several kinds of inequalities in GCNs, such as the positive and negative edge weight inequality of each gene in the GCN, the positive and negative connectivity inequality of the whole GCN. For the positive edge weight inequality, the variable X is the positive edge weight of the analyzed gene in the GCN. While for the negative edge weight inequality, the variable X should be the absolute value of negative edge weight. For the positive (or negative) connectivity inequality, the variable X is the positive (or negative) connectivity of the analyzed genes. As referred in the previous section, the connectivity in the GCN includes two components: the positive connectivity and the negative connectivity. The contribution of the positive and the negative connectivity to the overall inequality in the GCN can be quantified with the Gini correlation [22]. Let (P i , N i ) represent the positive and the negative connectivity of the ith gene in a given GCN. The Gini correlation of the positive connectivity (R p ) can be calculated with the following formula [23] where k is the number of analyzed genes, P (i) and P [i] are obtained by two different ways. For P (i) , the positive connectivities of analyzed genes are firstly sorted in an ascending order, then the P (i) is used to represent the ith positive connectivity sorted in this order. Whereas for P [i] , the connectivities of analyzed genes are firstly sorted in an increasing order, then the P [i] is used to represent the concomitant positive connectivity of ith connectivity.  The Gini correlation can be ranged from −1.0 to 1.0. If the Gini correlation is higher than zero, the positive connectivity increases the overall inequality of connectivity in GCNs. Otherwise if the Gini correlation is lower than zero, the positive connectivity decreases the overall inequality of connectivity in GCNs. The Gini correlation of negative connectivity (R N ) can also be calculated similarly with the formula (3). (N 1 and N 2 ), the ΔG + and ΔG − can be, respectively, computed with the following formulas:

Estimation of Significance Levels of ΔG + and ΔG − . For a given gene i in two GCNs
where G + (N j , i) and G − (N j , i) represent the Gini coefficients of positive and negative edge weight in the N j ( j = 1 or 2), respectively. In this study, N 1 is the N unin , and N 2 could be N acut , N asym , or N AIDS . Genes with significant ΔG + or ΔG − might play important roles in the pathogenesis of HIV-1 infection. Here we utilized a formal permutation test method to determine the statistical significance of ΔG + and ΔG − . Take the N unin and N acut for example, we firstly generated 2000 randomized GCNs for the uninfected (unin) subjects and the HIV-1infected patients at the acute stage (acut), respectively. The expression values of genes in randomized N unin (or N acut ) were randomly selected from all the gene expression values of the uninfected subjects (or the patients at the acute stage) on the chip. We then, respectively, obtained 2 001 000 (2000 * (2000 + 1)/2) permutations of ΔG + (N acut /N unin ) and ΔG − (N acut /N unin ). We considered the ΔG + (or ΔG − ) significantly changed if the observed value above 0.5% (or below 99.5%) of permutations (two-sided P-value < 0.01). The significance level of ΔG + and ΔG − for genes changing from N unin to N asym (or N AIDS ) can also be similarly estimated with this formal permutation test method.  statistical analysis of the positive and negative connectivity in these GCNs would be helpful for further understanding the pathogenic mechanisms of HIV-1 infection.

Connectivity Inequality in GCNs of HIV-1 Infection.
In this study, we statistically analyzed the inequalities of the positive and negative connectivity with the Gini coefficient measure. According to the results shown in Figure 2(a), we find that the there are also remarkable differences between the Gini coefficients of positive and negative connectivity in N acut and N asym . Furthermore, the dynamic changes in the negative connectivity inequalities are different from those in the positive connectivity inequalities during HIV-1 infection. From the N unin to N AIDS , the Gini coefficient of negative connectivity is firstly increased from 0.39 to 0.70, and then decreased to 0.33 while the Gini coefficient of positive connectivity is firstly decreased from 0.34 to 0.19, and then increased to 0.31 (Figure 2(a)). The differences in the dynamic changes in the Gini coefficients of positive and negative connectivity are also observed for genes with different functions (Supplementary material is available online at doi:10.1155/2011/926407 ( Figure S1)).
With the Gini correlation measure, we further quantified the contribution of positive and negative connectivity to the overall inequality of connectivity in GCNs (Figure 2(b)).

Edge Weight Inequality in GCNs of HIV-1 Infection.
The differences between positive and negative coexpression links are also revealed by analyzing the edge weight inequality with the Gini's method (Figure 3). With the proposed permutation test method, we further identified a set of genes with significant ΔG + or ΔG − between GCNs of the uninfected subjects and infected patients at different stages ( Figure 4). Compared with the number of gene with significant ΔG + , the number of genes with significant ΔG − is relatively large, also indicating the differences in inequality between positive and negative coexpression links.
Further investigating the function information of genes with significant ΔG + or ΔG − , we find that they are enriched with the immune genes ( Figure 5). Among these immune genes, 5 immune activation genes (e.g., WDHD1, CDC45L, Journal of Biomedicine and Biotechnology  [24,25], indicating that these selected genes might be highly involved in HIV-1 infection.

Discussion
This is the first report on the analysis of the inequalities of dual connectivity and edge weight in GCNs of HIV-1 infection using the PCC-based double-connectivity approach [6] and Gini's method [21,22]. We not only found the differences between the uninfected subjects and patients at different stages of HIV-1 infection at a system level, but also identified a set of genes which might be highly involved in HIV-1 infection. These results also demonstrate the importance of the inequalities in GCNs for the analysis of HIV-1 disease. Furthermore, most importantly, changes in duality patterns are revealed in this study (Figures 1-4), suggesting that the inequalities or imbalance in SPD may contribute to the pathogenesis of HIV-1/AIDS. The SPD, which is extending along the dynamic continuum from antagonism to cooperation, is the most common fundamental feature of microbial infections [13]. When the Sym is much more dominant than Pat, the relationship between host and microbial community is cooperative. When the Pat is much more dominant than Sym, the relationship is antagonistic. Therefore, the Sym and Pat are the two sides of the same coin in the microbial infections, which reflect the relationships between microorganisms and hosts. About 1% of the total HIV-1-infected people in the world (long-term nonprogressors) remain high CD4+ and CD8+ T-cell counts without progressing to AIDS [26]. Natural infection of Simian immunodeficiency viruses (SIVs) of African nonhuman primates also does not progress to AIDS [27]. The benign nature of HIV infection in the long-term nonprogressors and SIV infection in the natural hosts suggests that there is a good tethering connection between Sym and Pat. The immune system plays an essential role in the modulation of SPD [13]. The immune system has a double-sided function. On one hand, it protects the host against the invasion of microbial pathogens. But, on the other hand, the immune system imbalance may cause tissue damage and disturbance of microbiota. The gut microbial translocation and persistent immune activation leading to a progressive depletion of Th17+ and CD4+ cells are the key contributing factors to drive HIV-1 disease progression [28,29]. We, therefore, further considered the correlation between expression values of immune genes and the CD4+ T cell count. The average PCC values of immune activation and immune defense genes at HIV-1 infection stages are shown in Figure 6. For immune activation genes, the average PCC values of the HIV-1 infected patients at different stages are higher than those of uninfected subjects, and higher than zero (Figures 6(a) and 6(c)). While for immune defense genes, the average PCC values at HIV-1 infected stages (except the average PCC value of genes with significant ΔG + at the AIDS stage) are lower than zero (Figures 6(b) and 6(d)). Most interestingly, the duality  Figure 6: Average Pearson correlation between gene expression and CD4+ T cell count. "HIV-1 infected stage" indicates immune activation (or defense) genes at the HIV-1 infected stages (i.e., the acute, the asymptomatic and the AIDS stage). "Control" represents the corresponding immune genes in the HIV-1-uninfected subjects.
patterns of the changes in immune genes are opposite or significantly different between the patients at the acute and AIDS stages. Immune activation genes with significant ΔG + and ΔG − show a highly positive correlation with CD4 cell counts at the acute and AIDS stages, respectively. However, immune defense genes with significant ΔG + exhibit an opposite correlation with CD4 cell counts at the acute (negative) and AIDS (positive) stages, respectively. These findings suggest that these immune activation and defense genes may play important roles in the pathogenesis of HIV-1/AIDS. Concurring with the current report, some of those genes, including C1QBP (p32), CD28, CD44, APOBEC3F (A3F), and ISG15, have been known to contribute to the pathogenesis of this disease [25,[30][31][32][33]. Further studies of those genes should be enabled to gain more insights into the HIV/AIDS problems.

Conclusion
This study provides a novel view of coexpression network characteristics in HIV-1 infection. The selected genes might be highly involved in the pathogenesis of HIV-1 infection. Our results also indicate that there might be a duality in the HIV infection. These results also show the effectiveness of GCN analysis and the Gini's method in investigating the mechanisms of HIV infection.