The Variation of Transcriptomic Perturbations is Associated with the Development and Progression of Various Diseases

Background Although transcriptomic data have been widely applied to explore various diseases, few studies have investigated the association between transcriptomic perturbations and disease development in a wide variety of diseases. Methods Based on a previously developed algorithm for quantifying intratumor heterogeneity at the transcriptomic level, we defined the variation of transcriptomic perturbations (VTP) of a disease relative to the health status. Based on publicly available transcriptome datasets, we compared VTP values between the disease and health status and analyzed correlations between VTP values and disease progression or severity in various diseases, including neurological disorders, infectious diseases, cardiovascular diseases, respiratory diseases, liver diseases, kidney diseases, digestive diseases, and endocrine diseases. We also identified the genes and pathways whose expression perturbations correlated positively with VTP across diverse diseases. Results VTP values were upregulated in various diseases relative to their normal controls. VTP values were significantly greater in define than in possible or probable Alzheimer's disease. VTP values were significantly larger in intensive care unit (ICU) COVID-19 patients than in non-ICU patients, and in COVID-19 patients requiring mechanical ventilatory support (MVS) than in those not requiring MVS. VTP correlated positively with viral loads in acquired immune deficiency syndrome (AIDS) patients. Moreover, the AIDS patients treated with abacavir or zidovudine had lower VTP values than those without such therapies. In pulmonary tuberculosis (TB) patients, VTP values followed the pattern: active TB > latent TB > normal controls. VTP values were greater in clinically apparent than in presymptomatic malaria. VTP correlated negatively with the cardiac index of left ventricular ejection fraction (LVEF). In chronic obstructive pulmonary disease (COPD), VTP showed a negative correlation with forced expiratory volume in the first second (FEV1). VTP values increased with H. pylori infection and were upregulated in atrophic gastritis caused by H. pylori infection. The genes and pathways whose expression perturbations correlated positively with VTP scores across diseases were mainly involved in the regulation of immune, metabolic, and cellular activities. Conclusions VTP is upregulated in the disease versus health status, and its upregulation is associated with disease progression and severity in various diseases. Thus, VTP has potential clinical implications for disease diagnosis and prognosis.


Introduction
With the recent development of next-generation sequencing (NGS) technologies, a substantial number of multiomics data associated with various diseases have been produced, including cancer, neurological disorders, cardiovascular disease, respiratory disease, digestive system disease, metabolic dis-ease, endocrine disease, kidney and urinary system disorders, and infectious disease. In a previous study [1], we developed an algorithm, termed DEPTH, to quantify the variation of transcriptomic perturbations (VTP) in cancer, namely intratumor heterogeneity. We found that VTP value was significantly higher in cancer than in normal controls. Moreover, VTP values increased with cancer advancement, and its increase were associated with worse clinical outcomes in cancer patients [1]. In this study, we generalized this algorithm to a wide variety of diseases and explored the association between VTP and prognosis-associated clinical features. The disease types we analyzed included neurological disorders, infectious diseases, cardiovascular diseases, respiratory diseases, liver diseases, kidney diseases, digestive diseases, and endocrine diseases. We compared VTP values between the disease state and normal controls and analyzed correlations between VTP and disease progression or severity.

Methods
2.1. Algorithm. The algorithm is described as follows: given a transcriptome dataset, which involves g genes and m disease samples and n normal control samples; the variation of transcriptomic perturbations (VTP) of a disease sample DS is defined as where exðG i , DSÞ indicates the expression value of gene G i in DS, and exðG i , NS j Þ indicates the expression value of G i in the normal sample NS j .

Datasets.
We downloaded transcriptome datasets for various diseases from the NCBI Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) and analyzed these datasets with the algorithm. The datasets were associated with various types of diseases, including neurological disorders (Alzheimer's disease (AD) and schizophrenia (SCZ)), infectious diseases (COVID-19, acquired immune deficiency syndrome (AIDS), hepatitis B virus (HBV) infection, tuberculosis (TB), and malaria), cardiovascular diseases (acute myocardial infarction, dilated cardiomyopathy, idiopathic or ischemic cardiomyopathy, and heart failure), respiratory diseases (chronic obstructive pulmonary disease), liver diseases (chronic hepatitis B and liver cirrhosis), kidney diseases (nephrotic syndrome, uremia, focal segmental glomerulosclerosis and glomerular disease), digestive diseases (inflammatory bowel disease and helicobacter pylori infection), and endocrine diseases (diabetes). A description of these datasets is shown in Table 1.

Data Preprocessing.
For RNA-Seq gene expression values, we normalized them by the TPM method. For microarray gene expression values, we used the normalization methods recommended by related platforms. A description of the normalization methods for the datasets analyzed was provided in Supplementary Table S1. All normalized expression values were transformed by log 2ðx + 1Þ before subsequent analyses.

Statistical Analysis and Visualization.
We employed the Mann-Whitney U test (one-tailed) to compare VTP values between two classes of samples, and the Kruskal-Wallis test to compare VTP values among more than two classes of samples. We utilized the Spearman method to assess the correlation between VTP values and other variables and reported the correlation coefficients (ρ) and P values. To correct for P values in multiple tests, we utilized the Benjamini and Hochberg method to calculate the false discovery rate (FDR) [2]. All statistical analyses were performed in the R programming environment (version 4.1.2). The R packages "ggplot2", "ggpubr", and "ggstatsplot" were used for data visualization.

Identifying Genes and Pathways whose Expression
Perturbations Have Significant Positive Correlations with VTP across Diverse Diseases. In each dataset, we identified the genes satisfying that jΔðG i , DS, NS j Þj significantly and positively correlated with VTP values in all disease samples using a threshold of the Spearman correlation test FDR < 0:1. For each disease with n datasets analyzed, we identified the genes which satisfied the prior condition at least n − 1 datasets. These genes were defined as the genes having significant positive correlations of expression perturbations with VTP in specific diseases. Among them, the genes identified in common in at least 5 specific diseases were defined as the genes whose expression perturbations had significant positive correlations with VTP across diseases. By inputting the genes associated with VTP across diseases into the GSEA web tool [3], we obtained the KEGG pathways [4] having 2 Disease Markers

Neurological
Disorder. AD is a progressive neurodegenerative disease [5]. In four transcriptome datasets for AD (GSE63063 [6], GSE118553 [7], GSE140831, and GSE84422 [8]), the VTP values were significantly larger in AD patients than in normal controls (P < 0:001) (Figure 1(a)). In GSE84422, VTP values were significantly larger in define than in possible or probable AD (P = 0:02) (Figure 1(a)). In addition, we analyzed correlations between VTP and several measures of the degree of AD progression in GSE84422, including clinical dementia rating, Braak neurofibrillary tangle score, average neuritic plaque density, sum of consortium to establish a registry for Alzheimer's diseases (CERAD) rating scores in multiple brain regions, and sum of neurofibrillary tangles density in multiple brain regions. Notably, VTP displayed significant positive correlations with these measures (P < 0:01) (Figure 1(a)). Mutations in PSEN2 may result in early-onset AD. In GSE158233 [9], Barthelson et al. generated transcriptomes of two-types of PSEN2mutated (psen2 T141_L142delinsMISLISV and psen2 N140fs ) lines of zebrafish brains and transcriptomes of their wild type siblings. We observed that VTP values were remarkedly greater in PSEN2-mutated zebrafish brains than in their wild type controls (P < 0:03) (Figure 1(a)). Schizophrenia (SCZ) is a severe psychotic disorder characterized by relapsing incidences of psychosis [10]. In four transcriptome datasets (GSE38484 [11], GSE87610 [12], GSE93577 [13], and GSE93987 [14]) generated from SCZ patients and normal controls, VTP values were consistently greater in SCZ patients than in normal controls (P < 0:02) (Figure 1(b)).
Taken together, these results indicate that VTP is augmented in certain neurological disorders (such as AD and SCZ) and grows with disease progression.
Collectively, these results support that VTP is upregulated in infectious diseases and increases with disease severity.

Respiratory Disease.
Respiratory diseases are the diseases affecting the organs and tissues involved in gas exchange in air-breathing animals [46]. Some of the most common respiratory diseases include obstructive lung disease, restrictive lung disease, and respiratory tract infections. In many tran-scriptome datasets of respiratory diseases, such as GSE112811, GSE42057 [47], GSE55962 [48], GSE103174, and GSE151052, VTP values were significantly larger in patients than in normal controls (P < 0:05) (Figure 4(a)). In chronic obstructive pulmonary disease (COPD), forced expiratory volume in the first second (FEV1) and ratio of FEV1 to forced vital capacity (FVC) are crucial in evaluating the severity of disease [49]. In GSE103174, which is a transcriptome dataset for COPD, VTP showed negative correlations with both FEV1 (P = 0:018; ρ = −0:39) and FEV1/FVC (P = 0:067; ρ = −0:31) (Figure 4(b)). The transcriptome dataset GSE32147 [50] is gene expression profiles in lung  13 Disease Markers samples of rats exposed to crystalline silica. We observed that VTP values increased steadily with the progression of silica-induced pulmonary toxicity: 1 week of exposed to crystalline silica < 2 weeks < 4 or 8 weeks < 16 weeks (Figure 4(c)).
Collectively, these results support that VTP is upregulated in respiratory diseases and is negatively associated with their clinical outcomes.

Digestive Disease.
In two transcriptome datasets (GSE16879 [56] and GSE27411 [57]) for digestive disease, VTP values were significantly larger in patients than in normal controls (P < 0:01) (Figure 7(a)). GSE27411 is a transcriptome dataset for patients with different stages of Helicobacter pylori (H. pylori) infection. Interestingly, we found that VTP values were significantly different among different stages of H. pylori infection and followed the pattern: without current H:pylori infection < H:pylori − infected without corpus atrophy < with current or past H:pylori − infection with corpus-predominant atrophic gastritis (Figure 7(b)). These results collectively support that VTP is upregulated in digestive diseases and increases with disease severity.

Endocrine Disease.
Diabetes is a metabolic disease that causes high blood sugar to cause many chronic health problems, such as cardiovascular diseases, vision damage, and kidney disease [58]. In two transcriptome datasets (GSE9006 [59] and GSE19420 [60]) for diabetes, VTP values were significantly greater in patients than in normal controls (P < 0:05) (Figure 8(a)). Moreover, in the transcriptome dataset GSE35725 [61] for diabetes, VTP values were significantly greater in recent onset diabetes patients than in longstanding diabetes patients (P < 0:001) (Figure 8(b)).

Genes and Pathways whose Expression Perturbations
Correlate Positively with VTP across Diseases. We identified 14 Disease Markers 369 genes whose expression perturbations showed significant positive correlations with VTP values across diseases (Supplementary Table S2). Notably, many of these genes are involved in immune regulation (such as CD2, CD247, CD300A, CD2AP, CD28, CD47, CD53, CD7, and CXCR2), cell cycle (such as CCND2, CDK4, and SKP2), and metabolism (such as LDHA, LDHB, PDHA1, GLO1, and ME2). Furthermore, we identified 58 KEGG pathways showing significant positive correlations of expression perturbations with VTP across diseases. Notably, many of these pathways are immune pathways, including natural killer cell-mediated cytotoxicity, T cell receptor signaling, B cell receptor signaling, chemokine signaling, cell adhesion molecules, Fc gamma R-mediated phagocytosis, leukocyte transendothelial migration, Fc epsilon RI signaling, hematopoietic cell lineage, Toll-like receptor signaling, Jak-STAT signaling, cytokine-cytokine receptor interaction, intestinal immune network for IgA production, and NODlike receptor signaling (Figure 9). The 58 pathways also included many metabolism-related pathways, such as pyruvate metabolism, inositol phosphate metabolism, propanoate metabolism, cysteine and methionine metabolism, fructose and mannose metabolism, riboflavin metabolism, β-alanine metabolism, and nicotinate and nicotinamide metabolism. Moreover, many pathways regulating cell growth and division were included in the list of the 58 pathways. Such pathways included MAPK signaling, Wnt signaling, calcium signaling, ErbB signaling, oocyte meiosis, and cell cycle. In addition, the 58 pathways also included many specific diseases-associated pathways, such as leishmania infection, AD, vibrio cholerae infection, epithelial cell signaling in Helicobacter pylori infection, amyotrophic lateral sclerosis, viral myocarditis, pathogenic Escherichia coli infection, arrhythmogenic right ventricular cardiomyopathy, pancreatic cancer, non-small-cell lung cancer, acute myeloid leukemia, colorectal cancer, glioma, and chronic myeloid leukemia.

Discussion
Although transcriptomic data have been widely applied to biomedical science, few studies have explored the association between transcriptomic perturbations and disease development and progression in a wide variety of diseases. For the  first time, we investigated the association between the VTP and various diseases' onset and progression. Our analysis suggests that VTP values are upregulated in various diseases relative to their normal controls, and that VTP values increase with disease progression. Thus, this analysis uncovers a common characteristic of transcriptomic perturbations across various human diseases. In fact, the VTP measure reflects the asynchronous degree of transcriptomic perturbations in a disease status relative to the health status.
Our results indicate that the asynchronous degree of transcriptomic perturbations is positively associated with disease progression or severity. That is, the higher asynchronous degree of transcriptomic perturbations suggests more unfavorable clinical outcomes in disease. This is consistent with the findings in cancer [1]. An intriguing question is whether the variation of perturbations in other molecules, such as genome, proteome, and metabolome, has similar associations with disease development and progression.
We identified numerous genes and pathways whose expression perturbations correlated positively with VTP scores across diseases. These genes and pathways are mainly involved in the regulation of immune, metabolic, and cellular activities. It is justified since deregulated immune, metabolic, and cellular activities have been associated with various diseases. Our data suggest that the disordered perturbations of the molecules modulating immune, metabolic, or cellular activities are associated with the development and progression of various diseases. Interestingly, by searching for the database of publicly available GWAS summary statistics (https://www.ebi.ac.uk/gwas/), we found that many of the 369 genes, which displayed significant expression perturbations' correlations with VTP values across diseases, had genetic variants that are statistically associated with the risk of the diseases we analyzed (Supplementary Table S3). For example, there were 16 genes, including RDX, PIP4K2A, PILRA, LPXN, LILRB2, ITGAX, IQGAP2, FOXN2, CR1, CELF2, CDC42SE2, CD2AP, PDK4, PARP8, HSPA6, and BNIP3, whose genetic variants are statistically associated with the risk of AD. Six genes (TKT, TCF4, SWAP70, DDHD2, ARHGAP31, and LTB) showed significant associations of genetic variants with the risk of cardiovascular disease. Notably, FOXN2 had genetic variants statistically associated with the risk of both AD and SCZ, and NOTCH2 displayed genetic variants that are

16
Disease Markers statistically associated with the risk of both endocrine disease and kidney disease. These data support the relevance of many of these genes with the diseases. This study has several limitations. First, although we have analyzed numerous datasets for various diseases, more datasets are needed to be analyzed to bolster the validity of this analysis. Second, the mechanism underlying the association between VTP and disease development and progression needs to be explored. Finally, the prospect of translating the present findings into clinical practice remains unclear. Nevertheless, our further study is to implement further investigations to overcome these limitations.

Conclusions
VTP is upregulated in the disease relative to health status, and its upregulation is associated with disease progression and severity in various diseases. The molecules whose abundance perturbations correlate positively with VTP are mainly involved in the regulation of immune, metabolic, and cellular activities. Thus, VTP has potential clinical values in disease diagnosis and prognosis.

Data Availability
All data associated with this study are available within the paper and the database of NCBI Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/).

Ethical Approval
Ethical approval and consent to participate were waived since we used only publicly available data and materials in this study.

Consent
Not applicable.

Conflicts of Interest
The authors declare that they have no competing interests.