ATAXIC: An Algorithm to Quantify Transcriptomic Perturbation Heterogeneity in Single Cancer Cells

The single-cell RNA sequencing (scRNA-seq) has recently been widely utilized to quantify transcriptomic profiles in single cells of bulk tumors. The transcriptomic profiles in single cells facilitate the investigation of intratumor heterogeneity that is unlikely confounded by the nontumor components. We proposed an algorithm (ATAXIC) to quantify the heterogeneity of transcriptomic perturbations (TPs) in single cancer cells. ATAXIC calculated the TP heterogeneity level of a single cell based on the standard deviations of the absolute z-scored gene expression values for tens of thousands of genes, reflecting the asynchronous degree of transcriptomic alterations relative to the central (mean) tendency. By analyzing scRNA-seq datasets for eight cancer types, we revealed that ATAXIC scores were likely to correlate positively with the enrichment scores of various proliferation and oncogenic signatures, DNA damage repair, treatment resistance, and unfavorable phenotypes and outcomes in cancer. The ATAXIC scores varied among different cancer types, with lung cancer and melanoma having the lowest average scores and clear cell renal cell carcinoma having the highest average scores. The low TP heterogeneity in lung cancer and melanoma could bestow relatively higher response rates to immune checkpoint inhibitors on both cancer types. In conclusion, ATAXIC is a useful algorithm to quantify the TP heterogeneity in single cancer cells, as well as providing new insights into tumor biology.


Introduction
Single-cell technology is rapidly evolving to analyze molecular characteristics at the single cell level [1]. In particular, the single-cell RNA sequencing (scRNA-seq) has shown its power in analyzing gene expression profiles in thousands of individual cells in one test [2,3]. A major advantage of the scRNA-seq technology is that it enables characterization of sophisticated biological processes at single-cell resolution to capture transcriptional snapshots of rare events [4,5]. us, this technology has recently been widely used to quantify transcriptomic profiles in different components of bulk tumors, including cancer cells, immune cells, stromal cells, and normal cells [6][7][8][9][10][11][12][13]. Apparently, scRNA-seq demonstrates the advantage of uncovering transcriptomic heterogeneity among different tumor cells as compared to bulk RNA sequencing, which measures the average expression of genes among different types of cells [14]. erefore, scRNA-seq has evolved into a popular tool to dissect cellular heterogeneity in an unbiased manner without requiring any prior information of the cell population [2,15,16]. Most of the scRNA-seqdata-based studies aimed to characterize the tumor microenvironment, the cross-talk among the different components of bulk tumors, and intratumor heterogeneity (ITH). Previous studies on bulk tumors have demonstrated that the transcriptomic perturbations (TPs) are observed in a wide variety of cancers and are heterogeneous within a tumor and across different tumors [17,18]. In fact, the heterogeneity of TPs is an intrinsic feature of many cancers and is due to the presence of different subclones within one tumor, each of which has its own unique features of gene expression profiles [19,20]. Because the heterogeneity of TPs provides the fuel for drug resistance, a precise measure of ITH resulting from it is essential for the development of effective therapies [21]. However, the investigation into the TP profiles in single cancer cells remains lacking to date.
In this study, we developed an algorithm (ATAXIC) to quantify the TP heterogeneity in single cancer cells. e TP heterogeneity score indicates the asynchronous degree of transcriptomic alterations relative to the central tendency in single cells. Using ATAXIC, we scored the TP heterogeneity for single cells from eight cancer types, including glioma, melanoma, lung cancer, prostate cancer, renal cell carcinoma (RCC), liver cancer, breast cancer, and sarcoma. By analyzing correlations of ATAXIC scores with various oncogenic signaling, clinical features, DNA damage repair signatures, and treatment responses in these cancer types, we demonstrated that the ATAXIC index likely increases with tumor proliferation, progression, invasion, and metastasis and thus is a risk factor for tumor development.

Data Collection and Preprocess.
We obtained the data of scRNA-seq gene expression profiles in single cells for eight caner types from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), including glioma [13], melanoma [12], lung cancer [10], prostate cancer [8], RCC [6], liver cancer [9], breast cancer [7], and sarcoma (https:// www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc�GSE131309). We excluded the patients with less than ten cancer cells reported in this study. Low-quality cells were discarded if the number of expressed genes was smaller than 300. Cells were also removed if their mitochondrial gene expression was larger than 10 percent. e numbers of patients and their cancer single cells for each cancer type are presented in Supplementary Table S1. We normalized all TPM normalized gene expression values by adding 1 and then log2 transformation. In addition, we downloaded the data of drug sensitivity of 578 cancer cell lines to 4686 compounds from the Dependency Map (DepMap) (https://depmap.org/ portal/download/). We obtained the marker genes of proliferation, differentiation, invasion, metastasis, stemness, and damage repair signatures from the CancerSEA database (http://biocc.hrbmu.edu.cn/CancerSEA/home.jsp).
We downloaded the pathway genes for the TGF-β, Wnt, PI3K-Akt, JAK-STAT, Notch, Hedgehog, mismatch repair, and homologous recombination pathways from KEGG [22]. e marker or pathway gene sets are given in Supplementary  Table S2.

ATAXIC Algorithm.
Given a normalized single cell gene expression matrix, which contains m genes and t samples (single cancer cells), the ATAXIC score of a single cell SC is defined as where ex (G i , SC) denotes gene G i expression level in SC and ex (G i , SS j ) denotes G i expression level in the single cell sample SS j . ATAXIC calculated the TP heterogeneity level of a single cell based on the standard deviations of the absolute z-scored gene expression values for tens of thousands of genes. In a single cancer cell, when most genes show close absolute zscored expression values, the single cancer cell will have a low ATAXIC score, namely, low TP heterogeneity level; otherwise, the single cancer cell will have a relatively high ATAXIC score. Hence, the ATAXIC score reflects the asynchronous degree of transcriptomic alterations relative to the central (mean) tendency normalized by standard deviation in all single cancer cells for all genes in the gene expression matrix (Figure 1). We present the R function for the ATAXIC algorithm at the GitHub (https://github.com/ XS-Wang-Lab/ATAXIC/) under a GNU GPL open-source license.

Enrichment Scores of Molecular Signature or Pathways.
For a molecular signature or pathway, we defined its enrichment score in a single cell as the average expression level of its marker or pathway genes in the single cell.

Correlations of ATAXIC Scores with Drug Sensitivity.
We assessed the Spearman correlation of ATAXIC scores with viability values to each of the 4686 compounds in 578 cancer cell lines. e ATAXIC scores of cancer cell lines were calculated based on their gene expression profiles. e significant correlations were identified using a threshold of false discovery rate (FDR) < 0.05.

Statistical Analysis.
Because ATAXIC scores did not follow the Gaussian distribution in most cases (Shapiro test, P < 0.05), we used the one-tailed Mann-Whitney U test to compare ATAXIC scores between two classes of samples. We evaluated correlations between ATAXIC scores and other variables using the Spearman method and reported the correlation coefficient (ρ) and adjusted P values. e adjusted P values were the FDRs calculated by the Benjamini and Hochberg method [23]. We performed all the statistical analyses in the R programming environment (version 4.0.2).

ATAXIC Scores Likely Correlate Positively with Proliferation and Differentiation Signatures in Cancer.
We defined the proliferation signature score in a single cell as the average expression levels of its marker genes [24]. We found that ATAXIC scores had a significant positive correlation with proliferation signature scores in 5 of the 9 breast cancers, 9 of the 10 gliomas, 9 of the 10 prostate cancers, 7 of the 8 RCCs, 8 of the 12 sarcomas, 7 of the 12 melanomas, 10 of the 20 lung cancers, and 13 of the 15 liver cancers (Spearman correlation, FDR < 0.05) (Figure 2(a)). Only in 1 sarcoma and 2 lung cancers, ATAXIC scores had a significant negative correlation with proliferation signature scores. e cell cycle is a process of cell growth and division. We found that ATAXIC scores were significantly and positively correlated with the enrichment scores of the cell cycle pathway in 7 breast cancers, 6 gliomas, 10 prostate cancers, 7 RCCs, 8 sarcomas, 4 melanomas, 10 lung cancers, and all the 15 liver cancers (Figure 2(b)). Only in 1 sarcoma, 2 melanomas, and 2 lung cancers, ATAXIC scores had a significant negative correlation with cell cycle scores. In addition, ATAXIC scores had a significant positive correlation with cell differentiation signature scores in most of these cancers, including 9 breast cancers, 10 gliomas, 10 prostate cancers, 7 RCCs, 10 sarcomas, 9 melanomas, 11 lung cancers, and 15 liver cancers (Figure 2(c)). Together, these results indicate that ATAXIC scores likely correlate positively with proliferation and differentiation potential in cancer.

ATAXIC Scores Likely Correlate Positively with DNA Damage Repair Signatures in Cancer.
We analyzed correlations of ATAXIC scores with DNA damage repair signatures in single cancer cells. First, we found that ATAXIC scores had a significant positive correlation with the scores of DNA damage in 7 breast cancers, 7 gliomas, 10 prostate cancers, 6 RCCs, 5 sarcomas, 5 melanomas, 10 lung cancers, and 15 liver cancers (Spearman correlation, FDR < 0.05) ( Figure 5(a)), compared to the significant negative correlation between them in 2 sarcomas, 1 melanoma, and 2 lung cancers. Second, ATAXIC scores were significantly and positively correlated with the enrichment scores of the mismatch repair pathway in 5 breast cancers, 7 gliomas, 10 prostate cancers, 5 RCCs, 6 sarcomas, 2 melanomas, 7 lung cancers, and 15 liver cancers ( Figure 5(b)), compared to the significant negative correlation between them in 1 sarcoma, 2 melanomas, and 2 lung cancers. ird, ATAXIC scores had a significant positive correlation with the enrichment scores of the homologous recombination pathway in 7 breast cancers, 8 gliomas, 10 prostate cancers, 4 RCCs, 9 sarcomas, 1 melanoma, 9 lung cancers, and 15 liver cancers ( Figure 5(c)), compared to the significant negative correlation between them in 1 sarcoma, 1 melanoma, and 3 lung cancers. e mismatch repair and homologous recombination are two key pathways involved in DNAdamage repair processing in cancer [25]. us, our results suggest that ATAXIC scores likely correlate positively with DNA damage repair signatures in cancer.

ATAXIC Scores Likely Correlate Negatively with Clinical
Outcomes in Cancer. We found that ATAXIC scores were significantly higher in metastatic than in primary tumor in sarcoma and lung cancer (one-tailedMann-WhitneyU test, P < 0.01) (Figure 6(a)). is is consistent with the positive association between ATAXIC scores and metastasis signature scores in cancer. In breast cancer and lung cancer, ATAXIC scores tended to be higher in late-stage than in early-stage tumors (P < 0.1) (Figure 6(b)). In RCC, ATAXIC scores were significantly higher in higher-grade (G4) than in lower-grade (G3) tumors (P < 0.001) (Figure 6(c)). In lung cancer and RCC, ATAXIC scores displayed marked correlations with treatment outcomes. For example, ATAXIC scores were significantly higher in progressive than in regressive or stable lung cancers after treatment (P < 0.001) (Figure 6(d)). ATAXIC scores were lower in the tumors treated with VEGF inhibitors than those not treated with them in RCC (P < 0.09) (Figure 6(e)). In addition, ATAXIC scores were lower in the tumors treated with immune checkpoint inhibitors (ICIs) than those not treated with them in RCC (P < 0.001) (Figure 6(e)). Within the RCC tumors treated with ICIs, ATAXIC scores were lower in the tumors responsive to ICIs versus the tumors not responsive to ICIs (P < 0.001) (Figure 6(f )). Overall, these results suggest that the ATAXIC TP heterogeneity is associated with unfavorable clinical outcomes in cancer. Interestingly, ATAXIC scores were significantly lower in smoker than in nonsmoker lung cancers (P < 0.001) (Figure 6(g)).
e variation of ATAXIC scores of single cells varied among these cancer types, with the largest value of 0.134 in prostate cancer and the smallest value of 0.019 in breast cancer (Figure 7(c)). Interestingly, breast cancer had a relatively large mean value but the smallest variation of ATAXIC scores. In contrast, lung cancer had the smallest mean value but a relatively large variation of ATAXIC scores. Again, these results indicate the intertumor heterogeneity among different cancer types.  Table S3).

ATAXIC Scores Likely Correlate Negatively with Drug
ese results indicate that ATAXIC scores likely have a significant negative association with drug sensitivity in cancer, supporting that tumor heterogeneity promotes cancer therapeutic resistance [21].

Discussion
For the first time, we developed an algorithm to evaluate the TP heterogeneity in single cancer cells. It should be noted that this is the first algorithm to measure the heterogeneity of gene expression perturbations in single cancer cells. We revealed that the TP heterogeneity was associated with proliferation and differentiation signatures, various oncogenic signaling, DNA damage repair, treatment resistance, and unfavorable phenotypes and outcomes in cancer. ese characteristics in single tumor cells are in line with those displayed in the bulk tumors [27]. Furthermore, we demonstrated that the TP heterogeneity of single cells varied    DNA damage repair deficiency often causes genomic instability to drive intratumor heterogeneity [28,29]. We observed the significant correlations of the single cell TP heterogeneity with DNA damage and its repair signaling pathways, such as mismatch repair and homologous recombination. It confirms the marked associations among DNA damage repair, genomic instability, and tumor heterogeneity at the single cell level. at is, DNA damage repair deficiency as well as genomic instability could be the major source of the TP heterogeneity in single cancer cells.
is study has several limitations. First, because there are relatively small numbers of cancer patient samples in most single scRNA-seq datasets, the analysis of correlations between the TP heterogeneity level and clinical parameters was restricted. Second, due to lack of the data on genomic alterations (such as somatic mutations and copy number alterations) in single cells, the analysis of correlations between the TP heterogeneity level and genomic alterations was missing. Finally, this study investigated only eight cancer types, although the analysis of a wider variety of cancers would strengthen the validity of conclusions.

Conclusions
ATAXIC is an algorithm to evaluate TP heterogeneity in single cancer cells. e TP heterogeneity by ATAXIC has significant associations with increased cell proliferation, oncogenic signaling, DNA damage repair, treatment resistance, and unfavorable outcomes in cancer.
us, ATAXIC is a useful algorithm to quantify the TP heterogeneity in single cancer cells and provide new insights into cancer biology as well as potentially valuable markers for cancer diagnosis and treatment.

Data Availability
e authors declare that all data supporting the findings of this study are available within the paper and its supplementary information files. e ATAXIC scores of single cells of eight cancer types are available at GitHub (https://github. com/XS-Wang-Lab/ATAXIC/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
QL performed the research and data analyses. QL provided help in manuscript preparation and data collection. XW conceived this study, designed the algorithm and analysis strategies, and wrote the manuscript. All the authors read and approved the final manuscript. Table S1: e numbers of patients and their cancer single cells in each cancer type. Table S2: e marker or pathway gene sets of the signatures and pathways analyzed in this study. Table S3: Significant correlations between the viability values and ATAXIC scores in the 578 cancer cell lines for 728 compounds. e Spearman correlation coefficients, P values, and adjusted P values (FDR) are shown. Figure S1 Spearman correlations between ATAXIC scores and the enrichment scores of the invasion signature in single cells from breast cancer (BC), glioma (GBM), prostate cancer (PC), renal cell carcinoma (RCC), sarcoma, melanoma, and lung cancer.

Supplementary Materials
e Spearman correlation coefficients and adjusted P values (FDR) are shown. Figure S2