Mechanisms of Core Chinese Herbs against Colorectal Cancer: A Study Based on Data Mining and Network Pharmacology

Colorectal cancer (CRC) is now the second most deadly cancer globally. Chinese herbal medicine (CHM) plays an indispensable role in CRC treatment in China. However, the core herbs (the CHs) in the treatment of CRC and their underlying therapeutic mechanisms remain unclear. This study aims to uncovering the CHs and their mechanisms of action of CRC treatment, applying data mining and network pharmacology approach. First, CHM prescriptions treating CRC were collected from clinical studies from the Chinese National Knowledge Infrastructure (CNKI) and MEDLINE databases, and the CHs were identified through data mining. Then, the bioactive compounds and the corresponding putative targets of the CHs were obtained from three traditional Chinese medicine (TCM) databases. CRC related targets were acquired from three disease databases; the overlapping targets between the CHs and CRC were identified as the therapeutic targets. Subsequently, functional enrichment analysis was performed to elucidate the mechanisms of the CHs on CRC. Moreover, networks were constructed to screen the major bioactive compounds and therapeutic targets. Finally, prognostic values of the major target genes were evaluated by survival analysis, and molecular docking simulation was performed to assess the binding affinity of key targets and major bioactive compounds. It came out that 10 the CHs from 113 prescriptions and 190 bioactive compounds with 118 therapeutic targets were identified. The therapeutic targets were mainly enriched in the biological progress of transcription, apoptosis, and response to cytokine. Various cancer-associated signaling pathways, including microRNAs, TNF, apoptosis, PI3K-Akt, and p53, were involved. Furthermore, 15 major bioactive compounds and five key target genes (VEGFA, CASP3, MYC, CYP1Y1, and NFKB1) with prognostic significance were identified. Additionally, most major bioactive compounds might bind firmly to the key target proteins. This study provided an overview of the anti-CRC mechanisms of the CHs, which might refer to the regulation of apoptosis, transcription, and inflammation.


Introduction
Colorectal cancer (CRC), including colon and rectal cancer, is the third most frequently diagnosed cancer and the second leading cause of cancer-related deaths worldwide [1]. Furthermore, the rising incidence of CRC at younger ages (<50 years-old) is a merging trend [2]. Hereditary accounts for approximately 10-20% of all CRC patients, while 60-65% of CRC cases arise from modifiable risk factors, such as smoking, red and processed meat intake dietary, excessive alcohol intake, and obesity [3]. Currently, surgical resection is still the only curative treatment for CRC patients. Nevertheless, due to its occult onset, most CRC patients are diagnosed in advanced stages when surgery is unavailable [4]. Another issue is that major complications occur on up to 15% patients, and postoperative recurrence and metastasis are quite common [4,5]. Neoadjuvant and adjuvant chemoradiotherapy is now generally accepted standard treatment for locally advanced CRC, with the aim of reducing recurrence and metastasis [6][7][8]. Disappointing, long-course chemoradiotherapy brings adverse reactions, such as myelosuppression, cumulative neuropathy, gastrointestinal tract reaction, and organs damage, which badly reduce the quality of life of the patients, and even leads to interruption of therapies [9]. Additionally, chemoradiotherapy is often confronted with treatment resistance [10].
Chinese herbal medicine (CHM) plays an indispensable role in integrative therapy of malignancies in China. It has been widely reported that CHM prescriptions could improve survival and quality of life of CRC patients through the following effects: (1) preventing tumorigenesis, suppressing tumor growth, and reducing metastasis and recurrence [11][12][13][14]; (2)increasing sensitivity and alleviating side effects of chemo-or radiotherapy; (3) relieving tumor-related symptoms or surgery complications, such as fatigue, pain, loss of appetite, diarrhea, nausea, and vomiting; and (4) improving immunity, lessening the damages induced by conventional treatments, and ameliorating bone marrow suppression [9,[15][16][17][18]. However, CHM prescriptions in the treatment of CRC differ a lot, due to varied educational backgrounds and personal clinical experiences of different traditional Chinese medicine (TCM) doctors. e underlying patterns or the core herbs (the CHs) of the prescriptions and the bioactive compounds and therapeutic mechanisms of the CHs are still in veil. e application of data mining and machine learning can help people better understand the patterns of herbs use from abundant clinical prescriptions [19]. CHM effects in a multicomponent, multitarget, and multipathway mode, which makes it unique superiority in treating complex diseases, but causes difficulty in clarifying mechanisms of action; meanwhile. Integrated systems biology, bioinformatics, and poly-pharmacology, network pharmacology provides an effective solution to uncover the synergistic effects and underlying mechanisms of multicomponent and multitarget agents through network analysis [20,21]. Molecular docking is an in silico structure-based method to predict ligand-target interactions at a molecular level, which has been widely used in drug discovery [22].
In the present study, the CHs in the treatment of CRC were identified through data mining from prescriptions in clinical studies. en, the bioactive compounds, the putative targets, and mechanisms of the CHs acting on CRC were investigated by network pharmacology approach. Finally, survival analysis and molecular docking simulation were performed to strengthen the results of network pharmacology analysis. A flowchart of this study is described in Figure 1.

Inclusion and Data Mining of CHM Prescriptions.
Clinical CHM prescriptions in the treatment of CRC were collected from studies from the Chinese National Knowledge Infrastructure (CNKI) and MEDLINE databases. e literature search was conducted in topic, title, and abstract, using the following search terms: ("traditional Chinese medicine") OR ("Chinese herb" * ) OR (decoction) OR (prescription) AND (colorectal cancer) AND (clinical); the timespan was set from database creation to 1st June 2020. e inclusion criteria for studies were as follows: (1) the first diagnosis of patients which was CRC, (2) clinical studies about oral CHM prescriptions treating CRC, combined with (chemo)radiotherapy or not, and (3) the prescriptions should be effective. Effectiveness is defined as statistically significant protective effects in the CHM treatment group, compared with the control group. e protective effects included but are not limited to the alleviation of cancerrelated symptoms and/or adverse reactions of (chemo)radiotherapy, prolongation of survival, improvement of quality of life, and immunity. e exclusion criteria were as follows: (1) review articles and laboratory experimental studies, (2) the constituent herbs in prescriptions not published in full, (3) studies in which the patients were treated with extract (s) of a single herb or Chinese patent medicine, and (4) the number of patients in any single group being less than 20. e CHM prescriptions were extracted from the eligible studies, and a database was formed. en, frequency analysis was performed with SPSS software (version 21.0, IBM Corp., Armonk, NY, USA). And association rule analysis was conducted by a priori algorithm with SPSS Clementine software (version 12.0, SPSS Inc., Chicago, IL, USA), to identify the CHs from all prescriptions. In this study, association rules were set under the condition of support degree ≥20% and confidence degree ≥50%.
Based on the TCMSP database, oral bioavailability (OB) ≥ 30% and drug-likeness (DL) ≥ 0.18 were set as the screening criteria of bioactive compounds, which were labeled as "Mol ID" in the TCMSP database in the further study [26]. e putative targets corresponding to the bioactive compounds were collected from the above three TCM databases, and all target names were converted into gene symbols using the UniProt database (https://www.uniprot. org/) for the subsequent study.

Functional Enrichment Analysis.
To elucidate the therapeutic mechanisms of the CHs on CRC, gene ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of therapeutic targets were performed using the 2 Evidence-Based Complementary and Alternative Medicine Database for Annotation, Visualization, and Integrated Discovery database (David, https://david.ncifcrf.gov/) [30]. e results with P value <0.05 and false discovery rate (FDR) < 0.05 were considered significant, which were visualized using the GraphPad Prism software (version 6.0, San Diego, CA, USA) and "ggplot2" package in R.

Network Construction, Major Compounds, and Major
Targets Screening. Two networks were constructed in this study. (1) A herb-bioactive compound-therapeutic target network was established to show the interactions among the CHs, bioactive compounds, and therapeutic targets. (2) A protein-protein interaction (PPI) network was built to explore the interactions among therapeutic targets. e PPI data of therapeutic targets were analyzed using Search Tool for the Retrieval of Interacting Genes/Proteins platform (STRING, https://string-db.org/) [31]. e species was set as "Homo sapiens", and the minimum confidence score was set as 0.9. e diagrammatic networks were constructed using Cytoscape software version 3.7.2 [32], and the major bioactive compounds and targets were screened by network  Evidence-Based Complementary and Alternative Medicine topological parameters calculated with Network Analyzer plug-in. e topological parameter of degree value was used to describe the importance of a node in a network, which is defined as the number of edges linked to a certain node. A major node must have a degree value two times greater than the median degree value of all nodes in the network [33].

Survival Analysis of the Major Target Genes.
e prognostic value of major target genes was assessed by survival analysis using a web server named Long-Term Outcome and Gene Expression Profiling Database of Pan-Cancers (LOGpc, http://bioinfo.henu.edu.cn/DatabaseList.jsp), which provides 13 types of survival terms for 28,098 patients from 26 malignancies [34]. All cases were categorized into high and low expression groups by the median expression of a certain therapeutic target gene; then overall survival (OS) and disease free survival (DFS) were analyzed by log-rank test. Kaplan-Meier (KM) curves, hazard ratio (HR), 95% confidence intervals (CI), and log-rank P value were generated online.
e results with P < 0.05 were considered statistically significant, and the targets correlated with patients' survivals were considered as the key targets.

Molecular Docking Simulation.
To evaluate the binding potential of the major bioactive compounds of the CHs with the key target proteins, molecular docking simulation was performed using GEMDOCK software (version 2.1, National Chiao Tung University, Hsinchu, Taiwan) [35]. e empirical scoring function of GEMDOCK is as follows: Fitness � van der Waal energy + hydrogen bonding energy + electro statistic energy. A fitness value was used to estimate the binding affinity of a protein and a ligand and the fitness value of the corresponding protein-original ligand complex was used as a comparison. e generic evolutionary method parameters were set as population size � 200, generations � 70, and number of solutions � 2. All 3D crystal structures of target protein-original ligand complexes were downloaded as ".pdb" files from Protein Data Bank (PDB, http://www.rcsb.org/pdb/) [36]. All 3D molecular structures of original ligands and major bioactive compounds were downloaded as ".mol2" files from TCMSP database or ZINC database (http://zinc.docking.org) [37].

Results of Data
Mining. 1130 and 131 studies were obtained from CNKI and MEDLINE databases, respectively. According to the inclusion and exclusion criteria above, 108 and 3 clinical studies were included from CNKI (published in Chinese) and MEDLINE (published in English) databases, respectively. A list of the clinical studies included in this study is provided in Supplementary Table 1. en, a total of 113 CHM prescriptions treating CRC were extracted from the 111 clinical studies. ere were 196 different Chinese herbs used in all prescriptions, and the frequency analysis of herbs revealed that the ten most frequently used Chinese herbs were Atractylodis Macrocephalae Rhizoma (Bai-Zhu), Poria (Fu-Ling), Radix Astragali (Huang-Qi), Glycyrrhizae Radix et Rhizoma (Gan-Cao), Codonopsis Radix (Dang-Shen), Coicis Semen (Yi-Yi-Ren), Citrus Reticulatae Pericarpium (Chen-Pi), Hedyotis Diffusae Herba (Bai-Hua-She-She-Cao), Angelicae Sinensis Radix (Dang-Gui), and Scutellariae Barbatae Herba (Ban-Zhi-Lian), which were recognized as the CHs in the treatment of CRC. e usage frequency of each CH is shown in the second and third columns of Table 1. Besides, the association rule analysis revealed that nine herbs out of the CHs were often used together, whose associations are shown in Figure 2(a).

Bioactive Compounds and Putative erapeutic Targets of the CHs.
e eligible bioactive compounds of every herb were merged from three TCM databases, bioactive compounds reiterated, or without putative targets were removed. Totally, 190 bioactive compounds and 252 putative targets of the CHs were included. e final counts of bioactive compounds of each CH are shown in the fourth column of Table 1, and duplicates existed among different the CHs.
In terms of CRC related targets, 838 targets were obtained from MalaCards, and the top 1000 targets were collected from DisGeNET and CTD databases, according to their rank of inference score, respectively. A total of 1860 CRC related targets were found after removing the duplicates. Subsequently, 118 overlapping targets between the CHs and CRC were identified using the Venn diagram ( Figure 2(b)), which were considered as the potential therapeutic targets of the CHs for CRC.

GO and KEGG Pathway Enrichment Analysis.
Functional enrichment analysis was performed for 118 therapeutic targets. In total, 17 GO terms were identified, including 8 biological process (BP) terms, 6 molecule function (MF) terms, and 3 cellular component (CC) terms (P value and FDR both <0.05). For BP, the targets mainly participated in the regulation of transcription, apoptosis, and response to cytokine. For MF, the targets were mainly responsible for the activity of steroid hormone receptor and transcription factor, as well as binding of chromatin, DNA, and steroid. For CC, the targets were mainly distributed in nucleus, extracellular space, and cytoplasm. e five most significantly enriched GO terms are shown as bar plots in Figure 3(a).
Besides, 44 KEGG pathway terms were enriched, many of which were pathways of specific kinds of cancer. Moreover, microRNAs in cancer and tumor necrosis factor (TNF), apoptosis, PI3K-Akt, and p53 signaling pathways, and so on were enriched. e top fifteen most significantly enriched KEGG terms are shown as a bubble chart in Figure 3(b).

e Network of Herb-Bioactive Compound-erapeutic
Target.
e network of herb-bioactive compound-therapeutic target was composed of 277 nodes (10 herbs, 149 bioactive compounds, and 118 targets) and 1240 edges ( Figure 3). Some bioactive compounds were absent in the network, in default of interaction with any therapeutic target. Among 149 compounds, quercetin might act on the most targets (degree � 59), followed by wogonin (degree � 33) and luteolin (degree � 25), the cumulative number of targets of the above three compounds was 82 (over 55% of all therapeutic targets). e information of the top 15 bioactive compounds with the most interactive therapeutic targets is shown in Table 2.

PPI Network and Major Targets
Screening. Based on the PPI analysis from the STRING, 12 targets were not involved in protein interactions; thus, the final PPI network consisted of 106 nodes (targets) and 364 edges (Figure 4). e median degree value of the PPI network was five, and 25 targets met the criteria of major targets in this network. e major targets interacted extensively with other proteins, suggesting their crucial roles in the development and progression of CRC. Table 3 lists the detailed information on these major targets.

Prognostic Significance of Major Target Genes.
Survival analysis was performed for every major target gene using the LOGpc server. e results showed that none of the major target genes showed significant association with OS in CRC patients. Nevertheless, high expression of VEGFA was associated with shorter DFS; high expression of vascular endothelial growth factor A (VEGFA), caspase 3 (CASP3), Myc protooncogene protein (MYC), cytochrome P450 enzymes 1A1 (CYP1A1), and NF-κB p105/p50 subunit (NFKB1) were associated with longer DFS in CRC patients. So, these five targets were chosen as the key targets of the CHs treating CRC; the KM survival curves of them are presented in Figure 5. And they were selected to perform molecular docking simulation.

Results of Molecular Docking.
Because the NFKB1 protein-original ligand complex was unavailable from the      (Table 4). Furthermore, it is easy to find out that CYP1A1 protein has the best binding affinities with the 15 compound ligands. Figure 6 shows the docking model of each target protein and the bioactive compound with the firmest binding.

Discussion
e current study could be divided into two major parts, which realized the inference from clinical experience to molecular mechanisms. In the first part, we identified the effective CHs in the treatment of CRC from abundant   Transcription factor p65 RELA 21 4 Retinoic acid receptor RXR-alpha RXRA 20 5 Tumor necrosis factor TNF 19 6 Nuclear receptor coactivator 1 NCOA1 19 7 Estrogen receptor ESR1 19 8 Glucocorticoid receptor NR3C1 18 9 Nuclear factor NF-kappa-B p105 subunit NFKB1 16 10 Epidermal growth factor receptor EGFR 15 11 Mitogen-activated protein kinase 8 MAPK8 15 12 Proto-oncogene c-Fos FOS 15 13 Interleukin-6 IL6 15 14 G1/S-specific cyclin-D1 CCND1 14 15 Cytochrome P450 1A1 CYP1A1 13 16 Retinoblastoma-associated protein RB1 13 17 Myc protooncogene protein MYC 13 18 Androgen receptor AR 13 19 Caspase-8 CASP8 12 20 Caspase-3 CASP3 12 21 Amyloid beta A4 protein APP 12 22 Cytochrome c oxidase subunit 2 MT-CO2 11 23 Peroxisome proliferator-activated receptor gamma PPARG 11 24 Vascular endothelial growth factor A VEGFA 11 25 Peroxisome proliferator-activated receptor alpha PPARA 11       CHM prescriptions through data mining. In the second part, we deduced the therapeutic targets and mechanisms of the CHs acting on CRC by network pharmacology approach. Firstly, ten Chinese herbs were identified as the CHs in the treatment of CRC, according to their high frequency of use and association rules with each other (Table 1 and Figure 2(a)). In CHM theory, Atractylodis Macrocephalae Rhizoma, Radix Astragali, Glycyrrhizae Radix et Rhizoma, Codonopsis Radix, and Angelicae Sinensis Radix are classified as tonic herbs to replenish the body's Qi (akin to the body's healthy energy) and blood. Poria, Coicis Semen, and Citrus Reticulatae Pericarpium are classified as dampness-dispelling herbs. Hedyotis Diffusae Herba and Scutellariae Barbatae Herba are classified as heat-clearing and detoxifying herbs. In the TCM theory of pathophysiology, deficiency of Qi and excess of toxic heat are the two most internal causes of CRC. What is more, deficiency of Qi usually leads to extra dampness, which is similar to the accumulation of metabolites in the body. us, the effects of the CHs were accorded with the TCM pathogenesis of CRC. Congruously, previous studies had demonstrated the benefits of the most CHs on CRC patients, including inhibiting side effects of chemotherapy and improving tumor response [38][39][40][41].
en, the bioactive compounds and putative targets of the CHs were collected. e overlapping targets between the CHs and CRC were considered as the potential therapeutic targets of CHs for CRC (Figure 2(b)). e functional enrichment analysis uncovered the therapeutic targets might mainly participate in the regulation of transcription, binding of DNA, reacting to inflammation, and regulation of apoptosis (Figure 3(a)). And the involved signaling pathways were mostly cancer-related, including microRNAs, TNF, apoptosis, PI3K-Akt, and p53 pathways (Figure 3(b)).
MicroRNAs (miRNAs) participate in tumorigenesis, progression, invasion, and drug resistance in cancers, including CRC. e activation of the PI3K/Akt pathway in cancer promotion had been widely learned, and miRNAs can affect CRC by regulating PI3K/Akt pathway in dual ways. For example, miRNA21 and miRNA200c could promote CRC by downregulating phosphatase and tensin homolog (PTEN), a negative regulator of PI3K, while miRNA106a and miRNA1 could suppress CRC by activating PTEN or directly blocking PI3K/Akt pathway [42,43]. Chronic inflammation is one of the characteristics of CRC. In the microenvironment of CRC, dense infiltrate of immune cells stimulates the secretion of proinflammatory cytokines, such as interleukin-(IL-) 6, IL-8, IL-1β, and TNF-α, which can synergistically activate signal transducer and activator of transcription 3 (STAT3), nuclear factor kappa-B (NF-κB), and hypoxia-inducible factor 1α (HIF-1α) pathways, resulting in progression of CRC [44,45]. As a well-known transcription factor and tumor suppresser, p53 protein drive cell apoptosis to avoid possibly cancer-inducing damaged DNA passing on to daughter cells. Not surprisingly, inactivation of the p53 pathway is often observed in CRC; binding of DNA to mutant p53 can amplify downstream protumor pathways [46].
Following, 15 compounds were recognized as the major bioactive compounds of the CHs in the treatment of CRC, with their importance in the herb-bioactive compoundtherapeutic target network (Figure 7 and Table 2). Most major bioactive compounds belong to the flavonoid family, including quercetin, wogonin, luteolin, kaempferol, 5,7,4′trihydroxy-8-methoxyflavone, formononetin, carthamidin, pinocembrin, and baicalein, all of which could be similarly used as chemopreventive agents, for their capacity of inducing cytotoxic apoptosis, anti-inflammation, antioxidation, and antiangiogenesis [47].
Quercetin was reported to induce cell cycle arrest and apoptosis, modulate estrogen receptors, alleviate oxidation, and inhibit angiogenesis in CRC [48]. Luteolin might suppress the proliferation and transformation of human CRC cells through antioxidation and miRNAs related pathways [49,50]. Kaempferol could stimulate apoptosis in CRC cells, in which p53 mediated caspases activation might play critical roles [51]. And kaempferol should have a synergistic effect with 5-fluorouracil (5-FU) in CRC cell lines through PI3K/Akt inactivation [52]. Wogonin was observed to stimulate apoptosis through facilitating cytoplasmic localization of p53 in vitro, and its effects of reducing tumor multiplicity and preserving colon length were verified in vivo [53]. Another study demonstrated wogonin might simultaneously induce autophagy, apoptosis, and cell cycle arrest via blocking PI3K/Akt and STAT3 pathways [54], which was similar to formononetin [55]. Moreover, pinocembrin could trigger Bax-dependent mitochondrial apoptosis in CRC cells [56,57]. Pachymic acid was also reported to induce cell cycle arrest and apoptosis in many kinds of digestive cancers [58][59][60][61].
Next, VEGFA, CASP3, MYC, CYP1Y1, and NFKB1 were identified as key targets of the CHs in the treatment of CRC, with their contributions in the PPI network ( Figure 4) and prognostic significance in CRC patients ( Figure 5). Angiogenesis is responsible for tumor vascularization degree, growth, and metastasis. VEGFA is acknowledged as a significant proangiogenic factor, which is frequently overexpressed in metastatic CRC patients [68]. Besides, miRNAs, such as miR-203, miR-497, and miR-26a, might resist CRC by targeting VEGFA [69,70]. CASP3 activation triggers anticancer apoptosis and could be used as a marker of Evidence-Based Complementary and Alternative Medicine apoptosis-targeted treatment response [71]. MYC is a family of transcriptional regulators, containing c-Myc, n-Myc, and l-Myc in mammals. Deregulation of c-Myc exists in 70% of CRC, which could partly explain genome instability, tumorigenesis, and other malignant behaviors of CRC cells [72,73]. CYP1A1 is used to be infamous for producing severe carcinogens. Although still controversial, CYP1A1 had been proposed more importance in detoxification, which might provide protective effects against some oral carcinogens in CRC patients [74]. NFKB1 belongs to NF-κB family, which consists of a class of transcription factors working as central regulators of inflammatory pathways, cell proliferation, and apoptosis [75]. Activation of NF-κB had been frequently observed in CRC patients, associated with worse outcomes [76]. Based on the above discussions, we could summarize VEGFA as an unfavorable factor, while CASP3, MYC, and CYP1Y1 as favorable factors in CRC, which was consistent with the results of survival analyses. In the contrary, our survival analysis showed that NFKB1 was correlated with longer DFS in CRC patients; for this, further researches are still needed.
In the end, the molecular docking simulation (Table 3 and Figure 6) showed a great majority of the 15 major bioactive compounds might bind firmly to VEGFA, CASP3, MYC, and especially CYP1Y1 proteins and thus afforded the possibility for them to exert their pharmacological activities. e molecular docking simulation strengthened all the above investigations to some extent, but experimental validations are still needed.

Conclusion
In summary, this study combined data mining and network pharmacology approach to identify ten the CHs, fifteen major bioactive compounds (quercetin, wogonin, luteolin, etc.), and five key therapeutic targets (VEGF, CASP3, MYC, CYP1Y1, and NFKB1) in the treatment of CRC. Additionally, the underlying mechanisms of action of the CHs might be apoptosis induction, transcription modulation, and inflammation suppression, and microRNAs, TNF, apoptosis, PI3K-Akt, and p53 signal pathways are participated. is study threw light on the anti-CRC mechanisms of the CHs in a holistic manner, which might provide novel insights for Figure 7: e herb-bioactive compound-therapeutic target network of the CHs. e hexagonal nodes represent the CHs, the triangle nodes represent the bioactive compounds, the circular nodes represent the therapeutic targets, and the edges represent the interactions among them. 12 Evidence-Based Complementary and Alternative Medicine therapeutic strategies and further studies. However, further experiments are still necessary.
Data Availability e information of the clinical studies included in this study is provided in Supplementary Table 1. All data that support the findings of this study are publicly available from the databases mentioned in the Materials and Methods section.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.