dTGS: Method for Effective Components Identification from Traditional Chinese Medicine Formula and Mechanism Analysis

Because of the complexity of the components in Traditional Chinese Medicine formula (TCM formula), it is still a challenge to identify its effective components, to elucidate the mechanism of the components, and to discover the relationship between components and therapy objectives. In this paper, a method called directed TCM grammar systems (dTGS) for effective component identification was proposed using entity grammar systems (EGS) as the theoretical framework. The component-disease relationship of a TCM formula (i.e., Bai-Hu decoction plus Wasting-Thirsting formula, BHDWT) and one disease (i.e., type 2 diabetes mellitus) treated with it was studied, and the effective component groups (ECGs) were identified. 19 compounds were found acting on 20 proteins in type 2 diabetes mellitus (T2D) disease network, and 15 compounds were determined as the candidate effective components. Results indicated that this method can be used to identify the effective components and provide an innovative way to elucidate the molecular mechanism of TCM formulas.


Introduction
The components in Traditional Chinese Medicine formula (TCM formula) were very complex and their molecular mechanism was unclear. For the treatment of one disease, some components may be favorable and others may not. Identification of favorable components and analysis of their action mechanisms will benefit the optimization of cultivation condition, processing technology, extraction process, and new drug development. At present, experiment screening was still the main method for the identification of effective components and effective component groups (ECGs). For example, high-performance liquid chromatography-mass spectrometry (HPLC-MS) was used to analyze the active constituents of Xiao-Xu-Ming decoction [1]; drosophila transgenic models were used to identify combinatorial drug, such as suberoylanilide hydroxamic acid (SAHA) and geldanamycin, for the treatment of Huntington's disease [2]; cellbased assays technology was used to screen two-component combinations for the treatment of cancer, infectious diseases, and CNS disorders [3]. However, the results identified through experimental screening were limited, due to the complexity of components and the high cost associated with experiments.
Recently, computational systems biology was used to study TCM because of the technical advantages of studying large and complex systems and the relative lower cost compared to experimental screening. The applications of complex network analysis techniques, in particular, led to many new findings. For instance, microarray technology and connectivity maps were integrated into the research of molecular mechanisms of Si-Wu decoction (composed of four herbs: Radix Rehmanniae preparata, Radix Angelicae Sinensis, Rhizoma Ligustici Chuanxiong, and Radix Paeoniae Alba) [4]. Multilayer map of "Phenotype network-Biological network-Herb network" was applied to uncover the underlying network systems of TCM syndromes and herb formulas [5]; the drug-target network was implemented to elucidate the mechanism of one TCM formula for the treatment of T2D [6]. Although those applications of computational systems biology in the study of TCM formulas are still in the exploratory stages, they demonstrate the feasibility of 2 Evidence-Based Complementary and Alternative Medicine integrating the biological network and the experiences in traditional medicines for the analysis of TCM formulas.
To date, graph theory is a primary approach for network research [7]. It is viable to study a network graph composed of dozens or hundreds of nodes through visual inspection. However, it is not practical to analyze a network containing massive nodes or complex relations between them, even with the help of three-dimensional display techniques. Most methods developed for complex networks, such as the path-length method and the nodes-distribution method [8], focus on the topological structure instead of the specific relationship between nodes. Therefore, it is still challenging to study a complex disease network with intercrossing pathways and to understand the final effects of the components of TCM formulas based on the biological signal pathways. In addition, those biological effects may be ambiguous (positive through one pathway, while negative through the other one) plus some proteins affected by the components of TCM which have not been identified as disease targets. In order to solve these issues and to discover components with positive effect from TCM formulas, we proposed a new approach called directed TCM grammar systems (dTGS) to identify effective components from TCM formula based on an entity grammar system (EGS). In dTGS, the TCM component-protein network and the disease network are viewed as grammar systems and the ECGs can be identified through syntax rules. Bai-Hu decoction plus Wasting-Thirsting formula (BHDWT) was selected as an example in this paper to illustrate the basic idea of the method.

Definition of dTGS Model in the Framework of EGS.
EGS is a formal grammar system that aims at complex biological system modeling [9]. Because of its scalable feature, EGS has already been used to establish the flow graph models of chemical processes [10] and to illustrate the mechanisms of TCM [11]. The details for establishing a specific EGS were described in [9] and are briefly summarized here. An entity grammar system is a quintuple, = ( , , , , ), whereas ∪ = , is a finite set of nonterminal symbols, is a finite set of terminal symbols, and ∩ = Φ, is a finite set of relations for ; is a set of rules to deduce relationships between entities, and is the starting entity. dTGS has the same structure as EGS: set contains different types of nodes (compounds, proteins, T2D, apoptosis, inflammation, etc.); set contains different types of relationships between adjacent nodes; set defines the rules to derive the relationship of nodes, as described by the following: 1 is the set of the compounds in TCM formulas, 2 is the set of proteins in the disease network on which the compounds in TCM formulas act directly, 3 is the set of the rest proteins in the disease network, and 4 is the  Figure 1: Network used in dTGS for deduction. Red triangle node: chemical compound; blue circle node: protein; " →": one chemical component or one protein enables the expression of the next protein, raises the expression, or enhances the activity of the next protein. "⊣": one chemical component or one protein inhibits the expression of the next protein, lowers the expression, or weakens the activity of the next protein.
The modes of action between nodes include the positive (pos) and the negative (neg) effects. If we define the positive effect as "1" and the negative effect as "−1, " the ultimate influence of intervention will depend on the product of each step in the whole signal pathway. For example, compound A in Figure 1 influences protein T through 3 paths. The effect of A to T is negative through the path "A-c-d-e-f-T" and the path "A-b-h-k-i-m-T. " This effect is uncertain through the third path "A-g-h-k-i-m-T, " taking into account the negative effect from the feedback path m to g. In this paper, we neglected the effects produced through feedback because the role of feedback is expected to regulate the magnitude of effect but not to alter the overall mode of action of effect. So, the effect is positive through the third path. After the effects of one compound on the ultimate node (i.e., T in Figure 1) through all signal pathways are determined, we can select the effective compounds based on the desired effect. If the desired effect on the ultimate node is positive, the compounds with positive effects through all pathways will be selected as active components. The compounds with negative effects through all pathways will be ruled out. The components with both effects through different pathways need further analysis on molecular mechanism. If their undesired effects can be countered by other compounds, they may also be selected as effective components to be used together with the countering compounds. The opposite analysis will be done if the desired effect on the ultimate node is negative. Consider 1 is the set of entities with structure cp( , , ) or pp( , , ) in biological network of disease, which are the background for deduction. 2 is the set of labeled compounds or proteins, expressed by tag( ). 2 is the initial conditions for deduction.

Data for Construction of Component-Protein Network of BHDWT Formula.
For decades, BHDWT has been used to treat T2D at the Beijing Guang-An-Men Hospital [12]. BHDWT has a positive effect on blood glucose control and symptom control for some patients in the early stages of T2D. The BHDWT formula consists of eight herbs, including gypsum, Anemarrhena asphodeloides Bunge, rehmannia dried rhizome, radix trichosanthis, Ophiopogon japonicus Ker Gawl, Coptis chinensis Franch, Scutellaria baicalensis Georgi, and Glycyrrhiza uralensis.
The components of the BHDWT formula came from the Traditional Chinese Medicines Database (TCMD) [13], the State Administration of Traditional Chinese Medicine Basic Information Database (http://dbshare.cintcm.com/ ZhongYaoJiChu/), and A Handbook on the Analysis of the Active Composition in Traditional Chinese Medicine [14].
The compound-targeted proteins were derived from the STITCH system (http://stitch.embl.de/) [15]. By entering the names or identifiers of compounds or proteins of interest, STITCH provides the list of proteins with matching or higher confidence score that the user specified, up to the number the user specified. The required confidence score represents the possibility of interaction between the entities. In order to obtain more general results, the parameter of the required confidence score was set higher than 0 and the interacting entities number was set to be 500. The interacting entities with clear mode of action (positive or negative) were chosen for further analysis.

Data for the Construction of T2D Biological Network.
To construct the T2D network, we used the data collected from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and therapeutic targets database (TTD). KEGG lists the signal pathways related to T2D and TTD lists the chemical components used to treat T2D. The biological network of T2D ( Figure 3) was constructed using the signal pathways from KEGG, the chemical components from TTD, and the positive or negative relationship between targets and T2D from STITCH. The networks were visualized with the software Cytoscape [16]. compound in BHDWT on T2D were derived using 1 , 2 , 3 , and 4 rules defined in Section 2.1.

The Effect of Chemical Components on T2D.
The ultimate effects of each compound of BHDWT on T2D can be found by combining Figures 2 and 3 through dTGS. We applied rule 6 for each compound node in Figure 3 as , with the linked protein with to be in 6 rule. As a result, all the compounds in Figure 2 that have clear modes of action on linked protein were labeled. The proteins presented in both Figures 2 and 3 were also labeled by applying 6 . With the labeled protein as and the connected node as , the pathway describing the relationship can be extracted by applying 5 . Those compounds' ultimate effects on T2D (Figure 4) can be derived through 1 , 2 , 3 , and 4 . Totally, 45 compounds in 7 TCMs (except gypsum) showed effects on 61 proteins in the T2D biological network. Among 45 compounds, 19 (additional file 2) have a clear mode of action (positive or negative) recorded in STITCH. Three kinds of effect were found: positive, negative, and bidirectional effects. The desired effect on T2D is negative. Among these 19 compounds, -sitosterol, isoliquiritigenin played a positive effect on T2D (i.e., negative effect on the treatment of T2D) and four compounds (i.e., scutellarin, catalpol, mangiferin, and acteoside) have negative effects on T2D (i.e., positive effect on the treatment of T2D). The rest of the 13 compounds show bidirectional effects and will be studied further in the next section.

Extraction of the Subnet and Effective Components.
For each of the 13 bidirectional compounds, we extracted the subnetworks affected by each of them to study their effects in more detail. The method of extracting the subnetwork has been explained in Section 2.1. We found through subnetworks that the negative effect of rutin and phenylacetic acid on T2D originates from the feedback pathways, being not expected to override the positive effect through direct pathways. So, these two compounds were not considered as candidate effective components. Some other bidirectional components have a complex sub-network. For example, berberine acts on six proteins. The derived sub-network including all six proteins is too complex for further analysis ( Figure 5). Therefore, we derived six sub-networks including one or two proteins reacting with berberine. Two of those sub-networks were shown in Figures 5(b) and 5(c). Through analyzing these six sub-networks, we found that berberine's negative effect on T2D arises from the direct pathway. Finally, all 13 bidirectional components except rutin and phenylacetic acid were selected as candidate effective components. They together with the other four negative compounds (i.e., scutellarin, catalpol, mangiferin, and acteoside) form 15 candidate 8 Evidence-Based Complementary and Alternative Medicine effective components combinations and will be screened further in the next section.

Combination of Candidate Effective Components.
In this section, we try to find out which proteins each of the 15 effective components affects. This will help us figure out how to combine those components to achieve optimal results. According to the literature, insulin resistance and impaired insulin secretion are two major etiological factors of T2D [17], and -cell apoptosis was considered as one reason for the impaired insulin secretion [18]. Therefore, we divided the proteins into four categories. The first category is only related to insulin resistance; the second category is only related to apoptosis; the third category is related to both insulin resistance and apoptosis; the fourth category is only related to insulin secretion. Then, the effects of each active component were screened according to the category of proteins ( Figure 6). For instance, mangiferin acts on insulin resistance related proteins (PPARa) and catalpol acts on apoptosis related proteins (BCL-2); hence, the combination of mangiferin and catalpol was predicted to treat T2D by ameliorating insulin resistance and inhibiting apoptosis. It is worth noting that Figure 6 missed one protein in additional file 2, that is, CASP9. This is due to the fact that betasitosterol, the only compound which enables CASP9, was ignored because of beta-sitosterol's positive effect on T2D. The rest of the compounds in Figure 6 did not act on CASP9.
Some of the findings revealed in Figure 6 are consistent with numerous studies on the treatment of T2D. Ferulic acid showed antidiabetic effects in experiments on diabetic mice [19]. Mangiferin exhibited the potential to improve blood lipids in T2D [20]. Baicalein was demonstrated to protect pancreatic beta-cells from apoptosis and ameliorates hyperglycemia in a mouse model of T2D [21]. The experiment in vitro indicated that berberine can improve glucose consumption (GC) over 30% when the concentration is above 5 × 10 −6 mol/L; at the same time, berberine also depressed cell growth remarkably at the same concentration [22]. This finding was consistent with our analysis that berberine promotes cell apoptosis by promoting caspase 3 and inhibiting BCL-2.
The compounds that have multiple and counterpart pathways in Figure 6 were still selected as candidate effective components to treat T2D because their unfavorable effects may not be dominant or counteracted by other compounds, as demonstrated in clinical practice. For example, some physicians have used berberine to treat hyperglycemia agent in China for many years [23]. Figure 6 also discloses some information useful for designing or analyzing component combination. Although the hepatotoxicity or pancreotoxicity (typically resulting from enhanced cell apoptosis) induced by berberine has never been observed in clinics, Figure 6 indicated that its toxicity may be counteracted while constructing drug combinations with the components that can inhibit cell apoptosis, such as catalpol, scutellarin, acteoside, and baicalein. In clinical practice, BHDWT was used to treat early stages of T2D when the main disease factor is insulin resistance [8]. This can be explained by several effective components acting on green nodes (i.e., proteins related to insulin resistance) in Figure 6. Similar effects to suppress the insulin resistance can also be achieved by the combinations of some of these effective components according to Figure 6, such as (i) the combinations of berberine and mangiferin, (ii) the combinations of berberine and catalpol (or scutellarin or acteoside or baicalein), and (iii) of berberine, mangiferin compination and catalpol (or scutellarin or acteoside or baicalein). Some of these combinations have been validated by the work from other researchers: the combination of berberine and mangiferin was granted a patent [24], and the combination of berberine and catalpol [25] has been filed for a patent. All of the results indicate that TCM formula plays its role through synergistic effects of multiple components.

Conclusions
This paper proposed dTGS as an innovative method to study TCM formulas. It integrates the research achievements from three fields: TCM chemistry, drug discovery, and the network biology. The findings include the action trends of chemical components against one disease (T2D) and the active component combinations from BHDWT formula. It can also be applied on other TCM formulas to benefit the research on the mechanism of TCM formulas.
In addition, our work would benefit the development of fixed-dose combinations. Nowadays, drug combinations or fixed-dose combinations (FDCs) are widely used in the treatment of complex diseases because of the low cost and the clinical efficiency. TCM formulas, due to their characteristics of multicomponents, multitargets, and multipath effects, may embody some component combinations or combination principles beneficial to the design of drug combination. Our method provides a systemic approach to reveal those principles.
Last but not least, our method provided a novel idea for network analysis. Our method is different from the primary approach in network research (e.g., graph theory) in that we proposed a series of inference rules derived from the relationship of the nodes and provided a new theoretical framework for analyzing the complex network. The feasibility of this theoretical framework was proved by its success to identify the effective component combinations in TCM formulas.