Research on core and effective formulae (CEF) does not only summarize traditional Chinese medicine (TCM) treatment experience, it also helps to reveal the underlying knowledge in the formulation of a TCM prescription. In this paper, CEF discovery from tumor clinical data is discussed. The concepts of confidence, support, and effectiveness of the CEF are defined. Genetic algorithm (GA) is applied to find the CEF from a lung cancer dataset with 595 records from 161 patients. The results had 9 CEF with positive fitness values with 15 distinct herbs. The CEF have all had relative high average confidence and support. A herb-herb network was constructed and it shows that all the herbs in CEF are core herbs. The dataset was divided into CEF group and non-CEF group. The effective proportions of former group are significantly greater than those of latter group. A Synergy index (SI) was defined to evaluate the interaction between two herbs. There were 4 pairs of herbs with high SI values to indicate the synergy between the herbs. All the results agreed with the TCM theory, which demonstrates the feasibility of our approach.
Traditional Chinese medicine (TCM) has been developed and practiced in China for thousands of years, and herbal prescription has played a key role in the medical treatment. A Large number of herbal prescriptions have been recorded over the years where valuable TCM knowledge is hidden. It is urgent and critical to analyze these data so that TCM models can be developed in the modernization of this ancient knowledge. Although TCM is still in practice and more countries consider it as an alternative treatment method [
KDD allows TCM researchers to find interesting patterns efficiently, and they may direct further laboratory work that leads to discovery [
Generally, KDD research in TCM has been divided into two main categories. The first one attempts to extend our understanding using existing TCM knowledge, while another one attempts to identify core knowledge from existing TCM data, so that each piece of extracted knowledge can be further validated using scientific evidence. This paper belongs to the latter one and, in particular, pays attention to the study on TCM formulae from clinical data.
The efficiency of a formula can be interpreted as a collaboration of its member herbs. It is common to find that most of the prescriptions are of some relatively smaller fixed composition(s) that can be called core formula (CF). Adding herbs into and/or subtracting herbs from CFs are usually carried out in order to realize the personalized treatment. For example, although there are 113 prescriptions in one of the greatest TCM classics, named “Shang Han Lun”, only 8 CFs exist, such as Gui zhi Tang that forms the basis of the formation of Guizhi Jia Gui Tang, Guizhi Xinjia Tang, Gegeng Tang, and Dang gui Si ni Tang [
Research on CFs does not only summarize traditional Chinese medicine (TCM) treatment experience, it also helps to reveal the underlying knowledge in the formulation of a TCM prescription. Several computational models were proposed in the past decade to mine the TCM formulae, such as factor analysis [
Integrated tumor treatment using Chinese and western medicine is getting standardized in China and has become an important method of prevention and treatment. Many clinical studies [
Therefore, a major goal of this paper is to discuss approaches and strategies for the discovery of core and effective formulae (CEF) in tumor clinical data. Genetic algorithm (GA) was applied, which is a search heuristic that mimics the process of natural evolution [
This paper is organized as follows. The Materials and Dataset section contains the process of the data acquisition and a description of the data. The Methods section is the methodological part of this paper. It contains definitions related to the assessment of CEF and a description of the genetic algorithm including the definition of fitness function. Complex network is presented in order to address the core herbs analysis in combination of prescriptions, and the analysis of herb-herb interactions is performed in the Results section.
The dataset used in this paper came from the inpatient lung cancer (LC) records of Shanghai Longhua Hospital of TCM. 161 patients with different stages (both early and metastatic stages) of LC only receiving TCM therapy were included. Their prescriptions and symptoms were recorded during February 2010 to August 2012. Traditional Chinese medical herbs were taken as decoction, and fifteen LC symptoms were recorded and they are cough, expectoration, short of breath, chest tightness, chest pain, fatigue, loss of appetite, bloody sputum, dry mouth and throat, fever, spontaneous and night sweating, insomnia, diarrhea, nocturia, five upset hot.
A 4-point response scale (0: not at all, 1: a little, 2: quite a bit, 3: very much) was used to indicate the severity of the symptoms. Since the efficacy of a prescription can only be made known when the patient is met again in the next consultation, hence, to evaluate the efficacy of a prescription and to find the TCM treatment principles, only patients with multiple records (visits) were chosen.
There were 595 transaction records for the 161 patients, which range from 1 to 9 visits, and the average number of transaction records per patient is near 4. Each record has its information of symptoms and the corresponding prescription. The interval of time between two visits was one or two weeks, during which the patient took the same prescription. After excluding those patients who had only one visit, 586 transaction records for the 152 patients were left behind which had a total 230 different herbs being used. In the next stage, the symptom score in each record was calculated as follows:
An illustrative example for data format and SCV calculation is shown in Figure
An illustrative example for data format and SCV calculation.
After removing missing values, 419 SCVs for the 150 patients were obtained. According to the TCM theory, the criterion to be effective requires the SCV to be greater than or equal to 30% [
At the end of this step, 93 out of the 419 records have positive outcome, making it an imbalanced dataset with 22.2% being effective. The statistic information for the number of herbs is shown in Table
Statistic information for number of herbs.
Per record | Per patient | Average number per patient per record | |
---|---|---|---|
Minimum | 9 | 9 | 9 |
Average | 23 | 29 | 22 |
Maximum | 36 | 73 | 33 |
Top 50 frequently used herbs.
Rank | Record based | Patient based | ||
---|---|---|---|---|
Herb | Frequency | Herb | Frequency | |
1 | Chinese sage herb | 395 | Chinese sage herb | 147 |
2 | Doederlein's spikemoss herb | 393 | Doederlein's spikemoss herb | 146 |
3 | Akebia fruit | 359 | Akebia fruit | 133 |
4 | Herba |
332 | Herba |
129 |
5 |
|
321 |
|
127 |
6 | Rice-grain sprout | 268 | Astragalus root | 116 |
7 | Malt | 268 |
|
107 |
8 |
|
266 | Chicken gizzard membrane | 107 |
9 | Astragalus root | 263 | Rice-grain sprout | 106 |
10 | Chicken's gizzard-membrane | 252 | Malt | 106 |
11 | Common selfheal spike | 239 | Common selfheal spike | 104 |
12 | Rhizoma batatatis | 235 | Rhizoma batatatis | 97 |
13 | Coix seed | 223 | Coix seed | 96 |
14 | Tangerine peel | 195 | Coastal glehnia root | 86 |
15 | Coastal glehnia root | 195 | Oysters | 85 |
16 | Oysters | 183 | Tangerine peel | 80 |
17 | Rhizoma amorphophalli | 172 | Pericarpium trichosanthis | 79 |
18 | Pericarpium trichosanthis | 162 | Asparagus cochinchinensis | 72 |
19 | Asparagus cochinchinensis | 158 | Rhizoma amorphophalli | 68 |
20 | Edible tulip | 139 | Edible tulip | 64 |
21 |
|
133 |
|
62 |
22 |
|
122 | Tatarian aster root and rhizome | 58 |
23 | Chinese date | 120 | Shorthorned epimedium herb | 58 |
24 |
|
119 | Pilose asiabell root | 58 |
25 | Pilose asiabell root | 119 |
|
57 |
26 | Tatarian aster root and rhizome | 112 |
|
55 |
27 | Chinese taxillus herb | 111 | Chinese taxillus herb | 53 |
28 | Shorthorned epimedium herb | 109 | Chinese date | 53 |
29 |
|
99 |
|
51 |
30 | Baikal skullcap root | 98 | Heartleaf |
48 |
31 | Suberect spatholobus stem | 97 | Suberect spatholobus stem | 46 |
32 | Heartleaf |
95 | Glossy privet fruit | 46 |
33 | Glossy privet fruit | 91 | Baikal skullcap root | 45 |
34 | Noble |
89 | Balloon flower root | 43 |
35 | Chekiang fritillary bulb | 82 | Chekiang fritillary bulb | 42 |
36 | Paris polyphylla smith | 79 |
|
40 |
37 | Almond | 78 | Paris polyphylla smith | 40 |
38 |
|
76 | Barbary wolfberry fruit | 40 |
39 | Balloon flower root | 75 | Almond | 40 |
40 | Barbary wolfberry fruit | 73 | Noble |
39 |
41 |
|
65 | Cherokee rose fruit | 35 |
42 | Cherokee rose fruit | 63 | Chinese dodder seed | 32 |
43 | Reed rhizome | 57 |
|
31 |
44 | Toad skin | 57 | Reed rhizome | 30 |
45 | Radix |
55 | Toad skin | 30 |
46 | Fingered citron fruit | 55 | Common macrocarpa fruit | 29 |
47 | Chinese dodder seed | 54 | Immature bitter orange | 28 |
48 | Common macrocarpa fruit | 53 | Radix |
25 |
49 | Dragon's bones | 51 | Radish seed | 24 |
50 | Immature bitter orange | 51 | Dragon's bones | 24 |
The aim of this paper is to find the core and effective formula (CEF). The measure of effectiveness of a formula helps to determine the efficacy of the herbal interaction in TCM medicine, while the coreness of the prescriptions can help us summarize the TCM treatment principle. The identification of CEF comes from a high dimensional search space of symptoms and herbs; hence, the discovery of CEF can be described as a complicated combinatorial optimization problem. The analytic process of this paper can be described as follows: recognizing and defining the problem, constructing and solving a model for the problem, validating the obtained solutions.
The following sections discuss the different process steps in detail.
Our problem focuses on how to choose a best combination of herbs. The typical data format is shown in Figure a set of herbs an outcome variable herb domains an objective function
Data format for combinatorial optimization problem of discovery of CEF.
The set of possible feasible combinations is
where
The higher the value
To construct and solve a model for combinatorial optimization is a difficult task: in general, we start with a realistic but possible solution, and then execute iterative optimization. As a computational model of evolutionary processes, GA not only has the ability to solve combinatorial optimization problems that are nonparametric, in contrast to most other algorithms that find one solution at a time, but also it has the strength to find multiple pareto optimum solutions in parallel at the same time. This is compatible with TCM treatment that multiple formulae are applicable to a set of symptoms, that is, it is an equifinality. The concept of equifinality refers to many alternative ways of attaining the same objective. Using the previous definitions in Section
Flow chart of GA and explanations of the sequence of GA steps for the discovery of CEF.
The herb combination to be optimized is represented by a chromosome whereby each herb is encoded in a binary string called gene according to the original herb space. Since there were 230 distinct herbs, the chromosome was made up of a string of 230 binary characters, with the value of “0” and “1” to describe a prescription. A population, which consisted of a given number of chromosomes, was initially created by randomly assigning “1” to all genes with probability
A crucial point in using GA is the design of fitness function, which determines what a GA should optimize. The goal of this study is to find CEF, which is a small subset of herbs that are frequently used and most significant for effectiveness. Fitness was measured by two criteria of CEF, one is coreness that is represented by
After evolving the fitness of the population, the chromosomes were selected by means of the tournament selection, which involved running several “tournaments” among a few chromosomes chosen at random from the population. The winner of each tournament (the one with the best fitness) was more likely selected. Then children chromosomes were created from parent chromosomes by multipoint crossover operator. After that, the chromosomes were mutated with a three-way swap of three randomly chosen genes in a permutation, which could lead to new chromosomes in the searching space. Sometimes, this may lead to new and better results. Mathematically, using crossing over is helpful to find a local optimal solution, and mutations can help to discover new and better optima.
GA is an iterative search method, which will approach the optimized region but may not arrive at the optimized solution. So a terminal condition is needed. Here, we terminated GA process after a predefined number of generations. The chromosomes of the last generation with the highest value of
After finding optimal or near-optimal solutions (prescriptions), we had to evaluate them. Based on the meaning of CEF, solutions were evaluated on both coreness and effectiveness. In this study, the measurements of confidence and support are proxy to the coreness property of a formula. The definitions
The results in the following discussion were averaged over three executions using the same parameters. To compare effect of changing the parameters on GA efficiency and results, we needed to fix all the parameters of fitness function. The fixed configuration used for fitness function is being described here:
Parameters for GA.
Parameter | Value |
---|---|
Population size | 1000 |
Initial herb selection probability |
|
Crossover probability |
0.7 |
Tournament selection size |
15 |
Generation | 200 |
Parameter selection in GA.
As for the parameters of fitness function, in order to get CEF with the highest confidence,
Traditional indications and biological effects of herbs.
Herb | Abbreviation | Traditional indications# | Effects of cancer treatment | Reference |
---|---|---|---|---|
Astragalus root | AR | To reinforce |
Immune stimulating effect. Improving quality of life for patients with nonsmall cell lung cancer. | [ |
Akebia fruit | AF | To regulate |
Popularly used for primary liver cancer treatment in China. | [ |
Atractylis ovata | AO | To invigorate the function of the spleen and replenish |
Antiangiogenic activity. Inhibiting the growth of B16 cancer cells. | [ |
Chinese date | CD | To tonify the spleen, replenish |
Antiproliferative activity in human breast cancer cells. | [ |
Chinese sage herb | CS | To remove toxic heat and blood stasis and relieve pain. | Antiangiogenic activity. | [ |
Coix seed | CSE | To transform dampness and promote water metabolism, to strengthen the spleen, and to clear heat and eliminate pus. | Affecting cellular pathways in neoplasia: to inhibit NFkappaB and protein kinase C signaling. | [ |
Doederlein's spikemoss herb | DS | To remove toxic heat and dampness and to promote blood circulation and remove blood stasis. | Antiproliferative activity in three types of human cancer cells in vitro. | [ |
Herba |
HO | To eliminate heat and toxic material, to promote blood circulation and remove blood stasis, and to clear dampness heat. | Antiproliferative activity in eight cancer cell lines. Strengthening the patient's resistance. | [ |
Malt | MA | To invigorate the function of the spleen, to regulate the function of the stomach, and to promote the flow of milk. | Proliferative function of colonic epithelial cells. | [ |
|
PC | To cause diuresis, to invigorate the spleen function, and to calm the mind. | Inhibiting the growth of nonsmall cell lung cancer cells. | [ |
|
PL | To induce diuresis, relieve dysuria, remove heat, and arrest bleeding. | Its active components: isomangiferin has capability of inhibiting virus replication within cells, and fumaric acid has chemopreventive potential for tobacco-nitrosamine-induced lung tumors. | [ |
|
PT | To remove damp and phlegm, to relieve nausea and vomiting, and to eliminate stuffiness in the chest and the epigastrium. | Antiproliferative activity in five cancer cell lines in vitro. | [ |
Rhizoma batatatis | RB | To replenish the spleen and stomach, to promote fluid secretion, and to benefit the lung. | Inhibiting the cancer cell line of melanoma B16 and Lewis lung cancer in mice in vivo. | [ |
Rice-grain sprout | RS | To promote digestion, invigorate the function of the spleen, and improve appetite. | Popularly used for strengthening function of the spleen and the stomach during cancer treatment in China. | [ |
Tangerine peel | TP | To regulate the flow of |
Antioxidative and anti-inflammatory functions. Antiproliferative activity in human gastric cancer cells. | [ |
CEF obtained by GA.
No. | Number of herbs | Composition | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AR | AF | AO | CD | CS | CSE | DS | HO | MA | PC | PL | PT | RB | RS | TP | ||
1 | 10 | X | X | X | X | X | X | X | X | X | X | |||||
2 | 9 | X | X | X | X | X | X | X | X | X | ||||||
3 | 8 | X | X | X | X | X | X | X | X | |||||||
4 | 9 | X | X | X | X | X | X | X | X | X | ||||||
5 | 8 | X | X | X | X | X | X | X | X | |||||||
6 | 10 | X | X | X | X | X | X | X | X | X | X | |||||
7 | 9 | X | X | X | X | X | X | X | X | X | ||||||
8 | 11 | X | X | X | X | X | X | X | X | X | X | X | ||||
9 | 10 | X | X | X | X | X | X | X | X | X | X | |||||
6 | 8 | 9 | 1 | 5 | 2 | 7 | 4 | 4 | 7 | 1 | 9 | 9 | 4 | 8 |
Fitness value by GA with different
Herb-herb network was constructed using a cooccurrence frequency-based method. The degree value of one node (herb) was defined as the number of other nodes (herbs) that it connects to; it is a simple but an important property of any complex network. A node has a more significant role to play if it has a higher degree value. The importance of a herb was studied according to its degree value and frequency in the dataset. These values were sorted into descending order and shown in Table
Core herb identification.
Herb | Degree | Degree rank | Record based | Patient based | ||
---|---|---|---|---|---|---|
Frequency | Frequency rank | Frequency | Frequency rank | |||
DS | 225 | 1 | 393 | 2 | 146 | 2 |
CS | 225 | 2 | 395 | 1 | 147 | 1 |
AF | 223 | 3 | 359 | 3 | 133 | 3 |
AO | 220 | 4 | 321 | 5 | 127 | 5 |
HO | 219 | 5 | 332 | 4 | 129 | 4 |
PC | 207 | 6 | 266 | 8 | 107 | 7 |
AR | 198 | 9 | 263 | 9 | 116 | 6 |
CSE | 197 | 10 | 223 | 13 | 96 | 13 |
RB | 194 | 11 | 235 | 12 | 97 | 12 |
RS | 191 | 12 | 268 | 6 | 106 | 9 |
MA | 191 | 13 | 268 | 7 | 106 | 10 |
TP | 184 | 16 | 195 | 14 | 80 | 16 |
CD | 158 | 30 | 120 | 23 | 53 | 28 |
PT | 152 | 34 | 99 | 29 | 51 | 29 |
PL | 127 | 47 | 65 | 41 | 31 | 43 |
Average confidence of the prescription (
Confidence and support of CEF.
No. |
|
|
|
|
|
PBS |
---|---|---|---|---|---|---|
1 | 0.549 | 0.396 | 0.248 | 0.084 | 0.021 | 0.040 |
2 | 0.707 | 0.499 | 0.239 | 0.041 | 0.041 | 0.067 |
3 | 0.598 | 0.375 | 0.198 | 0.043 | 0.043 | 0.073 |
4 | 0.653 | 0.356 | 0.191 | 0.050 | 0.050 | 0.073 |
5 | 0.675 | 0.489 | 0.294 | 0.084 | 0.084 | 0.120 |
6 | 0.616 | 0.461 | 0.265 | 0.122 | 0.036 | 0.060 |
7 | 0.670 | 0.406 | 0.196 | 0.041 | 0.041 | 0.067 |
8 | 0.678 | 0.501 | 0.310 | 0.167 | 0.041 | 0.067 |
9 | 0.637 | 0.511 | 0.332 | 0.181 | 0.041 | 0.067 |
To test the effectiveness of CEF, the dataset was divided into two groups, namely, the CEF group and non-CEF group. In this study,
No. | EP of non-CEF group | EP of CEF group |
|
---|---|---|---|
1 | 0.210 | 0.778 | 0.000 |
2 | 0.206 | 0.588 | 0.002 |
3 | 0.204 | 0.611 | 0.000 |
4 | 0.206 | 0.524 | 0.004 |
5 | 0.203 | 0.429 | 0.009 |
6 | 0.210 | 0.533 | 0.013 |
7 | 0.206 | 0.588 | 0.002 |
8 | 0.206 | 0.588 | 0.002 |
9 | 0.206 | 0.588 | 0.002 |
Sampling is a simple and well-known method for parameter studies and robustness evaluations [
Leave one (patient) out analysis to test the robustness of effectiveness (total 150 times).
No. | EP of non-CEF group | EP of CEF group | −log |
|||
---|---|---|---|---|---|---|
Mean | Range | Mean | Range | Mean | Range | |
1 | 0.209 |
|
0.778 |
|
4.215 |
|
2 | 0.206 |
|
0.589 |
|
2.764 |
|
3 | 0.204 |
|
0.612 |
|
3.273 |
|
4 | 0.206 |
|
0.524 |
|
2.360 |
|
5 | 0.203 |
|
0.429 |
|
2.036 |
|
6 | 0.210 |
|
0.533 |
|
1.859 |
|
7 | 0.206 |
|
0.589 |
|
2.764 |
|
8 | 0.206 |
|
0.589 |
|
2.764 |
|
9 | 0.206 |
|
0.589 |
|
2.764 |
|
There were 9 CEF and 15 core herbs generated from the GA process. Since the number of distinct herbs from the overall CEF was relatively small, we want to find out whether a CEF consisting of these 15 core herbs exists or not, if so, check its effectiveness. It was found that such a combination of herbs was in the dataset. Its coreness and effectiveness were evaluated (Table
|
|
EP of non-CEF group | EP of CEF group |
|
---|---|---|---|---|
0.605 | 0.009 | 0.217 | 0.750 | 0.014 |
A herb combination is chosen to promote desirable herb-herb interaction; the efficacy of a TCM formula comes from the synergistic effects of its constituent herb pairs. Therefore, practitioners are interested to identify the potential interacting herbs from a prescription. Based on the previous work [
Analysis of herb-herb interactions in CEF.
No. | Herb pair | SI |
|
|
---|---|---|---|---|
1 | PL | PT | 1.673 | 0.004 |
2 | CD | PT | 1.419 | 0.012 |
3 | CSE | PL | 1.363 | 0.028 |
4 | PT | TP | 1.077 | 0.025 |
Distribution of SI.
Prescription for a diagnosis is a complicated and flexible procedure that integrates the knowledge of TCM theory. TCM practitioners put heavy emphasis on individualities when prescribing formulae in clinical practices. This is very different from the modern western medical therapies that usually comply with a common and operational clinical guideline. Revealing the regularity in prescriptions is an important step to reveal the underpinning TCM theory. It has generated much research interest to discover the regularity from the TCM prescriptions. Although computational models have been applied to reveal the core herbs and herb-collaboration patterns, not much effort has been expended to study their effectiveness. This is a critical and important research to discover these hidden patterns that are core and effective herbal formula.
As for the discovery of CEF, it can be described as a complicated combinatorial optimization problem mathematically, which is concerned with the efficient combination of herbs to meet requirement. The purpose of this study is to set the stage and give an outline of properties of optimization problems that are relevant for discovery of CEF in TCM. We described the process of how to define this problem model that could be solved by GA method. In brief, analytic process consisted of recognizing and defining problems, constructing and solving models, and evaluating solutions. Furthermore, we looked at important properties of CEF, which could be used as the validation criteria. For CEF, there are two key questions to be answered. One is how to evaluate the coreness of a TCM formula and the other is the assessment of its clinical effectiveness.
In this study, the measurements of confidence and support are proxy to the coreness property of a formula; the greater these two values are, the more widely the constituent herbs are used in the prescriptions and the more frequently the formula is used. The definitions
Regarding the assessment of clinical effectiveness, the primary outcome measurement in our study was to quantify information related to the symptoms changes in a cancer treatment. In an internal panel meeting of TCM cancer experts, the most common LC symptoms were identified and they were consistent with the literature [
GA has the ability to solve combinatorial optimization problems, which was reported by the literature [
In this study, we gave an outline description of the way in which a genetic algorithm worked. While a crucial point in using GA is the design of the fitness function, which determines what a GA should optimize. In this study, we designed the fitness function based on two evaluation criteria of CEF, one is coreness which is represented by confidence and support defined in the present paper, the other is effectiveness which is evaluated by the statistic difference in effective proportion between CEF group and non-CEF group. The proposed fitness function is flexible and suitable for both binary and continuous outcome. To apply a penalty constant
Parameter tuning is always a challenging task for GA. The GA toolbox for Matlab developed by the University of Sheffield was used in these experiments. We implement and run the algorithm using different configurations and compared results. Results show that some parameters need careful selection of settings like population size, generation, and
In particular, for multiple records data, which can be also regarded as longitudinal data, there are three types of correlation effects: (1) correlation between variables (herbs), (2) correlation within individual (patient), and (3) correlation between individuals (patients).
As for the research of TCM formula, the first one can be seen as the herb-herb relationship; such relationships are meaningful patterns of herb combination, which provokes many researchers to develop methods to uncover the underlying rules. For this purpose, support- and confidence-based association rules algorithms are generally introduced. Motivated by the idea of association algorithm, we presented the support- and confidence-based criteria (
It is hard to tackle the second correlation, which may undermine the evaluation of herb combination. For example, when one CEF is used for only one patient who visits frequently, although its support may be relatively high because of its large number of times for visit, such CEF is meaningless. However this disadvantage can be reduced by choosing a large sample size. Hence, individual- (patient) based support analysis could be helpful to identify the correlation within patient. In this paper, we gave support based on the patient for the analysis and carried out robustness analysis of CEF’s effectiveness by the leave one (patient) out method. Results showed no concentrated use on patient level for CEF and good robustness also implied the stability for the effectiveness evaluation with a small perturbation in sample (patient level) space, which meant that correlation within patient level in this study did not undermine our evaluation on the effectiveness of CEF and our sample size was appropriate for discovering the reliable solutions.
The last correlation is related to the individual’s factors, such as age, gender, pathology, family history, pulmonary function, and TCM syndrome. In order to reveal the relationship between patient pattern and CEF, another mathematical pattern recognition model needs to be established, which will be in our future work.
A total of 9 CEF were reported with good
In the theory of TCM, deficiency is the important cause and pathogenesis during the occurrence and development of tumor. Lack of vital
All these functions complement each other in order to achieve the effect of a treat for a disease by looking into both its root cause and symptoms. What is more, as a consumptive disease of lung cancer, the digestive function would decline over time, so RS, MA, TP, and CD help to resolve food stagnation and promoting herb absorption.
One interesting observation is the similarity among the CEF; this can help understand underlying TCM therapeutic principles for LC. Since it is fairly common for the doctors in the same hospital to use similar sets of herbs for the same disease (LC), it is necessary and beneficial to compare the results of CEF with an LC dataset from another hospital. It is also worthwhile to observe what CEF are discovered if a larger dataset with higher supports is used.
The herb-herb interactions in CEF were also studied and reported. Four herb pairs with high and significant SI values indicate that they were synergistic. Some of them are present in classic TCM formulae. For example, PT and TP are in Ban xia Chen pi Tang, which contribute to the relieving of cough and reducing sputum.
Therefore, all the results conformed with TCM theory, which indicated the feasibility and validity of the proposal. However, dosages not considered in this work, which are a key aspect in CEF, should be taken as the future work. GA is capable of representing its chromosomes in real numbers, and a reformulation of the fitness function can accommodate this change. A mathematical model of dose-effect needs to be defined. This may increase the complexity of the definition of the fitness functions, but the valuable results will make the effort worthwhile.
After the confidence, support, and effectiveness values related to a CEF were introduced, GA was used to discover the CEF from a TCM cancer clinical dataset. Results indicated that GA is suitable for the discovery of CEF that can be interpreted from the TCM principles. This is just an attempt and exploration of data mining to discover CEF from TCM clinical data. More work is still required to explore the strength, limitation, and appropriateness of the measures if they are relevant to other types of diseases.
The authors are grateful to the anonymous reviewers and the editors for their helpful comments and suggestions, which substantially improved the quality of this paper. This study was supported by the National Natural Science Foundation of China (81173226), 2013 of Chinese medicine industry special (201307006), Clinical Study of Traditional Chinese Medicine of Malignant Tumor Base, State Administration of Traditional Chinese Medicine, Chinese Medicine Cancer Diseases Key Disciplines, Longhua Medical Project, and China Scholarship Council (the student no. is 201206740061s. There is no conflict of interest) involved in this paper.