Coronary artery disease (CAD) is the leading causes of deaths in the world. The differentiation of syndrome (ZHENG) is the criterion of diagnosis and therapeutic in TCM. Therefore, syndrome prediction
Coronary artery disease (CAD), which is a narrowing of the small blood vessels that supply the heart with blood, oxygen, and nutrients, is the most common cardiovascular disease (CVD). It is the leading cause of death in the world. According to the newest World Health Organization reports, an estimated 17.3 million people died from CVD in 2008, representing 30% of all global deaths [
CAD is caused by many factors such as genetics, the environment, harmful use of alcohol, unhealthy diet, tobacco, and others. In western medicine, CAD is treated by surgical operation, pharmaceutical drugs, physical activity, and other interventional therapies. These achievements typically lead to good outcomes by decreasing rates of death related to CAD. However, these methods generally focus on the structure and function of the heart, but ignore differences in systematic functions, curative reaction, and the individual. Since western medicine faces problems such as high cost and significant side effects, Traditional Chinese Medicine (TCM) can be a complementary alternative to overcome these defects. In TCM, CAD belongs to the scope of chest heartache and cardiodynia [
In order to achieve an effective and objective standard of syndrome prediction, many researchers have used a data mining approach to construct the classifier for the TCM dataset [
In recent years, there has been remarkable progress in thesyndrome predictionof TCM. Data have focused on two aspects: feature selection (symptom selection) and syndrome prediction (syndrome classification). Jie et al. investigated syndrome factors of CAD by using the support vector machine (SVM) method on the basis of 15 typical medical records from prominent TCM doctors. Eight syndromes were drawn, including blood stasis, turbid phlegm, Qi deficiency, Yang insufficiency, Yin deficiency, inner heat, blood deficiency, and Qi stagnation [
Though many achievements have been made in syndrome prediction, there are still some problems left, which deserve discussion [
In this paper, 987 CAD cases were used for selecting related symptoms and building the predicting model of CAD syndrome. Based on symptoms, we propose a syndrome prediction method which integrates SVM feature selection and Bayesian network classifier to improve the predictive performance of the classifier.
The rest of this paper is organized as follows. Section
In this paper, the cases were collected from two provinces including 5 clinical centers from June 2005 to October 2008, where patients who suffered from CAD were surveyed. Each patient was diagnosed by western doctors by means of coronary artery angiography.
Inclusion criteria are as follows [ Each case must have been diagnosed with CAD defined by the American College of Cardiology (ACC) together with American Heart Association (AHA) in 2002. Each case was verified by coronary artery angiography as having at least one branch of the coronary artery main branch with stenosis larger than 70% or coronary artery left diameter stenosis greater than 50%. Each case must have included an attached informed consent signed by each patient. Each patient was greater than 35 years of age.
In western medicine, the diagnosis of patients was in accordance with the “Guidelines for the diagnosis and management of chronic angina pectoris, unstable angina pectoris, and non-ST elevation myocardialinfarction” released by the ACC/AHA, and “Recommendation about Diagnosis of Diagnosing Unstable Angina Pectoris” released by Chinese Society of Cardiology in 2000. In TCM, syndrome diagnosis was in accordance with the foundation theory of TCM. For example, the diagnosis of blood stasis was judged by “Standard of Blood Stasis Diagnosis” (1986.11, Guangzhou); the diagnosis of deficiency was treated by “Standard of TCM Syndrome Differentiation of Deficiency” (1986.5); the diagnosis of turbid phlegmwas decided by “Classification Code of TCM Diseases”; the others depended on the teaching materials (“
There were two exclusion criteria [ any patient with acute ST-segment elevation myocardial infarction, and any patient who also suffers from concomitant serious diseases such as liver orkidney disease.
Each symptom has four levels: none, light, middle, and severe. Each case was diagnosed as a syndrome by experienced TCM experts. Each symptom was considered a feature; the diagnosed syndrome was taken as a response.
In total, we evaluated 1,008 cases of patients, including the diagnosis results of western medicine and TCM, and over 100 symptoms of both western medicine and TCM. Data were compiled according to the characteristics of syndromes of CAD, sthenia and asthenia syndromes follow CAD. In regards to the foundation and practice of TCM, sthenia syndromes include Qi stagnation, blood stasis, cold coagulation, phlegm turbidity, heat accumulation, water retention, and dampness pathogen; asthenia syndromes include Qi deficiency, blood deficiency, Yin deficiency, Yang deficiency, heart deficiency, liver deficiency, spleen deficiency, kidney deficiency, and lung deficiency.
In every case, there were over 70 diagnostic symptoms in TCM and above 30 lab-measured symptoms in the western medicine information. For TCM diagnosis, there was Qi stagnation, blood stasis, cold coagulation, phlegm turbidity, heat accumulation, Qi deficiency, Yin deficiency, and so on. A histogram of syndromes of TCM diagnosis results is shown in Figure
Histogram of syndromes of TCM.
In the process of medical surveys, there inevitably exists missing data. Cases were discarded if the missing data frequency rate of it symptom was more than 70%. Some symptoms which were not treated by data mining technique were removed. If its syndrome was not in the top six syndromes, the case was discarded. Overall, there were 113 features including 78 TCM symptoms and 35 lab-measured indexes. Details of the symptoms are shown in Table
Symptom list.
Symptoms of comprehensive subset | |||||
Symptoms of TCM subset | Symptoms of western medicine | ||||
(1) Chest pain | (21) Sighing | (41) Frothy sputum | (61) Red eye | (79) ST normal | (97) Ef |
(2) Oppression in chest | (22) Depression | (42) Pharyngeal foreign body | (62) Deep-colored eye weeks | (80) ST lower than | (98) A/e |
(3) Shortness of breath | (23) Inappetence | (43) Thirst without large fluid intake | (63) Eyelids swelling | (81) ST greater than 0.1 | (99) Wall motion |
(4) Palpitation | (24) Abdominal distension | (44) Tastelessness | (64) Dark red lip and gingivitis | (82) ST limb breast high | (100) Valve regurgitation |
(5) Cough | (25) Ruffian of epigastrium | (45) Bitter taste in mouth | (65) Light-colored lip and methyl | (83) ECG | (101) Regurgitant degree |
(6) Chilly sensation and the cold limbs | (26) Belching | (46) Sweet taste in mouth | (66) Deep-colored palate mucosa | (84) Q wave | (102) Leukocyte |
(7) Tiredness and fatigue | (27) Nausea and vomiting | (47) Salty taste in mouth | (67) Less abdominal pressure | (85) Frequent extrasystole | (103) Neutral % |
(8) Spontaneous sweating | (28) Loose stool | (48) Sticky and greasy sensation in mouth | (68) Lower extremity edema | (86) High left ventricular voltage | (104) Lymph % |
(9) Night sweating | (29) Constipation | (49) Morning diarrhea | (69) Faint low voice | (87) T wave | (105) Erythrocyte |
(10) Dysphoria with feverish sensation in chest, palms, and | (30) Soreness and weakness of waist and knees | (50) Powerless in defecation | (70) Atrophy | (88) Diameter of main root | (106) Hemoglobin |
(11) Dry eyes | (31) Frequent urination at night | (51) Deep-colored urine | (71) Tongue quality | (89) Main pulmonary | (107) Platelet |
(12) Dry mouth | (32) Limb numbness | (52) Clear urine in large amounts | (72) Patchy petechia and ecchymosis | (90) Left atrial dimension | (108) Fasting plasma glucose |
(13) Dizziness | (33) Heel pain | (53) Residual urine | (73) Tongue body | (91) Interventricular septum thickness | (109) TG |
(14) Amnesia | (34) Hemiplegic limbs | (54) Coldness in abdomen and waist | (74) Quality of tongue coating | (92) Pulsatile range | (110) TG |
(15) Vertigo | (35) Subcutaneous ecchymosis | (55) Heavy limbs | (75) Color of tongue coating | (93) End-diastolic diameter | (111) HDL |
(16) Tinnitus | (36) Rough skin | (56) Pale complexion | (76) Body fluid on tongue coating | (94) Systolic diameter | (112) LDL |
(17) Facial flush | (37) Obesity | (57) Suddenly white complexion | (77) Vein color | (95) Right ventricular diameter | (113) Fibrinogen |
(18) Insomnia | (38) White phlegm | (58) Darkish complexion | (78) Vein type | (96) Outflow tract | |
(19) Fussy temper and irascibility | (39) Yellow phlegm | (59) Sallow complexion | |||
(20) Distending pain in the hypochondria | (40) Blood in the sputum | (60) Flushing |
In general, syndrome prediction of CAD included the symptom selection phase and syndrome prediction phase. Symptom selection was regarded as the problem of feature selection, and syndrome prediction was regarded as supervised pattern classification in data mining fields. In the feature selection phase, mingling symptoms including TCM symptoms and western symptoms were selected to be used as feature of the syndrome prediction model. In the syndrome prediction phase, every case was classified as blood stasis, phlegm turbidity, Qi deficiency, Yin deficiency, Yang deficiency, and kidney deficiency based on the syndrome prediction model.
Symptoms are essential to diagnose CAD for everyone from TCM doctors to western medicine doctors. Therefore, a strong predicting model of syndrome is based on key symptoms. In this phase, we investigated which symptoms influence the predicted syndromes most. We propose two feature selection methods to discover critical symptoms. In this paper, we design SVM and Markov blanket feature selection methods to identify the optimal symptom subset.
SVMs have been an acknowledged tool with high accuracy and efficiency for data classification. The basic idea is to map data into a high dimensional space and find a separating hyperplane with the maximal margin [
Together with penalty function or optimization objective, SVM can be exploited to select appropriate features or optimal feature groups. As for the feature selection problem, there are two alternative situations [
One may distinguish between the two types of methods to solve the problem of filter and wrapper methods [
Several existing strategies have been combined with SVM for feature selection. Given training vectors
We selected features with high Calculate Pick possible thresholds as cutoffs for For each threshold, complete the following: drop features with randomly split the training data into let repeat the steps above five times and then calculate the average validation error. Choose the threshold with the lowest average validation error. Drop features with
Finally, the features with efficient prediction power were selected.
Compared with SVM feature selection, we also designed Markov blanket feature selection which was firstly proposed by Koller and Sahami in 1996 [
Syndrome prediction is important for doctors. In this study we presented a Bayesian network framework to construct a high-confidence syndrome predictor by integrating a comprehensive list of mingling symptoms. In fact, it is a classification that is a basic task in data analysis and pattern recognition that requires construction of a classifier, that is, a function that assigns a class label to instances described by a set of features [
Bayesian network, which is one of the most effective classification method for graphically representing and processing feature interdependencies, represents a joint probability distribution over a dataset [
The differences in Bayesian network was focused on the way in which they search through the space of nodes. In the process of searching, there are two steps: model evaluating and model optimization. There are many model evaluating methods such as Akaike Information Criterion (AIC), Minimum Description Length (MDL), and Cross-Validation Likelihood (CVL). In this paper, we adopted a simple estimator [
For model optimization, we adopted K2 that one simple and very fast learning algorithm starts a given ordering of the features. Then it processes each node in turn and greedily considers adding edges from previously processed nodes to the current one. In each step it adds the edge that maximizes the network’s score. When there is no further improvement, attention turns to the next node [
Symptoms are selected to reduce the dimension of symptoms in predicting syndromes of CAD and to find the most related symptom subsets to improve the precision of syndrome prediction. In this experiment, datasets were grouped into three subsets: the TCM subset, the western subset, and the comprehensive subset. Every case was labeled with asthenia, sthenia or mingling syndrome. We collected 78 TCM symptoms in the TCM subset, 35 lab-measured indexes in the western medicine subset, and 113 mingling symptoms in the comprehensive subset. We quantitatively assessed the relatedness of each feature for syndrome prediction by SVM feature selection on the basis of tenfold cross-validation tests. By means of SVM feature selection, symptom ranking results of three subset sare shown in Table
Ranked symptoms by means of SVM feature selection.
Dataset | Rank list of NO. symptom |
---|---|
TCM | 75, 8, 73, 52, 36, 50, 22, 54, 40, 31, 13, 26, 30, 42, 23, 74, 71, 6, 49, 27, 7, 25, 78, 11, 20, 35, 4, 60, 34, 65, 10, 72, 33, 32, 59, 63, 9, 3, 67, 61, 57, 17, 18, 66, 64, 43, 5, 45, 76, 19, 38, 77, 16, 24, 2,28, 14, 44, 62, 56, 70, 55, 1, 68, 53, 29, |
WM | 17, 27, 26, 30, 13, 20, 18, 15, 11, 29, 16, 14, 12, 10, 7, 35, 33, 24, 22, 31, 5, 28, 34, 25, 19, 23, 4, 9, 32, 8, 3, 6, 1, 2, 21 |
Comprehensive | 95, 71, 102, 108, 92, 78, 107, 101, 73, 7, 97, 40, 27, 8, 82, 22, 85, 75, 31, 23, 74, 109, 103, 42, 30, 5, 10, 35, 106, 50, 6, 52, 65, 11, 57, 20, 89, 18, 13, 81, 113, 111, 79, 77, 36, 54, 9, 104, 67, 60, 44, 25, 72, 64, 83, 16, 3, 59, 24, 32, 21, 49, |
The performance of symptom selection was estimated by the classifier. In this experiment, we adopted seven classifiers: Naïve Bayes, Bayesian network, C4.5, Logistic, RBF Network, SMOSVM, and KNN. These seven classifiers are implemented in Weka [
The relationships between the AUC and symptom number in TCM subset are shown in Figure
Relationship between AUC and symptom number in the TCM subset.
Relationship between AUC and symptom number in the western medicine subset.
Relationship between AUC and symptom number in the comprehensive subset.
Compared with SVM feature selection, we also constructed the Markov blanket method, which considered the performance in the field of feature selection. After Markov blanket feature selection, we observed 28 symptoms in the TCM subset, 10 in the western medicine subset, and 35 in the comprehensive subset. We selected the top 25, 10, and 35 symptoms from the ranked list of three subsets. These results are shown in Figure
Comparative results of weighted AUC by using SVM and Markov blanket methods.
TCM subset
WM subset
Comprehensive subset
In all results, the optimum feature subset is essential to predict syndromes of CAD. From Figures
Comparative results of syndrome prediction using the Bayesian network classifier.
All 35 symptoms above were collected for predicting syndromes of CAD. According with the foundational theory of TCM, sthenia can be divided into Qi stagnation, blood stasis, cold coagulation, phlegm turbidity, heat accumulation, water retention, and dampness pathogen, while asthenia can be divided into Qi deficiency, blood deficiency, Yin deficiency, Yang deficiency, heart deficiency, liver deficiency, spleen deficiency, kidney deficiency, and lung deficiency. In this paper, we constructed syndrome prediction models of Qi stagnation, blood stasis, cold coagulation, phlegm turbidity, heat accumulation, water retention, and dampness pathogen. On the dataset with the optimum symptoms, a prediction model of the Bayesian network was built as described in Section
Results of syndrome prediction based on Bayesian network.
Index | ||||
Syndrome | Weighted precision | Weighted recall | Weighted | Weighted AUC |
Blood stasis | 0.763 | 0.761 | 0.762 | 0.811 |
Phlegm turbidity | 0.740 | 0.746 | 0.742 | 0.791 |
Qi deficiency | 0.750 | 0.747 | 0.748 | 0.766 |
Yin deficiency | 0.656 | 0.663 | 0.640 | 0.589 |
Yang deficiency | 0.926 | 0.926 | 0.926 | 0.946 |
Kidney deficiency | 0.735 | 0.728 | 0.731 | 0.766 |
We extensively compare the Bayesian network predictor with the following four methods: C4.5, Logistic, Naïve Bayes, and RBF network. And these five methods are implemented by Weka. Default parameters are exploited to predict syndromes. ROC curve analyses were used for estimating the performance of five classifiers. Comparative results are shown in Figure
Comparative results of syndrome prediction with five classifiers.
Figure
In this paper, we attempted to predict patient syndromes according to our constructed predicting model based on the related symptoms separately in TCM and western medicine. Instead of using all of the symptoms in diagnosis, SVM feature selection can be used to select 35 of the 113 symptoms by assessing the predictive power of syndrome prediction. The prediction process implemented by feature selection techniques achieved more successful forecasting performance. In addition, they reduced the dimensions of the dataset so that the complexity of the syndrome predictor was decreased. The 35 symptoms subset was significant to diagnosis in clinical practice. Syndrome prediction processes of CAD based on the Bayesian network wasemployed to construct the prediction models of six syndromes for CAD in TCM. It resulted in better performance than four classifiers by means of ROC curve analyses without affecting the distribution of classes. We can conclude that our methods may be used for predicting the syndromes of CAD. Further research is under way addressing doctors’ experience and knowledge related to constructing a Bayesian network structure.
P. Lu, J. Chen, and H. Zhao contributed equally to this work.
This work was supported by the National Basic Research Program of China (973 Program) under Grant no. 2011CB505106, the Creation for Significant New Drugs under Grant no. 2009ZX09502-018, the International Science and Technology Cooperation of China under Grant no. 2008DFA30610, National Science Foundation of China under Grant no. 81173463 30902020 and 81102730, the New Century Excellent Talent Support Plan of the Ministry of Education under Grant no. NCET-11-0607, the Beijing Science and Technology Star under Grant no. 2011069, the Beijing Common special construction projects, and the Foundation of Beijing University of Chinese Medicine of Education Ministry of China under Grant no. 2011-CXTD-06 and 2011JYBZZ-JS090.