Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease

Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.


Introduction
Traditional Chinese medicine (TCM) is an ancient medical practice system which emphasizes regulating the integrity of the human body and its interrelationship with natural environments [1]. Zheng (meaning syndrome or pattern) is a unique TCM concept. It is the overall physiological and/or pathological pattern of the human body in response to a given internal and external condition, which usually is an abstraction of internal disharmony defined by a comprehensive analysis of the clinical symptoms and signs gathered by a practitioner using inspection, auscultation, olfaction, interrogation, and palpation of the pulses [2]. Chinese practitioners diagnose diseases through "Zheng differentiation. " The Zheng differentiation of TCM considers the etiology, location, nature, and condition of a disease during a specific stage of the disease process based on clinical symptoms and signs. Pulse taking is one of the key methods to gather the signs and symptoms of patients by a practitioner. During pulse taking, TCM practitioners place their fingers on the radial artery, from which various physiological and pathological conditions can be detected. Traditional pulse taking has important clinical value on the diagnosis and prognosis of the diseases, especially angiocardiopathy. Accurate pulse taking can only be done by TCM practitioners with years of experience. Therefore, objective and quantitative pulse diagnosis is highly desirable, which is of help to establish the objective and quantitative criteria on disease diagnosis or Zheng differentiation. Coronary heart disease (CHD) is considered a primary cause of death in developed countries and is predicted to be one of the most common causes of death worldwide by 2020 [3]. Early diagnosis and prevention of CHD are essential and have critical public health implications. Identifying, vascular lesions in early stages to reverse and prevent CHD, stroke, sudden death, and other malignant vascular events are crucial [4]. Previous studies have been conducted to clarify and identify subclinical vascular disease. Thus, a noninvasive, convenient, and efficient method should be developed to detect vascular lesions. In TCM, visceral pathological changes and other information can be obtained by detecting pulses. For instance, pathological changes in CHD are clearly reflected in pulse diagnosis information. Therefore, CHD could be considered a breakthrough point for providing a theoretical basis for investigation the TCM pulse.
Modern medical research has shown that the arterial pulse is caused by heart contraction. The left ventricle ejects blood into the aorta through the aortic valve, causing the velocity, pressure, and diameter in the arterial tree to pulsate [5]. The signal acquired from the radial artery is the comprehensive reflection of the wave form (shape), velocity (fast or slow), period (rhythm), and swing (intensity) of pulse waves in the radial artery [6]. Pulse waves contain human physiological and pathological information. Pulse diagnosis is conducted to investigate this information. Thus, pressure pulse charts, which show pulses in TCM, can be used noninvasively and conveniently to provide insight into visceral diseases, particularly cardiovascular diseases such as CHD. Subjective judgments and description of pulse in TCM rely on the experience of doctors. However, clinical application and development of this technique are restricted. Thus, further study should be conducted using various modern information-processing methods, including time domain analysis [7]; frequency domain analysis, such as the Fourier transform [8]; and combined time-frequency analysis, such as wavelet decomposition [9]. Many quantifiable parameters can be obtained; moreover, fuzzy clustering, the Bayes classifier, support vector machine, artificial neural networks, and other methods can be used to classify and recognize pulse [10][11][12][13]. For feature extraction of pulse, although the Fourier transform provides the average distribution of signal energy, this technique fails to characterize time-varying information of signal frequency and cannot describe the time domain local features of signals. For characterizing the nonstationary signals of pulse waves, an extra high-frequency harmonic signal is necessary. However, a correct and reasonable interpretation of the signal cannot be provided because the highfrequency harmonic signal is noninherent [14]. Although wavelet transform is essentially a Fourier transform with an adjustable window, the signal in a wavelet window must be stable and cannot eliminate the limitations of Fourier analysis.
Hilbert-Huang transform (HHT) is a new self-adapting time-frequency analytic method. This method can be used to conduct self-adapting time-frequency decomposition according to local time-varying characteristics and can overcome the defects of insignificant harmonic component showing nonstationary and nonlinear signal in traditional methods. Moreover, this method enables obtaining high time-frequency resolution and sufficient time-frequency aggregation; hence, this technique is suitable for nonstationary and nonlinear signal analysis [15]. HHT is composed of empirical mode decomposition (EMD) and the Hilbert transform. The core of this technique is EMD, which is performed on the basis of the time-scale feature of data. Therefore, EMD is more suitable for nonstationary and nonlinear data processing than the Fourier and wavelet methods depending on the transcendental function based on the decomposition method. We used the random forest algorithm as the classifier. The training and prediction speeds of this algorithm are high, and an internal unbiased estimation of a generalization error can be generated. The interaction between features and their degree of importance can be detected, and the over fitting does not occur. For an unbalanced classified data set, this algorithm can balance the error and can be easily parallelized. Therefore, we analyzed and determined the pulse condition of patients with CHD in this study by using EMD time series analysis method and the random forest recognition algorithm.

Clinical Material
TCM pulse refers to the pulse sensed by doctors as they palpating the examinee's radial artery with their fingers. Imitating TCM doctors, measurement equipment (cooperatively developed by our research team and Shanghai Asia-Pacific Computer Co. Ltd) was employed to acquire pulse recordings, which provided the basis for objective pulse analysis.
Pulse recordings used in this study were acquired from 342 volunteers for 60 sec with a sampling rate of 720 Hz. Two groups without respiratory system and nervous system disorders were studied. Each subject was instructed to relax for more than 5 min before pulse was recorded.
Group 1 included 225 inpatients with CHD aged 64.8 ± 10.57 years from Longhua Hospital and Shuguang Hospital, which are affiliated to Shanghai University of Traditional Chinese Medicine.
Group 2 included 117 normal subjects, who are selected as control subjects aged 52.17 ± 11.00 years. The subjects were players in the "2010 Zhangjiang ball game competition for the elderly" and staff from Shanghai University of Traditional Chinese Medicine. These subjects have no documented history of cardiovascular disorders. [16]. The proposed technique is based on the intrinsic mode function (IMF) and EMD. EMD involves decomposing a given signal from a small scale to a large scale to obtain the component signal IMF according to local characteristic time scale. The IMF obtained through decomposition must satisfy two conditions: (1) for the entire signal length, the numbers of extreme points and zero crossing of an IMF component must be equal to or differ by 1 at most.; (2) at any time, the average Evidence-Based Complementary and Alternative Medicine 3 value of an upper envelope point defined by the maximum and a lower envelope defined by the minimum is 0.

HHT. Huang et al. proposed the HHT method
EMD was performed as follows.
(1) Three sample interpolation fittings were used for obtaining the upper envelope curves and lower envelope curves of the signal to calculate the average value of the upper and lower envelope curves at each point and, thus, obtain the average curve 1 ( ).
If ℎ 1 ( ) satisfied two conditions of the IMF, then ℎ 1 ( ) was the IMF component in the first order. Otherwise, the difference value between ℎ 1 ( ) and the other envelope median value was calculated again. The IMF component 1 ( ) in the first stage can only be obtained if the difference value sequence satisfies the two conditions of the IMF.
(3) The component 1 ( ) was subtracted from the original signal to obtain the residual signal of the original signal 1 ( ). The signal 1 ( ) was redefined as the original signal.
Steps (1) to (3) were repeated and IMF components were obtained until ( ) was converted into monotone function or reached a constant value.
By performing EMD, n IMF components and a residual signal ( ) can be obtained. Thus, the original signal can be represented using the following equation: Each IMF component ( ) and its Hilbert transform were used to construct an analytic signal as shown in the following equation: where ( ) is the amplitude function, which shows the instantaneous amplitude energy of the signal at each sampling point and 0 ( ) is the phase function, which shows the instantaneous phase of a signal at each sampling point; instantaneous frequency ( ) can be obtained by calculating its derivative. Thus, the amplitude and frequency of the signal are functions of time and are plotted on the time-frequency plane to obtain the Hilbert spectra ( , ). Hilbert showed the global transformation rule for a signal amplitude with time and frequency conversions; this corresponds to the distribution of signal energy in various characteristic scales (time or frequency) to a certain extent.

Energy
For subsequent statistical analysis, normalized energy can be obtained according to the following: (2) The distance between two such vectors is defined as the maximum difference in their corresponding scalar components: where = 0, 1, . . . , − 1.
(3) Given of ( ), for every value from 1 to − , the number of [ ( ), ( ) < ] is calculated, and ̸ = to exclude self-matches. The ratio of the number to ( − − 1) is defined as follows: where 1 ≤ ≤ − . The average value of ( ) is defined as follows: (4) Given of +1 ( ), set where 1 ≤ ≤ − , and ̸ = . The average value of ( ) is then obtained using the following equation: (5) ( ) is the probability that two sequences match at points, whereas ( ) is the probability that two sequences match for + 1 points. The parameter SampEn( , ) is defined as follows: The sample entropy of pulse signal with finite length is estimated using the following statistic: 4 Evidence-Based Complementary and Alternative Medicine The parameters and are used to estimate SampEn [18]. Pincus suggested that = 2, = 0.1 to 0.25 , and is the standard deviation of the original signal ( ), = 1, . . . , . SampEn( = 2, , ) reflects the rate of information production as increases from 2 to 3. The higher the SampEn value is, the higher the rate of the information production, indicating that the signal is more complex.

Random Forest Recognition Method.
A random forest [19] is composed of numerous decision trees, which are formed using a stochastic method. Thus, it is also called a random decision tree. Trees in a random forest do not correlate. After test data are used as input in a random forest to classify each decision tree, the category with the highest classification results in all decision trees is selected as the final result. Therefore, a random forest is a classifier that contains multiple-decision trees, and its output category relies on the mode of output categories of individual trees.
A random forest resampling technique uses the bootstrap method, which entails repeatedly and randomly selecting samples from the original training sample set to generate new training sample sets. Subsequently, classification trees are generated according to the bootstrap sample set to construct random forests. The classification results of the new data rely on the score formed by the vote of the classification tree. The algorithm is presented as follows.
(1) The original training set is , and the bootstrap method is used to randomly select new self-help sample sets and to construct classification trees. Samples not drawn at each time constitute data outside the bag.
(2) In total, all variables are set, and try variables ( try ≪ all ) are randomly selected at each node of each tree. The variable with the greatest classification ability is selected. The variable classification threshold is determined by examining each classification point.
(3) Each tree grows to the maximum size, with on pruning.
(4) The generated multiple classification trees constitute random forests. New data are obtained and classified according to the random forest classifier. The classification results depend on the number of votes provided by the tree classifiers.
Random forests are an improvement of the decision tree algorithm, in which multiple decision trees are merged. Each tree is established on the basis of an independently extracted sample. All of the trees in the forest are uniformly distributed. Classification error depends on the classification ability of each tree and the correlation among these trees. Feature selection is performed to divide each node by using a stochastic method. The errors generated in various circumstances are then compared. Internal estimation error, or the classification and correlation ability, is detected to determine the number of selected features. The classification capability of a single tree may be low. The most likely classification of a test sample is selected after a high number of decision trees are randomly generated and after statistical analysis is performed according to the classification result of each tree.
The number of decision trees in a random forest in this study was 500, and try took the mean square root of all .

Statistical Analysis of Energy and Sample Entropy of Pulse
Signal Based on EMD. EMD involves adaptively decomposing a signal frequency into a series of IMFs from a high level to a low level. IMF at each level adaptively showed signal characteristics with various resolutions. The amplitude of the IMF at each level at each time differed. The amplitude showed the strength change in the signal at a modal. Most pulse signals can be decomposed to level 7 or higher through EMD (IMF 1 -IMF , ≥ 7), and only two pulse signals of patients with CHD could be decomposed to level 6 (IMF = 0, ≥ 7). In EMD, the modal energy at high levels was low; their effects on the entire system were weak. Therefore, we analyzed the IMF 1 -IMF 7 in front of all modals further. The IMF at each level of a patient with CHD and one normal pulse signal after EMD are shown in Figure 1. Figure 1 shows that the components, including IMF 1 -IMF 7 and the residual parameters res., were obtained after EMD of the pulse signal. The frequencies of IMF 1 -IMF 7 decreased successively, and the amplitudes of IMF 1 -IMF 7 increased progressively. The differences of IMFs between Figures 1(a) and 1(b) can be observed. For example, Figure 1(b) shows that components of IMF 1 -IMF 7 had higher morphological variation than those in Figure 1(a), reflecting the irregularity of a normal pulse signal, and had more highfrequency parts especially in IMF 3 -IMF 7 than those in Figure 1(a). In order to quantitatively describe the differences between the CHD patients and the healthy subjects, we extracted the IMF energy and the IMF sample entropy of the pulse signals to make an analysis.
We observed that the variances were nonhomogeneous in the distribution of the IMF energy and the IMF sample entropy of the pulse signals in the CHD and normal groups by using IBM SPSS20.0 statistical software. Thus, we used a nonparametric test for statistical analysis. For the nonparametric test of independent samples of the two groups, we used the rank sum test method to calculate the statistical difference between the two groups. Table 1 shows the statistical difference in the average rank of IMF energy between the two groups. The average rank with IMF normalized energy in the normal group was significantly greater than that in the CHD group. Table 2 shows the statistical difference in the average rank of the IMF sample entropy between the two groups. Owing to the high-frequency modes IMF 1 and IMF 2 were caused by interference, the modes IMF 1 and IMF 2 were discarded without the following analysis. The intermediatefrequency modes, such as IMF 3 , IMF 4 , IMF 5 , and IMF 6 in the CHD group were significantly lower than those in the normal group. No statistically significant difference was observed in IMF 7 between the two groups.

Pulse Recognition Based on the Random Forest Classifier.
We classified and recognized the energy and sample entropy characteristics of the IMFs of the two groups of pulses by using random forest classifier, and the recognition results are shown in Table 3. Table 3 shows that the average recognition rate was 76.35%, when we used only the sample entropy of IMFs as     the feature vector to recognize pulses in the two groups. Furthermore, the average recognition rate was 84%, when we used only the IMF energy as the feature vector to recognize the pulses in the two groups. If the IMF energy and IMF sample entropy were jointly used as the feature vector to recognize pulses in the two groups, the average recognition rate could reach 90.21%.

Discussion
Adaptive HHT exhibits a high time-frequency resolution and local orthogonal self-adaption; moreover, this technique is superior to the wavelet transform and other signal analysis methods. Adaptive HHT can be used to examine instantaneous frequency and energy at multiple scales and is a powerful tool for analyzing biomedical signals [20]. The IMF sequence of pulse signals obtained using EMD is directly separated from the original time series data. The number of decompositions need not be determined in advance without being affected by human factors. Thus, the IMF sequence can effectively reflect the inherent physical characteristics of original data. Its decomposition is objective, inherent, and adaptive. The IMF at each level represents band information with a specific meaning. Table 1 shows that the IMF energy at each level in the CHD group was significantly lower than that in the normal group, which indicates that the energy of the cardiovascular system in patients with CHD was likely lower than that in the normal subjects. Human biological waves such as the pulse wave can directly reflect physical function and health status. Human biological wave energy can be produced through the metabolism of water, air, sunlight, food, and other substances, which are absorbed by molecules and cells of the human body and can be naturally supplied with nutrients and oxygen. High pulse wave energy in the healthy people may result when qi of the viscera and meridians moves harmoniously. As Table 2 shows, the sample entropy of pulse signals in patients with CHD in modes IMF 3 -IMF 6 , which showed statistically significant differences, was lower than that in the normal subjects. Entropy is the rate at which new information is generated; the entropy value reflects the complexity of a system. A higher entropy value shows that the signal generated by the system is possibly random and irregular and that the system adaptability is stronger. A lower entropy value shows that the signal generated by the system is simpler and regular, and that the system adaptability is lower. The result of Table 2 indicated that the adaptive ability of the cardiovascular system of patients with CHD was likely lower than that of the normal subjects. This finding is consistent with a relatively weaker physiological complexity of the human body in the pathological state and, thus, corresponds to regular pathological states. The energy and sample entropy of IMFs are valuable features for characterizing the pathological state of pulse in patients with CHD. The HHT method can be used to analyze pulse signals of nonstationary dynamic change. This method can sensitively capture the primary features of various pulse signal components with the dynamic changes in time and frequency. The energy and sample entropy of IMF provides a new basis for the feature extraction and pattern recognition of various pulses. Random forest is a classifier that produces highly accurate with high training and prediction speeds. This classifier can generate an internal unbiased estimation of a generalization error during classification. Furthermore, overfitting never occurs. Random forest can also balance the errors in unbalanced data sets. In this study, we classified and determined the IMF energy and IMF sample entropy of pulse signals by using random forest classifier. The energy of IMFs, sample entropy of IMFs, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Compared with the separate use of IMF energy or IMF sample entropy as a feature vector, the combined use of IMF energy and IMF sample entropy as a feature vector improved the classified accuracy for the CHD group and the normal group. Although the sample size in the CHD group was differed from that in the normal group, the unbalanced sample capacity of random forest was superior. Nevertheless, we achieved satisfactory classification results.

Conclusion
Pulse diagnosis is a characteristic diagnostic method in TCM. Pulse detection is a noninvasive technique with the simple and easy operation and the stable performance, which does not require expensive equipment. A pressure pulse signal that corresponds to the pulse condition in TCM can be detected noninvasively and conveniently to obtain the pathological and physiological status of the cardiovascular system. We extracted the energy and sample entropy of IMFs of pulse signals as pulse features and used random forests as the classifier of pulse signals. The results illustrate that the proposed methods or pulse-signal processing and classification is effective and efficient. This study offers a new method for developing and promoting a noninvasive pulse diagnostic technique. Furthermore, this research provides 8 Evidence-Based Complementary and Alternative Medicine objective and quantitative parameters of TCM pulse and lays a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.