Pulse Waveform Classification Using Support Vector Machine with Gaussian Time Warp Edit Distance Kernel

Advances in signal processing techniques have provided effective tools for quantitative research in traditional Chinese pulse diagnosis. However, because of the inevitable intraclass variations of pulse patterns, the automatic classification of pulse waveforms has remained a difficult problem. Utilizing the new elastic metric, that is, time wrap edit distance (TWED), this paper proposes to address the problem under the support vector machines (SVM) framework by using the Gaussian TWED kernel function. The proposed method, SVM with GTWED kernel (GTWED-SVM), is evaluated on a dataset including 2470 pulse waveforms of five distinct patterns. The experimental results show that the proposed method achieves a lower average error rate than current pulse waveform classification methods.


Introduction
Pulse diagnosis is one of the most valuable and widely used diagnostic methods in traditional Chinese medicine (TCM) [1]. In pulse diagnosis, physicians palpate the pulse on the radial artery at the styloid process of the radius with fingertips. By recognizing the pulse patterns of patients which are related to different syndromes and diseases with TCM, physicians can customize the scheme of treatment. Pulse diagnosis is a convenient, noninvasive, and effective diagnostic method. However, as the diagnosis result highly depends on physician's sense and experience, pulse diagnosis is a skill that requires considerable training and practice and, for different physicians, may produce significant variations in diagnosis results. Over the last several decades, pulse diagnosis has attracted an increasing amount of attention in both clinical medicine [2][3][4] and biomedicine [5][6][7]. Many techniques developed for measuring, processing, and analyzing the physiological signal [7][8][9][10][11] have been considered in quantitative pulse diagnosis to improve the reliability and consistency of diagnoses.
As an important step in the quantification research of Chinese pulse diagnosis, the automatic classification of pulse waveforms has attracted much recent attention [7,[10][11][12][13]. Pulse waveform classification aims to assign a pulse pattern to a pulse waveform according to the criteria of shape, regularity, force, and rhythm [1]. However, because of the complicated intraclass variations in pulse patterns and the inevitable influence of local time shifts in pulse waveforms, conventional classification methods, such as artificial neural networks [12,13], decision trees [14], and wavelet networks [15], usually cannot achieve satisfactory classification accuracy. Moreover, as most of the previous results are tested on datasets with a small sample size, the effectiveness of these methods still requires further verification on large scale datasets.
Previously, by using edit distance with real penalty (ERP) [16], we proposed an elastic kernel function, Gaussian ERP (GERP) kernel [17], and incorporated it with a kernel difference-weighted -nearest neighbor classifier (KDF-WKNN) [18] for pulse waveform classification, and the experimental result on a dataset with 2470 samples has preliminarily shown its effectiveness. In this paper, we further extend this kind of elastic kernel-based approach by proposing a support vector machine (SVM) with a Gaussian time warping edit distance (GTWED) kernel method  (GTWED-SVM). The difference between GTWED-SVM and the method in [17] can be summarized as follows.
(1) The TWED distance in the proposed elastic kernel function, the GTWED kernel, is more promising for time series classification in comparison with ERP distance [19] and is thus expected to be more effective in enhancing the accuracy of pulse waveform classification. (2) The proposed new method for pulse waveform classification embeds the GTWED kernel in the SVM framework, while the method in [17] incorporates the GERP kernel into the KDF-WKNN classifier. Our experimental results show that, for pulse waveform classification, the GTWED-SVM method can achieve an average error rate (AER) of 9.43% and is more suitable than the KDF-WKNN with GERP kernel (GERP-KDF) and other pulse waveform classification methods.
The remainder of this paper is organized as follows. Section 2 describes the proposed method, that is, GTWED-SVM. Some basic modules for pulse waveform classification, including pulse waveform acquisition and preprocessing, are also introduced in this section. Section 3 provides the experimental results and discussion. Finally, Section 4 concludes this paper.

Method
In this section, we first briefly introduce some background knowledge, that is, pulse waveforms acquisition and preprocessing. Then, we define the GTWED kernel function and propose a new method for pulse waveforms classification, that is, GTWED-SVM.

Pulse Waveform Acquisition and Preprocessing.
The procedure of pulse waveform acquisition and preprocessing is summarized in Figure 1. The first step is to acquire the digital pulse waveforms. This work is performed by our pressure sensor-based pulse waveforms acquisition system [20], which simulates pulse palpation by attaching pressure sensors on the surface of the radial artery at the styloid process of radius. Then, the pulse signals caught by the pressure sensors are transformed to digital pulse waveforms with a sampling frequency of 150 Hz. Finally, the digital pulse waveforms are stored into a PC through the USB interface.
Because of the inevitable powerline interference and different types of artifacts, the acquired pulse waveforms always suffer from the problems of noise and baseline drift as shown in Figure 1. These problems could significantly distort the shapes of the pulse waveforms and, finally, reduce the classification accuracy. Thus, it is necessary to remove the noise and the baseline drift before further analysis. In this paper, we employ a Daubechies 4 wavelet transform to remove noise by empirically comparing the performance of several wavelet functions and adopt wavelet-based methods [9] to remove the baseline drift. After that, each pulse waveform is split into several single-period segmentations according to the onsets, and only one of them is selected for normalization. By using the bilinear interpolation method, all the selected segmentations are normalized to the equal length; that is, each has 150 data points, for pulse waveform classification. Figure 2 shows the typical normalized pulse waveforms of five different pulse patterns, namely, moderate, slippery, taut, hollow, and unsmooth pulses, which are acquired by our pulse waveforms acquisition system.
Pulse waveforms classification suffers from complicated intraclass variations. For example, as shown in Figure 3, the waveform of a moderate pulse with an unnoticeable tidal wave is similar to that of a slippery pulse, and for taut pulses, there are three typical shapes as shown in Figure 4. Moreover, as a common problem in time series classification, local time shifting also has influence on pulse waveforms classification accuracy. Nevertheless, our previous work has preliminarily shown the effectiveness of kernel-based methods in addressing the problems of pulse waveform classification [17]. In this paper, we further extend this kind of method and propose an elastic kernel function, GTWED, for kernel machine-based pulse waveform classification. The details are provided in the following sections.

Gaussian Time Warp Edit Distance Kernel Function.
By utilizing the development in time series matching, namely, TWED [19], we propose an elastic kernel function, GTWED kernel, for pulse waveforms classification. In the following, we first present related work in TWED and then the proposed GTWED kernel function.

Time Warp Edit Distance.
Motivated by the success of dynamic time warping (DTW) [21] in handling time shifting problems, elastic similarity measures are widely used in time series matching. Generally speaking, elastic similarity measures can be grouped into two categories: (1) nonmetric such as DTW and longest common subsequence (LCSS) [22] and (2) elastic metric, which, namely, satisfies the triangle inequality, such as ERP [16]. TWED [19] is a newly developed elastic metric with the following definition. , are time stamps of A 1 and B 1 , respectively, and subject to < , < , whenever < , < . The TWED metric between A 1 and B 1 , denoted by twed (A 1 , B 1 ), is recursively defined as where A (or B ) is the subtime series that consists of the th (or th) to the th ( th) samples of A 1 (or B 1 ), (⋅, ⋅) denotes the -norms, and , ] are two nonnegative parameters which are used to adjust the stiffness of TWED distance. TWED satisfies the triangle inequality and is a metric [19].
TWED metric is effective in handling the problem of local time shifting in time series classification. Moreover, it is also appealing to use the TWED metric for time series retrieval, because many data structures and algorithms have been optimized for efficient indexing and retrieval in metric space [23]. In the following, we show another potential advantage of the TWED metric, that is, in the construction of elastic kernel functions.

Gaussian Time Warp Edit Distance Kernel Function.
By utilizing the TWED metric, we propose a new elastic kernel function, the GTWED, which is defined as follows. Definition 2. Let be a nonempty time series set, and the dimension of each element is less than or equal to ( ≥ 1). Then, the GTWED kernel on is defined as where , are the lengths of times series A 1 and B 1 with , ≤ , twed (⋅, ⋅) denotes the TWED metric, and is the standard deviation of the Gaussian function.
GTWED is an elastic kernel function, which can be regarded as embedding TWED metric into the Gaussian function. Actually, motivated by the effectiveness of elastic measures in handling the time shifting problem, it is tempting to use elastic measures to construct elastic kernel functions for kernel machine-based time series classification. By using the DTW distance, the Gaussian DTW (GDTW) kernel is first proposed and embedded into an SVM for online handwriting recognition with a reported performance comparable to hidden Markov model [24]. Counterexamples, however, have reported the SVM with GDTW kernel (GDTW-SVM) cannot outperform either the SVM with Gaussian radial basis function (RBF) kernel or the nearest neighbor classifier with DTW distance [25] and is not suitable for time series classification [26].
We argue that the poor performance of GDTW-SVM should be attributed to the nonpositive definite symmetric (PDS) property of GDTW kernel function [26]. For SVM, a PDS kernel is required to satisfy Mercer's condition [27], which is essential to ensure the convexity of the optimization problem [28]. Otherwise, the solution to the optimization problem may only be local optimal and may not even converge at all. This may explain why GDTW-SVM may perform well for several tasks, but very poorly for most time series classification applications [25]. 6 Computational and Mathematical Methods in Medicine Actually, for any nonmetric similarity measure, nom (⋅, ⋅) (either elastic or nonelastic), the kernel function (⋅, ⋅) defined in the following form: is definitely not a PDS kernel function [29], where > 0 is a user-specified parameter. That is to say, the necessary condition for (⋅, ⋅) to be a PDS kernel is that nom (⋅, ⋅) is a metric. This can also prove that GDTW is not PDS, because we can easily get GDTW kernel by replacing nom (⋅, ⋅) with the nonmetric measure, that is, DTW distance. In contrast to the GDTW kernel function, the proposed GTWED is constructed by embedding TWED in a Gaussian form as (2). Since TWED is an elastic metric [19], we suppose that GTWED would be more suitable for time series classification than GDTW. In our previous work, we proposed another elastic kernel function, the GERP [17], by embedding an elastic metric, that is, ERP distance into a Gaussian function. Compared with ERP distance, by incorporating a nonnegative parameter ] on the time stamps, the TWED metric provides an easy way to adjust its own elasticity, which makes it more robust to time shifting. Also, experimental results on the UCR time series datasets show that the classification performance of TWED is better than that of ERP [19]. Based on this, we suppose GTWED is more effective in time series classification than GERP and propose to use GTWED-SVM for pulse waveforms classification.

Pulse Waveforms Classification by Using GTWED-SVM.
In this subsection, we first briefly present a survey on SVM. Then, we will describe the pulse waveforms classification method by using GTWED-SVM.

Support Vector Machine.
As a state of the art classifier, SVM has been widely used in many applications [30,31]. Let {( , )} =1 be a set of training samples, where is the th sample in the input space x, and ∈ {+1, −1} is the class label of . In the nonlinear SVM, by using a nonlinear operator Φ(⋅), the input space x is mapped into a Hilbert inner product space Η, as ⋅ → Φ( ) ⋅ Φ( ) = ( , ), where (⋅, ⋅) is a kernel function, and in Η, the two classes samples can be separated by a hyperplane: where w is a weight vector and is bias [28]. For a given training set, there may be many hyperplanes that satisfy (4). SVM classifier finds the optimal hyperplane that maximizes the separating margin between two classes as shown in Figure 5. Mathematically, this hyperplane can be obtained by solving the following optimization problem: Hyperplane M a rg in where is the regularization parameter that controls the tradeoff between margin maximization and classification error. { } =1 is the slack variable that is related to classification errors [31]. By using the technique of Lagrange multipliers [28], the optimization problem can be transformed to an equivalent dual problem: where is the Lagrange multiplier and (⋅, ⋅) denotes the kernel function which should satisfy Mercer's condition. In practice, this optimization problem can be numerically solved through quadratic programming. Then, the decision function of SVM can be represented as where is an unclassified sample.

Pulse Waveforms Classification Framework.
In this paper, we propose to use GTWED-SVM for pulse waveforms classification. Generally, there are mainly two steps in GTWED-SVM, including a training step, which involves Computational and Mathematical Methods in Medicine 7 training the structure of the SVM to obtain the hyperplane and the decision function, and a testing step, which involves using the obtained decision function to obtain the class labels of unclassified pulse waveforms. Let gtwed (⋅, ⋅) denote the GTWED kernel function and let {(x , ) | ∈ {1, −1}} =1 denote a training set of pulse waveforms. By using (2), (6), and (7), for each element z in the test set of pulse waveforms, we can get its class label (z) as where twed (x , z) can be calculated by using (1).

Experimental Results
In this section, we evaluate the effectiveness of GTWED-SVM for pulse waveforms classification. First, a description of the dataset and experimental setup is provided in Section 3.1. Then, in Section 3.2, we present the experimental results of the proposed method.
3.1. Experimental Setup. By using the method described in Section 2.1, we construct a dataset with 2470 pulse waveforms of five patterns which can be classified by their shapes. They are moderate, slippery, taut, hollow, and unsmooth pulses. Table 1 summarizes the information of the dataset. All subjects are patients in the hospital between 20 and 60 years old. Clinical data, for example, biomedical data and past medical history, are also obtained for reference. For each subject, only the pulse signal of the left hand is acquired, and three experts are asked to determine the pulse pattern according to their pulse signal and the clinical data. If the diagnosis results of the experts are the same, the sample is kept in the dataset, else it is abandoned. To the best of our knowledge, this dataset is the largest dataset used for pulse waveform classification. In the following, we use this dataset to evaluate the performance of GTWED-SVM. We adopt the 10-folder cross validation [32] to evaluate the proposed method. This procedure is as follows.
(1) We randomly divide the pulse waveform dataset into 10 subsets. (2) For each subset, repeat the following process: use the subset as the testing set and the other 9 subsets as the training set T. Each training set T is divided into two parts T 1 and T 2 . The dataset T 1 is used for training, and T 2 is used for tuning the parameters. That is to say, we can use T 2 to adjust the parameters of evaluated methods until we find the optimal parameters. Then, we rerun the training step on the larger dataset T by using the optimized parameters. Finally, the classification error rate is measured on testing subset.  [19]. The values of and are selected from [10 −2 , 10 −1 , 1, 10, 10 2 , 10 3 , 10 4 ] and [10 −3 , 10 −2 , 10 −1 , 1, 10, 10 2 , 10 3 , 10 4 , 10 5 ], respectively. In this paper, we adopt the grid search to find the values of , ], , and , because it will always find the optimal values of these parameters. In the following, we use the above methods to evaluate the performance of proposed method.

Performance of GTWED-SVM in Pulse Waveform
Classification. In our previous work, we proposed another method for pulse waveform classification, namely, GERP-KDF [17], which has the best performance in current pulse waveforms classification methods. So, in this paper, we will compare the performance of GTWED-SVM with that of GERP-KDF. As we have employed two distinct kernel functions in the two methods, that is, GTWED and GERP, which are constructed by embedding two similarity measures, TWED and ERP, respectively, we divide the comparison into two parts: similarity measures comparison, that is, TWED versus ERP, and AER comparison, that is, GTWED-SVM versus GERP-KDF.

Comparison of Similarity
Measures. The comparison between TWED and ERP is performed by using 10-folder cross validation under the framework of one nearest neighbor classifier (1NN). Figure 6 plots the error rates obtained by using one nearest neighbor classifier with a TWED metric (1NN-TWED) and one nearest neighbor classifier with an ERP metric (1NN-ERP), while the results of one nearest neighbor classifier with Euclidean distance (1NN-ED) are also plotted for reference. In TWED, the optimal values of and ] are 0.01 and 0.25, respectively. Table 2 shows the average error rates (AERs) of each method.
As we can see that both the AERs of 1NN-ERP and 1NN-TWED are much lower than that of 1NN-ED, which indicates that, compared with the nonelastic metric, that is, Euclidean distance, elastic metrics, that is, TWED and Error rate (%) ERP, are more effective in handling the problem of local time shifting in pulse waveforms and more suitable for pulse waveform classification. Furthermore, among the two elastic metrics, TWED achieves an AER of 0.1084, which is slightly lower than that of the ERP, that is, 0.1128. The comparison results show that, in terms of AER, the TWED metric is better than the ERP metric in the task of pulse waveforms classification.

Performance Comparison of GTWED-SVM and GERP-KDF.
We run 10-folder cross validation on GTWED-SVM and GERP-KDF, respectively, and plot the error rates of two methods in Figure 7. For comparison, we also plot the result of 1NN-ED, 1NN-ERP, and 1NN-TWED in Figure 7. In terms of AER, the kernel based methods, that is, GTWED-SVM and GERP-KDF, are better than those similarity measures, that is, 1NN-ED, 1NN-ERP, and 1NN-TWED.
To give a comprehensive comparison of GTWED-SVM and GERP-KDF, we also count the correctly classified and misclassified samples in each class of the pulse waveforms dataset. The results are represented in the form of confusion matrices as shown in Tables 3 and 4. In the two tables, each column represents the instances in a predicted class, while each row represents the instances in an actual class. In this way, it makes it easy to see if the method is confusing different classes, that is, mislabeling one as another. It is apparent that the bold data on the diagonal of the tables are the numbers of correctly classified samples. Table 5 shows the average error rates (AERs) of the two methods in each pulse pattern, while the bold number denotes the minimum AERs of each row. For all the pulse patterns, GTWED-SVM is able to achieve error rate better than or comparable to GERP-KDF.

Conclusion
By incorporating one of the state-of-the-art time series matching methods, that is, TWED, we propose to use the GTWED kernel and SVM classifier for pulse waveform classification. By using an elastic kernel function, that is, GTWED, the proposed method is promising in addressing intraclass variations and the problem of local time shifting in pulse waveforms classification and thus can achieve lower classification error rates in comparison with other methods. The experimental results on a dataset with 2470 pulse waveforms show that the GTWED-SVM achieves an AER of 9.43%, which is lower than that of other state-of-the-art pulse waveform classification methods. The GTWED kernel in the proposed method can be regarded as the distance substituting kernels by embedding TWED elastic distance into the Gaussian. Commonly, the positive definite symmetric property of this kind of kernel could not be always guaranteed. However, this problem could be solved by using the recursive time warp kernel construction method [33]. In the future, we will further study the effectiveness of elastic kernel function in pulse waveform and other time series classification.