Blood Pressure Estimation Using Photoplethysmography Only: Comparison between Different Machine Learning Approaches

Introduction Blood pressure (BP) has been a potential risk factor for cardiovascular diseases. BP measurement is one of the most useful parameters for early diagnosis, prevention, and treatment of cardiovascular diseases. At present, BP measurement mainly relies on cuff-based techniques that cause inconvenience and discomfort to users. Although some of the present prototype cuffless BP measurement techniques are able to reach overall acceptable accuracies, they require an electrocardiogram (ECG) and a photoplethysmograph (PPG) that make them unsuitable for true wearable applications. Therefore, developing a single PPG-based cuffless BP estimation algorithm with enough accuracy would be clinically and practically useful. Methods The University of Queensland vital sign dataset (online database) was accessed to extract raw PPG signals and its corresponding reference BPs (systolic BP and diastolic BP). The online database consisted of PPG waveforms of 32 cases from whom 8133 (good quality) signal segments (5 s for each) were extracted, preprocessed, and normalised in both width and amplitude. Three most significant pulse features (pulse area, pulse rising time, and width 25%) with their corresponding reference BPs were used to train and test three machine learning algorithms (regression tree, multiple linear regression (MLR), and support vector machine (SVM)). A 10-fold cross-validation was applied to obtain overall BP estimation accuracy, separately for the three machine learning algorithms. Their estimation accuracies were further analysed separately for three clinical BP categories (normotensive, hypertensive, and hypotensive). Finally, they were compared with the ISO standard for noninvasive BP device validation (average difference no greater than 5 mmHg and SD no greater than 8 mmHg). Results In terms of overall estimation accuracy, the regression tree achieved the best overall accuracy for SBP (mean and SD of difference: −0.1 ± 6.5 mmHg) and DBP (mean and SD of difference: −0.6 ± 5.2 mmHg). MLR and SVM achieved the overall mean difference less than 5 mmHg for both SBP and DBP, but their SD of difference was >8 mmHg. Regarding the estimation accuracy in each BP categories, only the regression tree achieved acceptable ISO standard for SBP (−1.1 ± 5.7 mmHg) and DBP (−0.03 ± 5.6 mmHg) in the normotensive category. MLR and SVM did not achieve acceptable accuracies in any BP categories. Conclusion This study developed and compared three machine learning algorithms to estimate BPs using PPG only and revealed that the regression tree algorithm was the best approach with overall acceptable accuracy to ISO standard for BP device validation. Furthermore, this study demonstrated that the regression tree algorithm achieved acceptable measurement accuracy only in the normotensive category, suggesting that future algorithm development for BP estimation should be more specific for different BP categories.


Introduction
Blood pressure (BP) is one of the main risk factors for cardiovascular diseases. Abnormal BP has been a potent issue that causes strokes, heart attacks, and kidney failure [1]. At present, cuff-based BP measurement devices have been widely used in hospital settings to detect abnormal BP [2]. However, they are not convenient and comfortable for the users.
In the past few years, various research groups have attempted numerous techniques in order to achieve cuffless BP measurement. e key measuring principle for cuffless BP estimation is based upon the time taken by a pulse from the heart to the finger. ey are known as pulse transit time (PTT) or pulse arrival time (PAT) [3][4][5][6][7][8][9][10]. Other researchers used vascular transit time (VTT) which was calculated from the time difference between photoplethysmograph (PPG) measured at the fingertip and phonocardiograph measured at the chest [11]. Cuffless BPs were also measured using the tonometry technique based on the information from multiple pressure sensors on the radial artery tree [6,12]. Another group of researchers introduced the cuffless BP measurement technique using modified normalised pulse volume and heart rate [13]. Multiple magnetic sensors have also been used to measure pulse wave velocity (PWV) for the estimation of cuffless BP [14]. Although some of the cuffless BP devices achieved overall acceptable accuracies, the above mentioned algorithms required at least two sensors [15], making them unsuitable for true wearable applications. erefore, developing a single PPG-based cuffless BP estimation algorithm with enough accuracy would be clinically and practically useful.
Recently, machine learning algorithms, including support vector machine (SVM), multiple linear regression (MLR), and neural networks algorithms, have been used to estimate cuffless BP. Zhang and Feng applied the SVM algorithm to waveform features that were extracted from PPG signal segments collected from the University of Queensland Vital Signs dataset [16]. Nevertheless, their study only achieved the SBP and DBP measurement accuracies of 11.6 ± 8.2 mmHg and 7.6 ± 6.7 mmHg [16]. Kawanaka et al. tested MLR algorithm with their own collected dataset. eir training data included old individuals while testing datasets gathered from young individuals [17]. Visvanathan et al. also used PPG signal features with both linear regression and SVM algorithms to estimate cuffless BP [18]. However, these studies failed to meet ISO noninvasive BP device accuracy (average difference no greater than 5 mmHg and SD no greater than 8 mmHg). Other researchers also developed a cuffless BP measurement device with acceptable accuracy in terms of mean difference (3.8 mmHg for SBP and 4.6 mmHg for DBP) accuracy, but unfortunately, their measurement techniques have not been described in detail [19]. Furthermore, in all the published studies, the measurement accuracies have not been evaluated specifically in different clinical BP categories (normotensive, hypertensive, and hypotensive).
is research aimed to develop and compare three machine learning algorithms (regression tree, MLR, and SVM) to estimate BPs only using pulse waveform features derived from good quality PPG signals. In addition, their estimation accuracy would be evaluated for three different clinical BP categories (normotensive, hypertensive, and hypotensive).

Methods
e overall flow diagram of the proposed research methodology is presented in Figure 1, which is summarised in the following steps: (1) Extract PPG signal segments and reference BPs (SBP and DBP). Only the acceptable quality of 5 s data segments was saved.  [20]. e length of each extracted segment was 5 seconds. During data segmentation, a manual check was performed to avoid unacceptable quality of the PPG signal with the movement artefact and to exclude the segments without corresponding reference SBP and DBP data. e manual check was performed to ensure our machine learning models being developed did not have any interference of bad signals, allowing the BP results from different machine learning approaches to be more comparable. e number of unacceptable signal segments and the segments without reference SBP and DBP data were 9772 and 5572. Figure 2 illustrates some examples of bad quality PPG segments.  In total, as given in Table 1, 8133 signal segments of both good quality PPG and reference NIBP data were collected from the online database of 23617 signal segments. Next, each of the good quality segments was grouped into three different BP categories according to their reference BPs and the BP classification chart, as shown in Figure 3(a). e normotensive category included 6482 segments which were about 80% of the total good quality segments. e remaining hypertensive and hypotensive categories contained 1015 (12%) and 636 (8%), respectively, as shown in Figure 3(b). Since the BPs varied during the long period of recording, each case included variable BP segments under different BP categories, as shown in Table 1.

PPG Signal Preprocessing.
Each PPG segment was firstly processed with a 4th order and 19 frame length Savitzky-Golay filter. is filter is a moving average filter to smooth the PPG signal. It was selected due to the advantage of sharp edge preservation [21]. Baseline wandering caused by the respiratory activity was also removed from the segments. e 2-dimensional normalization (in both width and amplitude) was then performed. Figure 4 shows how a raw PPG segment is transformed to a normalised pulse. Since the reference NIBP was constant during the 5-second period of the segment, no further preprocessing of reference NIBP was required.

Features Extraction and Selection.
Five different waveform features were initially extracted from each of the preprocessed PPG segments, which consisted of pulse area, pulse rising time, width 25%, width 50%, and width 75%. e "pulse area" feature of the PPG segment reflects the vascular tone changes [22]. Pulse rising time is associated with BP changes. It has been reported that it appeared earlier in younger than in older individuals [23]. Sinha et al. included this important feature in their algorithm to estimate cuffless BP [18]. e PPG pulse widths are associated with the systemic vascular resistance [24].
To select the most significant features, the multicollinearity test was applied in this study. e presence of multicollinearity among the predictor variables affects the generalizability of the algorithm, causing a high estimated mean square error of the algorithm. Variance inflation factor (VIF) as an important diagnostic tool for multicollinearity among predictors, was used to determine the presence of collinearity among predictors [25]. If VIF of a predictor is larger than 10, it indicates that the predictor is highly collinear with another predictor. e most significant features were identified with the multicollinearity test on the basis of their VIF. After the multicollinearity, width_50% and width_75% were eliminated from the training dataset due to their VIF > 10.

Machine Learning Algorithms to Estimate BPs.
e training and testing dataset consisted of three most significant PPG waveform features (pulse area, pulse rising time, and Width_25%) from each of the 8133 PPG segments and their corresponding reference BPs (SBP and DBP). Due to the continuous nature of data, three commonly used regression-based machine learning algorithms were applied in this study as follows.

Multiple Linear Regression (MLR).
MLR is a type of the machine learning algorithm that has been widely used by previous researchers to estimate cuffless BP [3,7,26]. e algorithm started with the random selection of coefficients of the linear algorithm (θ 0 , θ 1 , θ 2 , and θ 3 ). Each predictor was associated with a coefficient as shown in a virtual box in Figure 5(a). After each iteration, the coefficients and random error (ε, the difference between the estimated and reference BP) were updated. e least square algorithm was used to minimize the squared error as shown in Equation (1). Iterative minimization of the squared error continued until it converged when BP estimation was generated:    Journal of Healthcare Engineering where m � total number of training data (90% of 8133), ε � random error, θ 0-3 � coefficients, h(x) � estimated BP, and y � reference BP.

Support Vector Machine (SVM)
. SVM is a nonparametric algorithm that uses kernel function. SVM regression has a similar goal as in the least square method of MLR to minimize the error function (squared error between the estimated and reference BP). However, its approach for minimizing the function is different with MLR as it uses epsilon (ε), and the goal is to find a function whose error was no greater than ε. In this study, linear epsilon SVM (ε-SVM) regression which is also called L1 loss was implemented. ε-SVM has two boundaries across the hyperplane (regression line), as shown in the line across hyperplane in Figure 5(b). However, in reality, not all residuals were laid in epsilon boundary. erefore, slack variables (another boundary) were introduced to cover all the remaining residuals, as shown in a dashed line across hyperplane in Figure 5(b). Slack variables were added to make a dual objective. Each iteration updated the vectors existing in a dual objective, and the equation was analytically solved by Lagrangian function.
In SVM, the convergence criteria were based on the following equation: where J(β) is called the primal objective. L(α) is a dual objective that was solved by the Lagrangian function. e goal was to minimize the Lagrangian function to get BP estimations. Δ represents the feasibility gap. To converge the algorithm, feasibility gap should be less than the gap tolerance [27].

Regression Tree.
Regression tree algorithm is another nonparametric machine learning approach for making predictions. It is a relatively fast algorithm to train the data as compared to the SVM algorithm. It carries decisions from the root nodes to the leaf nodes. Regression trees are the binary trees, and the leaf that contains responses is in numeric form [28]. It splits the data with the best optimization criteria (that subject to tree depth (α); minimum leaf size (β)) on each predictor (pulse area, pulse rising time, and width_25%). Criterion for stopping the split to make a pure node based on the mean square error (MSE) is shown as follows:
A pure node indicates that the MSE of the observed response is less than the MSE of the observed response from all the data multiplied by the tolerance [28]. For optimization, the algorithm splits the branches of trees to minimize the prediction error as shown in Figure 5(c).

Tenfold Cross-Validation.
In total, 8131 × 3 good quality PPG signal features and reference BPs were used to train and test the above three machine learning algorithms with 10-fold cross-validation. In each iteration, 9 folds were used to train an algorithm, and the remaining fold was used to test that algorithm. e process continued until 10 iterations were completed. In the end, there was one  estimated SBP and one DBP for each of the 8133 signal segments.

Data Analysis to Evaluate Overall Measurement
Accuracy. e three machine learning algorithms (regression tree, MLR, and SVM) were firstly evaluated in terms of overall BP estimation accuracy. After the 10-fold cross-validation of all available segments, each segment contained reference BPs (mmHg), estimated BPs (mmHg), and the difference (mmHg) between reference and estimated BP. e averaged BPs (including both reference and estimated BPs) were calculated for each case based on all the available segments in that case. e final mean and SD of estimated BPs were then calculated for all 32 cases as an overall estimation for SBP and DBP, separately for the three machine learning algorithms. ey were then compared with their reference BPs in each case to obtain overall estimation accuracy (mean difference and SD of difference).

Data Analysis to Evaluate Measurement Accuracy in Each BP Category.
For the categorical evaluation, the estimated BPs for each of the available PPG segments in each case were separated into three groups according to their reference BP category (normotensive, hypertensive, and hypotensive). For each case, the averaged BPs were then calculated from all the available segments under each category, which were used to obtain overall BPs across all the 32 cases, separately for each BP category. Finally, the mean difference and SD of difference between the reference and estimated BPs were calculated for each BP category and plotted using the Bland-Altman method.

Comparison of Overall BP Measurement Accuracy.
e overall BP measurement accuracy, as shown in Figures (6(a) and 6(b)) and Table 2, showed that the regression tree achieved the smallest mean difference of SBP (−0.1 mmHg between reference and estimated SBP) and SD of difference (6.5 mmHg) when compared with the MLR and SVM algorithms. Similarly, the regression tree achieved an acceptable mean difference (−0.6 mmHg between reference and estimated SBP) and SD of difference (5.2 mmHg) for DBP. It was also observed that only the regression tree method achieved overall acceptable accuracy to ISO standard for NIBP device validation with an average difference no greater than 5 mmHg and SD no greater than 8 mmHg. Figures (6(c)-6(h)) shows the Bland-Altman plots between the reference and estimated BPs from the three machine learning algorithms.

BP Measurement Accuracy under Each BP Category.
e estimation accuracies of the three machine learning algorithms under each BP category are presented in Figure 7. It can be seen that only the regression tree achieved acceptable accuracy to meet the ISO standard for device evaluation, and it was only observed in normotensive BP category. Its mean differences and SDs of difference for SBP and DBP were −1.1 ± 5.7 mmHg and −0.3 ± 5.6 mmHg. e detailed results from the regression tree for each BP category are presented in Tables 3 and 4. It can be seen that the regression tree algorithm produced higher mean differences and SD of difference under both hypertensive and hypotensive BP categories in comparison with normotensive category. It was also observed that, although the mean differences for the MLR and SVM algorithms were acceptable in the normotensive category, they did not achieve an acceptable ISO standard for device evaluation in terms of SD of difference, as shown in Figure 7.

Discussion
In this study, the overall BP estimation accuracy from three supervised machine learning algorithms (regression tree, MLR, and SVM) was compared to determine which algorithm was better to estimate cuffless BPs using PPG signals only. To prevent the selection of an overfitted algorithm, the 10-fold cross-validation was used to test the overall measurement accuracy of the algorithms. e results showed that the regression tree achieved better overall accuracy in terms of mean and SD of BP difference as required by the ISO [29].
Researchers have attempted to develop the MLR algorithm for PTT-based cuffless BP estimation [7,30]. Although the MLR algorithm in those studies achieved acceptable measurement accuracy, their research was still susceptible to the practical issues with two sensors for the measurement. Measurements from multiple wearable sensors could cause restricted movement and discomfort to the users [31]. Another group also used the MLR algorithm with tonometry for the estimation of cuffless BP, and they succeeded to pass the ISO requirement [12], but MLR is sensitive to the outliers as shown in Figure 6(e), suggesting that MLR is probably not an ideal algorithm for BP estimation [32]. In this study, SD of BP difference was higher than the requirement of no more than 8 mmHg, and this was partially due to the presence of outliers.
e SVM algorithm has been used to estimate cuffless BP using heart sound signals, where acceptable BP measurement accuracy was achieved [33]. Similarly, in our study, the SVM algorithm was applied to PPG signal features to estimate cuffless BP. However, the SVM algorithm did not achieve acceptable accuracy with high SD of BP difference. e performance of the SVM algorithm is mostly based on the selection of the kernel. ree different kernels (linear, Gaussian, and polynomial) have been widely used [34]. In this study, the linear kernel was used to get the estimation output because the selected signal features and their corresponding BPs were in linear relationships. Zhang and Feng used the same database (University of Queensland) but with different PPG signal features to test three machine learning algorithms (MLR, neural network, and SVM). In their study, SVM achieved best measurement accuracy for SBP (11.6 ± 8.2 mmHg) and DBP (7.6 ± 6.7 mmHg), which were not up to the ISO standard [16]. erefore, there is a need to better understand the potential reasons to improve the algorithm development.

Journal of Healthcare Engineering
Regression tree algorithm is robust to the noisy data and able to make a better-fitted algorithm for discrete target data [28]. Researchers used the regression tree algorithm for PTT-based cuffless BP estimation and achieved acceptable results [35]. In this study, the regression tree algorithm was among the best algorithm for BP estimation. e possible reason behind the success of regression tree is their nonvulnerability to the outliers. Another strong characteristic of this algorithm is that it also produces a well-fitted algorithm in the presence of slight nonlinearity within the data [28].
Most importantly, this study further analysed the estimation accuracy of the three machine learning algorithms under different BP categories (normotensive, hypertensive, and hypotensive) and found that most of the algorithms exhibited better accuracy in the normotensive category. Previous research only presented overall BP accuracies (overall mean of   difference ± SD of difference) rather than individual categorical BP accuracies [3,9,36]. Some studies only included normotensive subjects [10,17,37]. In our study, regression tree was found with higher mean BP difference and SD of difference in hypertensive and hypotensive categories in comparison with the normotensive group. is could be caused by the low amount of data within the hypertensive and hypotensive categories of the online database. To make an accurate algorithm for each BP category, it is therefore suggested that the specific algorithm approach for different BP categories should be considered in a future study.
is study has some limitations. Firstly, manual check to determine the quality of PPG signal segments is not practical in real scenario. e development of advanced preprocessing algorithms to automatically determine signal quality is important. It is also worth investigating the effect of noise on the estimation accuracy of machine learning models. Secondly, the training and test of the three machine learning algorithms were limited to the database of the University of Queensland. It would be useful to test the algorithms in a new database. irdly, due to the lack of the basic clinical variables (e.g., BMI, gender, weight, and height) in the dataset, these variables were not included to train the machine learning algorithms, which may improve the measurement accuracy of some of the algorithms [12]. Finally, the BP estimation was performed on the basis of each segment and only noninvasive intermittent BPs were available to be used as reference BPs to train the algorithms. In a future study, using continuous BP as reference BPs may improve the algorithms, allowing beat-to-beat BP estimation.

Conclusions
is study developed and compared three machine learning algorithms to estimate BPs using PPG only and revealed that the regression tree algorithm was the best approach with overall acceptable measurement accuracy to the ISO standard for device validation. Furthermore, this study demonstrated that the regression tree algorithm achieved acceptable measurement accuracy only in the normotensive category, suggesting that the future algorithm development Data Availability e database used in this study is available to access via the link: https://outbox.eait.uq.edu.au/uqdliu3/uqvitalsignsdataset/ index.html.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication.