The Prediction of Atherosclerosis Index Based on Photoplethysmograph

Current atherosclerosis (AS) assessment devices have a disadvantage for users to carry around. In response to this shortcoming, we propose to collect the wrist photoplethysmograph (PPG) signal and create models to predict the indicators of atherosclerosis (cardiovascular age and right brachial and ankle pulse wave velocity (baPWV)). .is study uses the maximum correlation coefficient method for feature selection and establishes multiple models to predict cardiovascular age and the right baPWV. .e study results show that the prediction of cardiovascular age using the backpropagation (BP) neural network model is the best. Its Pearson correlation coefficient (PCC) is 0.9501 (P< 0.05), and the model finds the best six physiological features as crest time (CT), crest time ratio (CTR), slop K, stiffness index (SI), reflection index (RI), and heart rate (HR). When predicting the right baPWV value on the right side, we propose a hybrid method MLR_BP, which has better experimental results than BP and MLR. .eMLR_BPmodel improves the prediction accuracy, the predicted PCC value is 0.9204 (P< 0.05), and the model only needs two features, HR and cardiovascular age. .is study further verified the results of related literature and proved the relationship between AS and related physiological parameters. .e proposed method is applied to wearable devices and has an application value for diagnosing AS and preventing cardiovascular diseases.


Introduction
According to the World Heart Federation report at the World Heart Conference, approximately 20 million people die from various cardiovascular diseases (CVD) each year worldwide. It is estimated that the number of deaths from multiple CVD will exceed 30 million in 2025 [1]. "China Cardiovascular Health and Disease Report 2018" pointed out that China's CVD prevalence and fatality rate are still on the rise. It is estimated that there are 270 million CVD patients, and CVD deaths account for more than 40% of the residents' disease mortality, which is higher than that of tumors and other diseases [2]. Among them, arteriosclerosis (AS) is a significant predictor of CVD. erefore, the prevention of AS is the key to reducing the risk of CVD [3]. It is possible to diagnose AS by magnetic resonance imaging, ultrasound, and other methods clinically. However, this requires professional equipment, which is high cost, and complicated operation and cannot dynamically obtain AS status at any time. e development of a portable, noninvasive diagnosis of AS wearable devices has positive significance for early screening and diagnosis of CVD.
e Framingham heart disease study proposed cardiovascular age and cardiovascular risk as new indicators to measure AS. e study used factors such as gender, age, and systolic blood pressure to predict cardiovascular age and cardiovascular risk [4]. In recent years, a large number of studies believe that physiological parameters such as Anklebrachial index (ABI) and pulse wave velocity (PWV) are important indicators for evaluating AS [5,6]. PWV measurements of different parts have essential value in assessing AS, such as brachial and ankle pulse wave velocity (baPWV) and carotid-femoral artery PWV (cfPWV). However, these PWV measurement methods generally have the disadvantages of complicated operation and inconvenient to carry around. Because of the critical role of cardiovascular age and baPWV in predicting the AS, our research predicts two essential physiological indexes using the wrist PPG as the research goal.
PWV is an independent predictor of cardiovascular risk [7]. Cardiovascular age is also one of the gold standards for assessing AS. Existing studies use PPG signals to evaluate blood pressure, arterial stiffness, and so on, and the collected signals are concentrated on the finger. In daily life, collecting finger signals reduces the users' comfortable. e wrist has thick skin and fewer blood vessels, and its sign is weaker than that of the finger. erefore, the feature analysis based on the PPG signal of the wrist is more complicated, but comfort is better for users. is research first collects PPG signals at the wrist, extracts relevant features, and then establishes a cardiovascular age prediction model to improve user comfort. e correlation between baPWV and aortic PWV is high. At the same time, the correlation coefficient of left and right baPWV is high. erefore, this study took the right baPWV as an example to establish the baPWV prediction model and use the prediction of cardiovascular age and baPWV to monitor AS noninvasively and dynamically.
e main contributions of this paper are as follows: (1) rough many experimental analyses, we found the best feature subset and the best model for predicting cardiovascular age. (2) When predicting the right baPWV, we proposed the MLR_BP model, which improves the prediction accuracy and further verifies AS and cardiovascular age correlate. (3) Our study found that heart rate (HR) plays an essential role in predicting right baPWV, indicating that HR has a specific relationship with AS. It may provide another convenient method for monitoring AS. (4) e proposed model has fewer feature parameters, has low computational resource overhead, can be embedded in wearable devices, and improves the comfort of the user experience. It has particular reference significance for detecting AS and preventing CVD.
ere are six sections in the paper, and the content of each section is as follows.
In the first section "Introduction," firstly, we introduce the background and significance of the thesis research. en, the existing methods for evaluating AS are introduced, including ABI, PWV, and cardiovascular age, and the disadvantages of related research. Finally, the research content and innovation of this paper are explained.
In the second section "Related Work," we mainly introduce related research to extract various characteristic parameters from PPG signals to evaluate the cardiovascular function, AS, blood pressure, and other physiological indicators.
In the third section "Materials and Methods," there are four small parts in this part. e data acquisition object is first introduced, and secondly, the PPG signal processing process is presented. en, the methods of feature extraction and feature selection are introduced. e last is the construction of the model, which includes MLR, SVR, BP, and our proposed MLR_BP model.
In the fourth section "Results," there are four small parts in this part. Firstly, we introduce the results of model feature selection in predicting cardiovascular age. Secondly, we show the accuracy of models performance in predicting cardiovascular age. en, we present the results of model feature selection in predicting right baPWV. Finally, we offer the accuracy of the model's performance in predicting the right baPWV. At the same time, we analyze the results to verify our contribution.
In the fifth section "Discussion," we discuss the relationship between the experimental results and the conclusions in the relevant literature. e experimental results further verify the findings in the relevant literature. At the same time, there are new findings from the experimental results.
In the sixth section "Conclusion," we summarize the work of the paper and related conclusions, as well as deficiencies and future work.

Related Work
PPG contains cardiovascular-related physiological and pathological information such as cardiac pulsation function, hemodynamics, and vascular conditions [8,9]. e PPG tracing method has the advantages of noninvasiveness, convenience, and low cost. At the same time, PPG signals are relatively easy to obtain among all medical-biological features. erefore, more and more scholars use PPG waveforms to detect human-related physiological indexes, for example, blood pressure, blood sugar, blood oxygen saturation, heart, and other physiological indicators. Pulse wave analysis (PWA) is a typical method for studying AS, and it is widely used in both Chinese and Western medicine [10,11].
Nidigattu et al. used PPG signals by extracting feature engineering to find the best feature subset to predict heart rate and blood pressure [12]. Zhang et al. used PPG signals to extract features and built a machine learning model to premeasure blood glucose, thereby changing the existing invasive measurement methods [13]. Couceiro Ricardo et al. proposed that the PPG signal at the finger evaluates cardiovascular function through multi-Gaussian fitting [14]. Zhao et al. established a joint framework for heart rate monitoring by PPG signals during exercise. It can be used for PPG heart rate monitoring in high-intensity physical activities and can be applied to fitness tracking and health information tracking of smart wearable devices [15]. Shen et al. recorded PPG signals in a free-moving environment and used deep learning algorithms to detect the onset of atrial fibrillation (AF) [16]. Tjahjadi et al. accurately classify blood pressure types based on two-way long and short-term memory and time-frequency analysis of PPG signals [17]. e above studies extract different feature parameters from the PPG signal to study the relevant conditions of the cardiovascular system. It is of great significance for the realization of rapid and noninvasive monitoring of CVD. e PPG signal collected at the wrist is greatly affected by 2 Scientific Programming motion artifacts. e above research did not analyze and process the signal collected at the wrist. ey have also not been applied to wearable devices for real-time and noninvasive detection of cardiovascular disease-related indicators. erefore, the evaluation of AS indicators based on the PPG signal at the wrist is of great significance.

Selection and Data Collection of Research
Objects. In this experiment, a self-developed watch is used to collect the PPG signal of the wrist. e PPG signal is transmitted to the smartphone via Bluetooth and can be sent to the computer. In the experiment, standard cardiovascular testing equipment collects the test subjects' cardiovascular age. And AS testing equipment tests left lower limb ABI, right lower limb ABI, left baPWV, and right baPWV.
irty-seven subjects were recruited for this experiment, 20 males and 17 females aged 24-66. e population selection is representative.
In the experiment, each subject was required to use a wristwatch to collect ECG and PPG signals for 1 minute simultaneously. e sampling frequency was 500 Hz, and the number of signal points accumulated in 1 minute was 30,000. To make up for the small number of participants, we randomly selected 3000 point signals for denoising and feature extraction and other signal points used for data enhancement. e simulated experimental population was 370 people. To reduce computing resources and facilitate integrating algorithms into wearable devices, we only analyze PPG signals. Participants did not engage in moderate or high-intensity exercise for more than 10 minutes within 1 hour before the experiment.

PPG Signal Processing.
We use discrete wavelet transform to filter high-frequency noise and a maximum point recognition algorithm of differential signal to start point identification. We use a baseline calibration algorithm based on starting point to solve filter baseline drift and use "timedomain analysis" and "derivative function analysis method" to extract feature points. We use the threshold method to extract the typical feature points and display them on the waveform graph. e signal processing flow is shown in Figure 1, and Figure 2 is the original PPG waveform collected at the wrist and the waveform after denoising.

Feature Extraction and Feature Selection.
In the denoised PPG signal, the AS-related indexes are extracted according to the "time-domain analysis method" and the "derivative function analysis method." A total of 10 feature indexes are extracted. e detailed definitions of 10 indexes are shown in Table 1. Figure 3 shows the typical features of PPG. A is the starting point, B is the primary wave, C is the descending middle gorge wave, D is the dicrotic wave, T is the duration of a complete pulse wave, and K is the slope. e relevant index calculation formula is as follows: where Q(t) is the collected PPG, Q max is the peak of Q(t), Q min is the Q(t) valley, and T is the pulsation period. e best feature subset is selected for training to reduce features and make the model more generalized. e experiment uses the Pearson correlation coefficient (PCC) method for feature selection. Find the best feature subset for each machine learning algorithm model, and the whole process is shown in Figure 4.
Feature selection methods are divided into three types: wrapper, embedded, and filter. ere are multiple methods for each class. e wrapper method takes the performance of the learning model used as the evaluation criterion of the feature subset, and the purpose is to select a "tailor-made" feature subset for the learning model. e feature selection of the embedded method is embedded in the learning model training process; the feature selection is not clearly distinguished from the training process of the learner. erefore, the feature selection of these two methods takes into account the subsequent learning model. However, the filter method does not need to consider the learning model to be used later when selecting features. is method selects features based on the general properties of features, such as target correlation, autocorrelation, and divergence. One of the primary purposes of our research is to explore the model between AS or baPWV and related features, so the filter method is chosen as the feature selection.
e filter also has a variety of methods, such as variance selection and correlation coefficient selection. e variance selection method only considers the feature's variance and selects features with a variance more than the set threshold. e variance selection method only considers the feature's variance without considering the target value, and it is not easy to determine how appropriate the threshold is. Intuitively, the larger the correlation coefficient between each feature and the target value (AS; baPWV), the more critical it is. Compared with other methods, the correlation coefficient is more straightforward and more interpretable.
In addition, the wrapper and embedded methods' computational overhead are larger than the filter method. e filter method can quickly explore the best model and feature subset. erefore, we use the maximum correlation coefficient method based on PCC for feature selection in synthesis consideration.

Model Construction.
Use linear and nonlinear relationship model algorithms to build cardiovascular age prediction models, including MLR, Ridge Regression (RR), Lasso Regression (LR), BP, Random Forest (FR), and Support Vector Regression (SVR) model.
In the experiment, BP neural network and MLR model were used to predict the right baPWV value. At the same time, we propose a hybrid method MLR_BP based on MLR Scientific Programming and BP models. e model's accuracy is better than the BP neural network and MLR models in predicting the right baPWV.
(1) MLR establishes the relationship model between the response variable and the explanatory variable by fitting a linear formula. In this study, x 1 , x 2 , . . . x n are the extracted PPG signal features, and the response variable is the predicted cardiovascular age or baPWV value. e formula is as follows: Feature extraction and index calculation   4 Scientific Programming Among them, w � (w 1 , w 2 , . . . w n ), w 1 , w 2 , . . . , w n are the regression coefficients, which are fitted through the train set data to minimize the loss function, and the loss function adopts the least square method, as shown in the following formula: where y i is the actual value and f(x i ) is the predicted value, which is the predicted cardiovascular age or baPWV value in this study. A fundamental problem in linear regression is overfitting. e so-called overfitting means that the training error of the model is tiny. However, the test error is obvious. Two methods are generally used to reduce overfitting. One is to reduce the number of features. In this study, the feature selection method is used to reduce the number of features. Another method is regularization. Lasso and Ridge Regression's essence is to add L1 and L2 regularization based on standard linear regression. erefore, two models of LR and RR are used in the experiment. (2) e SVR model can solve overlearning, nonlinearity, dimensionality disaster, and local minimum. It can handle both linear relational data and nonlinear relational data and has good generalization ability. In this study, the feature variables extracted by the PPG signal may have a nonlinear relationship, so this model explores the nonlinear relationship. (3) BP neural network is composed of two parts: forward propagation and error signal direction propagation. Forward propagation is from the input layer to the hidden layer and then to the output layer. e state of each layer of neurons only affects the next layer of neurons. If the output layer result does not reach the expected result, the backpropagation of the error signal is performed. is process continues to iterate until the error is minimized. In this process, a gradient descent method is used. e output node of the hidden layer of the j th neural unit is shown as follows: Among them, w ji represents the weight of the input node and the hidden node, and x i represents the input. In this research, it means the value of the extracted feature. b j is the bias. e output layer node is expressed as follows: v kj is the weight of the hidden node and the output node and b k is the bias. (4) To improve the accuracy of predicting baPWV, we propose the MLR_BP model. is model is a fusion of the MLR and BP neural network models. e algorithm steps are as follows: (1) We use the BP model to find the best feature subset to predict cardiovascular age. (2) We acquire cardiovascular age by the BP model's prediction as a new feature. is feature and the ten features that have been extracted from the PPG signal get a new larger feature set. (3) Based on the MLR model and the maximum correlation coefficient method, the best feature subset is found from the newly constructed feature set to predict baPWV.
In the experiment, 80% of the collected data are divided into the training set, and 20% are the test set. We used Python 3.6, and the best parameters were found by grid search and 5-fold cross-validation.

e Results of Models Feature Selection in Predicting
Cardiovascular Age. Firstly, perform feature selection based on PCC, and select the top 10 features of the correlation coefficient. en, 1 to 10 feature input models are chosen according to the value of correlation coefficients. e PCC between the predicted value of different models and the standard value when selecting the different number of features is shown in Figure 5.

Scientific Programming
We can see from Figure 5 that the BP model has the best effect when selecting six features. ese six features are crest time (CT), crest time ratio (CTR), slope K, stiffness index (SI), reflection index (RI), and heart rate (HR). e three linear models of MLR, LR, and RR have better results when selecting four features. Only the representative models are shown in Figure 5, and the MLR model is representative of these three linear models. e SVR model also works best when the number of features is six, and the FR model works best when nine features are chosen.

Accuracy of Models Performance in Predicting Cardiovascular Age.
e accuracy is evaluated by PCC, Mean Deviation (MD), Residual Standard Deviation (RSD), Root Mean Square Error (RASE), and Mean Absolute Error (MAE) to model the. PCC represents the correlation between the predicted value and the measurement value, and MD is the average value of the deviation between the predicted value and the measurement value, reflecting the deviation index between the two. RSD represents the degree of discretization of the residual. RASE is used to measure the deviation between the predicted value and the measurement value. MAE and Mean Absolute Error are used to measure the average value of the absolute value of the error between the observed value and the actual value. Table 2 shows the accuracy of each model. Among the six models, the BP model is the best, and the worst is the random forest model. P < 0.05, indicating that the prediction results of the models are statistically significant. Figure 6 shows the density plot of residual in cardiovascular age between the predicted value by model and measured value. Figures 6(a) and 6(b), respectively, represent the BP model and MLR. It shows that the residual between the predicted value and the measured value of the BP model is normal distribution in cardiovascular age. From the comparison shown in Figure 6, the prediction result of the BP model is better than that of the MLR, and the prediction result is more credible.
Based on the above experimental results, the best model for predicting cardiovascular age is BP, and its best feature subsets are CT, CTR, slope K, SI, RI, and HR.

e Results of Models Feature Selection in Predicting Right baPWV.
To further predict the atherosclerosis index, based on predicting the cardiovascular age, the right baPWV was selected as the prediction target. e MLR model and the BP model were used for comparison. 1 to 10 features were selected, respectively, by the feature maximum correlation coefficient method. e result of feature selection is shown in Figure 7. Both MLR and BP models have the best predictive effect when the feature number is 1. And the feature is HR.
To further improve the accuracy of predicting baPWV, we propose the MLR_BP model. e experimental results are shown in Figure 8. e MLR_BP model works best when the number of features is 2, and it is easy to see that the proposed model is better than the BP and MLR models. ese two features are HR and cardiovascular age.

Accuracy of Models Performance in Predicting
Right baPWV. Table 3 shows the prediction performance of the three models. It shows that the prediction accuracy of the BP model and the MLR model is similar. MLR_BP predicts the accuracy of various indicators better than these two models. P < 0.005 of predicted value and the measured value of the three models. Figure 9 is a density plot of residual in right baPWV between the predicted value by two models and the measured value. Figures 9(a) and 9(b), respectively, represent the MLR_BP and MLR model. From Table 3, the accuracy indicators of the MLR model and the BP model are similar. erefore, Figure 9 only draws the density plot of residual of the MLR and MLR_BP. It shows that the residuals of the predicted and measured values of the MLR_BP model and the MLR model are normal distribution in Figure 9. However, residuals of the MLR_BP prediction are more concentrated around 0. erefore, the MLR_BP model fits the error better than the MLR model. In summary, the MLR-BP model has better performance than MLR and BP model. erefore, the above experimental results and analysis show that our proposed BP_MLR model is the best for predicting the right baPWV, and the best feature subset is HR and cardiovascular age. At the same time, it is concluded from the experimental results that the BP and MLR models predict that the best feature subset of the right baPWV is HR. It shows that no matter which of these three models predicts the right baPWV, HR plays an essential role in predicting the right baPWV. It further shows that HR and AS have a specific relationship. HR monitoring has many devices that can be easily measured, which may provide a more convenient method for monitoring AS.
In addition, the two models we proposed use fewer feature parameters and require fewer computing resources. ey can be easily embedded in wearable devices and improve the user's comfort of AS diagnosing. It has particular reference significance for detecting AS and preventing CVD.

Discussion
e baPWV is a widely used clinical index to assess AS and is the gold standard. And the recorded parts of the measurement are the brachial and ankle artery. is study uses the PPG signal at the wrist to predict the value of baPWV with high accuracy, indicating that the proposed model provides a specific reference value for the simple measurement of baPWV. Millasseau     Scientific Programming plays an essential role in evaluating arterial elasticity [18]. Based on the CT study, Wu et al. confirmed that parameters such as CTR have a significant positive correlation with SI, RI, and cfPWV and confirmed that, in the absence of obviously reflected waves, Can still assess the degree of AS [19]. e results of this study further verify these conclusions.
In the study of predicting cardiovascular age, the MLR model has the best prediction effect when using four features (K; SI; RI; HR). e BP neural network has the best effect when selecting six features (CT; CTR; K; SI; RI; HR). Experiments show that the addition of CT and CTR features improves the accuracy of the model's prediction. It indicates that CT and CTR have a particular value in predicting cardiovascular age, which is also consistent with the research conclusions of Wu et al.
MLR and BP neural network models need only the HR feature to achieve the best prediction effect in predicting the right baPWV. e MLR_BP model needs the two features of HR and cardiovascular age to achieve the best accuracy. It shows that cardiovascular age and baPWV are correlated, which is consistent with the views of related literature [20].
e study results indicate that HR plays an essential role in the prediction of cardiovascular age and baPWV. It shows that HR may also be a crucial indicator in assessing AS.

Conclusion
is research collected PPG signals at the wrist, extracted relevant features, and established several models to predict cardiovascular age and the right baPWV. We found the best feature subsets in different models. It further verifies the influence of related physiological indicators on AS. In the prediction of cardiovascular age, the BP model has the best accuracy. When predicting the baPWV on the right, the MLR_BP model has the best accuracy. e model only includes the two features of HR and cardiovascular age. e experimental results further verified the correlation between AS and SI, RI, CT, and CTR, which is consistent with the research conclusions of related scholars. At the same time, it can be seen from the research results that there is a specific correlation between HR and AS. e monitoring method of HR has been relatively mature and convenient. It may provide a new and convenient way for AS detection. Of course, this requires a lot of experiments to verify further.
In this research, based on the wrist PPG signal, models were established to assess the cardiovascular age and right baPWV indicators of AS. e models with high prediction accuracy and few model feature parameters can be applied to wearable devices. It can make the detection of AS-related indexes more convenient, fast, and noninvasive. It has a particular reference value for diagnosing AS and the prevention of cardiovascular diseases. e PPG signal is subject to many external interference factors, such as motion artifacts and temperature. e experimenter was in a relatively quiet scene when we collected the data in this study. We need further study the robustness of the model in the case of people's daily exercise. In the future, we will explore wearable devices that apply our proposed model, which is robust in different environments, collect data in different environments, and make the model more general.   Scientific Programming Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.