Study on PPG Biometric Recognition Based on Multifeature Extraction and Naive Bayes Classifier

Nowadays, the method of simple-feature extraction has been extensively studied and is used in PPG biometric recognition; some promising results have been reported. However, some useful information is often lost in the process of PPG signal denoising; the time-domain, frequency-domain, or wavelet feature extracted is often partial, which cannot fully express the raw PPG signal; and it is also difficult to choose the appropriate matching method.,erefore, to make up for these shortcomings, a method of PPG biometric recognition based on multifeature extraction and naive Bayes classifier is proposed. First, in the preprocessing of the raw PPG data, the sliding window method is used to rerepresent the raw data. Second, the feature-extraction methods based on time-domain, frequency-domain, and wavelet are analysed in detail, then these methods are used to extract the time-domain, frequency-domain, and wavelet features, and the features are concatenated into amultifeature. Finally, themultifeature is normalized and combined with classifiers and Euclidean distance for matching and decision-making. Extensive experiments are conducted on three PPG datasets, it is found that the proposedmethod can achieve a recognition rate of 98.65%, 97.76%, and 99.69% on the respective sets, and the results demonstrate that the proposed method is not inferior to several state-of-the-art methods.


Introduction
Since biometrics system based on human physiological or behavioural characteristics is more reliable and safer than traditional identification technology, certain distinctive features of our body or behavioural attributes, such as fingerprint [1], face [2], voice [3], iris [4], lip print [5], gait motion [6], palm vein [7], finger vein [8], electroencephalography (EEG) [9], and electrocardiography (ECG) [10,11], are viewed as means of human identification. ese applications based on biometric approaches provide a promising and convincing future of human recognition. However, a fingerprint can be modified with latex, face recognition can be faked with an artificially disguised, voice can be imitated, and the EEG and ECG-based methods are to some extent complicated to acquire the biosignals. erefore, photoplethysmography (PPG) biometric recognition has received considerable attention in authentication for personal privacy and fraud prevention.
PPG is a noninvasive electrooptical methodology which provides much physiological information about the human body. In recent years, PPGs were proven to have competency to distinguish individuals [12]. As a new biometrics technology, PPG has been verified by studies for its universality, uniqueness, robustness, and adaptability [13,14]. Compared to the ECG, EEG, and so on, the PPG signal can be acquired at a low cost and is more accessible and more portable device. In addition, the PPG signal can be collected from different positions in the human body such as fingertips, wrist, or earlobes. And PPG measurements only need to be acquired from one side of the body, allowing it to be used in a larger number of human recognition scenarios. erefore, PPG signal has more practical application and appealing [15,16].
To the best of our knowledge, Gu et al. [1] were the first to study PPG for human verification, considering four feature parameters and achieving 94% accuracy. Since then, many scholars have researched PPG biometric recognition. For example, some scholars took low pass filter (LPF), Butterworth filter (BWF), the first derivative (FD), the second derivative (SD), finite infinite response (FIR) filter, peak detection (PD), and sliding window to signal preprocessing; some scholars took time-domain feature (TDF) analysis, frequency-domain feature (FDF) analysis, discrete wavelet transform (DWT), continuous wavelet transform (CWT), linear discriminant analysis (LDA), Karhunen-Loève transform (KLT), discrete cosine transform (DCT), statistical curve width (SCW), and three-layer features based on sparse softmax vector for feature extraction; some scholars took k-nearest neighbor (k-NN), majority voting (MV), linear discriminating classifier (LDC), support vector machine (SVM), Bayes network (BN), Pearson's distance (PearsD), Manhattan distance, Euclidean distance, naive Bayes (NB), radial basis function (RBF), multilayer perceptron (MLP), decision tree (DT), random forest (RF), gradient boosted trees (GBT), isolation forest (IsFrst), and so on to classification. e technologies or methods in the related literature are summarized for preprocessing, feature extraction and classification in Table 1. In addition, some scholars proposed PPG biometric methods based on deep learning [16,[31][32][33][34] and achieved good recognition performance. However, deep learning is not easy to train on small-scale data, and it has too many hyperparameters that need intricate adjustment, which requires powerful computational resources. erefore, this paper does not consider deep learning methods. Table 1 shows that, in the current PPG biometrics researches, firstly, low pass and Butterworth filters are mainly used to reduce the PPG signal noise, or the first and SD are used to do the PPG signal preprocess, but these methods often cannot remove motion artifacts, baseline wanders, and power line interference noises at the same time and often cause the signal to lose useful information; secondly, the time-domain or the FDF analysis and wavelet transform are mainly used for feature extraction; however, the simple features extracted by these methods are often partial, which cannot fully express the raw PPG signal; thirdly, in terms of matching or classification, there are often many technologies or methods to be selected, but how to choose a technique or method that can achieve the best performance is also a difficult problem. erefore, to make up for these shortcomings, a method of PPG biometrics based on multifeature extraction and naive Bayes classifier (NBC) is proposed herein. e contributions of this paper are summarized as follows.
First, we use the sliding window method to rerepresent the PPG raw data and analyse the influence of sliding window size and sliding step size on the recognition rate.
Second, we propose a novel multifeature-extraction method to efficiently capture time-domain, frequency-domain, and wavelet signal characteristics and combine them to form a multifeature.
Finally, extensive experiments are conducted on three PPG datasets; the results demonstrate that the proposed method is not inferior to several state-of-the-art methods. e remainder of this paper is organized as follows. Section 2 describes the proposed method in detail. Section 3 reports the experimental results and provides a comprehensive analysis. Finally, Section 4 presents the conclusions and future work.

Proposed Methods
To perform PPG biometric recognition, a PPG biometrics framework was designed. First (mentioned in Section 2.1), the raw PPG data are rerepresented by the sliding window method. Second (mentioned in Section 2.2), the multifeature is generated from the rerepresented PPG data. Finally (mentioned in Section 2.3), the matching and decision are performed after the multifeature normalized. e block diagram of the proposed PPG biometrics process is shown in Figure 1. In the following subsections, preprocessing of the raw PPG data, multifeature extraction and matching, and so on will be detailed.

Rerepresenting Data Using Sliding Window Method.
ere are various sources of artifacts that interfere with PPG signal acquisition, including motion artifacts, baseline wander, and power line interference, so many scholars used various denoising methods to preprocess data. However, some useful information may be removed in the process of denoising. erefore, to avoid the situation, this paper uses the sliding window [28] method to rerepresent the raw PPG data, and it is no longer necessary  [28] to denoise. Figure 2 shows the working process of the sliding window method, W size is the size of sliding window, and S step is the step size of sliding. After sliding window scanning, a large number of samples are produced from the raw PPG data of each subject, which can also make up for the lack of samples in the training process, and these samples constitute a matrix that is rerepresented.

Influence Factor.
In the process of rerepresenting the raw PPG data using the sliding window method, the size of the sliding window and the sliding step size are the main influence factors of recognition rate. With the increase of the sliding window size, the recognition rate will also be improved but will be in a relatively stable state after the window size reaching a certain extent. With the increase of sliding step size, the recognition rate will decrease; when the sliding step size is smaller, the recognition effect is better. We analyse W size and S step in detail in Section 3.3.

Feature Extraction.
To the best of our knowledge, timedomain, frequency-domain, and wavelet features are simultaneously extracted for the PPG biometrics, which has not yet been achieved. erefore, this section studies multiple features' extraction and their concatenation.

TDF.
TDF has a wide application in PPG biometrics researches [14,19,[23][24][25]27,29]. It is simple in feature extraction, does not need any transformation, and is directly extracted from the preprocessed PPG signals. Table 2 lists the formulas for extracting 17 TDF of a sample, let x n � [x 0 n , x 1 n , x 2 n , . . . , x i n , . . . , x N−1 n ] T denote the n-th subject, where x i n is the amplitude corresponding to the i-th sample of the n-th subject, N is the total number of sample points after time-domain segmentation, max is the function of taking the maximum value, min is the function of taking the minimum value, and the relevant calculation formula is shown in Table 2. For convenience, the TDF extracted for the i-th sample of the n-th subject can be expressed as TDF i n � [F t1 n , F t2 n , F t3 n , . . . , F t17 n ] T . In general, the time-domain analysis is more intuitive, but the discriminability of TDF is weak.

FDF.
e frequency-domain representation (spectrum) refers to breaking down a signal into its sinusoids.
at is, the spectrum of a signal is a representation of its frequency content [35]. Four FDF parameters commonly used by researchers are gravity frequency, mean frequency, root mean square frequency, and frequency standard deviation. erefore, in this paper, the spectrums of the PPG signal are first extracted by the fast algorithm of discrete Fourier transform (FFT), and then the four FDFs are extracted according to the formula given in Table 3. Let x n � [x 0 n , x 1 n , x 2 n , . . . , x i n , . . . , x N−1 n ] T denote the n-th subject, where x n is time-domain segmentation and absolutely integrable, x i n is the amplitude corresponding to the i-th sample of the n-th subject, N is the total number of the sample points of the segmentation, and x i n 's FFT can be expressed as follows: where L � N/2 and X i n is a frequency amplitude of the i-th sample of the n-th subject. e main disadvantage of FDF compared to TDF is the higher computational cost. For convenience, the FDF extracted for the i-th sample of the n-th subject can be expressed as FDF   [36], it should be noted that successive details are never reanalysed in the orthogonal wavelet decomposition procedure [37]. erefore, this section discusses the wavelet packet which can be used for multiresolution decomposition and proposes a feature-extraction approach based on wavelet packet decomposition. e wavelet packet decomposition is a generalization of wavelet analysis that offers a more precise analysis method for signals; wavelet packet atoms are waveforms indexed by frequency, scale, and position three parameters [37]. For a given orthogonal wavelet function, a set of wavelet packet bases can be generated, each of which offers a particular way of coding signals, preserving global energy, and reconstructing exact features; the wavelet packets can be used for numerous decompositions of a given signal. In the process of wavelet packet decomposition, one-dimensional frequency coefficients are usually split into two parts, one part is referred to as an approximate coefficient vector, and the other part is referred to as detail coefficient vectors which are obtained. And then the next step is to further split the approximation coefficient vector into two parts, and the detail coefficient vector is also decomposed into two parts using the same approach as in approximation vector splitting. In this way, they are decomposed level by level until the requirements are met. However, it should be pointed out that not all wavelet bases are suitable for wavelet decomposition, so it is necessary to select appropriate wavelet bases. Figure 3 shows the schematic diagram of wavelet packet decomposition of three levels for one-dimensional signals [37], S is the raw signal before decomposition, A is the high-frequency signal after decomposition, D is the lowfrequency signal after decomposition, and the footnote is the number of levels of decomposition. e wavelet feature including frequency-band energy ratio, energy entropy, scale entropy, and singular entropy is extracted as follows: (1) e eight frequency subbands are obtained for each sample by three-level wavelet packet decomposition based on wavelet basis db8 (to this end, the mother wavelets such as sym8, db8, coif5, bior3.9, and dmey were studied, and experiments show that db8 is the best), and the wavelet packet decomposition coefficients are extracted from the subbands, which are reconstructed, and then they are denoted by RWP DC i n (reconstructed wavelet packet decomposition coefficient of the i-th sample of the n-th subject).
Scientific Programming (2) Energy ratios (ER) of the eight frequency subbands are calculated by the formula as follows: (3) Energy entropy (EE) is calculated by the formula as follows: (4) Scale entropy (SE) of eight frequency subbands is obtained according to the wavelet packet decomposition coefficients without being reconstructed, which is denoted as SE i n � y|y � SE i n (j), j � 1, 2, 3, . . . , 8} (SE is obtained directly through the wentropy function of MATLAB). (5) Computing singular spectral entropy (SSE): firstly, the singular spectral vector (SSV) is obtained according to wavelet packet decomposition coefficient with reconstructed, which is denoted as SSV i n � y|y � SSV i n (j), j � 1, 2, 3, . . . , 8}, and then the singular spectral ratio (SSR) is calculated by the formula as follows: Finally, the SSE is calculated by the formula as follows:

Multifeature Concatenating.
As previously mentioned, in the past researches, the methods of simple-feature extraction with time, frequency, or wavelet are often preferred, while the methods based on the simple-feature analysis have their own shortcomings. erefore, in order to reduce the impact of the shortcomings, make the extracted features more approximately represent the raw PPG signal, and make PPG biometric recognition more accurate, after TDF i n , FDF i n , and WF i n are extracted separately, the next step is to concatenate them into one feature vector to form the multifeature (MF). For the i-th sample of the n-th subject, the multifeature vector can be represented as MF i n � [TDF i n ; FDF i n ; WF i n ].

Normalization.
Due to the different dimensions of indexes, if the various feature extracted is directly sent to the classification algorithm, the weight of the algorithm will fluctuate in the process of convergence, and it is easy to converge to the local optimal result. erefore, in order to avoid this situation, MF i n is normalized as follows: where mean is the function of taking the mean value; MF i n is the multifeature vector extracted from the i-th sample of the n-th subject; MF i′ n is the normalized result; MF i′ n ∈ [− 1, 1], which will greatly improve the convergence ability of the algorithm.

Matching.
In this paper, the matching uses the following classification and Euclidean distance to produce their decisions.

2.3.1.
Classification. Classification is a fundamental task in pattern recognition that involves the formation of a classifier. In this study, we use two commonly applied classifiers (NBC and LDC) to identify the class labels for subjects.
Naive Bayes [38] uses the well-known Bayes theorem to build a probabilistic model of the subject's features. e intuitive idea of an NBC is that future observations of a feature vector belonging to a subject will follow the same probabilistic distribution of feature vectors that were given for training for the same subject and that the value of a feature is independent of the value taken by other features. e basic idea of a linear discriminant classifier is to minimize the difference between the same class samples and maximize the difference between the different class samples by linear projection. In other words, after projection, the same class samples are gathered, while the different class samples are as far away as possible.
Here, the multifeature ([MF 1 , MF 2 , . . . , MF M ], M is the number of subjects) normalized is divided into training set Figure 3: Wavelet packet decomposition tree at level 3. Scientific Programming and testing set, and then the two sets are directly input into the classifier, and the recognition rate is finally output.

Euclidean Distance.
In this study, Euclidean distance is used to compute the similarity between two PPG features extracted, and its decision is produced. In other words, in this section, we describe the matching procedure for the testing data. In general, the testing data are divided into two separate sets: one is an enroll set, and the other is a probe set. e enroll set is regarded as the matching template, and the probe set is regarded as the query sample [39].
First, as illustrated in Section 2.1, the enroll and probe sets should be rerepresented. en, multifeatures of the two sets are extracted, respectively, and the multifeatures normalized are denoted as MF enroll and MF probe . Finally, for each probe, the Euclidean distance, which estimates the similarity between the probe and the enroll sample, is defined by the following formulation: where MF i probe (k) is a multifeature point of the i-th probe sample, MF j enroll (k) is a multifeature point of the j-th enroll sample, d euclidean (i, j) is the distance between the i-th probe sample and the j-th enroll sample, and N is the dimension of one multifeature vector. After d euclidean (i, j) is acquired, we can determine whether the probe is an impostor or genuine by comparing d euclidean (i, j) to a threshold. Varying the threshold adjusts the false acceptance rate (FAR) and false rejection rate (FRR), generating what is called a receiver operating characteristic (ROC) curve. e equal error rate (EER) is the point in the ROC curve at which the FAR is equal to the FRR. In general terms, the lower the EER, the better acceptance and protection against circumvention.

Experiments and Results
In the section, all experiments were conducted on a personal computer with Intel ® Core TM i7 processor, 2.7 GHz speed, and 8 GB RAM.

Data Acquisition.
Performance of the proposed method was tested on the three PPG datasets including Beth Israel Deaconess Medical Center (BIDMC) [40,41], Multiparameter Intelligent Monitoring for Intensive Care (MIMIC) [28,42], and CapnoBase [43,44]. e BIDMC dataset comprises 53 8 min recordings of ECG, PPG, and so on, signals (sampling frequency, fs � 125 Hz) acquired from adult patients (aged 19-90+, 32 females). e patients were randomly selected from a larger cohort that was admitted to medical and surgical intensive care units at the BIDMC, Boston, Mass., USA. e MIMIC database collects recordings of PLETH, ABP, RESP, and so on of patients in ICUs and is published on PhysioBank ATM for free. PLETH is the PPG data signal needed and its frequency is 125 Hz. e partial recordings of 32 patients of 10 min duration downloaded by [28] are used for this work.
e CapnoBase dataset contains PPG, ECG, and other signals for 42 cases of 8 min duration, and its sampling frequency is 300 Hz.

Performance Evaluation Metrics.
To evaluate the performance of the proposed method, we conducted experiments on two methods. For the identification problem, the recognition rate is used as the evaluation criterion, which is the percentage of correctly classified testing samples, defined as follows: where N testing is the total number of testing samples and N correct is the number of testing samples that are correctly classified. For the verification problem, the EER is the measure. EER is acquired from the FAR and FRR, which have the following definition: where FAR is the false accept rate, FRR is the false reject rate, FN is false negative, TN is true negative, FP is false positive, and TP is true positive. EER is defined by the value where FAR is equal to FRR.

W size and S step Analysis.
Since the size of sliding window (W size ) and the step size of sliding (S step ) have an important influence on the recognition rate, for choosing suitable the W size and S step , two experiments based on the multifeature were performed on the BIDMC, MIMIC, and CapnoBase datasets. One experiment was sliding window size (W size ) influence on the recognition rate. First, we set S step � 1, rerepresented the raw PPG data by changing W size , and produced experimental samples of 10 groups. en, 80% of the samples of each subject in each group were taken as a training set, and 20% of the samples were taken as a testing set. Finally, the two sets were directly sent to the classifiers including k-NN, RF, LDC, and NBC and output recognition rates. Figure 4 shows that, with the increase of W size , the recognition rate is improved until it is stable. Another experiment was to detect the influence of S step on recognition rate. First, we randomly selected the sample points of 1.5 cycles as W size , rerepresented the raw PPG data by the sliding window method according to S step � 1, 2, 3, . . . , 20, respectively, and produced experimental samples of 20 groups. en, 80% of the samples of each subject in each group were used as a training set, and 20% of the samples were used as a testing set. Finally, the two sets were directly sent to the classifiers including k-NN, RF, LDC, and NBC and output recognition rates. Figure 5 shows that, with the increase of S step , the recognition rate decreases on the BIDMC dataset; experiments show that they are the same on the MIMIC and CapnoBase.
As can be seen from Figure 4, when the sliding window size is chosen with the sample points of 1.5 cycles, the proposed method can achieve a relatively high and stable recognition rate on the classifiers including k-NN, RF, LDC, and NBC. In Figure 4(c), when the recognition rate is relatively stable, the number of sample points is relatively large, which should be caused by the high sampling frequency of the CapnoBase dataset. Figure 5 shows that the sliding step size is smaller and better. erefore, in the following experiments, we choose the sample points of 1.5 cycles as W size , and S step � 1.

Performance of the Proposed Method.
Because the testing data are directly sent to the classification method provided by MATLAB, it is not easy to operate the internal details of the classification method and report EER. erefore, in this section, we not only did experiments based on classifiers but also did experiments based on Euclidean distance.

Experiments Based on Classifiers. MATLAB (R2018b)
was used as the experimental programming environment. To avoid special cases, each experiment was run multiple times, and their average value was finally reported.
To evaluate the effectiveness of the features from various feature extraction, this experiment uses the LDC and NBC. Firstly, we used the method proposed in Section 2.1 to preprocess the raw PPG data. en, the preprocessed data were divided into a training set (80% of the total data) and a testing set (20%), and various features were extracted on the training set and testing set, respectively. Finally, the training set and the testing set were directly sent to the LDC and NBC, and the recognition rates were output. Table 4 shows the experimental results on the three datasets. Table 4 shows that the method proposed for extracting TDF is effective, and the features are distinguishable, so it has a high recognition rate on the LDC and NBC. e methods of frequency-domain and wavelet feature  extraction are not ideal, but wavelet feature or FDF cannot be ignored in PPG biometrics, since they have better recognition performance when they are combined with other features (such as TDF).
e results also show that the performance of combinational features is better than that of a single feature, especially the multifeature extracted that has good performance in PPG biometrics. On the three datasets, NBC is better than LDC with the same features extracted. It should be noted that the recognition rate of single wavelet feature on LDC and NBC is not high, which may be due to the overlapping of useful signal and noise frequency bands in the process of raw data preprocessing.

Experiments Based on Euclidean Distance.
To account the usability-security trade-off, we conducted this experiment in Python 3.6 and reported the ROC and EER curves and so on. Firstly, we used the proposed method in Section 2.1 to preprocess the raw PPG data. Secondly, 10 subjects were randomly selected from the preprocessed data, and 10 samples of each of which were randomly selected. irdly, various features were extracted from all samples of the selected subjects and divided into an enroll set and a probe set. Fourthly, the Euclidean distance between the enroll set and the probe set is calculated by formula (7). Finally, the threshold is varied to calculate EER and recognition rate. Figures 6 and 7 show the ROC curve and EER curve of randomly selected testing data on the three datasets, respectively. e ROC curve represents the trade-off between FAR and FRR, while EER is generally adopted as a unique measure for characterizing the performance level of a biometrics system. e EER can be seen in the figure where the FAR and FRR cross each other. Table 5 shows the recognition rate and EER of the proposed method on three datasets. ese experimental results further show that the multifeature extracted has high discrimination, and the recognition rate and EER based on the multifeature are the best on the three datasets. As shown in Table 5, in the PPG biometrics, the performance based on TDF and FDF and TDF and WF is superior to that of TDF, that based on TDF and WF is slightly superior to TDF and FDF, and that based on MF is optimal, which confirms that extracting the multifeature is very effective in PPG representation. e results of this experiment are consistent with those based on the LDC and NBC, which also demonstrates the effectiveness of the proposed method. [45,46] is a two-way variance analysis, which can be used to test whether multiple related samples come from the same subject. In this section, we use the Friedman test to analyse the significance of the multifeatures extracted. For this reason, we did two experiments on the Friedman test. One was about intersubject; firstly, we randomly chose 10 subjects from a dataset and then extracted multifeature for each subject by our method to produce 10 samples; thirdly, the 10 samples were tested with the "K related samples. . ." of IBM SPSS Statistics; finally, the significance was reported. e other was about intrasubject; firstly, we randomly chose one subject from a dataset and then randomly chose 10 samples from the extracted multifeature samples of the subject; thirdly, the 10 samples were tested with the "K related samples. . ." of IBM SPSS Statistics; finally, the significance was reported. Table 6 reports the test results of the two experiments on three datasets including BIDMC, MIMIC, and CapnoBase.

Friedman Test of the Multifeature. Friedman test
Generally, when the significance is less than 0.05 or 0.01, the difference among samples is considered to be significant   Scientific Programming or extremely significant; when the significance is greater than 0.05, it is considered to be not significant. As shown in Table 6, on the three datasets, the difference of intersubject is extremely significant, while the difference of intrasubject is not significant; this conclusion is highly consistent with our other experiments.

Comparisons with the State-of-the-Art Methods.
We perform comparisons between our method and the state-ofthe-art methods on the three datasets. Karimian et al. [18] used a nonfiducial approach for PPG with a DWT and k-NN and reported 99.84% recognition rate with an EER of 1.31% on the CapnoBase dataset. Jorge et al. [22] studied several feature extractors (e.g., cycle average, multicycle based on the time-domain, and the Karhunen-Loève transform average) and matching metrics (Manhattan and Euclidean distances) that had been tested by using the CapnoBase, and an optimal EER of 1.0% was achieved. Lee et al. [30] tried to use a DCTfor extracting features from the preprocessed PPG data, and the extracted features were used as the input variables for RF; the recognition rate was 99% on the CapnoBase. Yang et al. [28] extracted the three-layer feature based on the SSV on three datasets, and the features were used as the input variables for NBC and achieved recognition rates of 98.66%, 97.15%, and 99.71% on the BIDMC, MIMIC, and CapnoBase, respectively. e experimental results of [18,22,28,30] are directly reported from previous original work and are summarized in Table 7.
As shown in Table 7, although some of our methods are not the best, they have achieved a high recognition rate on the three datasets. Compared with [18], the recognition rate of their method is slightly superior, which may be that the two-step feature selection based on Kolmogorov Smirnov (KS) and kernel principal component analysis (KPCA) used in the feature-extraction module [18] is more effective. However, our method only makes a simple combination of time-domain, frequency-domain, and wavelet features, so that the multifeature is not enough distinguishable, which is what we will improve in the future. Compared with [28],    although the recognition rate of their method is slightly higher, they obtain the better result at the cost of more feature-extraction time, which is mainly due to [28] using the deep cascade model [47,48] in the process of three-layer feature extraction. However, our method is mainly some simple mathematical calculation; the time cost is very lower.

Time-Cost Analysis.
To verify the efficiency of the proposed framework, the preprocessing time (in seconds), feature-extraction time, and matching time cost during the PPG biometrics procedure on the MIMIC dataset are further summarized. e results are summarized in Table 8. e experimental environment includes HP EliteBook 8570w Notebook with Intel ® Core TM i7 processor, 2.7 GHz speed, 8 GB RAM, the 64-bit Windows 7 operating system, and MATLAB (2018b). Since the source codes of the other methods except [28] are not available for us, Table 8 only lists the average preprocessing time of one sample, average feature-extraction time of one sample, average matching time of one sample pair in the method of [28], and ours. From the experiments, the preprocessing time and the matching time are almost the same using the two methods; compared to the method of [28], the feature-extraction time of the proposed method is much less than theirs, which is clipped more than 0.4 seconds. It can be obvious that our method is more ideal for feature extraction.

Conclusions and Future Work
In this study, we propose a novel PPG biometrics framework. Firstly, we use the sliding window method to rerepresent the PPG raw data to avoid losing useful information due to denoising and analyse the influence of sliding window size and sliding step size on the recognition rate. Secondly, the time-domain, frequency-domain, and wavelet features are extracted and combined to form the multifeature, which is to make up for the lack of a single feature extracted. Finally, the normalized multifeature is matched by LDC, NBC, and Euclidean distance, and it is to make a decision. e extensive experimental results demonstrate that the method based on multifeature and NBC can achieve high recognition rates of 98.65%, 97.76%, and 99.69% on the BIDMC, MIMIC, and CapnoBase datasets, respectively, and it can be seen that our method is not inferior to several stateof-the-art methods. It should be noted that the multifeature can also achieve a high recognition rate using LDC on the three datasets. e extensive experimental results on the three datasets also demonstrate that the PPG biometrics based on the multifeature and Euclidean distance can provide a lower EER and high recognition rate. In particular, our method is suitable for small-scale PPG data biometrics and consumes fewer resources.
Despite the satisfactory performance achieved by our method, there is still some room for the proposed PPG biometrics framework. For example, the cost time of multifeature extraction needs to be improved, especially wavelet feature-extraction time, which is above 95% of the multifeature-extraction time; the fusion method of time-domain, frequency-domain, and wavelet features needs to be improved; how to apply PPG biometrics with large-scale data. In addition, our method can be used for reference for other biometrics or for more in-depth biometrics research. In our future work, we will further explore the attributive information of the PPG signal to improve the performance [49].

Data Availability
e simulated data used to support the simulation part of this study are available from the corresponding author upon request, and the real-world PPG data can be obtained from https://www.physionet.org/physiobank/database/bidmc, https://archive.physionet.org/cgi-bin/ATM? database�mimic2db, and https://dataverse.scholarsportal. info/dataverse/capnobase.