Nowadays, the method of simple-feature extraction has been extensively studied and is used in PPG biometric recognition; some promising results have been reported. However, some useful information is often lost in the process of PPG signal denoising; the time-domain, frequency-domain, or wavelet feature extracted is often partial, which cannot fully express the raw PPG signal; and it is also difficult to choose the appropriate matching method. Therefore, to make up for these shortcomings, a method of PPG biometric recognition based on multifeature extraction and naive Bayes classifier is proposed. First, in the preprocessing of the raw PPG data, the sliding window method is used to rerepresent the raw data. Second, the feature-extraction methods based on time-domain, frequency-domain, and wavelet are analysed in detail, then these methods are used to extract the time-domain, frequency-domain, and wavelet features, and the features are concatenated into a multifeature. Finally, the multifeature is normalized and combined with classifiers and Euclidean distance for matching and decision-making. Extensive experiments are conducted on three PPG datasets, it is found that the proposed method can achieve a recognition rate of 98.65%, 97.76%, and 99.69% on the respective sets, and the results demonstrate that the proposed method is not inferior to several state-of-the-art methods.
Since biometrics system based on human physiological or behavioural characteristics is more reliable and safer than traditional identification technology, certain distinctive features of our body or behavioural attributes, such as fingerprint [
PPG is a noninvasive electrooptical methodology which provides much physiological information about the human body. In recent years, PPGs were proven to have competency to distinguish individuals [
To the best of our knowledge, Gu et al. [
Summary of technologies or methods used for PPG biometrics.
Stage | Technology or method | Literature |
---|---|---|
Preprocessing | LPF | [ |
BWF | [ | |
FD and SD | [ | |
LPF, FD, and SD | [ | |
FIR, FD, and SD | [ | |
PD | [ | |
Sliding window | [ | |
Feature extraction | TDF | [ |
TDF and FDF | [ | |
DWT | [ | |
CWT | [ | |
LDA | [ | |
TDF and KLT | [ | |
SCW, FDF, and TDF | [ | |
Sparse softmax vector | [ | |
Matching or classification | k-NN | [ |
k-NN and MV | [ | |
LDC | [ | |
SVM | [ | |
BN and k-NN | [ | |
PearsD | [ | |
Manhattan distance and Euclidean distance | [ | |
BN, NB, RBF, and MLP | [ | |
DCT | [ | |
SVM, GBT, and IsFrst | [ | |
k-NN, NB, RF, and LDC | [ |
Table
First, we use the sliding window method to rerepresent the PPG raw data and analyse the influence of sliding window size and sliding step size on the recognition rate.
Second, we propose a novel multifeature-extraction method to efficiently capture time-domain, frequency-domain, and wavelet signal characteristics and combine them to form a multifeature.
Finally, extensive experiments are conducted on three PPG datasets; the results demonstrate that the proposed method is not inferior to several state-of-the-art methods.
The remainder of this paper is organized as follows. Section
To perform PPG biometric recognition, a PPG biometrics framework was designed. First (mentioned in Section
Block diagram of the proposed PPG biometrics process.
There are various sources of artifacts that interfere with PPG signal acquisition, including motion artifacts, baseline wander, and power line interference, so many scholars used various denoising methods to preprocess data. However, some useful information may be removed in the process of denoising. Therefore, to avoid the situation, this paper uses the sliding window [
Working process of the sliding window.
In the process of rerepresenting the raw PPG data using the sliding window method, the size of the sliding window and the sliding step size are the main influence factors of recognition rate. With the increase of the sliding window size, the recognition rate will also be improved but will be in a relatively stable state after the window size reaching a certain extent. With the increase of sliding step size, the recognition rate will decrease; when the sliding step size is smaller, the recognition effect is better. We analyse
To the best of our knowledge, time-domain, frequency-domain, and wavelet features are simultaneously extracted for the PPG biometrics, which has not yet been achieved. Therefore, this section studies multiple features’ extraction and their concatenation.
TDF has a wide application in PPG biometrics researches [
Calculation formula of the time-domain feature (TDF) of a sample.
Label of the feature | Name of the feature | Formula |
---|---|---|
Max value | ||
Min value | ||
Peak value (PV) | ||
Peak to peak value | ||
Mean value | ||
Square root amplitude (SRA) | ||
Mean amplitude (MA) | ||
Variance | ||
Standard deviation (STD) | ||
Root mean square (RMS) | ||
Skewness | ||
Kurtosis | ||
Waveform index | ||
Pulse index | ||
Peak index | ||
Margin index | ||
Clearance index |
The frequency-domain representation (spectrum) refers to breaking down a signal into its sinusoids. That is, the spectrum of a signal is a representation of its frequency content [
Calculation formula of each FDF.
Label of the feature | Name of the feature | Formula |
---|---|---|
Gravity frequency (GF) | ||
Mean frequency | ||
RMS frequency | ||
Frequency standard deviation |
Time-domain or frequency-domain characteristic cannot describe the time varying of signal and cannot be localized for signal analysis. That is, their time-frequency resolution is not higher. While wavelet functions are localized in the time-domain as well as in the frequency-domain [
The wavelet packet decomposition is a generalization of wavelet analysis that offers a more precise analysis method for signals; wavelet packet atoms are waveforms indexed by frequency, scale, and position three parameters [
Wavelet packet decomposition tree at level 3.
The wavelet feature including frequency-band energy ratio, energy entropy, scale entropy, and singular entropy is extracted as follows: The eight frequency subbands are obtained for each sample by three-level wavelet packet decomposition based on wavelet basis db8 (to this end, the mother wavelets such as sym8, db8, coif5, bior3.9, and dmey were studied, and experiments show that db8 is the best), and the wavelet packet decomposition coefficients are extracted from the subbands, which are reconstructed, and then they are denoted by Energy ratios (ER) of the eight frequency subbands are calculated by the formula as follows: Energy entropy (EE) is calculated by the formula as follows: Scale entropy (SE) of eight frequency subbands is obtained according to the wavelet packet decomposition coefficients without being reconstructed, which is denoted as Computing singular spectral entropy (SSE): firstly, the singular spectral vector (SSV) is obtained according to wavelet packet decomposition coefficient with reconstructed, which is denoted as Finally, the SSE is calculated by the formula as follows: The wavelet feature (WF) extracted for the i-th sample of the n-th subject can be expressed as
As previously mentioned, in the past researches, the methods of simple-feature extraction with time, frequency, or wavelet are often preferred, while the methods based on the simple-feature analysis have their own shortcomings. Therefore, in order to reduce the impact of the shortcomings, make the extracted features more approximately represent the raw PPG signal, and make PPG biometric recognition more accurate, after
Due to the different dimensions of indexes, if the various feature extracted is directly sent to the classification algorithm, the weight of the algorithm will fluctuate in the process of convergence, and it is easy to converge to the local optimal result. Therefore, in order to avoid this situation,
In this paper, the matching uses the following classification and Euclidean distance to produce their decisions.
Classification is a fundamental task in pattern recognition that involves the formation of a classifier. In this study, we use two commonly applied classifiers (NBC and LDC) to identify the class labels for subjects.
Naive Bayes [
Here, the multifeature (
In this study, Euclidean distance is used to compute the similarity between two PPG features extracted, and its decision is produced. In other words, in this section, we describe the matching procedure for the testing data. In general, the testing data are divided into two separate sets: one is an enroll set, and the other is a probe set. The enroll set is regarded as the matching template, and the probe set is regarded as the query sample [
First, as illustrated in Section
In the section, all experiments were conducted on a personal computer with
Performance of the proposed method was tested on the three PPG datasets including Beth Israel Deaconess Medical Center (BIDMC) [
The BIDMC dataset comprises 53 8 min recordings of ECG, PPG, and so on, signals (sampling frequency, fs = 125 Hz) acquired from adult patients (aged 19–90+, 32 females). The patients were randomly selected from a larger cohort that was admitted to medical and surgical intensive care units at the BIDMC, Boston, Mass., USA. The MIMIC database collects recordings of PLETH, ABP, RESP, and so on of patients in ICUs and is published on PhysioBank ATM for free. PLETH is the PPG data signal needed and its frequency is 125 Hz. The partial recordings of 32 patients of 10 min duration downloaded by [
To evaluate the performance of the proposed method, we conducted experiments on two methods. For the identification problem, the recognition rate is used as the evaluation criterion, which is the percentage of correctly classified testing samples, defined as follows:
Since the size of sliding window (
Influence of the
Influence of
As can be seen from Figure
Because the testing data are directly sent to the classification method provided by MATLAB, it is not easy to operate the internal details of the classification method and report EER. Therefore, in this section, we not only did experiments based on classifiers but also did experiments based on Euclidean distance.
MATLAB (R2018b) was used as the experimental programming environment. To avoid special cases, each experiment was run multiple times, and their average value was finally reported.
To evaluate the effectiveness of the features from various feature extraction, this experiment uses the LDC and NBC. Firstly, we used the method proposed in Section
Recognition rates of the various features combined with LDC and NBC on the three datasets.
Dataset | Feature vector | Recognition rate (%) | |
---|---|---|---|
LDC | NBC | ||
BIDMC | TDF | 87.33 | 96.73 |
FDF | 33.24 | 27.23 | |
WF | 43.15 | 40.63 | |
TDF and FDF | 89.37 | 97.48 | |
TDF and WF | 92.86 | 98.33 | |
FDF and WF | 64.84 | 55.13 | |
MF | 93.90 | 98.65 | |
MIMIC | TDF | 89.38 | 93.85 |
FDF | 38.23 | 28.18 | |
WF | 57.82 | 65.94 | |
TDF and FDF | 92.24 | 94.17 | |
TDF and WF | 94.32 | 97.40 | |
FDF and WF | 70.94 | 74.27 | |
MF | 95.42 | 97.76 | |
CapnoBase | TDF | 99.04 | 99.12 |
FDF | 43.25 | 37.86 | |
WF | 70.48 | 78.75 | |
TDF and FDF | 99.12 | 99.21 | |
TDF and WF | 99.26 | 99.34 | |
FDF and WF | 79.53 | 82.98 | |
MF | 99.47 | 99.69 |
Table
To account the usability-security trade-off, we conducted this experiment in Python 3.6 and reported the ROC and EER curves and so on. Firstly, we used the proposed method in Section
ROC curve.
EER curve.
Recognition rates and EERs based on the various features and Euclidean distance on the three datasets.
Dataset | Feature vector | Recognition rate (%) | EER (%) |
---|---|---|---|
BIDMC | TDF | 96.08 | 3.52 |
TDF and FDF | 96.47 | 3.48 | |
TDF and WF | 96.71 | 3.29 | |
MF | 98.06 | 1.94 | |
MIMIC | TDF | 96.67 | 3.43 |
TDF and FDF | 96.86 | 3.18 | |
TDF and WF | 96.87 | 3.14 | |
MF | 96.99 | 3.11 | |
CapnoBase | TDF | 97.59 | 2.43 |
TDF and FDF | 97.59 | 2.41 | |
TDF and WF | 97.60 | 2.39 | |
MF | 97.62 | 2.36 |
These experimental results further show that the multifeature extracted has high discrimination, and the recognition rate and EER based on the multifeature are the best on the three datasets. As shown in Table
Friedman test [
Significances of the intersubject and intrasubject on the three datasets.
Dataset | Significance | |
---|---|---|
Intersubject | Intrasubject | |
BIDMC | 0.000 | 0.291 |
MIMIC | 0.000 | 0.054 |
CapnoBase | 0.000 | 0.147 |
Generally, when the significance is less than 0.05 or 0.01, the difference among samples is considered to be significant or extremely significant; when the significance is greater than 0.05, it is considered to be not significant. As shown in Table
We perform comparisons between our method and the state-of-the-art methods on the three datasets. Karimian et al. [
Comparison of the proposed method and state-of-the-art methods on the three datasets.
Dataset | Method | Matching | Recognition rate (%) | EER (%) |
---|---|---|---|---|
BIDMC | [ | NBC | 98.66 | - |
Proposed | NBC | 98.65 | - | |
MIMIC | [ | NBC | 97.15 | - |
Proposed | NBC | 97.76 | - | |
CapnoBase | [ | k-NN | 99.84 | 1.31 |
[ | RF | 99.00 | - | |
[ | Manhattan and Euclidean distance | - | 1.00 | |
[ | NBC | 99.71 | - | |
Proposed | NBC | 99.69 | - |
As shown in Table
To verify the efficiency of the proposed framework, the preprocessing time (in seconds), feature-extraction time, and matching time cost during the PPG biometrics procedure on the MIMIC dataset are further summarized. The results are summarized in Table
Comparison of the proposed method and [
Method | Process | Time (s) |
---|---|---|
Proposed | Preprocessing | 0.000116 |
Feature extraction | 0.026431 | |
Matching | 0.000264 | |
[ | Preprocessing | 0.000107 |
Feature extraction | 0.451316 | |
Matching | 0.001217 |
In this study, we propose a novel PPG biometrics framework. Firstly, we use the sliding window method to rerepresent the PPG raw data to avoid losing useful information due to denoising and analyse the influence of sliding window size and sliding step size on the recognition rate. Secondly, the time-domain, frequency-domain, and wavelet features are extracted and combined to form the multifeature, which is to make up for the lack of a single feature extracted. Finally, the normalized multifeature is matched by LDC, NBC, and Euclidean distance, and it is to make a decision. The extensive experimental results demonstrate that the method based on multifeature and NBC can achieve high recognition rates of 98.65%, 97.76%, and 99.69% on the BIDMC, MIMIC, and CapnoBase datasets, respectively, and it can be seen that our method is not inferior to several state-of-the-art methods. It should be noted that the multifeature can also achieve a high recognition rate using LDC on the three datasets. The extensive experimental results on the three datasets also demonstrate that the PPG biometrics based on the multifeature and Euclidean distance can provide a lower EER and high recognition rate. In particular, our method is suitable for small-scale PPG data biometrics and consumes fewer resources.
Despite the satisfactory performance achieved by our method, there is still some room for the proposed PPG biometrics framework. For example, the cost time of multifeature extraction needs to be improved, especially wavelet feature-extraction time, which is above 95% of the multifeature-extraction time; the fusion method of time-domain, frequency-domain, and wavelet features needs to be improved; how to apply PPG biometrics with large-scale data. In addition, our method can be used for reference for other biometrics or for more in-depth biometrics research. In our future work, we will further explore the attributive information of the PPG signal to improve the performance [
The simulated data used to support the simulation part of this study are available from the corresponding author upon request, and the real-world PPG data can be obtained from
The authors declare no conflicts of interest.
This work was supported in part by the NSFC-Xinjiang Joint Fund under Grant U1903127 and in part by the Key Research and Development Project of Shandong Province under Grant 2018GGX101032.