Investigation of Five Algorithms for Selection of the Optimal Region of Interest in Smartphone Photoplethysmography

1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China 2Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen 518055, China 3Key Lab for Health Informatics of Chinese Academy of Sciences (HICAS), Shenzhen 518055, China 4Department of Physics and Materials Science, City University of Hong Kong, Kowloon, Hong Kong 5Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong


Introduction
Photoplethysmography (PPG) is an optical technique that can detect blood volume changes in the microvascular bed of tissue by using a light source and a detector [1].The light source transmits light of certain wavelengths that propagates through the microvascular bed of tissue and is received by the photoelectric detector.According to Lambert-Beer law, the light absorbed by blood is associated with the blood volume.Hence, the intensity of the light received by the detector changes synchronously with the blood volume in each heartbeat.This technique is easy to use and low in cost.It has been utilized in medical devices like pulse oximeters and has been widely used in clinical application to measure heart rate and blood oxygen saturation [2].With the development of modern digital signal processing techniques, PPG can be also used to detect breath rate [3,4], blood pressure [5,6], cardiac output [7,8], arterial stiffness [9,10], heart rate variability [11], and other underlying physiological information [12].
In recent years, a new kind of PPG technique was proposed based on a smartphone.This smartphone PPG (sPPG) acquires signals from the built-in camera of the smartphone.One only needs to place a finger on the camera lens and capture a video record with the built-in LED flash turned on; then several physiological parameters such as heart rate [13][14][15][16][17], respiratory rate [15,18], pulse volume [17], and oxygen saturation [15] can be estimated from the sPPG signals.With the help of a microphone to detect heart sound, blood pressure can also be estimated [19].This sPPG technique requires no specific hardware equipment except a smartphone.It needs only a software downloaded in the smartphone that can be used anywhere anytime by anyone.As smartphones are becoming ubiquitous in the world, the sPPG technique shows promising application in remote medicine and home healthcare service.
The sPPG is much the same as the traditional PPG (tPPG, e.g., a pulse oximeter), only replacing the light source and the detector with a LED flash and a camera, respectively.However, the sPPG signals are videos with three dimensions (two-dimensional image and one-dimensional time) while the tPPG signals are time-series with only one dimension.It is necessary to reduce the dimensions of the sPPG to one dimension for digital signal processing.The general approach to deal with the sPPG is to select a region of interest (ROI) where the light intensity changes markedly in the video frames and then to calculate the average intensity of the ROI for each frame to generate a time-series waveform.
The selection of the ROI is an important factor affecting the quality of the waveform and the subsequent accuracy of the physiological parameters measurement.However, to the best of our knowledge, it has not been well studied in literature.Matsumura et al. averaged each video frame among all of the pixels; that is, they set the whole frame as the ROI [20].Jonathan and Leahy and Scully et al. chose a fixed central region in each frame as the ROI [13,15].Chandrasekaran et al. split video frames into four quadrants and empirically selected the first quadrant as the ROI [19].Karlen et al. selected the best ROI with the maximal pulsatile amplitude after comparing 88 blue channels [21].To improve the reliability, two other methods were developed: Kurylyak et al. calculated the radius of the fitting circle after binarizing each frame [22], and Po et al. developed a frame adaptive ROI method to detour the color saturation or cut-off distortion [23].Both these methods had a time-varying ROI, confusing with the time-varying intensity of pixels.
Furthermore, in our previous work of extracting heart rate variability from smartphone photoplethysmograms [24], we found that the selection of ROI had an impact on the quality of the waveform and the conventional fixed ROI was not satisfactory.Therefore, we proposed five algorithms to further investigate the determination of the optimal ROI.These algorithms are variance (VAR), spectral energy ratio (SER), template matching (TM), temporal difference (TD), and gradient (GRAD), and their performances were evaluated using a 50-subject experiment.

Variance (VAR).
Every video frame is divided into  rows  columns,  ×  blocks in total. and  are set to proper values to make every block have a suitable size.Then the average intensity of each block for each frame is calculated over time to generate a time-series waveform.The waveform passes through a 4th Butterworth filter with passband 0.5 to 8 Hz to remove baseline wander and high-frequency noise [25].Then the variance of the output of the filter is calculated.At last, the block generating the waveform with the maximal variance is selected as the optimal ROI, for the reason that the PPG signal is sine-like and the maximal variance means the maximal signal power.

Spectral Energy Ratio (SER).
The frame division and waveform generation are the same as described in Section 2.1, but without filtering, and the SER rather than the variance of the waveform is calculated.The SER is first introduced by Lee and Wei for spectral analysis of pulse signals [26].We modified its definition as the ratio of the energy in the range of 0.5-3 Hz to the total energy of the waveform (1).The range 0.5-3 Hz is chosen because the frequency of heartbeats is usually in this range, corresponding to normal heart rates from 30 to 180 beats per minute (bpm).A higher SER indicates a larger proportion for cardiac activity and a smaller proportion for noise and interference in the total energy.Thus the block with the maximal SER is selected as the optimal ROI: where () is the power spectrum and  is the frame rate of the camera.

Template Matching (TM).
The frame division, waveform generation, and filtering of the TM algorithm are the same as described in Section 2.1.Then the waveform is crosscorrelated with a template, which is a typical PPG signal shown in Figure 1, to measure the similarity between them.
The cross-correlation is realized by a matched filter [27]: where () is the input, () is the output, and ℎ() is the impulse response which is the same as the template except flipped left-for-right.Afterwards, the similarity is quantified as the amplitude of ().The higher the similarity is, the more the waveform matches the template.Thus the block with the maximal similarity is selected as the optimal ROI.

Temporal Difference (TD).
TD is a commonly used algorithm to separate moving objects and the background [28].It can be also applied to sPPG video processing.First, the TD is calculated as the absolute difference of the intensity for each pixel between two adjacent frames.It reflects the intensity variation of each pixel.In most cases, the value of the TD is too small and is sensitive to noise.Therefore, the TD for every interframe during a time interval is summed to reduce the effect of the noise.The time interval can be set to 2 s or longer to cover at least one heart cycle that is long enough to completely reflect the intensity variation caused by cardiac activity.Thereafter, the summed TD map is divided into  rows  columns,  ×  blocks in total.The average of the TD value in each block is counted and the block with the greatest average is selected as the optimal ROI.

Gradient (GRAD).
From some preliminary results of the four algorithms mentioned above (Figure 2), we observed that the optimal ROI was neither the brightest block nor the darkest one; it often existed in the blocks with medium intensity between the brightest and the darkest, namely, the transition region with significant changes of the intensity for pixels.In light of this observation, we thereby proposed the GARD algorithm.First, a frame of image is chosen from the sPPG video and its gradient is calculated.Then the gradient map is divided into  rows  columns,  ×  blocks in total.The average gradient of each block is calculated and finally the block with the greatest average gradient is selected as the optimal ROI.

Evaluation
We evaluated the effectiveness of the aforementioned five algorithms with a 50-subject experiment.The experiment was approved by the Institutional Review Board of Shenzhen Institutes of Advanced Technology (registration number: SIAT-IRB-140215-H0040).The subjects included 34 males and 16 females, age 20-31 years, height 150-183 cm, and weight 40-90 kg.They were all healthy without any known diseases and their written informed consent was obtained.
In the experiment, all the subjects were instructed to lie on a mattress and to place their right index finger on the camera lens of an HTC S510e smartphone with the builtin LED flash turned on.A camera application (APP) in the smartphone recorded the video of the fingertip for 1 minute with a resolution of 320 × 240 pixels at the sampling rate of 30 frames per second (fps).Simultaneously, a Finometer MIDI (Model II, Finapres Medical Systems B.V., The Netherlands) was used to collect the electrocardiogram (ECG) signals at a sampling rate of 200 Hz and automatically stored the signals in the computer by a BeatScope Easy software (Finapres Medical Systems B.V., The Netherlands).The subjects were asked to keep as still as possible throughout the recording period.
The five algorithms introduced in Section 2 were employed to determine the optimal ROI.To compare with them, a sixth algorithm that sets a fixed central region (FCR) of the frame as ROI was also employed, since it was most used in literature [13,15].Then the time-series waveform of the selected ROI for each algorithm was generated as the average intensity in the red channel of the ROI for each frame.Afterwards, the waveform was filtered and processed by Fast Fourier Transform (Figure 3) to estimate the heart rate near the heartbeat frequency [13], about the range of 0.5-3 Hz corresponding to 30-180 bpm in normal heart rate.
On the other hand, the "true" heart rate was calculated by ECG analysis.R-wave peaks of the ECG were detected using Pan and Tompkins' algorithm [29] and the heart rate was determined as 60 times the inverse of the mean R-to-R intervals (RRI), shown in To evaluate the accuracy of the six algorithms for ROI selection, the heart rates estimated from the sPPG were compared with those estimated from the ECG by using statistical analysis (note that two subjects were excluded for the FCR algorithm, explained below).As shown in Table 1, the Pearson correlation coefficients for VAR, TM, TD, and GRAD were all greater than 0.95, except for SER and FCR.All the six algorithms had the standard errors of estimate (SEE) less than 5 bpm, especially the TM and TD less than 2 bpm.The Bland-Altman analysis showed that all the six algorithms had a bias less than 1 bpm.The VAR, TM, TD, GRAD, and FCR had the limits of agreement (LA) less than 5 bpm but the SER had the LA greater than 5 bpm.In Figure 4, the Bland-Altman plots revealed that more than 95% of the data points fell within LA for all the six algorithms.
Previous research has suggested that the correlation coefficients should be greater than 0.90 and the SEE should be less than 5 bpm for heart rate monitors [16].Accordingly, all the six algorithms are valid, which indicates that the sPPG technique can provide accurate measurement of heart rate.However, the performances of these algorithms are different.The FCR algorithm failed in two subjects because the intensity in the central region of the video was saturated in the two subjects and no signals could be extracted, whereas Figure 4: Bland-Altman plots between heart rates measured by the electrocardiogram and those measured by the smartphone using the six algorithms for selection of region of interest.Note that two subjects were excluded for the FCR algorithm; namely, the number of subjects was 48 for the FCR algorithm and 50 for the other algorithms.VAR: variance; SER: spectral energy ratio; TM: template matching; TD: temporal difference; GRAD: gradient; FCR: fixed central region; HR: heart rate; bpm: beat per minute.the five proposed algorithms always found the optimal ROI and calculated the heart rate.
As to the five proposed algorithms, in general, the TM and the TD algorithms were better than the other three because they had greater coefficients, smaller SEE, and smaller LA.The SER was worse than the other four because it had the least coefficients, the largest SEE, the largest bias, and the largest LA.The performance of the VAR and GRAD was in the middle.

Advantages and Disadvantages of the Five Algorithms.
All the five algorithms are simple in principle and easy to implement.Each of them can select the optimal ROI according to its own decision rules.The VAR, the SER, and the TM are based on waveform processing so that they have to perform frame division and waveform generation first, which is time-consuming.However, the TD and the GRAD are based on image processing so that they can perform ROI selection before the waveform generation.This is time-saving and well suits smartphones with limited processing power, but the ROI selected may not be optimal to generate the waveform with the best quality.This weakness is acceptable because the ultimate objective of the sPPG technique is the extraction of physiological parameters like heart rate, and a suboptimal ROI will still work well (see Table 1).
The VAR algorithm selects the block generating the waveform with the maximal variance as the optimal ROI.The maximal variance is equivalent to the maximal power if the waveform is zero-mean.However, the VAR algorithm neglects whether the power is produced from the signal or the noise.It might report a wrongly selected ROI in the case that the waveform is polluted by square waves, triangular waves, or other fluctuations.Fortunately, these cases rarely happen in practice and the VAR algorithm works well with the SEE less than 3 bpm and the LA less than 5 bpm.
The SER algorithm was expected to have a good performance but it did not.The reason may be that the sPPG waveform is periodic and has harmonics out of the range 0.5-3 Hz (Figure 3).Consequently, the SER is not a good quantitative index for the ratio of heart energy to the total energy.
The TM algorithm needs a preset reference template and measures the similarity between the waveform and the template.Any PPG-like wave can be set as the template.The exact shape of the template is not important because the exact matching is not concerned.From the standpoint of filtering, a matched filter is a band-pass filter that maximizes the output signal-to-noise ratio.Therefore, the TM algorithm works better than the VAR algorithm.
The TD algorithm is commonly used to separate moving objects and the background.It is also useful to reflect the intensity variation caused by heartbeats in sPPG videos.It works much better than the VAR, the SER, and the GRAD and slightly better than the TM algorithm.It has another advantage to recognize the incorrect placement when the finger does not cover the camera lens.Its drawback is that it is sensitive to the finger's movement.Nevertheless, this can be avoided by the subject's self-control.
The GRAD algorithm is an empirical algorithm.It also works well with the SEE less than 3 bpm and the LA less than 5 bpm.However, it may mistake the edge of the finger for the optimal ROI when the finger half-covered the lens.
To summarize, the TM and the TD are better than the other three algorithms.The TD algorithm is slightly better than the TM and more suitable for smartphone applications.

Spatial Resolution and Temporal Resolution.
The spatial resolution has a negligible impact on the sPPG waveform, since the spatial resolution describes the ability of the camera to show clear details which are not concerned in the sPPG technique, and the video frames are blurred by averaging all the pixels in the ROI.On the other hand, compared with the sPPG, the tPPG can be regarded as a camera with only one pixel.
On the contrary, the temporal resolution has a significant impact on the sPPG waveform.According to Nyquist sampling theorem, the sampling rate should be higher than twice the signal frequency.Therefore, the frame rate of the camera should be greater than 6 fps to detect the heart rate if the normal heart rate is less than 3 Hz (180 bpm).Fortunately, most of the commercial smartphones meet the requirements.But if more are required to detect the details of the waveform like peaks and dicrotic notches, the frame rate should be greater than 40 fps to reconstruct the complete pulse wave after digitization, as the maximum frequency of the pulse wave is less than 20 Hz [30].

Shape and Size of the ROI.
Theoretically, the shape of the ROI should be round, for the light isotropically travels through space.In practice, a rectangular shape is more appropriate for computer processing.
The size of the ROI is a trade-off between the computational load and the antinoise capability.If the size is too large, the waveform generated by averaging the ROI is less sensitive to the noise, but more computational time is needed.If the size is too small, the computational time is reduced but the generated waveform contains more noise.Nam et al. suggested a ROI size of 50 × 50 pixels and found that a larger ROI could not provide better signal quality [18].As a rule of thumb, we think a proper size with 20 × 20-100 × 100 pixels is workable.

Conclusion
The sPPG technique is easy to use and low in cost.It has great potentials to be applied in remote medicine and home healthcare service, especially for rural district and developing countries.However, the determination of the optimal ROI is an important practical problem that one encounters when dealing with sPPG videos.We thereby proposed five algorithms to solve this problem in the present study.The results showed that the TM and the TD algorithms were better than the other three as they had less standard error of estimate and smaller LA.The TD algorithm was slightly better than the TM algorithm and more suitable for smartphone applications.Therefore, the TD algorithm can be used in smartphones to promote the practicability of the sPPG technique.It may be also helpful to improve the accuracy of the physiological parameters measurement.

Figure 1 :
Figure 1: The template of a typical photoplethysmographic signal.

Figure 2 :
Figure 2: A video frame illustrating the positions of the optimal regions of interest (ROI) selected by the five algorithms.A, B, C, and D indicate the optimal ROI selected by the VAR, SER, TM, and TD algorithm, respectively.Note that C is also the optimal ROI selected by the GRAD algorithm.

Figure 3 :
Figure 3: Heart rate estimation from the smartphone photoplethysmographic signal.(a) The filtered average intensity of the optimal region of interest selected by the VAR algorithm.(b) The frequency spectrum of (a).

Table 1 :
Comparison between heart rates measured by the electrocardiogram and those measured by the smartphone using the five algorithms for selection of region of interest.Note that two subjects were excluded for the FCR algorithm; namely, the number of subjects was 48 for the FCR algorithm and 50 for the other algorithms.:Pearsoncorrelation coefficients; SEE: standard error of estimate; LA: limit of agreement = 1.96 × standard deviation; VAR: variance; SER: spectral energy ratio; TM: template matching; TD: temporal difference; GRAD: gradient; FCR: fixed central region; bpm: beat per minute. *