Unifying frame rate and temporal dilations for improved remote pulse detection

作者:

Highlights:

摘要

Remote photoplethysmography (rPPG) is the monitoring of blood volume pulse from a camera at a distance. 3-Dimensional Convolutional Neural Networks (3DCNNs) have shown promising performance on the rPPG task, although it is critical that we understand the impact of both video and model parameters. In this paper, we explore the effect of frame rate, temporal kernel width, and – more generally – temporal receptive field on the reliability of heart rate and waveform estimation carried out by 3DCNNs. We train and evaluate 32 3DCNNs with different temporal parameters on a new large-scale database for physiological monitoring in an interview scenario. We show that previous studies reporting null effects of frame rate changes on pulse estimators may no longer be valid when using CNNs, and decreasing the frame rate may actually improve performance. In particular, we found that models trained on videos with frame rates as low as 12.9 frames per second (fps) perform better than those trained on videos recorded at a full 90 fps, perhaps due to the temporal receptive fields becoming larger in time dimension when the fps decreases. Using this insight, we propose RemotePulseNet, a novel 3DCNN architecture that exploits temporally dilated convolutions with increasing dilation rate to drastically increase the receptive field. We compare its performance with that of recent state-of-the-art pulse estimation methods, and show that both RemotePulseNet and the low frame rate 3DCNNs produce high-quality pulse signals from faces captured under a challenging interview scenario. The source code and instructions for obtaining a copy of the test data are made available with this paper.

论文关键词:

论文评审过程:Received 31 January 2021, Revised 16 May 2021, Accepted 7 July 2021, Available online 10 July 2021, Version of Record 17 July 2021.

论文官网地址:https://doi.org/10.1016/j.cviu.2021.103246