I see it in your eyes: Training the shallowest-possible CNN to recognise emotions and pain from muted web-assisted in-the-wild video-chats in real-time

作者:

Highlights:

• We propose the shallowest-possible, and perhaps the shallowest-ever, convolutional neural network model that can predict emotions from real-life, noisy, laggy, internet-based (in-the-wild) videos real-time, capturing nuances of emotions, i.e. value- and time-continuous affect prediction. The research we present in this paper is directly relevant to healthcare for applications such as real-time patient monitoring, AI-assisted doctor-patient consultations.

• The proposed models are computationally inexpensive, can be embedded into devices such as smartglasses.

• We use a novel feature selection paradigm that is driven by feature attribution score computations.

• We investigate and reason the AI performance, present computations on how exactly it utilises the input features to make affect-related predictions (Explainable AI).

• We compute the relevance and utilisation of facial action unit (FAU)-derived features by the model, comparing it against the human perception of emotion expression.

• We extend this FAU-based ’affect’ prediction approach to the FAU-based ’pain-intensity’ prediction problem.

• As FAUs can be extracted in near real-time, and because the models we developed are exceptionally shallow, this study paves the way for a robust, cross-cultural, end-to-end, in-the-wild real-time affect and pain prediction, that is also (nuanced or) value- and time-continuous.

摘要

•We propose the shallowest-possible, and perhaps the shallowest-ever, convolutional neural network model that can predict emotions from real-life, noisy, laggy, internet-based (in-the-wild) videos real-time, capturing nuances of emotions, i.e. value- and time-continuous affect prediction. The research we present in this paper is directly relevant to healthcare for applications such as real-time patient monitoring, AI-assisted doctor-patient consultations.•The proposed models are computationally inexpensive, can be embedded into devices such as smartglasses.•We use a novel feature selection paradigm that is driven by feature attribution score computations.•We investigate and reason the AI performance, present computations on how exactly it utilises the input features to make affect-related predictions (Explainable AI).•We compute the relevance and utilisation of facial action unit (FAU)-derived features by the model, comparing it against the human perception of emotion expression.•We extend this FAU-based ’affect’ prediction approach to the FAU-based ’pain-intensity’ prediction problem.•As FAUs can be extracted in near real-time, and because the models we developed are exceptionally shallow, this study paves the way for a robust, cross-cultural, end-to-end, in-the-wild real-time affect and pain prediction, that is also (nuanced or) value- and time-continuous.

论文关键词:Affect recognition,Healthcare,Real-time,Explainable,Feature selection,In-the-wild

论文评审过程:Received 18 November 2019, Revised 19 June 2020, Accepted 20 June 2020, Available online 16 July 2020, Version of Record 16 July 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102347