I see it in your eyes: Training the shallowest-possible CNN to recognise emotions and pain from muted web-assisted in-the-wild video-chats in real-time

Highlights：

• The proposed models are computationally inexpensive, can be embedded into devices such as smartglasses.

• We use a novel feature selection paradigm that is driven by feature attribution score computations.

• We investigate and reason the AI performance, present computations on how exactly it utilises the input features to make affect-related predictions (Explainable AI).

• We compute the relevance and utilisation of facial action unit (FAU)-derived features by the model, comparing it against the human perception of emotion expression.

• We extend this FAU-based ’affect’ prediction approach to the FAU-based ’pain-intensity’ prediction problem.

• As FAUs can be extracted in near real-time, and because the models we developed are exceptionally shallow, this study paves the way for a robust, cross-cultural, end-to-end, in-the-wild real-time affect and pain prediction, that is also (nuanced or) value- and time-continuous.

摘要

•We propose the shallowest-possible, and perhaps the shallowest-ever, convolutional neural network model that can predict emotions from real-life, noisy, laggy, internet-based (in-the-wild) videos real-time, capturing nuances of emotions, i.e. value- and time-continuous affect prediction. The research we present in this paper is directly relevant to healthcare for applications such as real-time patient monitoring, AI-assisted doctor-patient consultations.•The proposed models are computationally inexpensive, can be embedded into devices such as smartglasses.•We use a novel feature selection paradigm that is driven by feature attribution score computations.•We investigate and reason the AI performance, present computations on how exactly it utilises the input features to make affect-related predictions (Explainable AI).•We compute the relevance and utilisation of facial action unit (FAU)-derived features by the model, comparing it against the human perception of emotion expression.•We extend this FAU-based ’affect’ prediction approach to the FAU-based ’pain-intensity’ prediction problem.•As FAUs can be extracted in near real-time, and because the models we developed are exceptionally shallow, this study paves the way for a robust, cross-cultural, end-to-end, in-the-wild real-time affect and pain prediction, that is also (nuanced or) value- and time-continuous.