Speech-driven facial animation with spectral gathering and temporal attention

作者：Yujin Chai, Yanlin Weng, Lvdi Wang, Kun Zhou

摘要

In this paper, we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip. By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism, we design a light-weight speech encoder that learns useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data. To learn subject-independent facial motion, we use deformation gradients as the internal representation, which allows nuanced local motions to be better synthesized than using vertex offsets. Compared with state-of-the-art automatic-speech-recognition-based methods, our model is much smaller but achieves similar robustness and quality most of the time, and noticeably better results in certain challenging cases.

论文关键词：speech-driven facial animation, spectral-dimensional bidirectional long short-term memory, temporal attention, deformation gradients

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11704-020-0133-7