A novel deep pixel restoration video prediction algorithm integrating attention mechanism

作者：Muxuan Yuan, Qun Dai

摘要

With the rapid development of deep learning, in recent years, many excellent deep learning models have been developed to solve the problem of video frame prediction. Among them, most models directly generate predicted target frames. However, the predicted frames obtained in this way are often fuzzy and not realistic enough. In order to solve this problem, this paper first attempts to integrate the attention mechanism with Convolutional Long Short-Term Memory, and correspondingly proposes a new deep learning model, abbreviated as AttConvLSTM. One prominent and original characteristic of this newly constructed model is that, each of its layer calculates the attention weight of the obtained information, focusing on the information key part. Although the proposed AttConvLSTM model effectively improves the prediction accuracy, it still does not solve the problem that the prediction frames directly generated by classical deep learning models are often fuzzy and not realistic. Therefore, inspired by the concept of optical flow, this work further develops a novel Deep Pixel Restoration AttConvLSTM (DPRAConvLSTM) model. This model cleverly uses the input frames and the end-to-end characteristics of deep learning. We innovatively restore the pixels of the input frames to the predicted frames, thereby avoiding the defects that typical deep learning models can easily cause, when directly generating the predicted frames. The experimental results effectively confirm that the finally formed DPRAConvLSTM model can not only improve the accuracy of prediction, but also obtain clearer and more realistic prediction frames.

论文关键词：Video prediction, Deep learning, Attention mechanism, Pixel restoration

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-021-02631-9