ViXNet: Vision Transformer with Xception Network for deepfakes based video and image forgery detection

作者：

Highlights：

• Proposed a deep learning based model for deepfake image/video detection.

• It has a patch-wise self-attention module which learns local image artifacts.

• It consists of a vision transformer which learns correlation among masked patches.

• Xception based global image features are stacked with patch based local features.

• The model achieves good results on some standard video forgery detection datasets.

摘要

•Proposed a deep learning based model for deepfake image/video detection.•It has a patch-wise self-attention module which learns local image artifacts.•It consists of a vision transformer which learns correlation among masked patches.•Xception based global image features are stacked with patch based local features.•The model achieves good results on some standard video forgery detection datasets.

论文关键词：Deepfakes,FaceSwap,Soft attention,Vision transformer,Forgery detection,Xception model

论文评审过程：Received 27 January 2022, Revised 5 July 2022, Accepted 3 August 2022, Available online 8 August 2022, Version of Record 16 August 2022.

论文官网地址：https://doi.org/10.1016/j.eswa.2022.118423