Are 3D convolutional networks inherently biased towards appearance?

作者:

Highlights:

• We confirm the clear connection between activities and their locations in 3D convolutional networks.

• We define new temporality measurements for 3D video networks.

• In the Kinetics dataset — we show the appearance bias in later layers of a network.

• We present two new datasets, in which we explicitly decouple motion and appearance.

• Appearance bias is not inherent to 3D models, but rather to the datasets.

• We test various real-world videosets and point to one with minimum appearance bias.

摘要

•We confirm the clear connection between activities and their locations in 3D convolutional networks.•We define new temporality measurements for 3D video networks.•In the Kinetics dataset — we show the appearance bias in later layers of a network.•We present two new datasets, in which we explicitly decouple motion and appearance.•Appearance bias is not inherent to 3D models, but rather to the datasets.•We test various real-world videosets and point to one with minimum appearance bias.

论文关键词:3D models,Temporality measure,Motion analysis,Large-scale videosets

论文评审过程:Received 14 July 2021, Revised 27 March 2022, Accepted 13 April 2022, Available online 27 April 2022, Version of Record 11 May 2022.

论文官网地址:https://doi.org/10.1016/j.cviu.2022.103437