Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

作者：Rama Krishna Kandukuri, Jan Achterhold, Michael Moeller, Joerg Stueckler

摘要

Representation learning for video is increasingly gaining attention in the field of computer vision. For instance, video prediction models enable activity and scene forecasting or vision-based planning and control. In this article, we investigate the combination of differentiable physics and spatial transformers in a deep action conditional video representation network. By this combination our model learns a physically interpretable latent representation and can identify physical parameters. We propose supervised and self-supervised learning methods for our architecture. In experiments, we consider simulated scenarios with pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. We demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences. We evaluate the accuracy of our training methods, and demonstrate the ability of our method to predict future video frames from input images and actions.

论文关键词：Physical scene understanding, Video representation learning, Differentiable physics

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11263-021-01493-5