Learning to transfer focus of graph neural network for scene graph parsing

作者:

Highlights:

• A new neural network architecture, the graphical focal network, is proposed to improve the recognition rate of semantic relationship in scene graph parsing task.

• The proposed graphical focal loss transfers the focus of network learning to the semantic relationship types with high value but limited instances.

• The proposed relative depth encoding module and regional layout encoding module introduce effective 3D spatial layout information.

• On the two evaluation metrics of scene graph parsing tasks, our method has achieved new advanced performance on the Visual Genome dataset.

摘要

•A new neural network architecture, the graphical focal network, is proposed to improve the recognition rate of semantic relationship in scene graph parsing task.•The proposed graphical focal loss transfers the focus of network learning to the semantic relationship types with high value but limited instances.•The proposed relative depth encoding module and regional layout encoding module introduce effective 3D spatial layout information.•On the two evaluation metrics of scene graph parsing tasks, our method has achieved new advanced performance on the Visual Genome dataset.

论文关键词:Semantic relationship,Graphical focus,Scene graph,Class imbalance,Image understanding

论文评审过程:Received 29 November 2019, Revised 25 August 2020, Accepted 14 October 2020, Available online 17 October 2020, Version of Record 30 January 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107707