Structural fragmentation in scene graphs
作者:
Highlights:
•
摘要
Despite continuous performance improvements, contemporary Scene Graph (SG) systems tend to generate ‘fragmented’ graphs. A central problem is that standard metrics only measure similarity to ground truth graphs at the triplet level and may not fully capture image relevance or semantic correctness. In particular, multiple triplet predictions are usually made for the same ground truth regions, which can be considered as a trivial method to improve the standard evaluation metric, i.e. recall. The central purpose of our work is to reveal the inherent drawback of current SG evaluation methods and the resultant redundancy issue. We investigate different types of graph artifacts in SGs generated by existing models and propose two graph quality metrics to evaluate the level of fragmentation. Detailed analysis is given to show how SG model architectures contributes to graph fragmentation. We study these problems in the context of graph semantic quality assessment. Qualitative assessment via human study is conducted to evaluate the semantic consistency between the proposed metrics and human perception. To further clarify the validity of the new source of error, a simple but effective method which targets graph fragmentation is presented. Systematic experiments are conducted with the standard Visual Genome (VG) dataset and the Visual Relationship Detection (VRD) dataset. Experimental results show that our proposed system significantly improves the scene graph quality in terms of the new metrics as well as the traditional Top-N recall values.
论文关键词:Scene graph,Graph coherence,Clustering,Semantic quality,Human study
论文评审过程:Received 20 April 2020, Revised 4 October 2020, Accepted 6 October 2020, Available online 22 October 2020, Version of Record 27 October 2020.
论文官网地址:https://doi.org/10.1016/j.knosys.2020.106504