Learning language to symbol and language to vision mapping for visual grounding

作者:

Highlights:

• We propose a novel method to extract symbolic features for cross-modality mapping.

• We propose a residual attention language parser to process variable expressions.

• We achieve competitive performance on datasets: RefCOCO, RefCOCO +, and RefCOCOg.

摘要

•We propose a novel method to extract symbolic features for cross-modality mapping.•We propose a residual attention language parser to process variable expressions.•We achieve competitive performance on datasets: RefCOCO, RefCOCO +, and RefCOCOg.

论文关键词:Cross modality,Visual grounding,Neural symbolic reasoning

论文评审过程:Received 19 December 2021, Revised 6 April 2022, Accepted 11 April 2022, Available online 14 April 2022, Version of Record 23 April 2022.

论文官网地址:https://doi.org/10.1016/j.imavis.2022.104451