Computational models for integrating linguistic and visual information: A survey

作者：Rohini K. Srihari

摘要

This paper surveys research in developing computational models for integrating linguistic and visual information. It begins with a discussion of systems which have been actually implemented and continues with computationally motivated theories of human cognition. Since existing research spans several disciplines (e.g., natural language understanding, computer vision, knowledge representation), as well as several application areas, an important contribution of this paper is to categorize existing research based on inputs and objectives. Finally, some key issues related to integrating information from two such diverse sources are outlined and related to existing research. Throughout, the key issue addressed is the correspondence problem, namely how to associate visual events with words and vice versa.

论文关键词：natural language understanding, computer vision, diagram understanding, spatial reasoning, multimedia

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00849725