aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption

作者:Chiranjib Sur

摘要

Region visual features enhance the generative capability of the machines based on features. However, they lack proper interaction-based attentional perceptions and end up with biased or uncorrelated sentences or pieces of misinformation. In this work, we propose Attribute Interaction-Tensor Product Representation (aiTPR), which is a convenient way of gathering more information through orthogonal combination and learning the interactions as physical entities (tensors) and improving the captions. Compared to previous works, where features add up to undefined feature spaces, TPR helps maintain sanity in combinations, and orthogonality helps define familiar spaces. We have introduced a new concept layer that defines the objects and their interactions that can play a crucial role in determining different descriptions. The interaction portions have contributed heavily to better caption quality and have out-performed various previous works on this domain and MSCOCO dataset. For the first time, we introduced the notion of combining regional image features and abstracted interaction likelihood embedding for image captioning.

论文关键词:Language modeling, Representation learning, Tensor product representation, Image description, Sequence generation, Image understanding, Automated textual feature extraction

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-021-10438-5