AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

作者：

Highlights：

• An attention-based multimodal alignment model is proposed for medical VQA.

• Attentions focus on questions by using visual and textual content simultaneously.

• Composite loss aligns text-based and image-based attention to locate the keywords.

• We construct an enhanced dataset based on the VQARAD dataset to improve data quality.

摘要

•An attention-based multimodal alignment model is proposed for medical VQA.•Attentions focus on questions by using visual and textual content simultaneously.•Composite loss aligns text-based and image-based attention to locate the keywords.•We construct an enhanced dataset based on the VQARAD dataset to improve data quality.

论文关键词：Attention mechanism,Deep learning,Medical Visual Question Answering,Multimodal fusion,Medical images

论文评审过程：Received 24 December 2021, Revised 18 August 2022, Accepted 18 August 2022, Available online 27 August 2022, Version of Record 5 September 2022.

论文官网地址：https://doi.org/10.1016/j.knosys.2022.109763