AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering
作者:
Highlights:
• An attention-based multimodal alignment model is proposed for medical VQA.
• Attentions focus on questions by using visual and textual content simultaneously.
• Composite loss aligns text-based and image-based attention to locate the keywords.
• We construct an enhanced dataset based on the VQARAD dataset to improve data quality.
摘要
•An attention-based multimodal alignment model is proposed for medical VQA.•Attentions focus on questions by using visual and textual content simultaneously.•Composite loss aligns text-based and image-based attention to locate the keywords.•We construct an enhanced dataset based on the VQARAD dataset to improve data quality.
论文关键词:Attention mechanism,Deep learning,Medical Visual Question Answering,Multimodal fusion,Medical images
论文评审过程:Received 24 December 2021, Revised 18 August 2022, Accepted 18 August 2022, Available online 27 August 2022, Version of Record 5 September 2022.
论文官网地址:https://doi.org/10.1016/j.knosys.2022.109763