Accuracy vs. complexity: A trade-off in visual question answering models
作者:
Highlights:
• Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.
• Often additional complexity does not guarantee higher VQA accuracy.
• SeNet features are more generalizable than ResNet features.
• Superior bilinear fusion with visual attention results in higher VQA accuracy.
摘要
•Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.•Often additional complexity does not guarantee higher VQA accuracy.•SeNet features are more generalizable than ResNet features.•Superior bilinear fusion with visual attention results in higher VQA accuracy.
论文关键词:Visual question answering,Visual feature extraction,Language features,Multi-modal fusion,Speed-accuracy trade-off
论文评审过程:Received 8 January 2020, Revised 17 May 2021, Accepted 5 June 2021, Available online 12 June 2021, Version of Record 1 July 2021.
论文官网地址:https://doi.org/10.1016/j.patcog.2021.108106