Accuracy vs. complexity: A trade-off in visual question answering models

作者:

Highlights:

• Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.

• Often additional complexity does not guarantee higher VQA accuracy.

• SeNet features are more generalizable than ResNet features.

• Superior bilinear fusion with visual attention results in higher VQA accuracy.

摘要

•Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.•Often additional complexity does not guarantee higher VQA accuracy.•SeNet features are more generalizable than ResNet features.•Superior bilinear fusion with visual attention results in higher VQA accuracy.

论文关键词:Visual question answering,Visual feature extraction,Language features,Multi-modal fusion,Speed-accuracy trade-off

论文评审过程:Received 8 January 2020, Revised 17 May 2021, Accepted 5 June 2021, Available online 12 June 2021, Version of Record 1 July 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108106