Accuracy vs. complexity: A trade-off in visual question answering models

作者：

Highlights：

• Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.

• Often additional complexity does not guarantee higher VQA accuracy.

• SeNet features are more generalizable than ResNet features.

• Superior bilinear fusion with visual attention results in higher VQA accuracy.

摘要

•Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.•Often additional complexity does not guarantee higher VQA accuracy.•SeNet features are more generalizable than ResNet features.•Superior bilinear fusion with visual attention results in higher VQA accuracy.

论文关键词：Visual question answering,Visual feature extraction,Language features,Multi-modal fusion,Speed-accuracy trade-off

论文评审过程：Received 8 January 2020, Revised 17 May 2021, Accepted 5 June 2021, Available online 12 June 2021, Version of Record 1 July 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.108106