Improving scene attribute recognition using web-scale object detectors
作者:
Highlights:
•
摘要
Semantic attributes enable a richer description of scenes than basic category labels. While traditionally scenes have been analyzed using global image features such as Gist, recent studies suggest that humans often describe scenes in ways that are naturally characterized by local image evidence. For example, humans often describe scenes by their functions or affordances, which are largely suggested by the objects in the scene. In this paper, we leverage a large collection of modern object detectors trained at the web scale to derive effective high-level features for scene attribute recognition. We conduct experiments using two modern object detection frameworks: a semi-supervised learner that continuously learns object models from web images, and a state-of-the-art deep network. The detector response features improve the state of the art on the standard scene attribute benchmark by 5% average precision, and also capture intuitive object-scene relationships, such as the positive correlation of castles with “vacationing/touring” scenes.
论文关键词:
论文评审过程:Received 16 July 2014, Revised 28 January 2015, Accepted 15 May 2015, Available online 10 June 2015, Version of Record 10 July 2015.
论文官网地址:https://doi.org/10.1016/j.cviu.2015.05.012