Real-time Lexicon-free Scene Text Retrieval

作者:

Highlights:

• Improved method that achieves state of the art performance in image-based text retrieval.

• Results on different CNN backbones modified to predict a PHOC of detected textual instances are presented.

• Effect of different PHOC dimensions is explored and analyzed.

• PHOC embedding allows retrieving out-of-vocabulary words unseen at training time.

• Proposed method achieves state of the art in multilingual dataset of unseen samples at training time.

• Method is faster than state of the art, allowing real-time retrieval in videos.

摘要

•Improved method that achieves state of the art performance in image-based text retrieval.•Results on different CNN backbones modified to predict a PHOC of detected textual instances are presented.•Effect of different PHOC dimensions is explored and analyzed.•PHOC embedding allows retrieving out-of-vocabulary words unseen at training time.•Proposed method achieves state of the art in multilingual dataset of unseen samples at training time.•Method is faster than state of the art, allowing real-time retrieval in videos.

论文关键词:Image retrieval,Scene text detection,Scene text recognition,Word spotting,Convolutional neural networks,Region proposal networks,PHOC

论文评审过程:Received 6 May 2019, Revised 19 August 2020, Accepted 9 September 2020, Available online 10 September 2020, Version of Record 1 November 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107656