Learning bag-of-embedded-words representations for textual information retrieval

作者:

Highlights:

• A novel BoF-based model is proposed for efficiently representing text documents.

• A weighting mask (similar to the traditional BoW weighting schemes) is learned.

• The BoEW is optimized end-to-end (from the word embeddings to the weighting mask).

• The learned representation can be efficiently finetuned using relevance feedback.

• The proposed method is evaluated using three text collections from different domains.

摘要

•A novel BoF-based model is proposed for efficiently representing text documents.•A weighting mask (similar to the traditional BoW weighting schemes) is learned.•The BoEW is optimized end-to-end (from the word embeddings to the weighting mask).•The learned representation can be efficiently finetuned using relevance feedback.•The proposed method is evaluated using three text collections from different domains.

论文关键词:Word embeddings,Bag-of-words,Bag-of-features,Dictionary learning,Relevance feedback,Information retrieval

论文评审过程:Received 23 October 2017, Revised 9 February 2018, Accepted 8 April 2018, Available online 10 April 2018, Version of Record 18 April 2018.

论文官网地址:https://doi.org/10.1016/j.patcog.2018.04.008