TEII: Topic enhanced inverted index for top-k document retrieval
作者:
Highlights:
•
摘要
In recent years, topic modeling is gaining significant momentum in information retrieval (IR). Researchers have found that utilizing the topic information generated through topic modeling together with traditional TF-IDF information generates superior results in document retrieval. However, in order to apply this idea to real-life IR systems, some critical problems need to be solved: how to store the topic information and how to utilize it with the TF-IDF information for efficient document retrieval. In this paper, we propose the Topic Enhanced Inverted Index (TEII) to incorporate the topic information into the inverted index for efficient top-k document retrieval. Specifically, we explore two different types of TEIIs. We first propose the incremental TEII, which includes the topic information into the traditional inverted index by adding topic-based inverted lists. The incremental TEII is beneficial for legacy IR systems, since it does not change the existing TF-IDF-based inverted lists. As a more flexible alternative, we propose the hybrid TEII to incorporate the topic information into each posting of the inverted index. In the hybrid TEII, two relaxation methods are proposed to support dynamic estimation of the upper bound impact of each posting. The hybrid TEII is highly extensible for incorporating different ranking factors and we show an extension of the hybrid TEII by considering the static quality of the documents in the corpus. Based on the incremental and hybrid TEIIs, we develop several query processing algorithms to support efficient top-k document retrieval on TEIIs. Empirical evaluation on the TREC dataset verifies the effectiveness and efficiency of the proposed index structures and query processing algorithms.
论文关键词:Topic model,Search engine,Information retrieval
论文评审过程:Received 20 October 2014, Revised 28 June 2015, Accepted 7 July 2015, Available online 23 July 2015, Version of Record 19 October 2015.
论文官网地址:https://doi.org/10.1016/j.knosys.2015.07.014