Memory efficient ranking

作者:

Highlights:

摘要

Fast and effective ranking of a collection of documents with respect to a query requires several structures, including a vocabulary, inverted file entries, arrays of term weights and document lengths, a set of partial similarity accumulators, and address tables for inverted file entries and documents. Of all of these structures, the array of document lengths and the set of accumulators are the components accessed most frequently in a ranked query, and it is crucial to acceptable performance that they be held in main memory. Here we describe an approximate ranking process that makes use of a compact array of in-memory, low-precision approximations for the lengths. Combined with another simple rule for reducing the memory required by the partial similarity accumulators, the approximation heuristic allows the ranking of large document collections using less than one byte of memory per document, an eight-fold reduction compared with conventional techniques. Moreover, in our experiments retrieval effectiveness was largely unaffected by the use of these heuristics.

论文关键词:

论文评审过程:Available online 13 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(94)90002-7