Universal indexes for highly repetitive document collections

作者：

Highlights：

• We study how existing indexes perform in highly repetitive document collections.

• We design new inverted index variants for this kind of collections.

• We implement, adapt, and/or tune existing self-indexes for this case.

• We obtain significant space reductions, at a moderate price in query time.

• We obtain larger reductions on self-indexes, but at a higher price in query time.

摘要

Highlights•We study how existing indexes perform in highly repetitive document collections.•We design new inverted index variants for this kind of collections.•We implement, adapt, and/or tune existing self-indexes for this case.•We obtain significant space reductions, at a moderate price in query time.•We obtain larger reductions on self-indexes, but at a higher price in query time.

论文关键词：Repetitive collections,Inverted index,Self-index

论文评审过程：Received 22 March 2016, Revised 8 April 2016, Accepted 9 April 2016, Available online 27 April 2016, Version of Record 19 May 2016.

论文官网地址：https://doi.org/10.1016/j.is.2016.04.002