Grammar-compressed indexes with logarithmic search time
作者:
Highlights:
•
摘要
Let a text T[1..n] be the only string generated by a context-free grammar with g (terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of T, can be encoded using GlgG bits. We introduce the first grammar-compressed index that uses O(Glgn) bits (precisely, Glgn+(2+ϵ)Glgg for any constant ϵ>0) and can find the occ occurrences of patterns P[1..m] in time O((m2+occ)lgG). We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.
论文关键词:Highly repetitive text collections,Grammar compression,Text indexing,Compressed full-text self-indices
论文评审过程:Received 25 March 2020, Revised 11 November 2020, Accepted 1 December 2020, Available online 30 December 2020, Version of Record 30 December 2020.
论文官网地址:https://doi.org/10.1016/j.jcss.2020.12.001