Structural properties of the string statistics problem

作者:

Highlights:

摘要

A suitably weighted Index Tree such as a B-tree or a Suffix Tree can be easily adapted to store, for a given string x and for all substrings w of x, the number of distinct instances of w along x. The storage needed is seen to be linear in the length of x: moreover, the whole statistics can itself be derived in linear time, off-line of a RAM. If the substring w has nontrivial periods, however, the number of distinct instances might differ from that of distinct non-overlapping occurrences along x. It is shown here that O(n log n) storage units—n standing for the length of x—are sufficient to organize this second kind of statistics, in such a way that the maximum number of nonoverlapping instances for arbitrary w along x can be retrieved in a number of character comparisons not exceeding the length of w.

论文关键词:

论文评审过程:Received 9 September 1985, Revised 3 July 1986, Available online 2 December 2003.

论文官网地址:https://doi.org/10.1016/0022-0000(85)90060-1