Hierarchies of indices for text searching

作者:

Highlights:

摘要

We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that text searching in a large database spends most of the time accessing external storage devices, we propose additional index structures and searching algorithms for pat arrays that reduce the number of disk accesses. We present two index structures: a two-level hierarchy model that uses main memory and one level of external storage (magnetic or optical devices) and a three-level hierarchy model that uses main memory and two levels of external storage (magnetic and optical devices). Performance improvement is achieved in both models by storing most of higher index levels in faster memories, thus reducing accesses in the slowest devices in the hierarchy. Analytical and experimental results are presented for both models. For 160 megabytes of text stored on cd-rom disk the two-level model using 2 megabytes of main memory costs 20% of the pat array used as a single level.

论文关键词:Index Hierarchy,Memory Hierarchy,Text Searching,pat Arrays,Suffix Arrays,Magnetic Disks,Read-Only Optical Disks,cd-rom

论文评审过程:Received 17 August 1994, Revised 2 August 1996, Available online 11 June 1999.

论文官网地址:https://doi.org/10.1016/0306-4379(96)00025-7