An algebra for hierarchically organized text-dominated databases

作者:

Highlights:

摘要

Structured documents are usually comprised of nested text elements; for example, reports contain chapters, chapters contain sections, …, sentences contain words. The containment relationships of these text elements define a text hierarchy that can be exploited during search activities such as database browsing and full-text retrieval. During a database load the system typically constructs concordance lists, each list maintaining the locations of all occurrences of a particular type of text element. Although not necessarily constructed in practice, a complete set of concordance lists would constitute an equivalent representation of the database, namely its inverted form. This paper describes an algebra based on various primitive operators that use concordance lists as operands. These primitives can be used to define higher level filter operators that specify whether a contiguous text extent will be selected or rejected during a search. The main contribution of the paper is the presentation of this algebra as a theoretical model that can be used to define a conceptual schema for the database. This theoretical model provides both a mathematically well defined abstraction for the database and a basis for database implementation since it may be utilized to formally define the search protocols between the database query facilities and the underlying retrieval engine.

论文关键词:

论文评审过程:Received 14 May 1991, Accepted 10 November 1991, Available online 19 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(92)90079-F