Sequences, Datalog, and Transducers

作者:

Highlights:

摘要

This paper develops a query language for sequence databases, such as genome databases and text databases. The language, calledSequence Datalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on a new notion called theextended active domainof a database. The extended domain contains all the sequences in the database and all their subsequences. This idea leads to a clear distinction between safe and unsafe recursion over sequences: safe recursion stays inside the extended active domain, while unsafe recursion does not. By carefully limiting the amount of unsafe recursion, the paper develops a safe and expressive subset of Sequence Datalog. As part of the development, a new type of transducer is introduced, called ageneralized sequence transducer. Unsafe recursion is allowed only within these generalized transducers. Generalized transducers extend ordinary transducers by allowing them to invoke other transducers as “subroutines.” Generalized transducers can be implemented in Sequence Datalog in a straightforward way. Moreover, their introduction into the language leads to simple conditions that guarantee safety and finiteness. This paper develops two such conditions. The first condition expresses exactly the class ofptimesequence functions, and the second expresses exactly the class of elementary sequence functions.

论文关键词:

论文评审过程:Received 16 December 1995, Available online 25 May 2002.

论文官网地址:https://doi.org/10.1006/jcss.1998.1562