Parameterized Pattern Matching: Algorithms and Applications

作者:

Highlights:

摘要

The problem of finding sections of code that either are identical or are related by the systematic renaming of variables or constants can be modeled in terms ofparameterized strings(p-strings) andparameterized matches(p-matches). P-strings are strings over two alphabets, one of which represents parameters. Two p-strings are aparameterized match(p-match) if one p-string is obtained by renaming the parameters of the other by a one-to-one function. In this paper, we investigate parameterized pattern matching via parameterized suffix trees (p-suffix trees). We give two algorithms for constructing p-suffix trees: one (eager) that runs in linear time for fixed alphabets, and another that uses auxiliary data structures and runs inO(n log(n)) time for variable alphabets, wherenis input length. We show that using a p-suffix tree for a pattern p-stringP, it is possible to search for all p-matches ofPwithin a text p-stringTin space linear in |P| and time linear in |T| for fixed alphabets, orO(|T| log(min(|P|, σ)) time andO(|P|) space for variable alphabets, whereσis the sum of the alphabet sizes. The simpler p-suffix tree construction algorithmeagerhas been implemented, and experiments show it to be practical. Since it runs faster than predicted by the above worst-case bound, we reanalyze the algorithm and show thateagerruns in timeO(min(t |S|+m(t, S) ∣ t>0) log σ)), where for an input p-stringS, m(t, S) is the number of maximal p-matches of length at leasttthat occur withinS, andσis the sum of the alphabet sizes. Experiments with the author's programdup(B. Baker,in“Comput. Sci. Statist.,” Vol. 24, 1992) for finding all maximal p-matches within a p-string have foundm(t, S) to be less than |S| in practice unlesstis small.

论文关键词:

论文评审过程:Received 28 September 1993, Available online 25 May 2002.

论文官网地址:https://doi.org/10.1006/jcss.1996.0003