Approximate sequence comparison: A study with histograms

作者:

Highlights:

摘要

A dynamic programming algorithm is developed for computing the distance between two sequences, i.e. the minimum cost to transform a sequence to another. The alphabet of the sequences may be infinite. The allowed operations to edit a sequence are insertion, deletion and substitution. The cost function to weigh the editing operations must have at least two of the metric properties: non-negative values and the triangle inequality property. The time complexity of the algorithm is O(n · min(z + 1 + s/Δ, m)), where n is the length of the longer input sequence, z is the number of items that can be deleted or inserted in the sequences with zero cost, s is the distance between the sequences. Δ is the smallest positive insertion or deletion cost and m is the length of the shorter input sequence. The algorithm needs linear space. It is especially designed for sequences where the costs of different deletions and insertions vary considerably. Experiments with integer number sequences show that our algorithm works at least as well as two other dynamic programming algorithms.

论文关键词:Pattern matching,Dynamic programming,Analysis of algorithms

论文评审过程:Received 25 May 1988, Revised 18 January 1989, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(90)90056-Q