Good edit similarity learning by loss minimization
作者:Aurélien Bellet, Amaury Habrard, Marc Sebban
摘要
Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, measures based on the edit distance are widely used, and there exist a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose an approach to edit similarity learning based on loss minimization, called GESL. It is driven by the notion of (ϵ,γ,τ)-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification. Using the notion of uniform stability, we derive generalization guarantees that hold for a large class of loss functions. We also provide experimental results on two real-world datasets which show that edit similarities learned with GESL induce more accurate and sparser classifiers than other (standard or learned) edit similarities.
论文关键词:Similarity learning, Edit distance, Good similarity function, Loss minimization
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10994-012-5293-8