W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents

作者:

Highlights:

摘要

This paper proposes a holistic lexicon-reduction method for ancient and modern handwritten Arabic documents. The word shape is represented by the weighted topological signature vector (W-TSV), which encodes graph data into a low-dimensional vector space. Three directed acyclic graph (DAG) representations are proposed for Arabic word shapes, based on topological and geometrical features. Lexicon reduction is achieved by a nearest neighbors search in the W-TSV space. The proposed framework has been tested on the IFN/ENIT and the Ibn Sina databases, achieving respectively a degree of reduction of 83.5% and 92.9% for an accuracy of reduction of 90%.

论文关键词:Lexicon reduction,Arabic handwritten documents,Ancient documents,Weighted topological signature vector (W-TSV),Graph indexing,IFN/ENIT,Ibn Sina database

论文评审过程:Received 18 October 2011, Revised 1 February 2012, Accepted 22 February 2012, Available online 5 March 2012.

论文官网地址:https://doi.org/10.1016/j.patcog.2012.02.030