Mining historical manuscripts with local color patches

作者:Qiang Zhu, Eamonn Keogh

摘要

Initiatives such as the Google Print Library Project and the Million Book Project have already archived more than twelve million books in digital format, and within the next decade, the majority of world’s books will be online. Although most of the data will naturally be text, there will also be tens of millions of pages of images, many in color. While there is an active research community pursuing data mining of text from historical manuscripts, there has been very little work that exploits the rich color information which is often present. In this work, we introduce a simple color measure which both addresses and exploits typical features of historical manuscripts. To enable the efficient mining of massive archives, we propose a tight lower bound to the measure. Beyond the fast similarity search, we show how this lower bound allows us to build several higher-level data mining tools, including motif discovery and link analyses. We demonstrate our ideas in several data mining tasks on manuscripts dating back to the fifteenth century.

论文关键词:Historical manuscripts, Color indexing

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-011-0401-9