Unsupervised clustering for nontextual web document classification
作者:
Highlights:
•
摘要
While the breath of vocabulary used in long documents may mislead the traditional keyword-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertext on the basis of nontextual elements in order to locate unfamiliar documents. As a result, a large portion of Web documents having pictorial information in nature is far beyond the reach of most current search engines. In this research, we devise a novel quantitative model of nontextual World Wide Web (WWW) classification based on image information. An intelligent content-sensitive, attribute-rich image classifier is presented. An image similarity measure is used to deduce the likelihood among images. Different image feature vectors have been constructed and evaluated. Evaluation shows images judged to be similar by human form interesting clusters in our unsupervised learning. Comparison with other clustering technique, such as Hierarchical Agglomerative Clustering (HAC), demonstrates that our approach is found useful in content-based image information retrieval.
论文关键词:Unsupervised clustering,Image classification,Neural networks
论文评审过程:Received 19 January 2001, Accepted 2 July 2002, Available online 26 March 2003.
论文官网地址:https://doi.org/10.1016/S0167-9236(03)00035-6