A theoretical model for the automatic generation of tag clouds

作者:Ursula Torres-Parejo, Jesús R. Campaña, M. Amparo Vila, Miguel Delgado

摘要

This paper presents a new approach to information retrieval from non-structured attributes in databases, which involves the processing of text attributes. To make retrieval more effective, frequent text sequences are extracted and mathematically represented as intermediate forms which permit a clearer and more precise definition of operations on texts. These intermediate forms appear to users in the form of tag clouds to facilitate content identification, exploration, and querying. In this sense, tag cloud visualization is a simple, user-friendly visual interface to data. This paper proposes a theoretical model for the representation of frequent text sequences and their operations as well as a general procedure for generating tag clouds from text attributes in databases. The tag clouds thus obtained were compared with conventional tag clouds composed of single terms. Our study showed that automatically generated multi-term tag clouds provide better results than mono-term tag clouds.

论文关键词:Semantic search, Knowledge visualization, Multi-term, Tag cloud, Unstructured databases, Content identification, Set algebra, Lattices

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-013-0651-9