An intelligent information system for organizing online text documents
作者:Han-joon Kim, Sang-goo Lee
摘要
This paper describes an intelligent information system for effectively managing huge amounts of online text documents (such as Web documents) in a hierarchical manner. The organizational capabilities of this system are able to evolve semi-automatically with minimal human input. The system starts with an initial taxonomy in which documents are automatically categorized, and then evolves so as to provide a good indexing service as the document collection grows or its usage changes. To this end, we propose a series of algorithms that utilize text-mining technologies such as document clustering, document categorization, and hierarchy reorganization. In particular, clustering and categorization algorithms have been intensively studied in order to provide evolving facilities for hierarchical structures and categorization criteria. Through experiments using the Reuters-21578 document collection, we evaluate the performance of the proposed clustering and categorization methods by comparing them to those of well-known conventional methods.
论文关键词:Document categorization, Document clustering, Fuzzy relations, Hierarchical agglomerative clustering, Information systems, Naïve Bayes, Topic hierarchy
论文评审过程:
论文官网地址:https://doi.org/10.1007/BF02637152