Topic discovery based on text mining techniques

作者:

Highlights:

摘要

In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.

论文关键词:Hierarchical clustering,Text summarization,Topic detection

论文评审过程:Received 10 March 2006, Revised 31 May 2006, Accepted 3 June 2006, Available online 6 September 2006.

论文官网地址:https://doi.org/10.1016/j.ipm.2006.06.001