Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization

作者:

Highlights:

摘要

Modern information retrieval (IR) systems consist of many challenging components, e.g. clustering, summarization, etc. Nowadays, without browsing the whole volume of datasets, IR systems present users with clusters of documents they are interested in, and summarize each document briefly which facilitates the task of finding the desired documents. This paper proposes a fuzzy evolutionary optimization modeling (FEOM) and its applications to unsupervised categorization and extractive summarization. In view of the nature of biological evolution, we take advantage of several fuzzy control parameters to adaptively regulate the behaviors of the evolutionary optimization, which can effectively prevent premature convergence to a local optimal solution. As a portable, modular and extensively executable model, FEOM is firstly implemented for clustering text documents. The searching capability of FEOM is exploited to explore appropriate partitions of documents such that the similarity metric of the resulting clusters is optimized. In order to further investigate its effectiveness as a generic data clustering model, FEOM is then applied to sentence clustering based extractive document summarization. It selects the most important sentence from each cluster to represent the overall meaning of document. We demonstrate the improved performance by a series of experiments using standard test sets, e.g. Reuter document collection, 20-newsgroup corpus, DUC01 and DUC02, as evaluated by some commonly used metrics, i.e. F-measure and ROUGE. The experimental results show that FEOM achieves performance as good as or better than state of arts of clustering and summarizing systems.

论文关键词:Extractive summarization,Topics estimation,Unsupervised categorization,Fuzzy evolutionary optimization,Normalized Google distance

论文评审过程:Available online 17 December 2010.

论文官网地址:https://doi.org/10.1016/j.eswa.2010.12.102