Discovery of interactive graphs for understanding and searching time-indexed corpora

作者:Ilija Subašić, Bettina Berendt

摘要

Rich information spaces (like the Web or scientific publications) are full of “stories”: sets of statements that evolve over time, manifested as, for example, collections of news articles reporting events that relate to an evolving crime investigation, sets of news articles and blog posts accompanying the development of a political election campaign, or sequences of scientific papers on a topic. In this paper, we formulate the problem of discovering such stories as Evolutionary Theme Pattern Discovery, Summary and Exploration (ETP3). We propose a method and a visualisation tool for solving ETP3 by understanding, searching and interacting with such stories and their underlying documents. In contrast to existing approaches, our method concentrates on relational information and on local patterns rather than on the occurrence of individual concepts and global models. In addition, it relies on interactive graphs rather than natural language as the abstracted story representations. Furthermore, we present an evaluation framework. Two real-life case studies are used to illustrate and evaluate the method and tool.

论文关键词:Text mining, Web mining, Graphical user interfaces

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-009-0227-x