WebKey: a graph-based method for event detection in web news

作者:Elham Rasouli, Sajjad Zarifzadeh, Amir Jahangard Rafsanjani

摘要

With rapid and vast publishing of news over the Internet, there is a surge of interest to detect underlying hot events from online news streams. There are two main challenges in event detection: accuracy and scalability. In this paper, we propose a fast and efficient method to detect events in news websites. First, we identify bursty terms which suddenly appear in a lot of news documents. Then, we construct a novel co-occurrence graph between terms in which nodes and edges are weighted based on important features such as click and document frequency within burst intervals. Finally, a weighted community detection algorithm is used to cluster terms and find events. We also propose a couple of techniques to reduce the size of the graph. The results of our evaluations show that the proposed method yields a much higher precision and recall than past methods, such that their harmonic mean is improved by at least 40%. Moreover, it reduces the running time and memory usage by a factor of at least 2.

论文关键词:Topic detection and tracking, Event detection, Data mining, Community detection

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-019-00576-7