Automatic trend detection: Time-biased document clustering

作者:

Highlights:

摘要

Identifying the trending topics in journals and conferences is valuable for understanding the role of authors, institutions, and funding agencies in the progression of knowledge produced in the field. However, many available clustering methods do not accommodate a desire for temporally clustered results that are typical of trends, in part because time of publication is often neglected as a feature. As a demonstration of how time can be emphasized in trend detection, we use a novel approach of introducing a weighted temporal feature to bias a topic clustering toward articles in a similar time frame; this is performed over a set of finance journal abstracts from 1974 to 2020. Latent Dirichlet Allocation (LDA) is used to parameterize each abstract, followed by dimensionality reduction using Singular Value Decomposition (SVD). We detect trending finance topics that are not identifiable when we use a standard clustering approach with no temporal bias. To identify trending topics, we utilize a metric of the silhouette score divided by the standard deviation of clusters over time. We then isolate topics identified by this metric and validate them using expert judgment. Our clustering strategy using temporal bias can be readily utilized in other fields for discovering the rise and fall of trends.

论文关键词:Text mining,Trend detection,Temporal biased clustering,Machine learning

论文评审过程:Received 8 October 2020, Revised 25 February 2021, Accepted 25 February 2021, Available online 2 March 2021, Version of Record 15 March 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106907