Broadcast news story segmentation using sticky hierarchical dirichlet process

作者:Jia Yu, Hongxiang Shao

摘要

Hidden Markov model (HMM) is a popular technique for story segmentation, where hidden Markov states represent the topics. The number of hidden states has to set manually, however, this number is often unknown. This paper proposed a nonparametric approach, called SHDP-HMM, to address this problem. By defining an HDP prior distribution on transition matrices over countably infinite state spaces, SHDP-HMM can infer the number of hidden states from the data automatically. Besides, to better model the duration of topics, we utilize a parameter for self-transition bias that reduces the transition probabilities among redundant hidden states. Given a stream of text, a Gibbs sampler labels the hidden states with topic classes. The position where the topic shifts indicates a story boundary. Experiments show that the proposed SHDP-HMM approach outperforms the traditional HMM-based approaches, and the number of hidden states can be automatically inferred from data.

论文关键词:Story segmentation, Non-parametric, HDP prior, SHDP-HMM, Infinite state spaces

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-03098-4