Semantic-based topic representation using frequent semantic patterns
作者:
Highlights:
•
摘要
Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.
论文关键词:Topic representation,Semantics,Concepts,Patterns
论文评审过程:Received 22 June 2020, Revised 13 December 2020, Accepted 23 January 2021, Available online 27 January 2021, Version of Record 3 February 2021.
论文官网地址:https://doi.org/10.1016/j.knosys.2021.106808