Building semantic kernels for cross-document knowledge discovery using Wikipedia

作者:Peng Yan, Wei Jin

摘要

Research into text mining has progressed over the past decade. One of the main challenges now is gauging the difficulty of taking advantage of outside knowledge in the discovery process. In this work, to address the limitations of the traditional bag-of- words model and expand the search scope beyond the document collections at hand, we present a new text mining approach incorporating Wikipedia as the background knowledge. Various semantic kernels are built out of the extensive knowledge derived from Wikipedia and applied to the search scenario of detecting potential semantic relationships between topics. We demonstrate the effectiveness of our approach through comparing with competitive baselines, as well as alternative solutions where only part of Wikipedia resources (e.g., the Wiki-article contents or the associated Wiki-categories) is considered.

论文关键词:Semantic relatedness, Cross-document knowledge discovery, Document representation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-016-0973-5