DF-Miner: Domain-specific facet mining by leveraging the hyperlink structure of Wikipedia

作者:

Highlights:

摘要

Organizing a set of domain-specific terms into a meaningful hierarchical structure is an essential task for faceted search and knowledge organization. In this paper, we present an automatic approach, called domain-specific facet (DF)-Miner, to discover DFs based on the hyperlink structure within the Wikipedia article pages. Each article page corresponds to a domain-specific term. The hyperlink structures among article pages represent the connections among these terms. The community structure of the connections among a domain-specific term set reveals the facets of the domain. The terms with more connections provide important clues for facet labeling. Accordingly, DF-Miner first constructs a domain-specific hyperlink graph from the Wikipedia article pages. Then it extracts a tree structure from the Wikipedia category pages. DF-Miner groups the terms of a domain into multiple facets based on the result of community detection. Finally, DF-Miner selects a meaningful label for each facet based on the connection number of terms and the extracted tree structure from the category pages. Two experiments were conducted with six real-world datasets to evaluate DF-Miner. The experimental results show that DF-Miner performs better than the textual content-based approaches.

论文关键词:Domain-specific facet mining,Community structure,Hyperlink structure,Scale-free property,Wikipedia

论文评审过程:Received 2 May 2014, Revised 16 November 2014, Accepted 1 January 2015, Available online 13 January 2015.

论文官网地址:https://doi.org/10.1016/j.knosys.2015.01.001