A cascaded classification approach to disambiguating polysemous mentions with social chains

作者:

Highlights:

摘要

This paper considers five features including titles, community chains, terms, temporal expressions, and hostnames for personal name disambiguation. In nine test data sets covering three ambiguous personal names, we address the issues of awareness degree of an entity, the source of materials and web pages in different areas. In a single-clusterer approach, employing all features achieve average F-score 0.635, which is better than employing contextual terms only 0.502. When community chains are expanded by using the web, the average F-score is increased to 0.676. We also propose a multiple-clusterer approach, which cascades five clusterers corresponding to the five features. The average F-score is further improved to 0.684. Expanding community chains also enhances the average F-score of the multiple-clusterer approach to 0.697. In summary, the proposed features are quite useful; the cascaded multiple-clusterer approach is better than the single-clusterer approach; and expanding community chains using the web has positive effects on personal name disambiguation. The experiments show that this approach has significant improvements.

论文关键词:Cascaded clusterers,Name disambiguation,Community chain,Single-clusterers

论文评审过程:Available online 28 January 2010.

论文官网地址:https://doi.org/10.1016/j.eswa.2010.01.016