Swash: A collective personal name matching framework

作者:

Highlights:

• The collective information of names, e.g. token frequency, can improve matching.

• To find possible candidates (blocking), considering name similarity is enough.

• To match candidates, names dissimilarities should be considered besides similarities.

• Corresponding personal part of names (e.g. first names) should be matched together.

• Similar names can assist to parse names without any gold standard tags.

摘要

•The collective information of names, e.g. token frequency, can improve matching.•To find possible candidates (blocking), considering name similarity is enough.•To match candidates, names dissimilarities should be considered besides similarities.•Corresponding personal part of names (e.g. first names) should be matched together.•Similar names can assist to parse names without any gold standard tags.

论文关键词:Personal name matching,Entity matching,Collective matching,Entity resolution,Heterogeneous information network,Unsupervised learning

论文评审过程:Received 14 April 2019, Revised 15 October 2019, Accepted 30 November 2019, Available online 2 December 2019, Version of Record 14 January 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.113115