Efficient identity matching using static pruning q-gram indexing approach

作者：

Highlights：

• The order of identity features weightage for effective identity matching: Passport > Name > Postal Address > Date of birth.

• The q-gram length=4 produced the best performance result. Also, increasing the q-gram length will increase efficiency.

• Pruning q-gram blocks from the index that contained more than 10% of the total records produced best performance result.

• Overall, the proposed approach exhibits better performance as compared to the adaptive detection identity matching technique.

摘要

Information overload is a growing problem for information management and analytics in many organizations. Identity matching techniques are used to manage and resolve millions of identity records in diverse domains such as health care information, telecom subscribers, insurance holders, law offenders, and the census. In this paper, we propose an identity matching technique that is efficient for large datasets without compromising matching effectiveness. Our experimental results provide strong evidence that our proposed identity matching technique outperforms the adaptive detection identity matching technique in terms of efficiency and effectiveness, reducing the number of required comparisons by almost 98% and the completion time by 97%, with promising scalability results. Furthermore, our proposed technique achieves better matching results than the most trusted pairwise identity matching approach.

论文关键词：Q-gram indexing,Static index pruning,Identity matching,Identity management

论文评审过程：Received 30 October 2013, Revised 9 January 2015, Accepted 19 February 2015, Available online 4 March 2015.

论文官网地址：https://doi.org/10.1016/j.dss.2015.02.015