MapReduce-based entity matching with multiple blocking functions

作者:Cheqing Jin, Jie Chen, Huiping Liu

摘要

Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking-based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n 2), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly.

论文关键词:entity matching, MapReduce, load balancing, pair deduplication

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-016-5346-4