Dynamic repair of categorical data with edit rules

作者:

Highlights:

• Data quality rules are studied dynamically in repeated search and repair steps.

• When rules are association rules, simple filters can ensure high precision.

• In some cases, association rules are very efficient to repair (strong generation).

• Repeated search-and-repair converges relatively quick on real datasets.

• Multiple search and repair steps boost recall with a mitigated drop in precision.

摘要

•Data quality rules are studied dynamically in repeated search and repair steps.•When rules are association rules, simple filters can ensure high precision.•In some cases, association rules are very efficient to repair (strong generation).•Repeated search-and-repair converges relatively quick on real datasets.•Multiple search and repair steps boost recall with a mitigated drop in precision.

论文关键词:Data quality,Data repair,Edit rules

论文评审过程:Received 19 May 2021, Revised 3 September 2021, Accepted 29 March 2022, Available online 18 April 2022, Version of Record 26 April 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117132