Methods for evaluating and creating data quality

作者:

Highlights:

摘要

This paper provides a survey of two classes of methods that can be used in determining and improving the quality of individual files or groups of files. The first are edit/imputation methods for maintaining business rules and for imputing for missing data. The second are methods of data cleaning for finding duplicates within files or across files.

论文关键词:Integer programming,Set covering,Data cleaning,Approximate string comparison,Unsupervised and supervised learning

论文评审过程:Received 30 May 2003, Revised 1 November 2003, Accepted 15 December 2003, Available online 6 February 2004.

论文官网地址:https://doi.org/10.1016/j.is.2003.12.003