Potential Problem Data Tagging: Augmenting information systems with the capability to deal with inaccuracies
作者:
Highlights:
• A useful data tag for accuracy can be created, without having to physically measure it.
• Data tags can help to avoid problems caused by inaccuracies, and find the inaccuracies themselves.
• The data tags perform best for high error rates and for problems with more decision alternatives.
摘要
Data quality tags are a means of informing decision makers about the quality of the data they use from information systems. Unfortunately, data quality tags have not been successfully adopted despite their potential to assist decision makers. One reason for the non-adoption is that maintaining the tags is expensive and time-consuming: having a tag that represents accuracy, for example, would be massively time-consuming to measure because it requires some physical observation of reality to check the true value. We argue that a useful surrogate tag for accuracy can be created—without having to physically measure it—by counting the number of times the data has been exposed to an event that could cause it to become inaccurate. Experimental results show that the tags can help to avoid problems caused by inaccuracies, and also to help find the inaccuracies themselves.
论文关键词:Data quality,Information quality,Accuracy,Metadata,Data analytics,Data tags
论文评审过程:Received 4 July 2018, Revised 27 April 2019, Accepted 28 April 2019, Available online 30 April 2019, Version of Record 9 May 2019.
论文官网地址:https://doi.org/10.1016/j.dss.2019.04.007