Reflections on quality requirements for digital trace data in IS research

作者:

Highlights:

• Explaining the relevance of data quality in research using digital trace data

• Offering concrete examples of data quality issues in research using digital trace data

• Offering guidelines for evaluating and reporting on data quality using digital trace data

摘要

In recent years an increasing number of academic disciplines, including IS, have sourced digital trace data for their research. Notwithstanding the potential of such data in (re)investigations of various phenomena of interest that would otherwise be difficult or impossible to study using other sources of data, we view the quality of digital trace data as an underappreciated issue in IS research. To initiate a discussion of how to evaluate and report on the quality of digital trace data in IS research, we couch our arguments within the broader tradition of research on data quality. We explain how the uncontrolled nature of digital trace data creates unique challenges for IS researchers, who need to collect, store, retrieve, and transform those data for the purpose of numerical analysis. We then draw parallels with concepts and patterns commonly used in data analysis projects and argue that, although IS researchers probably apply such concepts and patterns, this is not reported in publications, undermining the reader's ability to assess the reliability, statistical power and replicability of the findings. Using the case of GitHub to illustrate such challenges, we develop a preliminary set of guidelines to help researchers consider and report on the quality of the digital trace data they use in their research. Our work contributes to the debate on data quality and provides relevant recommendations for scholars and IS journals at a time when a growing number of publications are relying on digital trace data.

论文关键词:Digital trace data,Data quality,GitHub

论文评审过程:Received 12 February 2019, Revised 6 May 2019, Accepted 16 August 2019, Available online 19 August 2019, Version of Record 30 August 2019.

论文官网地址:https://doi.org/10.1016/j.dss.2019.113133