Numerical, secondary Big Data quality issues, quality threshold establishment, & guidelines for journal policy development

作者:

Highlights:

• Availability, accessibility, cost, time, and/or complexity considerations could make primary Big Data acquisition infeasible.

• Several secondary sources of Big Data provide open access to data from credible primary sources.

• Using data from secondary sources often requires data wrangling to render the data suitable for the application at hand.

• Academic IS journals should encourage research based on carefully vetted data acquired from secondary Big Data sources.

• We provide a checklist that helps judge the quality of data sourced from secondary Big Data sites.

摘要

An IS researcher may obtain Big Data from primary or secondary data sources. Sometimes, acquiring primary Big Data is infeasible due to availability, accessibility, cost, time, and/or complexity considerations. In this paper, we focus on Big Data-based IS research and discuss ways in which one may, post hoc, establish quality thresholds for numerical Big Data obtained from secondary sources. We also present guidelines for developing journal policies aimed at ensuring the veracity and verifiability of such data when used for research purposes.

论文关键词:Data quality,Big data,Secondary data,Numerical data,Quality threshold

论文评审过程:Received 23 February 2019, Revised 15 May 2019, Accepted 16 August 2019, Available online 10 September 2019, Version of Record 13 November 2019.

论文官网地址:https://doi.org/10.1016/j.dss.2019.113135