Representing the Twittersphere: Archiving a representative sample of Twitter data under resource constraints

作者:

Highlights:

• We propose a new method for creating a representative archive of Twitter data.

• Our sample shows high similarity to the full Twitter data in volumes and topics.

• This archiving method enables a wide range of post-hoc analyses.

• This method makes Twitter data accessible to researchers with a limited budget.

摘要

•We propose a new method for creating a representative archive of Twitter data.•Our sample shows high similarity to the full Twitter data in volumes and topics.•This archiving method enables a wide range of post-hoc analyses.•This method makes Twitter data accessible to researchers with a limited budget.

论文关键词:Twitter,Social media,Sampling,Representativeness,Data collection,API,application programming interface,LDA,latent Dirichlet allocation,LDP,Liberal Democratic Party,DP,Democratic Party

论文评审过程:Received 6 June 2018, Revised 25 January 2019, Accepted 25 January 2019, Available online 25 March 2019, Version of Record 25 March 2019.

论文官网地址:https://doi.org/10.1016/j.ijinfomgt.2019.01.019