Twitter user geolocation using web country noun searches
作者:
Highlights:
• The target is to estimate the implicit country of interest of Twitter users.
• We created a dataset with 3298 users from 54 different and with 48 languages.
• The proposed GTN model uses historical tweets and Google Trends (GT) frequent nouns.
• GTN is competitive when compared with a recent named-entity recognition method.
• To reduce GT querying time, we propose a machine learning GTN2 variant.
摘要
Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.
论文关键词:Country geolocation,Google Trends,Machine learning,Natural language processing,Twitter
论文评审过程:Received 26 October 2018, Revised 18 March 2019, Accepted 21 March 2019, Available online 29 March 2019, Version of Record 4 April 2019.
论文官网地址:https://doi.org/10.1016/j.dss.2019.03.006