Kernel-based features for predicting population health indices from geocoded social media data

作者:

Highlights:

• Kernel-based textual features for social media data analysis

• Population health prediction

• Spatial decision support systems

• Advanced cluster computing (Apache Spark)

• Big geo-tagged data from Twitter

摘要

When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels.

论文关键词:Spatial decision support system,Georeferenced social media,Spatial big data,Health rankings,Twitter,Kernel function

论文评审过程:Received 21 September 2016, Revised 27 May 2017, Accepted 30 June 2017, Available online 4 July 2017, Version of Record 18 September 2017.

论文官网地址:https://doi.org/10.1016/j.dss.2017.06.010