Emphasizing personal information for Author Profiling: New approaches for term selection and weighting
作者:
Highlights:
•
摘要
The Author Profiling (AP) task aims to predict specific profile characteristics of authors by analyzing their written documents. Nowadays, its relevance has been highlighted thanks to several applications in computer forensics, security and marketing. Most previous contributions in AP have been devoted to determine a suitable set of features to model the writing profile of authors. However, in social media this task is challenging due to the informal communication. In this regard, we present a novel approach, which considers that terms located in phrases exposing personal information have a special value for discriminating the author’s profile. The aim of this research work is to emphasize the value of such personal phrases by means of two new proposals: a feature selection method and term weighting scheme, both based on a novel measure called Personal Expression Intensity (PEI) which scores the quantity of personal information revealed by a term. For evaluating the latter ideas, we show experimental results in age and gender prediction of media users on six different collections. Average improvements of 7.34% and 5.76% for age and gender classification were obtained when comparing to the best result from state-of-the-art, indicating that personal phrases play a key role for the AP task by means of selecting and weighting terms.
论文关键词:Author profiling,Feature selection,Term weighting,Personal information,PEI
论文评审过程:Received 6 June 2017, Revised 19 December 2017, Accepted 17 January 2018, Available online 31 January 2018, Version of Record 20 February 2018.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.01.014