Richer Document Embeddings for Author Profiling tasks based on a heuristic search

作者:

Highlights:

• Users of Social Media can be profiled through their posts.

• Word Embeddings offer semantic meaning in a n-vectorial space.

• A Document can be represented as a weighted-average of their Word Embeddings.

• A new proposed statistic called Relevance Topic Value, is useful as a weighting-scheme of terms.

• Genetic Programming is useful to evolve competitive weighting-schemes of terms.

摘要

•Users of Social Media can be profiled through their posts.•Word Embeddings offer semantic meaning in a n-vectorial space.•A Document can be represented as a weighted-average of their Word Embeddings.•A new proposed statistic called Relevance Topic Value, is useful as a weighting-scheme of terms.•Genetic Programming is useful to evolve competitive weighting-schemes of terms.

论文关键词:Author profiling,Document embeddings,Word embeddings,Genetic programming,Weighting scheme

论文评审过程:Received 27 June 2019, Revised 29 January 2020, Accepted 13 February 2020, Available online 29 February 2020, Version of Record 6 May 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102227