Richer Document Embeddings for Author Profiling tasks based on a heuristic search
作者:
Highlights:
• Users of Social Media can be profiled through their posts.
• Word Embeddings offer semantic meaning in a n-vectorial space.
• A Document can be represented as a weighted-average of their Word Embeddings.
• A new proposed statistic called Relevance Topic Value, is useful as a weighting-scheme of terms.
• Genetic Programming is useful to evolve competitive weighting-schemes of terms.
摘要
•Users of Social Media can be profiled through their posts.•Word Embeddings offer semantic meaning in a n-vectorial space.•A Document can be represented as a weighted-average of their Word Embeddings.•A new proposed statistic called Relevance Topic Value, is useful as a weighting-scheme of terms.•Genetic Programming is useful to evolve competitive weighting-schemes of terms.
论文关键词:Author profiling,Document embeddings,Word embeddings,Genetic programming,Weighting scheme
论文评审过程:Received 27 June 2019, Revised 29 January 2020, Accepted 13 February 2020, Available online 29 February 2020, Version of Record 6 May 2020.
论文官网地址:https://doi.org/10.1016/j.ipm.2020.102227