An unsupervised annotation of Arabic texts using multi-label topic modeling and genetic algorithm
作者:
Highlights:
• World produces huge amount of unstructured text, which is useless without labeling.
• Humans can never annotate such massive textual data.
• In multi-label classification, we assign multiple labels to each instance.
• We propose multi-label topic modeling and genetic algorithms to annotate texts.
• Our automatic annotation agrees 79.3% with crowdsourced humans.
摘要
•World produces huge amount of unstructured text, which is useless without labeling.•Humans can never annotate such massive textual data.•In multi-label classification, we assign multiple labels to each instance.•We propose multi-label topic modeling and genetic algorithms to annotate texts.•Our automatic annotation agrees 79.3% with crowdsourced humans.
论文关键词:Arabic corpus,Topic modeling,Multi-label annotation,Genetic algorithm,Latent Dirichlet allocation,Crowdsourcing
论文评审过程:Received 12 September 2021, Revised 6 January 2022, Accepted 25 April 2022, Available online 6 May 2022, Version of Record 13 May 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.117384