Is my stance the same as your stance? A cross validation study of stance detection datasets
作者:
Highlights:
• Cross-dataset stance detection models do not generalize well.
• Model generalizability can be improved by aggregating datasets.
• It is hard to ascertain amount of extra data for fine-tuning aggregated dataset models.
• Possible reasons for poor model performance/generalizability are that texts are not easily different iable by stances, nor are annotations consistent within/across datasets.
• Model performance differences due to indifferentiable texts and inconsistent stances.
摘要
•Cross-dataset stance detection models do not generalize well.•Model generalizability can be improved by aggregating datasets.•It is hard to ascertain amount of extra data for fine-tuning aggregated dataset models.•Possible reasons for poor model performance/generalizability are that texts are not easily different iable by stances, nor are annotations consistent within/across datasets.•Model performance differences due to indifferentiable texts and inconsistent stances.
论文关键词:Stance detection,Natural language processing,Cross validation,Machine learning,Twitter
论文评审过程:Received 10 July 2022, Revised 15 August 2022, Accepted 22 August 2022, Available online 5 September 2022, Version of Record 5 September 2022.
论文官网地址:https://doi.org/10.1016/j.ipm.2022.103070