Named entity recognition in Turkish: A comparative study with detailed error analysis
作者:
Highlights:
• We implement 20 models and compare their performances on five datasets for Turkish named entity recognition.
• Our study includes a detailed error analysis with quantitative and qualitative factors.
• Transformer models perform the highest weighted F1 scores, varying from 80.8% in tweets to 96.1% in news articles.
• All models have poor performance for longer entities; still, Transformer models are more robust.
• When we shuffle 80% of words to imitate flexible word order, performance is deteriorated by 12% in well-written and 7% in noisy text.
摘要
•We implement 20 models and compare their performances on five datasets for Turkish named entity recognition.•Our study includes a detailed error analysis with quantitative and qualitative factors.•Transformer models perform the highest weighted F1 scores, varying from 80.8% in tweets to 96.1% in news articles.•All models have poor performance for longer entities; still, Transformer models are more robust.•When we shuffle 80% of words to imitate flexible word order, performance is deteriorated by 12% in well-written and 7% in noisy text.
论文关键词:Comparative analysis,Error analysis,Named entity recognition,Deep learning model,Turkish text,Transformer-based language model
论文评审过程:Received 22 April 2022, Revised 12 August 2022, Accepted 14 August 2022, Available online 5 September 2022, Version of Record 5 September 2022.
论文官网地址:https://doi.org/10.1016/j.ipm.2022.103065