Learning Morpheme Representation for Mongolian Named Entity Recognition

作者:Weihua Wang, Feilong Bao, Guanglai Gao

摘要

Traditional approaches to Mongolian named entity recognition heavily rely on the feature engineering. Even worse, the complex morphological structure of Mongolian words made the data more sparsity. To alleviate the feature engineering and data sparsity in Mongolian named entity recognition, we propose a framework of recurrent neural networks with morpheme representation. We then study this framework in depth with different model variants. More specially, the morpheme representation utilizes the characteristic of classical Mongolian script, which can be learned from unsupervised corpus. Our model will be further augmented by different character representations and auxiliary language model losses which will extract context knowledge from scratch. By jointly decoding by Conditional Random Field layer, the model could learn the dependence between different labels. Experimental results show that feeding the morpheme representation into neural networks outperforms the word representation. The additional character representation and morpheme language model loss also improve the performance.

论文关键词:Named entity recognition, Mongolian morpheme representation, Language model auxiliary loss

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-019-10044-6