Exploiting meta features for dependency parsing and part-of-speech tagging

作者:

摘要

In recent years, discriminative methods have achieved much progress in natural language processing tasks, such as parsing, part-of-speech tagging, and word segmentation. For these methods, conventional features in a relatively high dimensional feature space may suffer from sparseness and thus exhibit less discriminative power on unseen data. This article presents a learning framework of feature transformation, addressing the sparseness problem by transforming sparse conventional base features into less sparse high-level features (i.e. meta features) with the help of a large amount of automatically annotated data. The meta features are derived by bucketing similar base features according to the frequency in large data, and used together with base features in our final system. We apply the framework to part-of-speech tagging and dependency parsing. Experimental results show that our systems perform better than the baseline systems in both tasks on standard evaluation. For the dependency parsing task, our parsers achieve state-of-the-art accuracy on the Chinese data and comparable accuracy with the best known systems on the English data. Further analysis indicates that our proposed approach is effective in processing unseen data and features.

论文关键词:Dependency parsing,Natural language processing,Meta-features,Part-of-speech tagging,Semi-supervised approach

论文评审过程:Received 31 October 2013, Revised 8 August 2015, Accepted 7 September 2015, Available online 27 October 2015, Version of Record 27 October 2015.

论文官网地址:https://doi.org/10.1016/j.artint.2015.09.002