A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

作者:

Highlights:

摘要

Features used for named entity recognition (NER) are often high dimensional in nature. These cause overfitting when training data is not sufficient. Dimensionality reduction leads to performance enhancement in such situations. There are a number of approaches for dimensionality reduction based on feature selection and feature extraction. In this paper we perform a comprehensive and comparative study on different dimensionality reduction approaches applied to the NER task. To compare the performance of the various approaches we consider two Indian languages namely Hindi and Bengali. NER accuracies achieved in these languages are comparatively poor as yet, primarily due to scarcity of annotated corpus. For both the languages dimensionality reduction is found to improve performance of the classifiers. A Comparative study of the effectiveness of several dimensionality reduction techniques is presented in detail in this paper.

论文关键词:Named entity recognition,Dimension reduction,Feature selection,Feature clustering,Machine learning

论文评审过程:Received 7 April 2011, Revised 17 August 2011, Accepted 21 September 2011, Available online 14 October 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.09.015