Automatic dominant character identification in fables based on verb analysis – Empirical study on the impact of anaphora resolution

作者:

Highlights:

摘要

Named entity recognition (NER) is a subtask in information extraction which aims to locate atomic element into predefined types. Various NER techniques and tools have been developed to fit the interest of the applications developed. However, most NER works carried out focus on non-fiction domain. Fiction based domain displays a complex context in locating its NE, specifically whereby its characters could be represented in diverse spectrums, ranging from living things (animals, plants, and person) to non-living things (vehicle, furniture). Motivated by a hypothesis such that there always exists verb specifically describes human being conduct, in this paper, we propose a NER system which aims to identify NEs that perform human activity based on verb analysis (VAHA) in an autonomous manner. More specifically, our approach attempts to identify dominant character (DC) by studying the nature of verb that associates with human activity via TreeTagger, Stanford packages and WordNet. Our experimental results validate our initial hypothesis that NEs can be accurately identified by referring to the associated verbs that associate with human activity. Our empirical study also proves that the approach is applicable to small text size articles. Another significant contribution of our approach is that it does not require training data set and anaphora resolution.

论文关键词:Named entity recognition,Fiction documents,Verb analysis,Anaphora resolution,Character identification

论文评审过程:Received 17 December 2012, Revised 2 September 2013, Accepted 5 September 2013, Available online 13 September 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.09.009