E2EET: from pipeline to end-to-end entity typing via transformer-based embeddings

作者:Michael Stewart, Wei Liu

摘要

Entity typing (ET) is the process of identifying the semantic types of every entity within a corpus. ET involves labelling each entity mention with one or more class labels. As a multi-class, multi-label task, it is considerably more challenging than named entity recognition. This means existing entity typing models require pre-identified mentions and cannot operate directly on plain text. Pipeline-based approaches are therefore used to join a mention extraction model and an entity typing model to process raw text. Another key limiting factor is that these mention-level ET models are trained on fixed context windows, which makes the entity typing results sensitive to window size selection. In light of these drawbacks, we propose an end-to-end entity typing model (E2EET) using a Bi-GRU to remove the dependency on window size. To demonstrate the effectiveness of our E2EET model, we created a stronger baseline mention-level model by incorporating the latest contextualised transformer-based embeddings (BERT). Extensive ablative studies demonstrate the competitiveness and simplicity of our end-to-end model for entity typing.

论文关键词:Entity typing, Sequence labelling, Bidirectional LSTM, Bidirectional GRU, Neural language models, Natural language processing

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-021-01626-9