Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

作者：

Highlights：

• Five vocabulary- and model-based methods to extract terms from scientific publications are evaluated.

• Three conditional random fields (CRF)-based methods outperform the two vocabulary-based ones.

• CRF with keyword-based dictionary method has the best performance.

• The keyword-based one has a higher recall and the Wikipedia-based one has a higher precision.

摘要

•Five vocabulary- and model-based methods to extract terms from scientific publications are evaluated.•Three conditional random fields (CRF)-based methods outperform the two vocabulary-based ones.•CRF with keyword-based dictionary method has the best performance.•The keyword-based one has a higher recall and the Wikipedia-based one has a higher precision.

论文关键词：Entity extraction,Vocabulary,Dictionary,Conditional random fields,Content-aware

论文评审过程：Received 5 November 2014, Revised 22 April 2015, Accepted 22 April 2015, Available online 16 May 2015, Version of Record 16 May 2015.

论文官网地址：https://doi.org/10.1016/j.joi.2015.04.003