A hybrid model for finding abbreviation–definition pairs from biomedical abstracts using heuristics-based sequence labeling and perceptron linear classifier

作者:

Highlights:

• A hybrid model is introduced for extracting acronym-definition pairs from biomedical text.

• Heuristics-based sequence labeling is introduced for pattern recognition task.

• Three-level mapping strategies are proposed in sequence labeling task.

• Valid abbreviation-definition pair is recognized through perceptron linear classifier.

• Recently published PubMed abstracts are utilized from Thalia (Semantic Search Engine).

摘要

•A hybrid model is introduced for extracting acronym-definition pairs from biomedical text.•Heuristics-based sequence labeling is introduced for pattern recognition task.•Three-level mapping strategies are proposed in sequence labeling task.•Valid abbreviation-definition pair is recognized through perceptron linear classifier.•Recently published PubMed abstracts are utilized from Thalia (Semantic Search Engine).

论文关键词:Biomedical abbreviation-definition extraction,Text mining,Heuristics approach,Pattern recognition,Sequence labeling,Perceptron learning

论文评审过程:Received 27 March 2020, Revised 12 September 2020, Accepted 23 September 2020, Available online 28 September 2020, Version of Record 9 October 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.114049