Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters
作者:
Highlights:
•
摘要
An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a genetic algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an average F-score of 0.796 in identifying sentences which describe interactions between co-occurring biological concepts. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging.
论文关键词:Biological interactions patterns,PATRICIA tree,Pattern matching,Interaction sentences,Genetic algorithm
论文评审过程:Received 2 February 2009, Revised 14 September 2009, Accepted 14 September 2009, Available online 23 September 2009.
论文官网地址:https://doi.org/10.1016/j.datak.2009.09.002