LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

作者:

Highlights:

• A novel, generic method to extract open information triples from unstructured text.

• Substantially outperforms state-of-the-art systems on CaRB and Re-OIE16 benchmarks.

• Combines linguistics and learning-based methods to balance both precision and recall.

• Refines triples with dependency tree rules from a high-recall learning-based engine.

• Includes several augmentations to modify the generality and granularity of triples.

摘要

•A novel, generic method to extract open information triples from unstructured text.•Substantially outperforms state-of-the-art systems on CaRB and Re-OIE16 benchmarks.•Combines linguistics and learning-based methods to balance both precision and recall.•Refines triples with dependency tree rules from a high-recall learning-based engine.•Includes several augmentations to modify the generality and granularity of triples.

论文关键词:Information extraction,Data integration,Machine learning for database systems

论文评审过程:Received 25 October 2021, Revised 2 November 2021, Accepted 3 November 2021, Available online 18 November 2021, Version of Record 24 November 2021.

论文官网地址:https://doi.org/10.1016/j.is.2021.101938