Information Extraction from Text Intensive and Visually Rich Banking Documents

作者:

Highlights:

• First study using visual and textual information for deep-learning based information extraction on text-intensive and visually rich scanned documents

• First study to investigate deep learning algorithms in banking document understanding

• Automation of customer banking order documents reduced cycle times significantly

• Investigated traditional and deep learning approaches in noisy text NER

• Novel graph-based complex relation extraction algorithm outperforms previous methods

• N-ary, nested, document-level, and previously indeterminate quantity of complex relations extracted successfully

• Incorporating document layout information improves performances substantially

摘要

•First study using visual and textual information for deep-learning based information extraction on text-intensive and visually rich scanned documents•First study to investigate deep learning algorithms in banking document understanding•Automation of customer banking order documents reduced cycle times significantly•Investigated traditional and deep learning approaches in noisy text NER•Novel graph-based complex relation extraction algorithm outperforms previous methods•N-ary, nested, document-level, and previously indeterminate quantity of complex relations extracted successfully•Incorporating document layout information improves performances substantially

论文关键词:Information Extraction,Banking Documents,Deep Learning,Visually Rich Documents,Text Intensive Documents,Named Entity Recognition,Relation Extraction,NLP in Finance

论文评审过程:Received 1 March 2020, Revised 2 July 2020, Accepted 16 July 2020, Available online 17 September 2020, Version of Record 20 October 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102361