Extracting cancer mortality statistics from death certificates: A hybrid machine learning and rule-based approach for common and rare cancers

作者:

Highlights:

• A method for extractly cancer mortality statistics from death certificates is shown.

• Detailed features (terms and medical concepts) are extracted from death certificates.

• These are used to train a hybrid system, comprising Support Vector Machines and rules that classify according to ICD-10.

• Evaluation shows the system is effective at classifying common and rare cancers.

• The system allows monitoring cancer mortality in a timely and accurate manner.

摘要

•A method for extractly cancer mortality statistics from death certificates is shown.•Detailed features (terms and medical concepts) are extracted from death certificates.•These are used to train a hybrid system, comprising Support Vector Machines and rules that classify according to ICD-10.•Evaluation shows the system is effective at classifying common and rare cancers.•The system allows monitoring cancer mortality in a timely and accurate manner.

论文关键词:Cancer classification,Death certificates,Machine learning,Natural language processing,Rules,Hybrid

论文评审过程:Received 12 June 2017, Revised 26 April 2018, Accepted 30 April 2018, Available online 10 May 2018, Version of Record 27 July 2018.

论文官网地址:https://doi.org/10.1016/j.artmed.2018.04.011