Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer

作者:

Highlights:

• The paper studies several natural language processing (NLP) techniques to extract predictors from uncoded data in electronic medical records (EMRs).

• Some techniques are well-known while other have been developed specifically for this research.

• The approaches have been applied to a large dataset we have access to, covering 90,000 patients in general practices.

• We focus on predictive modelling of colorectal cancer, which is a challenging disease to study as it is a common type of cancer, while the symptoms are very a-specific for the disease.

• The results show that some of the NLP techniques studied can complement the coded EMR data, and hence, result in improved predictive models.

摘要

Highlights•The paper studies several natural language processing (NLP) techniques to extract predictors from uncoded data in electronic medical records (EMRs).•Some techniques are well-known while other have been developed specifically for this research.•The approaches have been applied to a large dataset we have access to, covering 90,000 patients in general practices.•We focus on predictive modelling of colorectal cancer, which is a challenging disease to study as it is a common type of cancer, while the symptoms are very a-specific for the disease.•The results show that some of the NLP techniques studied can complement the coded EMR data, and hence, result in improved predictive models.

论文关键词:Natural language processing,Predictive modeling,Uncoded consultation notes,Colorectal cancer

论文评审过程:Received 6 November 2015, Accepted 23 March 2016, Available online 31 March 2016, Version of Record 25 May 2016.

论文官网地址:https://doi.org/10.1016/j.artmed.2016.03.003