Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network

作者:

Highlights:

• Integration of clinical notes, concept identifiers and structural clinical features improves the performance of distant breast cancer recurrence prediction using Machine Learning and yields high AUC of over 0.88.

• Knowledge-guided convolutional neural network outperforms conventional Machine Learning configurations on the task of distant breast cancer recurrence prediction and yields high f1 score of 0.50.

• Natural Language Processing techniques, including Bag-of-Word, Metamap, word and entity embedding are employed to represent progress notes and pathology reports.

• Detailed report review and error analysis detect common caveats of using clinical notes for prediction of cancer recurrence which could inspire future investments.

摘要

•Integration of clinical notes, concept identifiers and structural clinical features improves the performance of distant breast cancer recurrence prediction using Machine Learning and yields high AUC of over 0.88.•Knowledge-guided convolutional neural network outperforms conventional Machine Learning configurations on the task of distant breast cancer recurrence prediction and yields high f1 score of 0.50.•Natural Language Processing techniques, including Bag-of-Word, Metamap, word and entity embedding are employed to represent progress notes and pathology reports.•Detailed report review and error analysis detect common caveats of using clinical notes for prediction of cancer recurrence which could inspire future investments.

论文关键词:Breast cancer,Distant recurrence,Knowledge-guided convolutional neural network,Word embeddings,Entity embeddings

论文评审过程:Received 20 January 2020, Revised 16 October 2020, Accepted 21 October 2020, Available online 1 November 2020, Version of Record 6 November 2020.

论文官网地址:https://doi.org/10.1016/j.artmed.2020.101977