Combining contextualized word representation and sub-document level analysis through Bi-LSTM+CRF architecture for clinical de-identification

作者：

Highlights：

• De-identify entities belonging to various classes in unstructured medical records.

• Stack embeddings and extend the context to boost Bi-LSTM+CRF systems performance.

• Establish a new state of the art in the classification of entities at category level.

摘要

•De-identify entities belonging to various classes in unstructured medical records.•Stack embeddings and extend the context to boost Bi-LSTM+CRF systems performance.•Establish a new state of the art in the classification of entities at category level.

论文关键词：Clinical de-identification,Named entity recognition,Deep learning,Contextualized embedding,Sub-document level analysis

论文评审过程：Received 30 June 2020, Revised 29 September 2020, Accepted 1 December 2020, Available online 24 December 2020, Version of Record 24 December 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2020.106649