Using text mining to establish knowledge graph from accident/incident reports in risk assessment

作者：

Highlights：

•

摘要

To clarify the risk factors and propagation characteristics affecting railway safety, we learn from historical reports to build a connected network of hazards and accidents, forming a knowledge graph (KG), and apply it to railway hazard identification and risk assessment. First, the open source-British railway accident/incident reports are selected as the data source. The text augmentation algorithm in the text mining technology is introduced and optimized to achieve data enhancement. An ensemble model is constructed based on the hidden Markov model, conditional random field (CRF) algorithm, bidirectional long short-term memory (Bi-LSTM), and Bi-LSTM-CRF deep learning network, completing the named entity recognition of the reports. Then, using the random forest algorithm, the standardized classification of entities is accomplished, and the multi-dimensional knowledge graph network is established. Finally, after defining a series of safety-related feature parameters, the obtained KG is applied to the quantitative assessment of the corresponding risk level of the hazards. The results show that this approach realizes the visualization and quantitative description of the potential relationship among hazards, faults, and accidents by exploring the topological relationship of the railway accident network, further assisting the formulation of railway risk preventive measures.

论文关键词：Text mining,Knowledge graph,Named entity recognition,Machine learning,Risk assessment,Railway safety

论文评审过程：Received 24 October 2021, Revised 13 June 2022, Accepted 26 June 2022, Available online 30 June 2022, Version of Record 15 July 2022.

论文官网地址：https://doi.org/10.1016/j.eswa.2022.117991