Learning heterogeneous graph embedding for Chinese legal document similarity

作者:

Highlights:

摘要

Measuring the similarity between legal documents to find prior documents from a massive collection that are similar to a current document is an essential component in legal assistant systems. This type of system can automatically link related legal documents to ensure that the same situations are treated identically in judicial practice. Most existing methodologies propose text- and citation-based methods to calculate the similarity between legal documents. However, those methods have difficulty capturing the semantics of many legal entities and giving more accurate similarity. The main reason is the lack of legal domain knowledge and citation relations between legal documents. We introduce practical, generic heterogeneous graph representation learning based on a legal heterogeneous knowledge graph to address these challenges. Specifically, we construct a heterogeneous knowledge graph containing legal entities and documents and develop a graph-based embedding model called L-HetGRL. A legal entity can simply be simply a legal-related encyclopedia entry that contains legal-domain knowledge utilized to enhance document representation. L-HetGRL incorporates learning legal document information and external legal domain knowledge in a unified manner by jointly considering heterogeneous content. In addition, we designed a legal case-aware semantic alignment module that effectively combines legal entities and their semantics in documents, thus improving the representation of entities. We conducted comprehensive experiments, including similar case matching and charge prediction, to evaluate the performance of our L-HetGRL on two real-world datasets. As a result, the experimental evaluations demonstrate that L-HetGRL outperforms other competitive baselines. In addition, we present a series of suggestions for document representation in the legal domain, which provide valuable guidelines for follow-up studies.

论文关键词:Legal document similarity,Legal-domain knowledge graph,Representation learning,Heterogeneous content,Semantic alignment

论文评审过程:Received 25 July 2021, Revised 10 May 2022, Accepted 10 May 2022, Available online 19 May 2022, Version of Record 23 May 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109046