A comparative study of automated legal text classification using random forests and deep learning

作者：

Highlights：

• We apply domain concepts to legal text classification based on PCA and RFs to demonstrate its powerful ability for legal text.

• We conduct a systematic comparative study on a legal area classification dataset by using domain concept-based machine learning algorithms and pre-trained word embeddings-based deep learning algorithms.

• We propose a framework, which includes the strategy for selecting machine learning models in terms of four indicators: data, performance, computation, and interpretation.

摘要

•We apply domain concepts to legal text classification based on PCA and RFs to demonstrate its powerful ability for legal text.•We conduct a systematic comparative study on a legal area classification dataset by using domain concept-based machine learning algorithms and pre-trained word embeddings-based deep learning algorithms.•We propose a framework, which includes the strategy for selecting machine learning models in terms of four indicators: data, performance, computation, and interpretation.

论文关键词：Legal text classification,Machine learning,Deep learning,Domain concept,Word embedding,Random forests

论文评审过程：Received 1 March 2021, Revised 10 July 2021, Accepted 17 October 2021, Available online 17 November 2021, Version of Record 17 November 2021.

论文官网地址：https://doi.org/10.1016/j.ipm.2021.102798