Data engineering for fraud detection

作者：

Highlights：

• Companies increasingly rely upon data-driven methods for detecting fraud.

• Data engineering is of utmost importance to improve the performance of most machine learning models.

• Our data engineering process is decomposed into several feature and instance engineering steps.

• The benefits of data engineering is illustrated on a payment transactions data set from a large European Bank.

摘要

Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting suspicious transactions is a binary classification problem and therefore many techniques can be applied. Interpretability is however of utmost importance for the management to have confidence in the model and for designing fraud prevention strategies. Moreover, models that enable the fraud experts to understand the underlying reasons why a case is flagged as suspicious will greatly facilitate their job of investigating the suspicious transactions. Therefore, we propose several data engineering techniques to improve the performance of an analytical model while retaining the interpretability property. Our data engineering process is decomposed into several feature and instance engineering steps. We illustrate the improvement in performance of these data engineering steps for popular analytical models on a real payment transactions data set.

论文关键词：Decision analysis,Payment transactions fraud,Instance engineering,Feature engineering,Cost-based model evaluation

论文评审过程：Received 15 July 2020, Revised 25 November 2020, Accepted 7 January 2021, Available online 12 January 2021, Version of Record 24 September 2021.

论文官网地址：https://doi.org/10.1016/j.dss.2021.113492