Latent Gaussian process for anomaly detection in categorical data

作者:

Highlights:

摘要

We propose a semi-supervised approach towards anomaly detection in multivariate categorical data. Our goal is to learn a model that can distinguish the anomalous data, given a small set of training data from the normal class. To this end, our approach learns the probability distribution of normal instances with the assumption that the categorical data are generated from a continuous latent space. Gaussian process is adopted to construct the generative model. As a non-parametric Bayesian model, Gaussian process can adapt its model complexity according to the data size. Hence, our approach can be effective when the training dataset is small. Comprehensive experiments over different benchmarks clearly demonstrate the effectiveness of our approach.

论文关键词:Anomaly detection,Categorical data,Gaussian process,Data-efficient learning

论文评审过程:Received 17 June 2020, Revised 11 February 2021, Accepted 22 February 2021, Available online 25 February 2021, Version of Record 15 March 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106896