A novel approach for fraudulent reviewer detection based on weighted topic modelling and nearest neighbors with asymmetric Kullback–Leibler divergence

作者:

Highlights:

• We propose a novel approach called ImDetector to detecting imbalanced fraudulent reviewers based on weighted LDA and asymmetric KL divergence.

• We develop weighted LDA model to extract latent topics of reviewers and adopt asymmetric KL divergence to measure similarities between reviewers.

• The data imbalance problem in fraudulent reviewer detection is alleviated by the proposed ImDetector approach.

• Extensive experiments on the Yelp.com dataset demonstrate that the proposed ImDetector approach is superior to state-of-the-art techniques.

摘要

The task of detecting fraudulent reviewers is of great importance to E-commerce platforms. Existing research has invested much effort into developing comprehensive features and advanced techniques to detect fraudulent reviewers. However, most of these studies have ignored the data imbalance problem inherent in fraudulent reviewer detection: non-fraudulent reviewers are the majority, while fraudulent reviewers are the minority in real practice. To fill this gap, we propose a novel approach called ImDetector to detect fraudulent reviewers while handling data imbalance based on weighted latent Dirichlet allocation (LDA) and Kullback–Leibler (KL) divergence. Specifically, we develop a weighted LDA model to extract the latent topics of reviewers distributed on the review features. Asymmetric KL divergence is adopted to make the similarity measure between reviewers biased toward the fraudulent minority when using the K-nearest-neighbor for classification. By mapping the reviewers to the latent topics of features derived from the weighted LDA model and measuring the similarities between reviewers using asymmetric KL divergence, the data imbalance problem in fraudulent reviewer detection is alleviated. Extensive experiments on the Yelp.com dataset demonstrate that the proposed ImDetector approach is superior to the state-of-the-art techniques used for fraudulent reviewer detection. We also explain the experimental results and present the managerial implications of this paper.

论文关键词:E-commerce,Fraudulent reviewer detection,Imbalanced data,Weighted LDA,Kullback–Leibler divergence

论文评审过程:Received 18 July 2021, Revised 18 January 2022, Accepted 22 February 2022, Available online 1 March 2022, Version of Record 12 April 2022.

论文官网地址:https://doi.org/10.1016/j.dss.2022.113765