Fraudulent review detection model focusing on emotional expressions and explicit aspects: investigating the potential of feature engineering

作者:

Highlights:

• Investigate the role of effective data pre-processing techniques to improve accuracy of ML classifier for predicting the behavior of fraudulent reviewers

• Develop a novel feature engineering approach to detect the fraudulent reviewers using NLP and text mining

• We present a novel feature engineering approach in which we extract several “review-centric” and “reviewer-centric” features from the datasets and combine the cumulative effects of features distributions into a unified model that represents overall behavior of the fraudulent reviewers.

摘要

Reading customer reviews before purchasing items online has become a common practice; however, some companies use machine learning (ML) algorithms to generate false reviews in order to create positive brand images of their own products and negative images of competitors' offerings. Existing techniques use review content to identify fraudulent reviewers; however, spammers become more intelligent, started to learn from their mistakes, and changed their tactics in order to avoid detection techniques. Thus, investigating fraudulent accounts' behaviour of generating fake negative or positive reviews for competitors or themselves and the necessity of ML classifiers to identify fraudulent reviews, is more important than ever. In this research, we present a novel feature engineering approach in which we (1) extract several “review-centric” and “reviewer-centric” features from a dataset; (2) combine the cumulative effects of features distributions into a unified model that represents overall behavior of the fraudulent reviewers; (3) investigate the role of effective data pre-processing to improve detection accuracy; and (4) develop a probabilistic approach to detect fraudulent reviewers by learning a novel M-SMOTE model over a derived balanced dataset and feature distributions, which outperforms other ML models. Our study contributes to the literature on digital platforms and fraudulent review detection with significant managerial and theoretical implications through these novel findings.

论文关键词:Online reviews,Digital platforms,Review manipulation,Machine learning,Opinion spamming,Feature engineering

论文评审过程:Received 14 June 2021, Revised 14 December 2021, Accepted 28 December 2021, Available online 5 January 2022, Version of Record 21 February 2022.

论文官网地址:https://doi.org/10.1016/j.dss.2021.113728