Complementary or Substitutive? A Novel Deep Learning Method to Leverage Text-image Interactions for Multimodal Review Helpfulness Prediction

作者:

Highlights:

摘要

With the flourishing of mobile Internet, the multimodal reviews (i.e., reviews with both texts and images) are becoming prevalent and playing an important role in customer decision makings. However, when making multimodal review helpfulness prediction (MRHP), it becomes difficult due to the information interaction between text and images. The information in review text (images) can be either complementary or substitutive to visual (textual) review information. Moreover, the text (images) itself may constitute the review’s diagnostic value predominantly in some cases, whereas they could be jointly perceived as useful by customers in others. In this study, we delve to conduct MRPH by modeling their text-image interactions. We proposed a novel multimodal deep learning method that exploits the complementation and substitution effects between text and images and further coordinates them for MRHP. Empirical evaluation on a large-scale online review dataset shows that our proposed method outperformed the benchmarks, indicating its powerful capability to predict the helpfulness of multimodal reviews. Exploratory analysis renders insights for understanding the complementary-substitutive interaction patterns between review text and images.

论文关键词:Review helpfulness prediction,Multimodal review,Text-image interaction,Complementation effect,Substitution effect

论文评审过程:Received 24 October 2021, Revised 13 June 2022, Accepted 10 July 2022, Available online 14 July 2022, Version of Record 18 July 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118138