Distributed matrix factorization based on fast optimization for implicit feedback recommendation

作者:Lian Chen, Wangdong Yang, Kenli Li, Keqin Li

摘要

In big data scenarios, matrix factorization (MF) is widely used in recommendation systems as it can offer high accuracy and scalability. However, when using MF to process large-scale implicit feedback data, the following two problems arise. One is that it is difficult to effectively obtain negative feedback information, which causes relatively poor recommendation accuracy. The other is that the limited resources of a single machine make the model training inefficient, and in particular, the acquisition of negative feedback information further increases the time complexity of model training. In order to solve the above two problems, we first propose a user-activity and item-popularity weighted matrix factorization (UIWMF) recommendation algorithm, which assigns every missing data different weight based on user activity and item popularity, gets negative feedback information more realistically, and leads to better recommendation accuracy. Meanwhile, in order to reduce the additional computational overhead caused by the weight strategy, we develop a fast optimization strategy to enhance the efficiency. In order to break the resource constraints of a single machine, we propose a distributed UIWMF (DUIWMF) algorithm based on Spark, which adopts an efficient parallel learning algorithm to train the model and utilizes cached in-block and out-block information to effectively reduce the communication overhead in a distributed environment. We conduct experiments on three public datasets, and the experimental results demonstrate that, comparing with the baseline MF methods, DUIWMF model has comparable performance in terms of recommendation accuracy and model training efficiency.

论文关键词:Personalized recommendation, Collaborative filtering, User and item recommendation, Fast optimization, Distributed matrix factorization, Spark

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-020-00601-0