Scalability and sparsity issues in recommender datasets: a survey

作者：Monika Singh

摘要

Recommender systems have been widely used in various domains including movies, news, music with an aim to provide the most relevant proposals to users from a variety of available options. Recommender systems are designed using techniques from many fields, some of which are: machine learning, information retrieval, data mining, linear algebra and artificial intelligence. Though in-memory nearest-neighbor computation is a typical approach for collaborative filtering due to its high recommendation accuracy; its performance on scalability is still poor given a huge user and item base and availability of only few ratings (i.e., data sparsity) in archetypal merchandising applications. In order to alleviate scalability and sparsity issues in recommender systems, several model-based approaches were proposed in the past. However, if research in recommender system is to achieve its potential, there is a need to understand the prominent techniques used directly to build recommender systems or for preprocessing recommender datasets, along with its strengths and weaknesses. In this work, we present an overview of some of the prominent traditional as well as advanced techniques that can effectively handle data dimensionality and data sparsity. The focus of this survey is to present an overview of the applicability of some advanced techniques, particularly clustering, biclustering, matrix factorization, graph-theoretic, and fuzzy techniques in recommender systems. In addition, it highlights the applicability and recent research works done using each technique.

论文关键词：Recommender systems, Collaborative filtering, Memory-based technique, Model-based techniques, Hard clustering, Biclustering, Matrix factorization, Graph-theoretic technique, Fuzzy recommender systems, Scalability, Sparsity

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10115-018-1254-2