Crowd labeling latent Dirichlet allocation

作者:Luca Pion-Tonachini, Scott Makeig, Ken Kreutz-Delgado

摘要

Large, unlabeled datasets are abundant nowadays, but getting labels for those datasets can be expensive and time-consuming. Crowd labeling is a crowdsourcing approach for gathering such labels from workers whose suggestions are not always accurate. While a variety of algorithms exist for this purpose, we present crowd labeling latent Dirichlet allocation (CL-LDA), a generalization of latent Dirichlet allocation that can solve a more general set of crowd labeling problems. We show that it performs as well as other methods and at times better on a variety of simulated and actual datasets while treating each label as compositional rather than indicating a discrete class. In addition, prior knowledge of workers’ abilities can be incorporated into the model through a structured Bayesian framework. We then apply CL-LDA to the EEG independent component labeling dataset, using its generalizations to further explore the utility of the algorithm. We discuss prospects for creating classifiers from the generated labels.

论文关键词:Crowd labeling, Generative model, Bayesian, Latent Dirichlet allocation, EEG

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-017-1053-1