On fast supervised learning for normal mixture models with missing information

作者:

Highlights:

摘要

It is an important research issue to deal with mixture models when missing values occur in the data. In this paper, computational strategies using auxiliary indicator matrices are introduced for efficiently handling mixtures of multivariate normal distributions when the data are missing at random and have an arbitrary missing data pattern, meaning that missing data can occur anywhere. We develop a novel EM algorithm that can dramatically save computation time and be exploited in many applications, such as density estimation, supervised clustering and prediction of missing values. In the aspect of multiple imputations for missing data, we also offer a data augmentation scheme using the Gibbs sampler. Our proposed methodologies are illustrated through some real data sets with varying proportions of missing values.

论文关键词:Bayesian classifier,Data augmentation,EM algorithm,Incomplete features,Rao-Blackwellization

论文评审过程:Received 1 July 2005, Revised 7 November 2005, Accepted 21 December 2005, Available online 14 February 2006.

论文官网地址:https://doi.org/10.1016/j.patcog.2005.12.014