Heuristically repopulated Bayesian ant colony optimization for treating missing values in large databases

作者:

Highlights:

摘要

The incomplete datasets with missing values are unsuitable for making strategic decisions since they lead to biased results. This problem is even worse when the dataset is large and collected from many heterogeneous sources. The paper deals with missing scenarios which were not dealt together earlier. The proposed Dual Repopulated Bayesian Ant Colony Optimization (DPBACO) handles both ignorable and non-ignorable missing values in heterogeneous attributes of large datasets The DPBACO integrates Bayesian principles with Ant Colony Optimization technique since both are simple and efficient to implement. After pheromone updation, repopulation of the solution pool is done by dividing the population into two based on their fitness values and generating new offsprings by performing crossover operation. The DPBACO algorithm is implemented on six large mixed-attribute datasets for imputing both kinds of missing values. The empirical and statistical results show that DPBACO performs better than other existing methods at variable missing rates ranging from 5% to 50%.

论文关键词:Missing values,Heterogeneous attributes,Ant colony optimization,Bayesian methods,Repopulation

论文评审过程:Received 15 May 2016, Revised 23 June 2017, Accepted 26 June 2017, Available online 1 July 2017, Version of Record 4 September 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.06.033