Outlier Detection and Data Cleaning in Multivariate Non-Normal Samples: The PAELLA Algorithm

作者:Manuel Castejón Limas, Joaquín B. Ordieres Meré, Francisco J. Martínez de Pisón Ascacibar, Eliseo P. Vergara González

摘要

A new method of outlier detection and data cleaning for both normal and non-normal multivariate data sets is proposed. It is based on an iterated local fit without a priori metric assumptions. We propose a new approach supported by finite mixture clustering which provides good results with large data sets. A multi-step structure, consisting of three phases, is developed. The importance of outlier detection in industrial modeling for open-loop control prediction is also described. The described algorithm gives good results both in simulations runs with artificial data sets and with experimental data sets recorded in a rubber factory. Finally, some discussion about this methodology is exposed.

论文关键词:outlier, multivariate, non-normal, data cleaning, EM algorithm, cluster analysis, mixture model

论文评审过程:

论文官网地址:https://doi.org/10.1023/B:DAMI.0000031630.50685.7c