An optimization approach with weighted SCiForest and weighted Hausdorff distance for noise data and redundant data

作者:Yifeng Zheng, Guohe Li, Ying Li, Wenjie Zhang, Xueling Pan, Yaojin Lin

摘要

With the development of intelligent technology, data obtained from practical applications may be subject to noise information (outlier data or redundant data). Noise data usually leads to the deterioration of the performance and robustness of classifiers. In order to address the above problem, in this paper, we propose an optimization method for Outlier samples and Redundant samples Detection (ORD). Firstly, we leverage the maximum information compression to eliminate irrelevant feature information. Secondly, an outlier optimization filter is proposed, called WSCiForest, which utilizes the fusion strategy based on the entropy-weighted and group optimization theory to calculate the distribution estimated score for each sample. Eventually, ORD adopts the improved Hausdorff distance to obtain redundant samples effectively. The experimental results show that the proposed method can effectively optimize the data space.

论文关键词:Data mining, Data preprocessing, Data optimization, Feature selection, Fusion strategy

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02685-9