Approximation-based feature selection and application for algae population estimation

作者:Qiang Shen, Richard Jensen

摘要

This paper presents a data-driven approach for feature selection to address the common problem of dealing with high-dimensional data. This approach is able to handle the real-valued nature of the domain features, unlike many existing approaches. This is accomplished through the use of fuzzy-rough approximations. The paper demonstrates the effectiveness of this research by proposing an estimator of algae populations, a system that approximates, given certain water characteristics, the size of algae populations. This estimator significantly reduces computer time and space requirements, decreases the cost of obtaining measurements and increases runtime efficiency, making itself more viable economically. By retaining only information required for the estimation task, the system offers higher accuracy than conventional estimators. Finally, the system does not alter the domain semantics, making any distilled knowledge human-readable. The paper describes the problem domain, architecture and operation of the system, and provides and discusses detailed experimentation. The results show that algae estimators using a fuzzy-rough feature selection step produce more accurate predictions of algae populations in general.

论文关键词:Feature evaluation and selection, Data-driven knowledge acquisition, Classification, Fuzzy-rough sets, Algae population estimation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-007-0058-y