A user-guided Bayesian framework for ensemble feature selection in life science applications (UBayFS)

作者:Anna Jenul, Stefan Schrunner, Jürgen Pilz, Oliver Tomic

摘要

Feature selection reduces the complexity of high-dimensional datasets and helps to gain insights into systematic variation in the data. These aspects are essential in domains that rely on model interpretability, such as life sciences. We propose a (U)ser-Guided (Bay)esian Framework for (F)eature (S)election, UBayFS, an ensemble feature selection technique embedded in a Bayesian statistical framework. Our generic approach considers two sources of information: data and domain knowledge. From data, we build an ensemble of feature selectors, described by a multinomial likelihood model. Using domain knowledge, the user guides UBayFS by weighting features and penalizing feature blocks or combinations, implemented via a Dirichlet-type prior distribution. Hence, the framework combines three main aspects: ensemble feature selection, expert knowledge, and side constraints. Our experiments demonstrate that UBayFS (a) allows for a balanced trade-off between user knowledge and data observations and (b) achieves accurate and robust results.

论文关键词:Ensemble feature selection, Bayesian model, Dirichlet-multinomial, User constraints

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-022-06221-9