Feature Subset Selection by Bayesian network-based optimization
作者:
摘要
A new method for Feature Subset Selection in machine learning, FSS-EBNA (Feature Subset Selection by Estimation of Bayesian Network Algorithm), is presented. FSS-EBNA is an evolutionary, population-based, randomized search algorithm, and it can be executed when domain knowledge is not available. A wrapper approach, over Naive-Bayes and ID3 learning algorithms, is used to evaluate the goodness of each visited solution. FSS-EBNA, based on the EDA (Estimation of Distribution Algorithm) paradigm, avoids the use of crossover and mutation operators to evolve the populations, in contrast to Genetic Algorithms. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search. This factorization is carried out by means of Bayesian networks. Promising results are achieved in a variety of tasks where domain knowledge is not available. The paper explains the main ideas of Feature Subset Selection, Estimation of Distribution Algorithm and Bayesian networks, presenting related work about each concept. A study about the `overfitting' problem in the Feature Subset Selection process is carried out, obtaining a basis to define the stopping criteria of the new algorithm.
论文关键词:Machine learning,Supervised learning,Feature Subset Selection,Wrapper,Predictive accuracy,Estimation of Distribution Algorithm,Estimation of Bayesian Network Algorithm,Bayesian network,Overfitting
论文评审过程:Received 12 December 1999, Available online 2 November 2000.
论文官网地址:https://doi.org/10.1016/S0004-3702(00)00052-7