The feature selection bias problem in relation to high-dimensional gene data

作者:

Highlights:

• We analyze seven gene datasets to show the feature selection bias effect on the accuracy measure.

• We examine its importance by an empirical study of four feature selection methods.

• For evaluating feature selection performance we use double cross-validation.

• By the way, we examine the stability of the feature selection methods.

• We recommend cross-validation for feature selection in order to reduce the selection bias.

摘要

Highlights•We analyze seven gene datasets to show the feature selection bias effect on the accuracy measure.•We examine its importance by an empirical study of four feature selection methods.•For evaluating feature selection performance we use double cross-validation.•By the way, we examine the stability of the feature selection methods.•We recommend cross-validation for feature selection in order to reduce the selection bias.

论文关键词:Feature selection bias,Convex and piecewise linear classifier,Support vector machine,Gene selection,Microarray data

论文评审过程:Received 26 September 2014, Revised 14 September 2015, Accepted 3 November 2015, Available online 14 November 2015, Version of Record 26 February 2016.

论文官网地址:https://doi.org/10.1016/j.artmed.2015.11.001