A methodology for comparing classification methods through the assessment of model stability and validity in variable selection

作者:

Highlights:

摘要

Classification analysis utilizes features for separating observations into distinct groups for decision-making purposes. This study provides a systematic design for comparing the performance of six classification methods using Monte Carlo simulations and illustrates that the variable selection process is integral in comparing methodologies to ensure minimal bias, enhanced stability, and optimize performance. We quantify the variable selection bias and show that, for sufficiently large samples, this bias is minimized so that methods can be compared. We address topics relevant to model building and provide prescriptions for future comparisons so as to build a body of evidence for recommending their use.

论文关键词:Classification,Comparison,Prediction,Variable selection,Reliability,Validity,F-measure

论文评审过程:Received 30 July 2010, Revised 14 May 2011, Accepted 1 August 2011, Available online 11 August 2011.

论文官网地址:https://doi.org/10.1016/j.dss.2011.08.001