Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling
作者:
Highlights:
• Several studies achieving near-perfect prediction results on the TPEHGDB dataset do this by introducing a methodological flaw in the data processing, in particular in the application of over-sampling to counter class imbalance.
• When reproducing the proposed methods with correct data processing, they often do not perform significantly better than random guessing.
• Over-sampling, when correctly applied, has a noticeable yet more moderate impact on prediction effectiveness.
摘要
•Several studies achieving near-perfect prediction results on the TPEHGDB dataset do this by introducing a methodological flaw in the data processing, in particular in the application of over-sampling to counter class imbalance.•When reproducing the proposed methods with correct data processing, they often do not perform significantly better than random guessing.•Over-sampling, when correctly applied, has a noticeable yet more moderate impact on prediction effectiveness.
论文关键词:Preterm birth risk estimation,Over-sampling,Electrohysterography
论文评审过程:Received 15 January 2020, Revised 9 September 2020, Accepted 12 November 2020, Available online 20 November 2020, Version of Record 4 December 2020.
论文官网地址:https://doi.org/10.1016/j.artmed.2020.101987