The problem of bias in training data in regression problems in medical decision support

作者:

Highlights:

摘要

This paper describes a bias problem encountered in a machine learning approach to outcome prediction in anticoagulant drug therapy. The outcome to be predicted is a measure of the clotting time for the patient; this measure is continuous and so the prediction task is a regression problem. Artificial neural networks (ANNs) are a powerful mechanism for learning to predict such outcomes from training data. However, experiments have shown that an ANN is biased towards values more commonly occurring in the training data and is thus, less likely to be correct in predicting extreme values. This issue of bias in training data in regression problems is similar to the associated problem with minority classes in classification. However, this bias issue in classification is well documented and is an on-going area of research. In this paper, we consider stratified sampling and boosting as solutions to this bias problem and evaluate them on this outcome prediction problem and on two other datasets. Both approaches produce some improvements with boosting showing the most promise.

论文关键词:Artificial neural networks,Medical decision support,Anticoagulant drug therapy,Regression

论文评审过程:Received 15 January 2001, Revised 21 May 2001, Accepted 18 June 2001, Available online 30 December 2001.

论文官网地址:https://doi.org/10.1016/S0933-3657(01)00092-6