Discovering statistically non-redundant subgroups
作者:
Highlights:
•
摘要
The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify the statistical difference of two groups for a certain outcome and it is a very suitable measure for quantifying the quality of subgroups. In this paper, we propose a statistically sound framework for statistically non-redundant subgroup discovery: measuring the quality of subgroups by the odds ratio and defining statistically non-redundant subgroups by the error bounds of odds ratios. We show that our proposed method is faster than most existing methods and discovers complete statistically non-redundant subgroups.
论文关键词:Subgroups,Non-redundancy,Odds ratio,Rules,Search space pruning
论文评审过程:Received 2 January 2013, Revised 1 April 2014, Accepted 2 April 2014, Available online 2 May 2014.
论文官网地址:https://doi.org/10.1016/j.knosys.2014.04.030