TCRM: diagnosing tuple inconsistency for granulized datasets

作者:

Highlights:

摘要

Many approaches to the granulization have been presented for knowledge discovery. However, the inconsistent tuples that exist in granulized datasets are hardly ever revealed. In this paper, we developed a model, tuple consistency recognition model (TCRM) to help efficiently detect inconsistent tuples for datasets that are granulized. The main outputs of the developed model include explored inconsistent tuples and consumed processing time. We further conducted an empirical test where eighteen continuous real-life datasets granulized by the equal width interval technique that embedded S-plus histogram binning algorithm (SHBA) and largest binning size algorithm (LBSA) binning algorithms were diagnosed. Remarkable results: almost 40% of the granulized datasets contain inconsistent tuples and 22% have the amount of inconsistent tuples more than 20%.

论文关键词:Knowledge discovery,Granulization,SQL,Tuple consistency

论文评审过程:Received 31 July 2001, Accepted 7 January 2002, Available online 9 May 2002.

论文官网地址:https://doi.org/10.1016/S0950-7051(02)00037-0