Discretization of continuous attributes through low frequency numerical values and attribute interdependency

作者:

Highlights:

• A new discretization technique called LFD.

• Does not require any user input.

• Interval width, number and frequency are automatically determined; all data driven.

• Minimizes information loss due to discretization by choosing low frequency cut points.

• Categorical attributes are taken as reference point for discretization.

摘要

•A new discretization technique called LFD.•Does not require any user input.•Interval width, number and frequency are automatically determined; all data driven.•Minimizes information loss due to discretization by choosing low frequency cut points.•Categorical attributes are taken as reference point for discretization.

论文关键词:Data discretization,Data pre-processing,Data cleansing,Missing value imputation,Corrupt data detection,Data mining

论文评审过程:Received 26 October 2014, Revised 5 October 2015, Accepted 6 October 2015, Available online 20 October 2015, Version of Record 10 November 2015.

论文官网地址:https://doi.org/10.1016/j.eswa.2015.10.005