Geometrical codification for clustering mixed categorical and numerical databases
作者:Fatima Barcelo-Rico, Jose-Luis Diez
摘要
This paper presents an alternative to cluster mixed databases. The main idea is to propose a general method to cluster mixed data sets, which is not very complex and still can reach similar levels of performance of some good algorithms. The proposed approach is based on codifying the categorical attributes and use a numerical clustering algorithm on the resulting database. The codification proposed is based on polar or spherical coordinates, it is easy to understand and to apply, the increment in the length of the input matrix is not excessively large, and the codification error can be determined for each case. The proposed codification combined with the well known k-means algorithm showed a very good performance in different benchmarks and has been compared with both, other codifications and other mixed clustering algorithms, showing a better or comparable performance in all cases.
论文关键词:Mixed data, Clustering, Data conversion, k-means, Codification error
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10844-011-0187-y