On clustering categories of categorical predictors in generalized linear models
作者:
Highlights:
• The paper proposes a method to cluster categorical features in Generalized Linear Models.
• The proposed approach uses a numerical method guided by the learning performance.
• The underlying structure of the categories and their relationship is identified using proximity graphs.
• Complexity is reduced and accuracy results are competitive against benchmark one-hot encoding of categorical features.
摘要
•The paper proposes a method to cluster categorical features in Generalized Linear Models.•The proposed approach uses a numerical method guided by the learning performance.•The underlying structure of the categories and their relationship is identified using proximity graphs.•Complexity is reduced and accuracy results are competitive against benchmark one-hot encoding of categorical features.
论文关键词:Statistical learning,Interpretability,Greedy randomized adaptive search procedure,Proximity between categories
论文评审过程:Received 3 March 2020, Revised 11 February 2021, Accepted 17 May 2021, Available online 24 May 2021, Version of Record 3 June 2021.
论文官网地址:https://doi.org/10.1016/j.eswa.2021.115245