Neighbor-weighted K-nearest neighbor for unbalanced text corpus

作者:

Highlights:

摘要

Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-weighted K-nearest neighbor algorithm, i.e. NWKNN. The experimental results indicate that our algorithm NWKNN achieves significant classification performance improvement on imbalanced corpora.

论文关键词:Text classification,K-Nearest neighbor (KNN),Information retrieval,Data mining

论文评审过程:Available online 12 January 2005.

论文官网地址:https://doi.org/10.1016/j.eswa.2004.12.023