An HMM-based over-sampling technique to improve text classification
作者:
Highlights:
• An over-sampling balancing method based on document content is proposed.
• The technique includes an HMM that generates samples based on existing documents.
• The model is tested with a SVM classifier in two medical document collections.
• Results show the method outperforms another well-used data balancing techniques.
摘要
•An over-sampling balancing method based on document content is proposed.•The technique includes an HMM that generates samples based on existing documents.•The model is tested with a SVM classifier in two medical document collections.•Results show the method outperforms another well-used data balancing techniques.
论文关键词:Hidden Markov Model,Text classification,Oversampling techniques
论文评审过程:Available online 17 July 2013.
论文官网地址:https://doi.org/10.1016/j.eswa.2013.07.036