Information fusion for text classification — an experimental comparison

作者:

Highlights:

摘要

This article reports on our experiments and results on the effectiveness of different feature sets and information fusion from some combinations of them in classifying free text documents into a given number of categories. We use different feature sets and integrate neural network learning into the method. The feature sets are based on the “latent semantics” of a reference library — a collection of documents adequately representing the desired concepts. We found that a larger reference library is not necessarily better. Information fusion almost always gives better results than the individual constituent feature sets, with certain combinations doing better than the others.

论文关键词:Text classification,Features,Latent semantic indexing,Reference library,Neural networks,Information fusion

论文评审过程:Received 26 March 1998, Revised 8 November 2000, Accepted 25 September 2001, Available online 30 August 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(00)00171-0