Information fusion for text classification — an experimental comparison

作者：

Highlights：

•

摘要

This article reports on our experiments and results on the effectiveness of different feature sets and information fusion from some combinations of them in classifying free text documents into a given number of categories. We use different feature sets and integrate neural network learning into the method. The feature sets are based on the “latent semantics” of a reference library — a collection of documents adequately representing the desired concepts. We found that a larger reference library is not necessarily better. Information fusion almost always gives better results than the individual constituent feature sets, with certain combinations doing better than the others.

论文关键词：Text classification,Features,Latent semantic indexing,Reference library,Neural networks,Information fusion

论文评审过程：Received 26 March 1998, Revised 8 November 2000, Accepted 25 September 2001, Available online 30 August 2001.

论文官网地址：https://doi.org/10.1016/S0031-3203(00)00171-0