Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval

作者:

Highlights:

摘要

A retrieval system was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional “semantic” space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system's retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a single word were studied.

论文关键词:

论文评审过程:Received 4 January 1989, Accepted 24 February 1989, Available online 19 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(89)90100-3