Choosing document structure weights

作者:

Highlights:

摘要

Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure.Vector space, probability, and Okapi BM25 ranking are extended to include structure weighting. Weights are then selected for the TREC WSJ collection using a genetic algorithm. The learned weights are then tested on an evaluation set of queries. Structure weighted vector space inner product and structure weighted probabilistic retrieval show an about 5% improvement in mean average precision over their unstructured counterparts. Structure weighted BM25 shows nearly no improvement. Analysis suggests BM25 cannot be improved using structure weighting.

论文关键词:Structured information retrieval,Genetic algorithms,Vector space model,Probability model

论文评审过程:Received 9 July 2003, Accepted 16 October 2003, Available online 27 November 2003.

论文官网地址:https://doi.org/10.1016/j.ipm.2003.10.003