Techniques for improving web retrieval effectiveness

作者:

Highlights:

摘要

This paper talks about several schemes for improving retrieval effectiveness that can be used in the named page finding tasks of web information retrieval (Overview of the TREC-2002 web track. In: Proceedings of the Eleventh Text Retrieval Conference TREC-2002, NIST Special Publication #500-251, 2003). These methods were applied on top of the basic information retrieval model as additional mechanisms to upgrade the system. Use of the title of web pages was found to be effective. It was confirmed that anchor texts of incoming links was beneficial as suggested in other works. Sentence–query similarity is a new type of information proposed by us and was identified to be the best information to take advantage of. Stratifying and re-ranking the retrieval list based on the maximum count of index terms in common between a sentence and a query resulted in significant improvement of performance. To demonstrate these facts a large-scale web information retrieval system was developed and used for experimentation.

论文关键词:Web information retrieval,Named page finding,Retrieval effectiveness,Sentence–query similarity

论文评审过程:Received 30 October 2003, Accepted 17 August 2004, Available online 7 October 2004.

论文官网地址:https://doi.org/10.1016/j.ipm.2004.08.002