Application of automatic topic identification on Excite Web search engine data logs

作者:

Highlights:

摘要

The analysis of contextual information in search engine query logs enhances the understanding of Web users’ search patterns. Obtaining contextual information on Web search engine logs is a difficult task, since users submit few number of queries, and search multiple topics. Identification of topic changes within a search session is an important branch of search engine user behavior analysis. The purpose of this study is to investigate the properties of a specific topic identification methodology in detail, and to test its validity. The topic identification algorithm’s performance becomes doubtful in various cases. These cases are explored and the reasons underlying the inconsistent performance of automatic topic identification are investigated with statistical analysis and experimental design techniques.

论文关键词:Search engine,Topic identification,Session identification,Genetic algorithm,Dempster–Shafer Theory

论文评审过程:Received 5 January 2004, Accepted 23 April 2004, Available online 13 July 2004.

论文官网地址:https://doi.org/10.1016/j.ipm.2004.04.018