Reasoning for Web document associations and its applications in site map construction

作者:

Highlights:

摘要

Recently, there is an interest in using associations between Web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question “for a given set of pages, find out why they are associated”. We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the Web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for Web site summarization and Web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essential to recover the Web authors' intentions and superimpose it with the users' retrieval contexts in summarizing Web sites. Therefore, we present a framework, which uses logical neighborhoods, entry pages, and associations of entry pages, in creating context-sensitive summaries and maps of complex Web sites.

论文关键词:Reasoning about associations,Link analysis,Random walk,WWW,Topic distillation,Connectivity

论文评审过程:Received 16 August 2001, Revised 12 December 2001, Accepted 6 February 2002, Available online 29 March 2002.

论文官网地址:https://doi.org/10.1016/S0169-023X(02)00053-8