Word AdHoc Network: Using Google Core Distance to extract the most relevant information

作者：

Highlights：

•

摘要

In recent years, finding the most relevant documents or search results in a search engine has become an important issue. Most previous research has focused on expanding the keyword into a more meaningful sequence or using a higher concept to form the semantic search. All of those methods need predictive models, which are based on the training data or Web log of the users’ browsing behaviors. In this way, they can only be used in a single knowledge domain, not only because of the complexity of the model construction but also because the keyword extraction methods are limited to certain areas. In this paper, we describe a new algorithm called “Word AdHoc Network” (WANET) and use it to extract the most important sequences of keywords to provide the most relevant search results to the user. Our method needs no pre-processing, and all the executions are real-time. Thus, we can use this system to extract any keyword sequence from various knowledge domains. Our experiments show that the extracted sequence of the documents can achieve high accuracy and can find the most relevant information in the top 1 search results, in most cases. This new system can increase users’ effectiveness in finding useful information for the articles or research papers they are reading or writing.

论文关键词：Similarity distance,Search engines,Information retrieval,Keyword sequence,n-gram

论文评审过程：Received 4 March 2010, Revised 23 November 2010, Accepted 23 November 2010, Available online 29 November 2010.

论文官网地址：https://doi.org/10.1016/j.knosys.2010.11.006