Coverage-based query subtopic diversification leveraging semantic relevance

作者:Md. Shajalal, Masaki Aono

摘要

Generally, users are reserved in describing their search intention when submitting queries into the search engine. Therefore, a large number of search queries are usually short, ambiguous and tend to have multiple interpretations. With the gigantic size of the web, ignoring the information needs underlying such queries can misguide the search engine. To mitigate these issues, an effective approach is to diversify the search results considering the query subtopics with diverse intents. The task of identifying possible subtopics with diverse intents underlying a query is known as subtopic mining. This paper is aimed at mining and diversifying subtopics underlying a query. Our method first exacts noun phrases containing the query terms from the top-retrieved web documents. We also extract query suggestions and completions from commercial search engines. The extracted candidates highly related to the query are then selected as subtopics. We introduce a new relatedness score function to estimate the degree of relatedness between the query and the candidate. To estimate the relevancy between the query and the subtopic, this paper introduces a semantic relevance measure using a locally trained sentence embedding model. Finally, we propose a novel coverage-based diversification technique to rank the subtopics combining their relevancy and the coverage estimated by the web documents. The experimental results on two NTCIR English subtopic mining datasets demonstrate that our proposed method achieves new state-of-the-art performance and significantly outperforms some known related methods in terms of relevance (D-nDCG) and diversity (D#-nDCG) metric at cut of 10.

论文关键词:Subtopic mining, Relatedness, Sentence embedding, Coverage-based diversification

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-020-01470-3