Regularized query classification using search click information

作者:

Highlights:

摘要

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.

论文关键词:Query classification,Query representation,Web search,Regression,Regularization,User logs

论文评审过程:Received 26 September 2006, Revised 1 October 2007, Accepted 9 January 2008, Available online 25 January 2008.

论文官网地址:https://doi.org/10.1016/j.patcog.2008.01.010