Combining compound and single terms under language model framework

作者:Arezki Hammache, Mohand Boughanem, Rachid Ahmed-Ouamer

摘要

Most existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. To go beyond this assumption and thereby capture the semantics of document and query more accurately, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. However, in these models all bigrams/n-grams are considered and weighted uniformly. In this paper we introduce a new approach to select and weight relevant n-grams associated with a document. Experimental results on three TREC test collections showed an improvement over three strongest state-of-the-art model baselines, which are the original unigram language model, the Markov Random Field model, and the positional language model.

论文关键词:Compound term weighting, Term dominance, Information retrieval, Language model

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-013-0618-x