The Dilution/Concentration conditions for cross-language information retrieval models

作者:

Highlights:

摘要

Experimental results of cross-language information retrieval (CLIR) do not indicate why a model fails or how a model could be improved. One basic research question is thus whether it is possible to provide conditions by which one can evaluate any existing or new CLIR strategy analytically and one can improve the design of CLIR models. Inspired by the heuristics in monolingual IR, we introduce in this paper Dilution/Concentration (D/C) conditions to characterize good CLIR models based on direct intuitions under artificial settings. The conditions, derived from first principles in CLIR, generalize the idea of query structuring approach. Empirical results with state-of-the-art CLIR models show that when a condition is not satisfied, it often indicates non-optimality of the method. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies the conditions. Lastly, we propose, by following the D/C conditions, several novel CLIR models based on the information-based models, which again shows that the D/C conditions are efficient to feature good CLIR models.

论文关键词:Cross-language information retrieval,D/C condition,Information retrieval heuristic

论文评审过程:Received 5 March 2017, Revised 17 November 2017, Accepted 26 November 2017, Available online 9 December 2017, Version of Record 9 December 2017.

论文官网地址:https://doi.org/10.1016/j.ipm.2017.11.008