Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval

作者:

Highlights:

• We study the effects of misspelled queries on the performance of CLIR systems.

• Word-based approaches (as both indexing and translation units) are highly sensitive to the presence of misspellings.

• The use of correction mechanisms can significantly reduce their negative effects.

• Classical techniques are suitable for shorter queries while context-based corrections are suitable for longer queries.

• Our approach based on character n-grams (as both indexing and translation units) shows remarkable strength.

摘要

•We study the effects of misspelled queries on the performance of CLIR systems.•Word-based approaches (as both indexing and translation units) are highly sensitive to the presence of misspellings.•The use of correction mechanisms can significantly reduce their negative effects.•Classical techniques are suitable for shorter queries while context-based corrections are suitable for longer queries.•Our approach based on character n-grams (as both indexing and translation units) shows remarkable strength.

论文关键词:Misspelled queries,Cross-Language Information Retrieval,Machine translation,Spelling correction,Character n-grams

论文评审过程:Received 14 October 2015, Revised 16 December 2015, Accepted 17 December 2015, Available online 12 January 2016, Version of Record 17 May 2016.

论文官网地址:https://doi.org/10.1016/j.ipm.2015.12.010