Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval

作者：

Highlights：

•

摘要

This paper analyzes the features of the Swedish language from the viewpoint of mono- and cross-language information retrieval (CLIR). The study was motivated by the fact that Swedish is known poorly from the IR perspective. This paper shows that Swedish has unique features, in particular gender features, the use of fogemorphemes in the formation of compound words, and a high frequency of homographic words. Especially in dictionary-based CLIR, correct word normalization and compound splitting are essential. It was shown in this study, however, that publicly available morphological analysis tools used for normalization and compound splitting have pitfalls that might decrease the effectiveness of IR and CLIR. A comparative study was performed to test the degree of lexical ambiguity in Swedish, Finnish and English. The results suggest that part-of-speech tagging might be useful in Swedish IR due to the high frequency of homographic words.

论文关键词：Text retrieval,Cross-language information retrieval,Swedish language,Natural language processing

论文评审过程：Received 21 January 2000, Accepted 12 April 2000, Available online 6 December 2000.

论文官网地址：https://doi.org/10.1016/S0306-4573(00)00024-8