Effective foreign word extraction for Korean information retrieval

作者：

Highlights：

•

摘要

In Korean text, foreign words, which are mostly transliterations of English words, are frequently used. Foreign words are usually very important index terms in Korean information retrieval since most of them are technical terms or names. So accurate foreign word extraction is crucial for high performance of information retrieval. However, accurate foreign word extraction is not easy because it inevitably accompanies word segmentation and most of the foreign words are unknown. In this paper, we present an effective foreign word recognition and extraction method. In order to accurately extract foreign words, we developed an effective method of word segmentation that involves unknown foreign words. Our word segmentation method effectively utilizes both unknown word information acquired through the automatic dictionary compilation and foreign word recognition information. Our HMM-based foreign word recognition method does not require large labeled examples for the model training unlike the previously proposed method.

论文关键词：Information retrieval,Foreign word recognition,Word segmentation

论文评审过程：Received 4 April 2000, Accepted 27 October 2000, Available online 29 August 2001.

论文官网地址：https://doi.org/10.1016/S0306-4573(00)00065-0