English–Arabic collocation extraction to enhance Arabic collocation identification

作者:Chiraz Ben Othmane Zribi

摘要

Bilingual collocation extraction could improve the performance of monolingual extraction. This is especially true for the English–Arabic pair, as difficulties of Arabic collocation extraction can be overcome. We present in this paper two novel approaches for extracting both monolingual and bilingual collocations. The monolingual extraction approach is hybrid, based on linguistic patterns and statistical measures. We propose during statistical filtering to combine vector-based measures with different association measures via a voting procedure. The bilingual extraction capitalizes on different cues (position, frequency, cross-language correspondence between POS-patterns, distribution, translation). It allows enhancing the monolingual collocation extraction by considering not only collocation equivalents with direct translation. Indeed, it can validate unconfirmed collocations because they translate confirmed ones. The results showed, in particular, how the extraction of Arabic collocations can be improved by extracting English–Arabic ones. The precision of extracting Arabic collocations moved upward, respectively, from about 86 to 96%.

论文关键词:Monolingual collocation, Bilingual collocation LSA, Word embeddings, Skip-gram, Association measure

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-019-01428-0