Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages

作者:Shweta Chauhan, Shefali Saxena, Philemon Daniel

摘要

The effective method to utilize monolingual data and enhance the performance of neural machine translation models is back-translation. Iteratively conducting back-translation can further improve the performance of the translation model. In back-translation where, pseudo sentence pairs are generated to train the translation systems with a reconstruction loss, but all the pseudo sentence pairs are not of good quality, which can severely impact the performance of neural machine translation systems. This paper proposes an approach to unsupervised learning for neural machine translation with weighted back translation as part of the training process, as it provides more weight to good pseudo-parallel sentence pairs. The weight is calculated as the round-trip semantic similarity score for each pseudo-parallel sentence. We overcome the limitation of earlier lexical metric-based approaches, especially in the case of morphologically rich languages. Experimental results show an improvement of up to around 0.7% BLEU score over the baseline paper for morphologically rich language (English–Hindi, English–Tamil, and English–Telugu) and 0.3% BLEU score for low resource Hindi-Kangri language.

论文关键词:Back translation, Neural machine translation, Evaluation metrics, Semantic analysis

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-021-10702-8