Bilingual recursive neural network based data selection for statistical machine translation

作者:

Highlights:

摘要

Data selection is a widely used and effective solution to domain adaptation in statistical machine translation (SMT). The dominant methods are perplexity-based ones, which do not consider the mutual translations of sentence pairs and tend to select short sentences. In this paper, to address these problems, we propose bilingual semi-supervised recursive neural network data selection methods to differentiate domain-relevant data from out-domain data. The proposed methods are evaluated in the task of building domain-adapted SMT systems. We present extensive comparisons and show that the proposed methods outperform the state-of-the-art data selection approaches.

论文关键词:Data selection,Machine translation,Domain adaptation,Recursive neural network,Autoencoder

论文评审过程:Received 26 October 2015, Revised 28 April 2016, Accepted 6 May 2016, Available online 9 May 2016, Version of Record 12 August 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.05.003