Semantically Smooth Bilingual Phrase Embeddings Based on Recursive Autoencoders

作者:Qian Lin, Jing Yang, Xiangwen Zhang, Hongji Wang, Yaojie Lu, Jinsong Su

摘要

In this paper, we propose Semantically Smooth Bilingual Recursive Autoencoders to learn bilingual phrase embeddings. The intuition behind our work is to exploit the intrinsic geometric structure of the embedding space and enforce the learned phrase embeddings to be semantically smooth. Specifically, we extend the conventional bilingual recursive autoencoders by preserving the translation and paraphrase probability distributions via regularization terms to simultaneously exploit richer explicit and implicit similarity constraints for bilingual phrase embeddings. To examine the effectiveness of our model, we incorporate two phrase-level similarity features based on the proposed model into a state-of-the-art phrase-based statistical machine translation system. Experiments on NIST Chinese–English test sets show that our model achieves substantial improvements over the baseline.

论文关键词:Bilingual phrase embeddings, Similarity constraints, Machine translation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-020-10210-1