Extracting lexical and phrasal paraphrases: a review of the literature

作者:ChukFong Ho, Masrah Azrifah Azmi Murad, Shyamala Doraisamy, Rabiah Abdul Kadir

摘要

Recent advances in natural language processing have increased the popularity of paraphrase extraction. Most of the attention, however, has been focused on the extraction methods only without taking the resource factor into the consideration. Unknowingly, there is a strong relationship between them and the resource factor also plays an equally important role in paraphrase extraction. In addition, almost all of the previous studies have been focused on corpus-based methods that extract paraphrases from corpora based solely on syntactic similarity. Despite the popularity of corpus-based methods, a considerable amount of research has consistently shown that these methods are vulnerable to several types of erroneous paraphrases. For these reasons, it is necessary to evaluate whether the trend is moving in a positive direction. This paper reviews the major research on paraphrase extraction methods in detail. It begins by exploring the definition of paraphrase from different perspectives to provide a better understanding of the concept of paraphrase extraction. It then studies the characteristics and potential uses of different types of paraphrase resources. After that, it divides paraphrase extraction methods into four main categories: heuristic-based, knowledge-based, corpus-based and hybrid-based and summarizes their strengths and weaknesses. This paper concludes with some potential open research issues for future directions.

论文关键词:Lexical paraphrase, Paraphrase acquisition, Paraphrase extraction, Phrasal paraphrase, Resource, Validation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-012-9357-8