An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding

作者:

Highlights:

• We proposed a new method based on part-of-speech tag n-grams (POSNG) for external plagiarism detection.

• The combination of Word2Vec and Longest Common Subsequence (LCS) are used to measure semantic similarity.

• We conducted our experiments on PAN-PC-11 corpus used for the evaluation of automatic plagiarism detection algorithms.

• Results displayed the best overall performance in the type of high obfuscation paraphrasing compared with PAN11 detectors.

摘要

•We proposed a new method based on part-of-speech tag n-grams (POSNG) for external plagiarism detection.•The combination of Word2Vec and Longest Common Subsequence (LCS) are used to measure semantic similarity.•We conducted our experiments on PAN-PC-11 corpus used for the evaluation of automatic plagiarism detection algorithms.•Results displayed the best overall performance in the type of high obfuscation paraphrasing compared with PAN11 detectors.

论文关键词:Plagiarism detection,part-of-speech (POS) tagging,N-grams,Semantic similarity,Word embedding

论文评审过程:Received 30 September 2020, Revised 18 January 2022, Accepted 11 February 2022, Available online 17 February 2022, Version of Record 23 February 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.116677