An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding
作者:
Highlights:
• We proposed a new method based on part-of-speech tag n-grams (POSNG) for external plagiarism detection.
• The combination of Word2Vec and Longest Common Subsequence (LCS) are used to measure semantic similarity.
• We conducted our experiments on PAN-PC-11 corpus used for the evaluation of automatic plagiarism detection algorithms.
• Results displayed the best overall performance in the type of high obfuscation paraphrasing compared with PAN11 detectors.
摘要
•We proposed a new method based on part-of-speech tag n-grams (POSNG) for external plagiarism detection.•The combination of Word2Vec and Longest Common Subsequence (LCS) are used to measure semantic similarity.•We conducted our experiments on PAN-PC-11 corpus used for the evaluation of automatic plagiarism detection algorithms.•Results displayed the best overall performance in the type of high obfuscation paraphrasing compared with PAN11 detectors.
论文关键词:Plagiarism detection,part-of-speech (POS) tagging,N-grams,Semantic similarity,Word embedding
论文评审过程:Received 30 September 2020, Revised 18 January 2022, Accepted 11 February 2022, Available online 17 February 2022, Version of Record 23 February 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.116677