Quality flaw prediction in Spanish Wikipedia: A case of study with verifiability flaws

作者:

Highlights:

• The first quality prediction study for Spanish articles in Wikipedia is presented.

• Two of the most frequent verifiability flaws were used, Refimprove and Unreference.

• Two features were proposed to model the Refimprove flaw.

• The 21% of the content marked with the flaw Unreference suffer in fact the Refimprove flaw.

• Under-bagged decision trees with sum and majority voting rules, biased-SVM and centroid-based SVM achieved F1 scores around 94%.

摘要

•The first quality prediction study for Spanish articles in Wikipedia is presented.•Two of the most frequent verifiability flaws were used, Refimprove and Unreference.•Two features were proposed to model the Refimprove flaw.•The 21% of the content marked with the flaw Unreference suffer in fact the Refimprove flaw.•Under-bagged decision trees with sum and majority voting rules, biased-SVM and centroid-based SVM achieved F1 scores around 94%.

论文关键词:Information quality,Quality flaw prediction,Semi-supervised learning,Supervised learning,Wikipedia

论文评审过程:Received 29 December 2017, Revised 6 August 2018, Accepted 6 August 2018, Available online 22 August 2018, Version of Record 22 August 2018.

论文官网地址:https://doi.org/10.1016/j.ipm.2018.08.003