Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus

作者:

Highlights:

• Wikipedia provides rich, natural semi-structured texts for information retrieval.

• It provides semantic information for keyword extraction from varied texts.

• It facilitates clustering, text classification and semantic relatedness analyses.

• It supplies a semantically structured knowledge base for studying ontologies.

摘要

•Wikipedia provides rich, natural semi-structured texts for information retrieval.•It provides semantic information for keyword extraction from varied texts.•It facilitates clustering, text classification and semantic relatedness analyses.•It supplies a semantically structured knowledge base for studying ontologies.

论文关键词:Information retrieval,Information extraction,Natural language processing,Ontologies,Wikipedia,Literature review

论文评审过程:Received 30 November 2014, Revised 19 July 2016, Accepted 28 July 2016, Available online 27 October 2016, Version of Record 19 January 2017.

论文官网地址:https://doi.org/10.1016/j.ipm.2016.07.003