Transforming Wikipedia into a large scale multilingual concept network
作者:
摘要
A knowledge base for real-world language processing applications should consist of a large base of facts and reasoning mechanisms that combine them to induce novel and more complex information. This paper describes an approach to deriving such a large scale and multilingual resource by exploiting several facets of the on-line encyclopedia Wikipedia. We show how we can build upon Wikipediaʼs existing network of categories and articles to automatically discover new relations and their instances. Working on top of this network allows for added information to influence the network and be propagated throughout it using inference mechanisms that connect different pieces of existing knowledge. We then exploit this gained information to discover new relations that refine some of those found in the previous step. The result is a network containing approximately 3.7 million concepts with lexicalizations in numerous languages and 49+ million relation instances. Intrinsic and extrinsic evaluations show that this is a high quality resource and beneficial to various NLP tasks.
论文关键词:Knowledge base,Multilinguality,Knowledge acquisition
论文评审过程:Available online 25 June 2012.
论文官网地址:https://doi.org/10.1016/j.artint.2012.06.008