Graph integration of structured, semistructured and unstructured data for data journalism

作者:

Highlights:

• We define novel integration graphs and we construct them from arbitrary datasets

• We build the graphs leveraging data integration, information extraction, and data management

• We propose a novel algorithm finding matches across heterogeneous data sources

• We implement our approach on text, CSV, JSON, XML, RDF, PDF and relational datasets

• We evaluate our approach using a set of use cases with real journalistic datasets

摘要

•We define novel integration graphs and we construct them from arbitrary datasets•We build the graphs leveraging data integration, information extraction, and data management•We propose a novel algorithm finding matches across heterogeneous data sources•We implement our approach on text, CSV, JSON, XML, RDF, PDF and relational datasets•We evaluate our approach using a set of use cases with real journalistic datasets

论文关键词:Data journalism,Heterogeneous data integration,Information extraction

论文评审过程:Received 15 December 2020, Revised 21 April 2021, Accepted 13 May 2021, Available online 6 July 2021, Version of Record 23 November 2021.

论文官网地址:https://doi.org/10.1016/j.is.2021.101846