Balancing coverage and specificity for semantic labelling of subject columns

作者:

Highlights:

摘要

Many data are published on the Web using tabular data formats (e.g., spreadsheets). One of the main challenges for their effective (re)use is their generalized lack of semantics (e.g., column names are not usually standardized, and their meaning and content are not always clear). There is a common understanding that the reuse of tabular data may be improved by annotating them with the types used in knowledge graphs. In this paper, we present a novel approach to automatically type entity columns in tabular data with ontology classes. In contrast with existing proposals in the state-of-the-art, our approach does not require external linguistic resources, lookup services, model training, building a model of the knowledge graph beforehand, or having a human in the loop.

论文关键词:Semantic labelling,Semantic annotation,Knowledge graph

论文评审过程:Received 31 December 2020, Revised 1 November 2021, Accepted 2 November 2021, Available online 14 January 2022, Version of Record 29 January 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.108092