IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

作者:

Highlights:

• IOTA is designed as a scalable framework to interlink translated fiscal concepts.

• There are 19 similarity measures experimented within IOTA.

• Token Sort yields the highest F1 score yet not robust to threshold change.

• TF-IDF has a good and robust F1 score, but it is computationally expensive.

• Results keep highly & positively correlated even as translation pairs are changed.

摘要

•IOTA is designed as a scalable framework to interlink translated fiscal concepts.•There are 19 similarity measures experimented within IOTA.•Token Sort yields the highest F1 score yet not robust to threshold change.•TF-IDF has a good and robust F1 score, but it is computationally expensive.•Results keep highly & positively correlated even as translation pairs are changed.

论文关键词:Data interlinking,Budget and spending data,String similarity measure,Open data,Translated string matching framework,Cluster computing

论文评审过程:Received 26 August 2019, Revised 12 November 2019, Accepted 13 December 2019, Available online 18 December 2019, Version of Record 22 January 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.113135