IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA
作者:
Highlights:
• IOTA is designed as a scalable framework to interlink translated fiscal concepts.
• There are 19 similarity measures experimented within IOTA.
• Token Sort yields the highest F1 score yet not robust to threshold change.
• TF-IDF has a good and robust F1 score, but it is computationally expensive.
• Results keep highly & positively correlated even as translation pairs are changed.
摘要
•IOTA is designed as a scalable framework to interlink translated fiscal concepts.•There are 19 similarity measures experimented within IOTA.•Token Sort yields the highest F1 score yet not robust to threshold change.•TF-IDF has a good and robust F1 score, but it is computationally expensive.•Results keep highly & positively correlated even as translation pairs are changed.
论文关键词:Data interlinking,Budget and spending data,String similarity measure,Open data,Translated string matching framework,Cluster computing
论文评审过程:Received 26 August 2019, Revised 12 November 2019, Accepted 13 December 2019, Available online 18 December 2019, Version of Record 22 January 2020.
论文官网地址:https://doi.org/10.1016/j.eswa.2019.113135