A novel similarity measure for spatial entity resolution based on data granularity model: Managing inconsistencies in place descriptions
作者:Mohammad Khodizadeh-Nahari, Nasser Ghadiri, Ahmad Baraani-Dastjerdi, Jörg-Rüdiger Sack
摘要
Tremendous amounts of data are generated every day by different sources and stored in heterogeneous databases. Providing an integrated view by fusion of data is essential to enhance data utilization. An indispensable type of data is spatial data, with diverse application domains, including GIS, e-commerce, military, and tourism. The concept of location forms a key part of user-generated data with serious challenges, including uncertainty. A particular location may have different names, and conversely, various locations may have the same name. Furthermore, geographical coordinates of locations may not be expressed accurately in datasets. More challenges also exist that have received less attention. Various data sources might describe locations in different levels of detail. This increases data inconsistency and decreases the quality of data fusion. This paper focuses on spatial data granulation to deal with this variety. If these diversities are not taken into consideration, the different descriptions of a location may be interpreted differently and, in turn, not be fused. The contribution of this paper are: (a) Introducing a granular approach to measure the similarity between two place description for managing apparent differences. The proposed method improves the quality of the geocoding and data fusion phases, (b) Introducing a novel data blocking method to decrease pairwise comparisons based on geographical features. For result evaluation, we developed a dataset from two real aviation accident datasets. The evaluation shows that the quality of entity recognition and data fusion improved by using our proposed data granulation technique.
论文关键词:Data fusion, Data integration, Data granulation, Entity recognition, Spatial data
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-020-01959-y