Querying on large and complex databases by content: Challenges on variety and veracity regarding real applications

作者：

Highlights：

•

摘要

The amount and variety of digital data currently being generated, stored and analyzed, including images, videos, and time series, have brought challenges to data administrators, analysts and developers, who struggle to comply with the expectations of both data owners and end users. The majority of the applications demand searching complex data by taking advantage of queries that analyze different aspects of the data, and need the answers in a timely manner. Content-based similarity retrieval techniques are well-suited to handle large databases, because they enable performing queries and analyses using features automatically extracted from the data, without users’ intervention. In this paper, we review and discuss the challenges posed to the database and related communities in order to provide techniques and tools that can meet the variety and veracity characteristics of big and complex data, while also considering the aspects of semantical preservation and completeness of the data. Examples and results obtained over a two-decade-long experience with real applications are presented and discussed.

论文关键词：Similarity search,Content-based image retrieval,Feature extraction methods,Bags-of-visual-words,Missing data,Big-data characteristics,Variety,Veracity

论文评审过程：Received 19 March 2018, Revised 16 November 2018, Accepted 26 March 2019, Available online 8 April 2019, Version of Record 30 July 2019.

论文官网地址：https://doi.org/10.1016/j.is.2019.03.012