Finding top-\(k\, r\)-cliques for keyword search from graphs in polynomial delay

作者:Mehdi Kargar, Aijun An

摘要

Keyword search over structured data offers an alternative method to explore and query databases for users that are not familiar with the structure of the data and/or a query language. Structured data are usually modeled as graphs. In this context, an answer is a substructure of the graph that contains all or some of the query keywords. In most of the previous works, a minimal tree that covers all the query keywords are found as the answer. Some recent works offer to find subgraphs rather than minimal trees and show that subgraphs might be more informative for the users. However, current methods suffer from the following problems. Although some of the content nodes (i.e., nodes that contain input keywords) are close to each other in an answer, others might be far from each other. While searching for the best answer, current methods explore the whole graph rather than only the content nodes. This might increase the run time and leads to poor performance. To address these problems, we propose to find top-\(k\, r\)-cliques as the answers to the graph keyword search problem. An \(r\)-clique is a set of content nodes that cover all the input keywords, and the distance between each pair of nodes is less than or equal to \(r\). We propose a new weight function that is the sum of distances between each pair of content nodes. We prove that minimizing the new weight function is NP-hard and propose an approximation algorithm that produces \(r\)-cliques with 2-approximation ratio in polynomial delay. We further improve the run time of the approximation algorithm with the cost of increasing the approximation ratio. Extensive performance studies using three large real datasets confirm the efficiency and accuracy of finding \(r\)-cliques in graphs.

论文关键词:Keyword search, Graph data, Polynomial delay, Approximation algorithm

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-014-0736-0