Formal concept analysis approach for data extraction from a limited deep web database

作者:Zhuo Zhang, Juan Du, Liming Wang

摘要

Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context K L , which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from K L . Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.

论文关键词:Algorithms, Formal concept analysis, Lower cover, Data extraction, Limited web database

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-013-0242-y