Join optimization for inverted index technique on relational database management systems

作者:

Highlights:

摘要

In relational database management systems (RDBMSs), an efficient join method for text retrieval using an inverted index has been developed and implemented. However, the existing crossing of the posting inverted list increases the keyword search time for large texts because of unnecessary comparisons. The relation-based search produces results by utilizing the posting list intersection. To reduce the search time for queries, a multi-way skip-merge join algorithm is proposed in this study. The proposed algorithm improves the execution speed by using a sorted inverted index posting list to minimize unnecessary comparison operations in the posting list intersection. The skip-merge join method, which minimizes unnecessary comparison operations using the aggregate function, is integrated with the multi-way join as a replacement for the existing two-way join method. The join algorithm combining skip-merge join and multi-way join shows good performance because the number of search keywords and the number of documents increase. The performance improvement of the keyword search is verified by implementing the multi-way skip-merge join algorithm in PostgreSQL, an RDBMS.

论文关键词:Inverted index,Join processing,Relational database,Multi-way join

论文评审过程:Received 16 March 2020, Revised 23 February 2022, Accepted 19 March 2022, Available online 23 March 2022, Version of Record 24 March 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.116956