Extensible Parallel Query Processing for Exploratory Geoscientific Data Mining

作者:Eddie C. Shek, Richard R. Muntz, Edmond Mesrobian

摘要

Exploratory data mining and analysis requires a computing environment which provides facilities for the user-friendly expression and rapid execution of “scientific queries.” In this paper, we address research issues in the parallelization of scientific queries containing complex user-defined operations. In a parallel query execution environment, parallelizing a query execution plan involves determining how input data streams to evaluators implementing logical operations can be divided to be processed by clones of the same evaluator in parallel. We introduced the concept of “relevance window” that characterizes data lineage and data partitioning opportunities available for an user-defined evaluator. In addition, we developed a query parallelization framework by extending relational parallel query optimization algorithms to allow the parallelization characteristics of user-defined evaluators to guide the process of query parallelization in an extensible query processing environment. We demonstrated the utility of our system by performing experiments mining cyclonic activity, blocking events, and the upward wave-energy propagation features from several observational and model simulation datasets.

论文关键词:parallel query processing, extensible user-defined operations, geoscientific data mining, cyclone, blocking events, upward wave-energy propagation

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1011401111535